Summary of "Power Transformer | Box - Cox Transform | Yeo - Johnson Transform"

Overview — main ideas and lessons

The video introduces the “power transformer” family of data transformations (Box–Cox and Yeo–Johnson) used to make feature distributions more Gaussian-like and to improve performance of algorithms that assume or benefit from near-normal inputs (for example, linear and logistic regression).

Key points:

If features are skewed and your model benefits from normal-ish inputs, try Box–Cox or Yeo–Johnson and validate which gives the best downstream results.


Key concepts / definitions


Step-by-step methodology

  1. Inspect your dataset

    • Plot histograms or density plots for each feature to identify skewness/non-normality.
    • Check for zeros or negative values (important for Box–Cox).
  2. Choose transformation strategy

    • If all values in a feature are strictly positive → Box–Cox is possible.
    • If features contain zeros or negatives → use Yeo–Johnson, or shift the data by a small positive constant before Box–Cox.
  3. Prepare scikit-learn transformer

    • Import and create transformer objects: python from sklearn.preprocessing import PowerTransformer pt_box = PowerTransformer(method='box-cox', standardize=True) pt_yj = PowerTransformer(method='yeo-johnson', standardize=True)

    • If using Box–Cox and a feature has zeros/min=0, add a small epsilon (e.g., 1e-6 or a domain-appropriate constant) to that feature before fitting.

  4. Fit and transform

    • Fit transformer on training features: python pt.fit(X_train)

    • Transform training and test/validation sets: python X_train_t = pt.transform(X_train) X_test_t = pt.transform(X_test)

    • PowerTransformer estimates a separate λ for each feature internally (accessible after fitting).

  5. Train and evaluate downstream model

    • Fit a model (e.g., LinearRegression) on transformed training data.
    • Evaluate via cross-validation or a hold-out test set (apply the same transform pipeline before scoring).
    • Compare metrics (R2, RMSE, etc.) against a model trained on untransformed features.
  6. Compare transformations

    • Try both Box–Cox and Yeo–Johnson (and possibly other transforms like log, sqrt).
    • Compare model metrics and inspect post-transform feature distributions.
    • Choose the transform that yields the best validation performance.
  7. Additional practical tips

    • Use Pipelines to ensure the same transform is applied to train and test splits.
    • When using Box–Cox, store/record any small shift applied so you can invert or apply the same preprocessing to new data.
    • If PowerTransformer.standardize=True, an additional StandardScaler after transform is usually unnecessary.

Implementation notes from the video


Observations and recommendations


Speakers / sources featured

Category ?

Educational


Share this summary


Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Video