Summary of "Feature Scaling - Normalization | MinMaxScaling | MaxAbsScaling | RobustScaling"
Feature scaling / normalization — overview
Feature scaling changes numeric feature values to a comparable range while preserving relative differences. Proper scaling helps algorithms that depend on magnitudes or distances (gradient-based methods, many linear models, neural networks, KNN, SVM, PCA) perform better. Tree-based models usually do not require scaling.
Main ideas and practical workflow
- Decide whether scaling is necessary for your problem and algorithm. Algorithms that commonly benefit include distance-based and gradient-based methods, many linear models, neural networks, KNN, SVM, and PCA.
- Typical workflow:
- Split data into training and test (and validation) sets.
- Fit the scaler on the training set only.
- Transform the training set and then transform the test/validation sets with the fitted scaler.
- Use
inverse_transformwhen you need to convert scaled values back to the original scale. - Note: scikit-learn transformers convert DataFrames to NumPy arrays — keep track of column names if you need to restore a DataFrame.
- Experimentation is important: no single scaler is best for every dataset. Try several and compare model performance.
Scaling methods — formulas, intuition, pros/cons
Min-Max Scaling (MinMaxScaler)
- Formula:
x' = (x - min) / (max - min) - Result: values mapped to
[0, 1](or another specified range) - Intuition: compresses each feature into a unit box (unit interval per feature)
- Pros:
- Preserves the shape of the original distribution (mostly).
- Useful when you know true feature bounds (e.g., image pixels 0–255) or when a model requires bounded inputs.
- Cons:
- Sensitive to outliers — extreme values will compress the rest of the data.
- Use when: you know the true min/max bounds or model requires bounded input ranges.
Standardization / Z-score (StandardScaler) and Mean Normalization
- Standardization (common form)
- Formula:
x' = (x - mean) / std - Result: zero mean and unit variance
- Use when: algorithms expect centered data (many linear models, PCA); often performs well in practice.
- Formula:
- Mean normalization (less common)
- Formula (sometimes used):
x' = (x - mean) / (max - min) - Result: centers data around zero; range roughly between
-1and+1depending on distribution. - Note: nomenclature varies in tutorials — StandardScaler (z-score) is the most common “centered” approach.
- Formula (sometimes used):
MaxAbs Scaling (MaxAbsScaler)
- Formula:
x' = x / max(|x|) - Result: scales data to the range
[-1, 1]without centering (zeros remain exactly zero) - Pros:
- Preserves sparsity — useful for sparse data with many zeros.
- Use when: data is sparse and you do not want to shift the mean.
Robust Scaling (RobustScaler)
- Formula:
x' = (x - median) / IQR, whereIQR = Q3 - Q1(75th percentile − 25th percentile) - Result: centers on the median and scales according to the IQR
- Pros:
- Much less sensitive to outliers than Min-Max or standard scaling.
- Use when: data contains significant outliers.
Practical example (wine dataset demonstration)
Dataset: wine data (example features: alcohol and malic acid)
Steps in the demo:
- Inspect feature distributions (histograms / distribution plots).
- Split into train and test sets.
- Fit MinMaxScaler on the training set, transform both training and test sets.
- Convert results back to a DataFrame to inspect ranges and create plots.
- Visualize scatter plots before and after scaling.
Observations:
- After Min-Max scaling, both alcohol and malic-acid columns were mapped to
[0, 1]; the scatter becomes compressed into the unit rectangle. - Distribution shapes are largely preserved by Min-Max scaling but can change slightly depending on how the transformation interacts with the original distribution; some distortions are possible.
- Min-Max guarantees min → 0 and max → 1 (if using the default range), but intermediate distances and shapes can change.
- RobustScaler is useful when outliers are present; MaxAbsScaler is useful for preserving sparsity.
Rules of thumb for choosing a scaler
- MinMaxScaler: when feature values are naturally bounded and you know min/max (e.g., image pixels 0–255).
- RobustScaler: when data contains outliers.
- MaxAbsScaler: when data is sparse with many zeros and you want to preserve sparsity.
- StandardScaler (z-score): when you need centered data (zero mean) and unit variance.
- Unsure: try multiple scalers and compare model performance.
- Always fit the scaler on training data only, then transform validation/test data.
Implementation notes and gotchas
- Fit on train only; transform both train and test (and validation).
- scikit-learn scalers return NumPy arrays — convert back to pandas DataFrame if you want column names preserved.
- Use
inverse_transformto map scaled data back to the original scale for interpretation or plotting. - Be aware that scaling changes distances and relations between features — interpret plots in scaled space with caution.
Sources and references
- scikit-learn transformer classes: MinMaxScaler, MaxAbsScaler, RobustScaler, StandardScaler.
- Wine dataset (used in the demo; likely the UCI Wine dataset).
- Demonstration presenter / YouTube channel (unnamed) and general machine learning references.
Category
Educational
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.
Preparing reprocess...