Summary of "Encoding Categorical Data | Ordinal Encoding | Label Encoding"

Encoding Categorical Data (Ordinal Encoding, Label Encoding)

Main ideas and concepts

Important practical rules

Always split data into train and test before fitting encoders to avoid data leakage. Fit encoders on the training set, then transform both train and test with the fitted encoder.

Step-by-step methodology / instructions

  1. Identify categorical columns and their types
    • Determine which features are nominal vs ordinal using domain knowledge.
    • Example dataset columns:
      • gender (nominal)
      • reviews (ordinal: poor / average / good)
      • education (ordinal: school / undergraduate / postgraduate)
      • purchased (yes/no — nominal target)
  2. Split data into train and test
    • Use train_test_split (or equivalent).
    • Always learn encoders only on the training set.
  3. Prepare OrdinalEncoder for ordinal features
    • Import OrdinalEncoder from sklearn.preprocessing (or implement a custom mapping).
    • When constructing the encoder, pass a categories parameter that explicitly specifies the ordered categories for each ordinal column (a list of lists), for example:
      • reviews: ['poor', 'average', 'good']
      • education: ['school', 'undergraduate', 'postgraduate']
    • Fit the encoder on X_train and transform both X_train and X_test.
    • Result: ordinal categories convert to integers reflecting the given order (e.g., poor → 0, average → 1, good → 2).
  4. Handle nominal features
    • Use OneHotEncoder (or another appropriate encoder) for nominal features to avoid introducing artificial order.
    • Apply OneHotEncoder via ColumnTransformer or a pipeline for selected nominal columns.
  5. Encode the target variable (if needed)
    • Use LabelEncoder only for target labels (y) in classification tasks.
    • Fit LabelEncoder on y_train and transform y_train and y_test.
    • Do not use LabelEncoder for input features unless you intentionally want a numeric mapping and understand the implications.
  6. Combine transformations cleanly
    • Use ColumnTransformer to assign different transformers (OrdinalEncoder, OneHotEncoder, etc.) to different subsets of columns.
    • Use Pipeline to chain preprocessing and modeling steps and ensure reproducible transforms.

Examples & warnings highlighted

Tools / classes mentioned

Speakers / sources featured

Category ?

Educational


Share this summary


Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Video