Summary of "Handling Mixed Variables | Feature Engineering"

Handling mixed variables (feature engineering)

Problem: “Mixed” columns in tabular data where a single field contains both categorical/text and numeric information, or where some rows in a column are numeric and others categorical. These cause feature-engineering headaches and high-cardinality/noisy features.

What the video covers (high level)

Two common patterns and recommended handling

1) Mixed within a single cell (concatenated category + number)

Example: values like C85 or A/5 21171.

Recommended steps:

  1. Split the cell into two new columns:
    • a categorical column for the alphabetic prefix (e.g., ticket_prefix, cabin_prefix)
    • a numeric column for the extracted number (e.g., ticket_number, cabin_number)
  2. Use regular expressions to capture the alphabetic prefix and the numeric part.
  3. Benefits:
    • reduces cardinality
    • converts the numeric portion into a usable numeric feature for downstream models

2) Mixed by row type (same column contains either numeric rows or categorical rows)

Recommended approach:

Implementation notes & practical tips

Tutorials / guides referenced

What’s next

Main speaker / source

Category ?

Technology


Share this summary


Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Video