Feature Engineering: Polynomials, Interactions, Binning, and Domain Features

Advanced Preprocessing

~13 min read Preprocessing

Prerequisites:

Encoding Categorical Variables: Label, One-Hot, Target, and Ordinal Encoding Feature Scaling: Standardization, Normalization, and Robust Scaling

Definition

Feature engineering is the process of transforming raw data into features that better represent the underlying problem to predictive models, resulting in improved model accuracy. It encompasses the creation of new features through mathematical transformations (polynomials, logarithms), combining existing features (interactions, ratios), discretizing continuous variables (binning), and incorporating domain knowledge to create meaningful representations. Effective feature engineering can often provide greater performance gains than algorithm tuning or collecting more data. The process requires understanding the data generation mechanism, the relationships between variables, and the inductive biases of the chosen algorithm. Polynomial features capture non-linear relationships by creating higher-order terms and interactions. Feature interactions model combined effects that individual features cannot express alone. Binning transforms continuous variables into categorical buckets, useful for handling non-linearities and outliers. Domain-specific features leverage expert knowledge to create highly predictive variables that raw data alone cannot reveal. Feature engineering is both an art and a science, requiring creativity, domain expertise, and rigorous validation.

Intuition

💡

Imagine you're trying to predict house prices. Raw data gives you square footage and number of bedrooms. But the price per square foot, the interaction between location and size, or whether the house is 'oversized' for its neighborhood might matter more. Feature engineering is like being a detective looking for hidden clues—a house's age matters, but its 'era' (Victorian, mid-century, modern) might capture style premiums better than raw year. Polynomial features are like asking 'what if the relationship curves instead of being a straight line?' Interactions ask 'does being large matter more in expensive neighborhoods?' Binning groups similar values—ages 20-30 might have similar buying patterns even though 20 and 30 are different numbers. Domain features come from asking an expert realtor what actually sells houses.

Mathematical Formula

\text{Polynomial Features (degree 2):} \quad \phi(x) = [1, x_1, x_2, x_1^2, x_2^2, x_1x_2]

\text{Interaction Term:} \quad x_{ij} = x_i \times x_j

\text{Log Transform:} \quad x' = \log(x + 1)

\text{Binning:} \quad b_i = \lfloor \frac{x - x_{min}}{w} \rfloor, \quad w = \frac{x_{max} - x_{min}}{n_{bins}}

\text{Ratio Feature:} \quad r = \frac{x_1}{x_2 + \epsilon}

Step-by-Step Explanation:

Polynomial: Generate all combinations of features up to specified degree including cross-terms
Interaction: Multiply two or more features to model combined effects
Log Transform: Apply logarithm to compress large values and handle skewed distributions
Binning: Divide continuous range into equal-width or equal-frequency buckets
Ratio: Divide one feature by another to create relative measures (add epsilon to avoid division by zero)

Real-World Use Cases

Healthcare

BMI (weight/height²) is an engineered feature more predictive than raw measurements. Age bins (child, adult, elderly) capture different risk profiles. Polynomial terms model non-linear drug response curves. Interaction between smoking and age for cancer risk.

Finance

Debt-to-income ratio more predictive than raw debt. Moving averages and volatility (std dev) from price history. Binning credit scores into rating tiers. Polynomial features for non-linear interest rate impacts. Time-based features from transaction timestamps.

Retail

Price per unit, discount percentage. Customer recency, frequency, monetary (RFM) features. Seasonality indicators from purchase dates. Interaction between category and season for demand prediction.

Manufacturing

Efficiency ratios (output/input). Temperature-pressure interactions for process optimization. Binned vibration levels for maintenance alerts. Rate of change features from sensor time series.

Tech

User engagement ratios (clicks/views, time/pages). TF-IDF from text. Session features (duration, depth, bounce). Interaction between device type and feature usage for churn prediction.

Implementation

Manual Implementation (No Libraries)

import numpy as np
import pandas as pd

# Create sample data
np.random.seed(42)
data = {
    'length': [10, 15, 8, 20, 12, 18, 9, 22, 14, 11],
    'width': [5, 8, 4, 10, 6, 9, 5, 11, 7, 6],
    'height': [3, 4, 2, 5, 3, 5, 3, 6, 4, 3],
    'price': [100, 200, 80, 300, 150, 250, 90, 350, 180, 120],
    'age': [25, 35, 28, 45, 32, 40, 27, 50, 38, 30]
}
df = pd.DataFrame(data)

print("Original Dataset:")
print(df)

# 1. POLYNOMIAL FEATURES (Manual)
def polynomial_features_manual(df, columns, degree=2):
    """
    Manual polynomial feature generation.
    Creates x, x², x³, ... and interaction terms x_i * x_j
    """
    result = df.copy()
    
    # Single feature powers
    for col in columns:
        for d in range(2, degree + 1):
            result[f'{col}_pow{d}'] = result[col] ** d
    
    # Interaction terms (pairwise products)
    for i, col1 in enumerate(columns):
        for col2 in columns[i+1:]:
            result[f'{col1}_x_{col2}'] = result[col1] * result[col2]
    
    return result

print("
=== 1. POLYNOMIAL FEATURES (Manual, degree=2) ===")
df_poly = polynomial_features_manual(df, ['length', 'width'], degree=2)
poly_cols = [c for c in df_poly.columns if c not in df.columns or c in ['length', 'width']]
print(f"New features created: {[c for c in df_poly.columns if c not in df.columns]}")
print("
Polynomial features:")
print(df_poly[[c for c in df_poly.columns if 'length' in c or 'width' in c or 'x_' in c]])

# 2. INTERACTION FEATURES (Manual)
def interaction_features_manual(df, interaction_pairs):
    """
    Create interaction features from specified pairs.
    Can also do sum, difference, ratio.
    """
    result = df.copy()
    
    for col1, col2 in interaction_pairs:
        # Product
        result[f'{col1}_mul_{col2}'] = result[col1] * result[col2]
        # Sum
        result[f'{col1}_add_{col2}'] = result[col1] + result[col2]
        # Difference
        result[f'{col1}_sub_{col2}'] = result[col1] - result[col2]
        # Ratio (with epsilon)
        result[f'{col1}_div_{col2}'] = result[col1] / (result[col2] + 1e-8)
    
    return result

print("
=== 2. INTERACTION FEATURES (Manual) ===")
df_interact = interaction_features_manual(df, [('length', 'width'), ('price', 'age')])
print(f"New interaction features:")
interact_cols = [c for c in df_interact.columns if any(op in c for op in ['_mul_', '_add_', '_sub_', '_div_'])]
print(df_interact[interact_cols].head())

# 3. MATHEMATICAL TRANSFORMATIONS (Manual)
def mathematical_transforms_manual(series, transforms=['log', 'sqrt', 'reciprocal', 'exp']):
    """
    Apply various mathematical transformations.
    """
    result = pd.DataFrame(index=series.index)
    name = series.name
    
    if 'log' in transforms:
        # Log transform (add 1 for zeros)
        result[f'{name}_log'] = np.log(series + 1)
    if 'sqrt' in transforms:
        result[f'{name}_sqrt'] = np.sqrt(series)
    if 'reciprocal' in transforms:
        result[f'{name}_inv'] = 1 / (series + 1e-8)
    if 'exp' in transforms:
        # Normalize to avoid overflow
        normalized = (series - series.mean()) / series.std()
        result[f'{name}_exp'] = np.exp(normalized)
    if 'square' in transforms:
        result[f'{name}_sq'] = series ** 2
    if 'cube' in transforms:
        result[f'{name}_cube'] = series ** 3
    
    return result

print("
=== 3. MATHEMATICAL TRANSFORMATIONS (Manual) ===")
df_math = df.copy()
transforms = mathematical_transforms_manual(df['price'], ['log', 'sqrt', 'square'])
df_math = pd.concat([df_math, transforms], axis=1)
print("Price transformations:")
print(df_math[['price', 'price_log', 'price_sqrt', 'price_sq']].head())

# 4. BINNING / DISCRETIZATION (Manual)
def equal_width_binning(series, n_bins=4):
    """
    Manual equal-width binning.
    """
    min_val, max_val = series.min(), series.max()
    bin_width = (max_val - min_val) / n_bins
    
    # Create bins
    bins = [min_val + i * bin_width for i in range(n_bins + 1)]
    bins[-1] = max_val + 1e-8  # Include max value
    
    # Assign bins
    binned = pd.cut(series, bins=bins, include_lowest=True, labels=range(n_bins))
    
    return binned, bins

def equal_freq_binning(series, n_bins=4):
    """
    Manual equal-frequency (quantile) binning.
    """
    quantiles = np.linspace(0, 1, n_bins + 1)
    bin_edges = series.quantile(quantiles).values
    bin_edges[0] -= 1e-8  # Ensure min is included
    bin_edges[-1] += 1e-8  # Ensure max is included
    
    binned = pd.cut(series, bins=bin_edges, labels=range(n_bins))
    
    return binned, bin_edges

print("
=== 4. BINNING / DISCRETIZATION (Manual) ===")
df_bin = df.copy()

# Equal width
width_binned, width_bins = equal_width_binning(df['age'], n_bins=4)
df_bin['age_bin_width'] = width_binned
print(f"
Equal-width bins for age: {['%.0f' % b for b in width_bins]}")
print("Value to bin mapping:")
print(df_bin[['age', 'age_bin_width']].head(8))

# Equal frequency
freq_binned, freq_bins = equal_freq_binning(df['price'], n_bins=4)
df_bin['price_bin_freq'] = freq_binned
print(f"
Equal-frequency bins for price: {['%.0f' % b for b in freq_bins]}")
print("Bin counts:")
print(df_bin['price_bin_freq'].value_counts().sort_index())

# 5. DOMAIN-SPECIFIC FEATURES (Manual)
def domain_features_manual(df):
    """
    Create domain-specific features based on problem knowledge.
    Example: Product/box dimensions → volume, surface area, aspect ratios
    """
    result = df.copy()
    
    # Volume (for box-like objects)
    result['volume'] = result['length'] * result['width'] * result['height']
    
    # Surface area
    result['surface_area'] = 2 * (
        result['length'] * result['width'] +
        result['width'] * result['height'] +
        result['length'] * result['height']
    )
    
    # Aspect ratios
    result['aspect_ratio_lw'] = result['length'] / (result['width'] + 1e-8)
    result['aspect_ratio_lh'] = result['length'] / (result['height'] + 1e-8)
    
    # Price per unit volume (density proxy)
    result['price_per_volume'] = result['price'] / (result['volume'] + 1e-8)
    
    # Price category (binned)
    result['price_tier'] = pd.cut(
        result['price'],
        bins=[0, 100, 200, 500],
        labels=['budget', 'mid', 'premium']
    )
    
    # Age groups
    result['age_group'] = pd.cut(
        result['age'],
        bins=[0, 30, 40, 100],
        labels=['young', 'middle', 'senior']
    )
    
    # Is cubic (all dimensions similar)
    max_dim = result[['length', 'width', 'height']].max(axis=1)
    min_dim = result[['length', 'width', 'height']].min(axis=1)
    result['is_cubic_like'] = (max_dim / (min_dim + 1e-8)) < 1.5
    
    return result

print("
=== 5. DOMAIN-SPECIFIC FEATURES (Manual) ===")
df_domain = domain_features_manual(df)
print("Domain features created:")
domain_cols = ['volume', 'surface_area', 'aspect_ratio_lw', 'price_per_volume', 'price_tier', 'age_group', 'is_cubic_like']
print(df_domain[domain_cols])

# 6. AGGREGATION FEATURES (Manual)
def aggregation_features_manual(df, group_col, agg_cols):
    """
    Create aggregation features based on group statistics.
    """
    result = df.copy()
    
    for col in agg_cols:
        # Group statistics
        group_mean = result.groupby(group_col)[col].transform('mean')
        group_std = result.groupby(group_col)[col].transform('std')
        group_min = result.groupby(group_col)[col].transform('min')
        group_max = result.groupby(group_col)[col].transform('max')
        group_count = result.groupby(group_col)[col].transform('count')
        
        # Features relative to group
        result[f'{col}_group_mean'] = group_mean
        result[f'{col}_group_std'] = group_std
        result[f'{col}_vs_group_mean'] = result[col] - group_mean
        result[f'{col}_group_zscore'] = (result[col] - group_mean) / (group_std + 1e-8)
        result[f'{col}_group_pct'] = (result[col] - group_min) / (group_max - group_min + 1e-8)
    
    return result

print("
=== 6. AGGREGATION FEATURES (Manual) ===")
# Add a category for grouping
df['category'] = ['A', 'A', 'B', 'B', 'A', 'B', 'A', 'B', 'A', 'B']
df_agg = aggregation_features_manual(df, 'category', ['price'])
print("Aggregation features:")
agg_cols = [c for c in df_agg.columns if 'group' in c or 'vs_group' in c or '_zscore' in c or '_pct' in c]
print(df_agg[['category', 'price'] + agg_cols].head(8))

# 7. ROLLING/LAG FEATURES (Manual)
def rolling_features_manual(series, windows=[2, 3]):
    """
    Create rolling window statistics (for time series).
    """
    result = pd.DataFrame(index=series.index)
    
    for window in windows:
        result[f'{series.name}_roll_mean_{window}'] = series.rolling(window=window, min_periods=1).mean()
        result[f'{series.name}_roll_std_{window}'] = series.rolling(window=window, min_periods=1).std()
        result[f'{series.name}_roll_min_{window}'] = series.rolling(window=window, min_periods=1).min()
        result[f'{series.name}_roll_max_{window}'] = series.rolling(window=window, min_periods=1).max()
        result[f'{series.name}_diff_{window}'] = series.diff(window)
    
    return result

print("
=== 7. ROLLING/LAG FEATURES (Manual) ===")
rolling_feats = rolling_features_manual(df['price'], windows=[2, 3])
df_rolling = pd.concat([df[['price']], rolling_feats], axis=1)
print("Rolling features for price:")
print(df_rolling.head())

# Feature count summary
print("
=== FEATURE COUNT SUMMARY ===")
print(f"Original features: {len(df.columns)}")
print(f"After polynomial: {len(df_poly.columns)}")
print(f"After interactions: {len(df_interact.columns)}")
print(f"After domain features: {len(df_domain.columns)}")

Using Libraries ()

import pandas as pd
import numpy as np
from sklearn.preprocessing import PolynomialFeatures, KBinsDiscretizer, FunctionTransformer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import warnings
warnings.filterwarnings('ignore')

# Create sample data
np.random.seed(42)
data = {
    'length': np.random.uniform(5, 25, 100),
    'width': np.random.uniform(3, 12, 100),
    'height': np.random.uniform(2, 8, 100),
    'price': np.random.uniform(50, 500, 100),
    'age': np.random.uniform(20, 60, 100),
    'category': np.random.choice(['A', 'B', 'C'], 100)
}
df = pd.DataFrame(data)

print("Original Dataset:")
print(df.head())
print(f"
Shape: {df.shape}")

# 1. POLYNOMIAL FEATURES (sklearn)
print("
" + "="*60)
print("1. POLYNOMIAL FEATURES (sklearn)")
print("="*60)

# Simple polynomial features
poly = PolynomialFeatures(degree=2, include_bias=False, interaction_only=False)
df_numeric = df[['length', 'width', 'height']]
poly_features = poly.fit_transform(df_numeric)

feature_names = poly.get_feature_names_out(['length', 'width', 'height'])
df_poly = pd.DataFrame(poly_features, columns=feature_names)

print(f"Original numeric features: {df_numeric.shape[1]}")
print(f"Polynomial features (degree 2): {df_poly.shape[1]}")
print(f"
Feature names: {list(feature_names)}")
print("
First 3 rows:")
print(df_poly.head(3))

# Interaction only
poly_interact = PolynomialFeatures(degree=2, include_bias=False, interaction_only=True)
interact_features = poly_interact.fit_transform(df_numeric)
interact_names = poly_interact.get_feature_names_out(['length', 'width', 'height'])
print(f"
Interaction-only features: {len(interact_names)}")
print(f"Names: {list(interact_names)}")

# 2. KBINS DISCRETIZER
print("
" + "="*60)
print("2. KBINS DISCRETIZER")
print("="*60)

# Equal width binning
kbin_uniform = KBinsDiscretizer(n_bins=4, encode='ordinal', strategy='uniform')
age_binned = kbin_uniform.fit_transform(df[['age']])
print("Equal-width binning (uniform):")
print(f"  Bin edges: {kbin_uniform.bin_edges_[0].round(2)}")
print(f"  Sample bins: {age_binned[:10].flatten()}")

# Quantile binning (equal frequency)
kbin_quantile = KBinsDiscretizer(n_bins=4, encode='ordinal', strategy='quantile')
price_binned = kbin_quantile.fit_transform(df[['price']])
print("
Quantile binning (equal frequency):")
print(f"  Bin edges: {kbin_quantile.bin_edges_[0].round(2)}")
print(f"  Bin counts: {pd.Series(price_binned.flatten()).value_counts().sort_index().values}")

# One-hot encoded bins
kbin_onehot = KBinsDiscretizer(n_bins=3, encode='onehot-dense', strategy='kmeans')
price_onehot = kbin_onehot.fit_transform(df[['price']])
print(f"
One-hot encoded bins: {price_onehot.shape[1]} columns")

# 3. FUNCTION TRANSFORMER (Custom transforms)
print("
" + "="*60)
print("3. FUNCTION TRANSFORMER (Custom)")
print("="*60)

# Log transform
log_transformer = FunctionTransformer(
    func=lambda x: np.log1p(x),  # log(1+x) handles zeros
    inverse_func=lambda x: np.expm1(x),
    validate=True
)

price_log = log_transformer.fit_transform(df[['price']])
print(f"Log transform applied to price")
print(f"  Original range: [{df['price'].min():.1f}, {df['price'].max():.1f}]")
print(f"  Log range: [{price_log.min():.2f}, {price_log.max():.2f}]")

# Reciprocal transform
recip_transformer = FunctionTransformer(
    func=lambda x: 1 / (x + 1e-8),
    validate=True
)

# Box-Cox (requires positive values)
from sklearn.preprocessing import PowerTransformer
pt = PowerTransformer(method='box-cox', standardize=True)
price_positive = df[['price']] - df[['price']].min() + 1
price_boxcox = pt.fit_transform(price_positive)
print(f"
Box-Cox transform lambda: {pt.lambdas_[0]:.4f}")

# 4. COLUMN TRANSFORMER (Different transforms per column)
print("
" + "="*60)
print("4. COLUMN TRANSFORMER (Pipeline)")
print("="*60)

# Define different transformations for different columns
preprocessor = ColumnTransformer(
    transformers=[
        ('poly', PolynomialFeatures(degree=2, include_bias=False), ['length', 'width']),
        ('bins', KBinsDiscretizer(n_bins=3, encode='onehot-dense'), ['age']),
        ('log', FunctionTransformer(np.log1p), ['price']),
        ('pass', 'passthrough', ['height'])
    ],
    remainder='drop'
)

processed = preprocessor.fit_transform(df)
print(f"Input shape: {df.shape}")
print(f"Output shape: {processed.shape}")

# Get feature names
poly_features = preprocessor.named_transformers_['poly'].get_feature_names_out(['length', 'width'])
bin_features = [f'age_bin_{i}' for i in range(3)]
all_features = list(poly_features) + bin_features + ['price_log', 'height']
print(f"Output features ({len(all_features)}): {all_features}")

# 5. PIPELINE INTEGRATION
print("
" + "="*60)
print("5. COMPLETE PIPELINE WITH FEATURE ENGINEERING")
print("="*60)

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Create target (synthetic)
df['target'] = (
    df['length'] * df['width'] * df['height'] * 0.5 +
    df['price'] * 0.3 +
    np.random.normal(0, 10, 100)
)

# Split
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Pipeline 1: No feature engineering
pipeline_simple = Pipeline([
    ('model', RandomForestRegressor(n_estimators=50, random_state=42))
])

# Pipeline 2: With polynomial features
pipeline_poly = Pipeline([
    ('poly', PolynomialFeatures(degree=2, include_bias=False)),
    ('model', RandomForestRegressor(n_estimators=50, random_state=42))
])

# Pipeline 3: With custom features
def add_custom_features(X_df):
    X = X_df.copy()
    X['volume'] = X['length'] * X['width'] * X['height']
    X['aspect_ratio'] = X['length'] / (X['width'] + 1e-8)
    X['price_per_volume'] = X['price'] / (X['volume'] + 1e-8)
    return X

custom_transformer = FunctionTransformer(add_custom_features)

pipeline_custom = Pipeline([
    ('features', custom_transformer),
    ('model', RandomForestRegressor(n_estimators=50, random_state=42))
])

# Evaluate
for name, pipeline in [('Simple', pipeline_simple), ('Polynomial', pipeline_poly), ('Custom', pipeline_custom)]:
    pipeline.fit(X_train.select_dtypes(include=[np.number]), y_train)
    pred = pipeline.predict(X_test.select_dtypes(include=[np.number]))
    rmse = np.sqrt(mean_squared_error(y_test, pred))
    print(f"{name}: RMSE = {rmse:.2f}")

# 6. ADVANCED: TARGET ENCODING WITH SMOOTHING
print("
" + "="*60)
print("6. TARGET-AWARE FEATURE ENGINEERING")
print("="*60)

from category_encoders import TargetEncoder

# Target encoding for categorical
te = TargetEncoder(smoothing=1.0)
X_train_encoded = X_train.copy()
X_test_encoded = X_test.copy()

X_train_encoded['category_encoded'] = te.fit_transform(X_train['category'], y_train)
X_test_encoded['category_encoded'] = te.transform(X_test['category'])

print("Category target encoding:")
for cat in X_train['category'].unique():
    mask = X_train['category'] == cat
    mean_target = y_train[mask].mean()
    encoded_val = X_train_encoded.loc[mask, 'category_encoded'].iloc[0]
    print(f"  {cat}: mean_target={mean_target:.2f}, encoded={encoded_val:.2f}")

# 7. FEATURE IMPORTANCE ANALYSIS
print("
" + "="*60)
print("7. FEATURE IMPORTANCE ANALYSIS")
print("="*60)

# Fit model with engineered features
X_engineered = add_custom_features(X.select_dtypes(include=[np.number]))
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_engineered, y)

# Get feature importance
importance = pd.DataFrame({
    'feature': X_engineered.columns,
    'importance': rf.feature_importances_
}).sort_values('importance', ascending=False)

print("Feature importance (top 10):")
print(importance.head(10).to_string(index=False))

# 8. AUTOMATED FEATURE ENGINEERING (Featuretools-style)
print("
" + "="*60)
print("8. AUTOMATED FEATURE GENERATION")
print("="*60)

def auto_generate_features(df, numeric_cols, max_degree=2, include_ratios=True, include_bins=True):
    """
    Automated feature engineering similar to Featuretools/AutoFeat.
    """
    result = df.copy()
    new_features = []
    
    # Polynomial features
    for i, col1 in enumerate(numeric_cols):
        for d in range(2, max_degree + 1):
            result[f'{col1}_pow{d}'] = result[col1] ** d
            new_features.append(f'{col1}_pow{d}')
        
        for col2 in numeric_cols[i+1:]:
            result[f'{col1}_mul_{col2}'] = result[col1] * result[col2]
            result[f'{col1}_div_{col2}'] = result[col1] / (result[col2] + 1e-8)
            new_features.extend([f'{col1}_mul_{col2}', f'{col1}_div_{col2}'])
    
    # Log transforms
    for col in numeric_cols:
        if result[col].min() >= 0:
            result[f'{col}_log'] = np.log1p(result[col])
            new_features.append(f'{col}_log')
    
    # Bins
    if include_bins:
        for col in numeric_cols:
            result[f'{col}_bin'] = pd.qcut(result[col], q=4, labels=['Q1', 'Q2', 'Q3', 'Q4'])
    
    return result, new_features

df_auto, auto_features = auto_generate_features(
    df[['length', 'width', 'price']],
    ['length', 'width', 'price'],
    max_degree=2
)

print(f"Auto-generated {len(auto_features)} new features:")
print(auto_features[:10], "...")
print(f"
Total features: {len(df_auto.columns)}")

# 9. DIMENSIONALITY WARNING
print("
" + "="*60)
print("9. DIMENSIONALITY CONSIDERATIONS")
print("="*60)

original_features = 5
poly_degree_2 = sum(1 for i in range(original_features) for j in range(i, original_features)) + original_features
poly_degree_3 = len(PolynomialFeatures(degree=3, include_bias=False).fit(np.zeros((1, original_features))).get_feature_names_out())

print(f"Original features: {original_features}")
print(f"Polynomial degree 2: {poly_degree_2} features")
print(f"Polynomial degree 3: {poly_degree_3} features")
print("
WARNING: Feature explosion can cause overfitting!")
print("Use feature selection after engineering:")
print("  - Remove highly correlated features")
print("  - Use feature importance from tree models")
print("  - Apply L1 regularization for automatic selection")
print("  - Use PCA for dimensionality reduction")

# 10. BEST PRACTICES SUMMARY
print("
" + "="*60)
print("10. BEST PRACTICES FOR FEATURE ENGINEERING")
print("="*60)

best_practices = {
    'Practice': [
        'Domain Knowledge',
        'Start Simple',
        'Validate Each Feature',
        'Avoid Data Leakage',
        'Handle Outliers',
        'Scale After Engineering',
        'Document Transformations',
        'Cross-Validate'
    ],
    'Description': [
        'Consult experts for meaningful feature creation',
        'Begin with basic interactions before complex transforms',
        'Check correlation with target and feature importance',
        'Never use target mean before train/test split',
        'Winsorize or use robust scaling for extreme values',
        'Apply scaling after polynomial/log transforms',
        'Record all transformations for production',
        'Use CV to assess true impact of new features'
    ]
}
print(pd.DataFrame(best_practices).to_string(index=False))

When to Use

✅ Appropriate Use Cases:

Polynomial features: When non-linear relationships are suspected, for polynomial regression, SVM with poly kernel
Interaction features: When combined effects are expected (e.g., price × quality), for tree model alternatives
Binning: When non-linear threshold effects exist, for handling outliers, creating interpretable ranges
Log transforms: For right-skewed distributions, multiplicative relationships, heteroscedastic data
Domain features: Always use expert knowledge—often the highest value features
Rolling features: For time series, capturing trends and seasonality patterns

❌ Avoid When:

Avoid high-degree polynomials with limited data—causes overfitting and numerical instability
Don't create interactions for every pair—explodes dimensionality and creates multicollinearity
Avoid binning when precise continuous values matter—loss of information
Don't engineer features using test set information—causes data leakage
Avoid complex features when simple ones perform equally—parsimony principle
Don't transform features without understanding the distribution—wrong transforms hurt performance

Common Pitfalls

Data leakage through target-based features—always compute statistics on training data only
Feature explosion causing overfitting—limit polynomial degree, use regularization
Multicollinearity from redundant features—check VIF, remove highly correlated features
Not handling division by zero—always add epsilon (1e-8) to denominators
Forgetting inverse transforms for predictions—needed when target was transformed
Not validating feature importance—some engineered features add noise, not signal

Previous Encoding Categorical Variables: Label, One-Hot, Target, and Ordinal Encoding Next Feature Scaling: Standardization, Normalization, and Robust Scaling