Feature Engineering: Polynomials, Interactions, Binning, and Domain Features
Definition
Feature engineering is the process of transforming raw data into features that better represent the underlying problem to predictive models, resulting in improved model accuracy. It encompasses the creation of new features through mathematical transformations (polynomials, logarithms), combining existing features (interactions, ratios), discretizing continuous variables (binning), and incorporating domain knowledge to create meaningful representations. Effective feature engineering can often provide greater performance gains than algorithm tuning or collecting more data. The process requires understanding the data generation mechanism, the relationships between variables, and the inductive biases of the chosen algorithm. Polynomial features capture non-linear relationships by creating higher-order terms and interactions. Feature interactions model combined effects that individual features cannot express alone. Binning transforms continuous variables into categorical buckets, useful for handling non-linearities and outliers. Domain-specific features leverage expert knowledge to create highly predictive variables that raw data alone cannot reveal. Feature engineering is both an art and a science, requiring creativity, domain expertise, and rigorous validation.
Intuition
Imagine you're trying to predict house prices. Raw data gives you square footage and number of bedrooms. But the price per square foot, the interaction between location and size, or whether the house is 'oversized' for its neighborhood might matter more. Feature engineering is like being a detective looking for hidden clues—a house's age matters, but its 'era' (Victorian, mid-century, modern) might capture style premiums better than raw year. Polynomial features are like asking 'what if the relationship curves instead of being a straight line?' Interactions ask 'does being large matter more in expensive neighborhoods?' Binning groups similar values—ages 20-30 might have similar buying patterns even though 20 and 30 are different numbers. Domain features come from asking an expert realtor what actually sells houses.
Mathematical Formula
Step-by-Step Explanation:
- Polynomial: Generate all combinations of features up to specified degree including cross-terms
- Interaction: Multiply two or more features to model combined effects
- Log Transform: Apply logarithm to compress large values and handle skewed distributions
- Binning: Divide continuous range into equal-width or equal-frequency buckets
- Ratio: Divide one feature by another to create relative measures (add epsilon to avoid division by zero)
Real-World Use Cases
BMI (weight/height²) is an engineered feature more predictive than raw measurements. Age bins (child, adult, elderly) capture different risk profiles. Polynomial terms model non-linear drug response curves. Interaction between smoking and age for cancer risk.
Debt-to-income ratio more predictive than raw debt. Moving averages and volatility (std dev) from price history. Binning credit scores into rating tiers. Polynomial features for non-linear interest rate impacts. Time-based features from transaction timestamps.
Price per unit, discount percentage. Customer recency, frequency, monetary (RFM) features. Seasonality indicators from purchase dates. Interaction between category and season for demand prediction.
Efficiency ratios (output/input). Temperature-pressure interactions for process optimization. Binned vibration levels for maintenance alerts. Rate of change features from sensor time series.
User engagement ratios (clicks/views, time/pages). TF-IDF from text. Session features (duration, depth, bounce). Interaction between device type and feature usage for churn prediction.
Implementation
Manual Implementation (No Libraries)
import numpy as np
import pandas as pd
# Create sample data
np.random.seed(42)
data = {
'length': [10, 15, 8, 20, 12, 18, 9, 22, 14, 11],
'width': [5, 8, 4, 10, 6, 9, 5, 11, 7, 6],
'height': [3, 4, 2, 5, 3, 5, 3, 6, 4, 3],
'price': [100, 200, 80, 300, 150, 250, 90, 350, 180, 120],
'age': [25, 35, 28, 45, 32, 40, 27, 50, 38, 30]
}
df = pd.DataFrame(data)
print("Original Dataset:")
print(df)
# 1. POLYNOMIAL FEATURES (Manual)
def polynomial_features_manual(df, columns, degree=2):
"""
Manual polynomial feature generation.
Creates x, x², x³, ... and interaction terms x_i * x_j
"""
result = df.copy()
# Single feature powers
for col in columns:
for d in range(2, degree + 1):
result[f'{col}_pow{d}'] = result[col] ** d
# Interaction terms (pairwise products)
for i, col1 in enumerate(columns):
for col2 in columns[i+1:]:
result[f'{col1}_x_{col2}'] = result[col1] * result[col2]
return result
print("
=== 1. POLYNOMIAL FEATURES (Manual, degree=2) ===")
df_poly = polynomial_features_manual(df, ['length', 'width'], degree=2)
poly_cols = [c for c in df_poly.columns if c not in df.columns or c in ['length', 'width']]
print(f"New features created: {[c for c in df_poly.columns if c not in df.columns]}")
print("
Polynomial features:")
print(df_poly[[c for c in df_poly.columns if 'length' in c or 'width' in c or 'x_' in c]])
# 2. INTERACTION FEATURES (Manual)
def interaction_features_manual(df, interaction_pairs):
"""
Create interaction features from specified pairs.
Can also do sum, difference, ratio.
"""
result = df.copy()
for col1, col2 in interaction_pairs:
# Product
result[f'{col1}_mul_{col2}'] = result[col1] * result[col2]
# Sum
result[f'{col1}_add_{col2}'] = result[col1] + result[col2]
# Difference
result[f'{col1}_sub_{col2}'] = result[col1] - result[col2]
# Ratio (with epsilon)
result[f'{col1}_div_{col2}'] = result[col1] / (result[col2] + 1e-8)
return result
print("
=== 2. INTERACTION FEATURES (Manual) ===")
df_interact = interaction_features_manual(df, [('length', 'width'), ('price', 'age')])
print(f"New interaction features:")
interact_cols = [c for c in df_interact.columns if any(op in c for op in ['_mul_', '_add_', '_sub_', '_div_'])]
print(df_interact[interact_cols].head())
# 3. MATHEMATICAL TRANSFORMATIONS (Manual)
def mathematical_transforms_manual(series, transforms=['log', 'sqrt', 'reciprocal', 'exp']):
"""
Apply various mathematical transformations.
"""
result = pd.DataFrame(index=series.index)
name = series.name
if 'log' in transforms:
# Log transform (add 1 for zeros)
result[f'{name}_log'] = np.log(series + 1)
if 'sqrt' in transforms:
result[f'{name}_sqrt'] = np.sqrt(series)
if 'reciprocal' in transforms:
result[f'{name}_inv'] = 1 / (series + 1e-8)
if 'exp' in transforms:
# Normalize to avoid overflow
normalized = (series - series.mean()) / series.std()
result[f'{name}_exp'] = np.exp(normalized)
if 'square' in transforms:
result[f'{name}_sq'] = series ** 2
if 'cube' in transforms:
result[f'{name}_cube'] = series ** 3
return result
print("
=== 3. MATHEMATICAL TRANSFORMATIONS (Manual) ===")
df_math = df.copy()
transforms = mathematical_transforms_manual(df['price'], ['log', 'sqrt', 'square'])
df_math = pd.concat([df_math, transforms], axis=1)
print("Price transformations:")
print(df_math[['price', 'price_log', 'price_sqrt', 'price_sq']].head())
# 4. BINNING / DISCRETIZATION (Manual)
def equal_width_binning(series, n_bins=4):
"""
Manual equal-width binning.
"""
min_val, max_val = series.min(), series.max()
bin_width = (max_val - min_val) / n_bins
# Create bins
bins = [min_val + i * bin_width for i in range(n_bins + 1)]
bins[-1] = max_val + 1e-8 # Include max value
# Assign bins
binned = pd.cut(series, bins=bins, include_lowest=True, labels=range(n_bins))
return binned, bins
def equal_freq_binning(series, n_bins=4):
"""
Manual equal-frequency (quantile) binning.
"""
quantiles = np.linspace(0, 1, n_bins + 1)
bin_edges = series.quantile(quantiles).values
bin_edges[0] -= 1e-8 # Ensure min is included
bin_edges[-1] += 1e-8 # Ensure max is included
binned = pd.cut(series, bins=bin_edges, labels=range(n_bins))
return binned, bin_edges
print("
=== 4. BINNING / DISCRETIZATION (Manual) ===")
df_bin = df.copy()
# Equal width
width_binned, width_bins = equal_width_binning(df['age'], n_bins=4)
df_bin['age_bin_width'] = width_binned
print(f"
Equal-width bins for age: {['%.0f' % b for b in width_bins]}")
print("Value to bin mapping:")
print(df_bin[['age', 'age_bin_width']].head(8))
# Equal frequency
freq_binned, freq_bins = equal_freq_binning(df['price'], n_bins=4)
df_bin['price_bin_freq'] = freq_binned
print(f"
Equal-frequency bins for price: {['%.0f' % b for b in freq_bins]}")
print("Bin counts:")
print(df_bin['price_bin_freq'].value_counts().sort_index())
# 5. DOMAIN-SPECIFIC FEATURES (Manual)
def domain_features_manual(df):
"""
Create domain-specific features based on problem knowledge.
Example: Product/box dimensions → volume, surface area, aspect ratios
"""
result = df.copy()
# Volume (for box-like objects)
result['volume'] = result['length'] * result['width'] * result['height']
# Surface area
result['surface_area'] = 2 * (
result['length'] * result['width'] +
result['width'] * result['height'] +
result['length'] * result['height']
)
# Aspect ratios
result['aspect_ratio_lw'] = result['length'] / (result['width'] + 1e-8)
result['aspect_ratio_lh'] = result['length'] / (result['height'] + 1e-8)
# Price per unit volume (density proxy)
result['price_per_volume'] = result['price'] / (result['volume'] + 1e-8)
# Price category (binned)
result['price_tier'] = pd.cut(
result['price'],
bins=[0, 100, 200, 500],
labels=['budget', 'mid', 'premium']
)
# Age groups
result['age_group'] = pd.cut(
result['age'],
bins=[0, 30, 40, 100],
labels=['young', 'middle', 'senior']
)
# Is cubic (all dimensions similar)
max_dim = result[['length', 'width', 'height']].max(axis=1)
min_dim = result[['length', 'width', 'height']].min(axis=1)
result['is_cubic_like'] = (max_dim / (min_dim + 1e-8)) < 1.5
return result
print("
=== 5. DOMAIN-SPECIFIC FEATURES (Manual) ===")
df_domain = domain_features_manual(df)
print("Domain features created:")
domain_cols = ['volume', 'surface_area', 'aspect_ratio_lw', 'price_per_volume', 'price_tier', 'age_group', 'is_cubic_like']
print(df_domain[domain_cols])
# 6. AGGREGATION FEATURES (Manual)
def aggregation_features_manual(df, group_col, agg_cols):
"""
Create aggregation features based on group statistics.
"""
result = df.copy()
for col in agg_cols:
# Group statistics
group_mean = result.groupby(group_col)[col].transform('mean')
group_std = result.groupby(group_col)[col].transform('std')
group_min = result.groupby(group_col)[col].transform('min')
group_max = result.groupby(group_col)[col].transform('max')
group_count = result.groupby(group_col)[col].transform('count')
# Features relative to group
result[f'{col}_group_mean'] = group_mean
result[f'{col}_group_std'] = group_std
result[f'{col}_vs_group_mean'] = result[col] - group_mean
result[f'{col}_group_zscore'] = (result[col] - group_mean) / (group_std + 1e-8)
result[f'{col}_group_pct'] = (result[col] - group_min) / (group_max - group_min + 1e-8)
return result
print("
=== 6. AGGREGATION FEATURES (Manual) ===")
# Add a category for grouping
df['category'] = ['A', 'A', 'B', 'B', 'A', 'B', 'A', 'B', 'A', 'B']
df_agg = aggregation_features_manual(df, 'category', ['price'])
print("Aggregation features:")
agg_cols = [c for c in df_agg.columns if 'group' in c or 'vs_group' in c or '_zscore' in c or '_pct' in c]
print(df_agg[['category', 'price'] + agg_cols].head(8))
# 7. ROLLING/LAG FEATURES (Manual)
def rolling_features_manual(series, windows=[2, 3]):
"""
Create rolling window statistics (for time series).
"""
result = pd.DataFrame(index=series.index)
for window in windows:
result[f'{series.name}_roll_mean_{window}'] = series.rolling(window=window, min_periods=1).mean()
result[f'{series.name}_roll_std_{window}'] = series.rolling(window=window, min_periods=1).std()
result[f'{series.name}_roll_min_{window}'] = series.rolling(window=window, min_periods=1).min()
result[f'{series.name}_roll_max_{window}'] = series.rolling(window=window, min_periods=1).max()
result[f'{series.name}_diff_{window}'] = series.diff(window)
return result
print("
=== 7. ROLLING/LAG FEATURES (Manual) ===")
rolling_feats = rolling_features_manual(df['price'], windows=[2, 3])
df_rolling = pd.concat([df[['price']], rolling_feats], axis=1)
print("Rolling features for price:")
print(df_rolling.head())
# Feature count summary
print("
=== FEATURE COUNT SUMMARY ===")
print(f"Original features: {len(df.columns)}")
print(f"After polynomial: {len(df_poly.columns)}")
print(f"After interactions: {len(df_interact.columns)}")
print(f"After domain features: {len(df_domain.columns)}")
Using Libraries ()
import pandas as pd
import numpy as np
from sklearn.preprocessing import PolynomialFeatures, KBinsDiscretizer, FunctionTransformer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import warnings
warnings.filterwarnings('ignore')
# Create sample data
np.random.seed(42)
data = {
'length': np.random.uniform(5, 25, 100),
'width': np.random.uniform(3, 12, 100),
'height': np.random.uniform(2, 8, 100),
'price': np.random.uniform(50, 500, 100),
'age': np.random.uniform(20, 60, 100),
'category': np.random.choice(['A', 'B', 'C'], 100)
}
df = pd.DataFrame(data)
print("Original Dataset:")
print(df.head())
print(f"
Shape: {df.shape}")
# 1. POLYNOMIAL FEATURES (sklearn)
print("
" + "="*60)
print("1. POLYNOMIAL FEATURES (sklearn)")
print("="*60)
# Simple polynomial features
poly = PolynomialFeatures(degree=2, include_bias=False, interaction_only=False)
df_numeric = df[['length', 'width', 'height']]
poly_features = poly.fit_transform(df_numeric)
feature_names = poly.get_feature_names_out(['length', 'width', 'height'])
df_poly = pd.DataFrame(poly_features, columns=feature_names)
print(f"Original numeric features: {df_numeric.shape[1]}")
print(f"Polynomial features (degree 2): {df_poly.shape[1]}")
print(f"
Feature names: {list(feature_names)}")
print("
First 3 rows:")
print(df_poly.head(3))
# Interaction only
poly_interact = PolynomialFeatures(degree=2, include_bias=False, interaction_only=True)
interact_features = poly_interact.fit_transform(df_numeric)
interact_names = poly_interact.get_feature_names_out(['length', 'width', 'height'])
print(f"
Interaction-only features: {len(interact_names)}")
print(f"Names: {list(interact_names)}")
# 2. KBINS DISCRETIZER
print("
" + "="*60)
print("2. KBINS DISCRETIZER")
print("="*60)
# Equal width binning
kbin_uniform = KBinsDiscretizer(n_bins=4, encode='ordinal', strategy='uniform')
age_binned = kbin_uniform.fit_transform(df[['age']])
print("Equal-width binning (uniform):")
print(f" Bin edges: {kbin_uniform.bin_edges_[0].round(2)}")
print(f" Sample bins: {age_binned[:10].flatten()}")
# Quantile binning (equal frequency)
kbin_quantile = KBinsDiscretizer(n_bins=4, encode='ordinal', strategy='quantile')
price_binned = kbin_quantile.fit_transform(df[['price']])
print("
Quantile binning (equal frequency):")
print(f" Bin edges: {kbin_quantile.bin_edges_[0].round(2)}")
print(f" Bin counts: {pd.Series(price_binned.flatten()).value_counts().sort_index().values}")
# One-hot encoded bins
kbin_onehot = KBinsDiscretizer(n_bins=3, encode='onehot-dense', strategy='kmeans')
price_onehot = kbin_onehot.fit_transform(df[['price']])
print(f"
One-hot encoded bins: {price_onehot.shape[1]} columns")
# 3. FUNCTION TRANSFORMER (Custom transforms)
print("
" + "="*60)
print("3. FUNCTION TRANSFORMER (Custom)")
print("="*60)
# Log transform
log_transformer = FunctionTransformer(
func=lambda x: np.log1p(x), # log(1+x) handles zeros
inverse_func=lambda x: np.expm1(x),
validate=True
)
price_log = log_transformer.fit_transform(df[['price']])
print(f"Log transform applied to price")
print(f" Original range: [{df['price'].min():.1f}, {df['price'].max():.1f}]")
print(f" Log range: [{price_log.min():.2f}, {price_log.max():.2f}]")
# Reciprocal transform
recip_transformer = FunctionTransformer(
func=lambda x: 1 / (x + 1e-8),
validate=True
)
# Box-Cox (requires positive values)
from sklearn.preprocessing import PowerTransformer
pt = PowerTransformer(method='box-cox', standardize=True)
price_positive = df[['price']] - df[['price']].min() + 1
price_boxcox = pt.fit_transform(price_positive)
print(f"
Box-Cox transform lambda: {pt.lambdas_[0]:.4f}")
# 4. COLUMN TRANSFORMER (Different transforms per column)
print("
" + "="*60)
print("4. COLUMN TRANSFORMER (Pipeline)")
print("="*60)
# Define different transformations for different columns
preprocessor = ColumnTransformer(
transformers=[
('poly', PolynomialFeatures(degree=2, include_bias=False), ['length', 'width']),
('bins', KBinsDiscretizer(n_bins=3, encode='onehot-dense'), ['age']),
('log', FunctionTransformer(np.log1p), ['price']),
('pass', 'passthrough', ['height'])
],
remainder='drop'
)
processed = preprocessor.fit_transform(df)
print(f"Input shape: {df.shape}")
print(f"Output shape: {processed.shape}")
# Get feature names
poly_features = preprocessor.named_transformers_['poly'].get_feature_names_out(['length', 'width'])
bin_features = [f'age_bin_{i}' for i in range(3)]
all_features = list(poly_features) + bin_features + ['price_log', 'height']
print(f"Output features ({len(all_features)}): {all_features}")
# 5. PIPELINE INTEGRATION
print("
" + "="*60)
print("5. COMPLETE PIPELINE WITH FEATURE ENGINEERING")
print("="*60)
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Create target (synthetic)
df['target'] = (
df['length'] * df['width'] * df['height'] * 0.5 +
df['price'] * 0.3 +
np.random.normal(0, 10, 100)
)
# Split
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Pipeline 1: No feature engineering
pipeline_simple = Pipeline([
('model', RandomForestRegressor(n_estimators=50, random_state=42))
])
# Pipeline 2: With polynomial features
pipeline_poly = Pipeline([
('poly', PolynomialFeatures(degree=2, include_bias=False)),
('model', RandomForestRegressor(n_estimators=50, random_state=42))
])
# Pipeline 3: With custom features
def add_custom_features(X_df):
X = X_df.copy()
X['volume'] = X['length'] * X['width'] * X['height']
X['aspect_ratio'] = X['length'] / (X['width'] + 1e-8)
X['price_per_volume'] = X['price'] / (X['volume'] + 1e-8)
return X
custom_transformer = FunctionTransformer(add_custom_features)
pipeline_custom = Pipeline([
('features', custom_transformer),
('model', RandomForestRegressor(n_estimators=50, random_state=42))
])
# Evaluate
for name, pipeline in [('Simple', pipeline_simple), ('Polynomial', pipeline_poly), ('Custom', pipeline_custom)]:
pipeline.fit(X_train.select_dtypes(include=[np.number]), y_train)
pred = pipeline.predict(X_test.select_dtypes(include=[np.number]))
rmse = np.sqrt(mean_squared_error(y_test, pred))
print(f"{name}: RMSE = {rmse:.2f}")
# 6. ADVANCED: TARGET ENCODING WITH SMOOTHING
print("
" + "="*60)
print("6. TARGET-AWARE FEATURE ENGINEERING")
print("="*60)
from category_encoders import TargetEncoder
# Target encoding for categorical
te = TargetEncoder(smoothing=1.0)
X_train_encoded = X_train.copy()
X_test_encoded = X_test.copy()
X_train_encoded['category_encoded'] = te.fit_transform(X_train['category'], y_train)
X_test_encoded['category_encoded'] = te.transform(X_test['category'])
print("Category target encoding:")
for cat in X_train['category'].unique():
mask = X_train['category'] == cat
mean_target = y_train[mask].mean()
encoded_val = X_train_encoded.loc[mask, 'category_encoded'].iloc[0]
print(f" {cat}: mean_target={mean_target:.2f}, encoded={encoded_val:.2f}")
# 7. FEATURE IMPORTANCE ANALYSIS
print("
" + "="*60)
print("7. FEATURE IMPORTANCE ANALYSIS")
print("="*60)
# Fit model with engineered features
X_engineered = add_custom_features(X.select_dtypes(include=[np.number]))
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_engineered, y)
# Get feature importance
importance = pd.DataFrame({
'feature': X_engineered.columns,
'importance': rf.feature_importances_
}).sort_values('importance', ascending=False)
print("Feature importance (top 10):")
print(importance.head(10).to_string(index=False))
# 8. AUTOMATED FEATURE ENGINEERING (Featuretools-style)
print("
" + "="*60)
print("8. AUTOMATED FEATURE GENERATION")
print("="*60)
def auto_generate_features(df, numeric_cols, max_degree=2, include_ratios=True, include_bins=True):
"""
Automated feature engineering similar to Featuretools/AutoFeat.
"""
result = df.copy()
new_features = []
# Polynomial features
for i, col1 in enumerate(numeric_cols):
for d in range(2, max_degree + 1):
result[f'{col1}_pow{d}'] = result[col1] ** d
new_features.append(f'{col1}_pow{d}')
for col2 in numeric_cols[i+1:]:
result[f'{col1}_mul_{col2}'] = result[col1] * result[col2]
result[f'{col1}_div_{col2}'] = result[col1] / (result[col2] + 1e-8)
new_features.extend([f'{col1}_mul_{col2}', f'{col1}_div_{col2}'])
# Log transforms
for col in numeric_cols:
if result[col].min() >= 0:
result[f'{col}_log'] = np.log1p(result[col])
new_features.append(f'{col}_log')
# Bins
if include_bins:
for col in numeric_cols:
result[f'{col}_bin'] = pd.qcut(result[col], q=4, labels=['Q1', 'Q2', 'Q3', 'Q4'])
return result, new_features
df_auto, auto_features = auto_generate_features(
df[['length', 'width', 'price']],
['length', 'width', 'price'],
max_degree=2
)
print(f"Auto-generated {len(auto_features)} new features:")
print(auto_features[:10], "...")
print(f"
Total features: {len(df_auto.columns)}")
# 9. DIMENSIONALITY WARNING
print("
" + "="*60)
print("9. DIMENSIONALITY CONSIDERATIONS")
print("="*60)
original_features = 5
poly_degree_2 = sum(1 for i in range(original_features) for j in range(i, original_features)) + original_features
poly_degree_3 = len(PolynomialFeatures(degree=3, include_bias=False).fit(np.zeros((1, original_features))).get_feature_names_out())
print(f"Original features: {original_features}")
print(f"Polynomial degree 2: {poly_degree_2} features")
print(f"Polynomial degree 3: {poly_degree_3} features")
print("
WARNING: Feature explosion can cause overfitting!")
print("Use feature selection after engineering:")
print(" - Remove highly correlated features")
print(" - Use feature importance from tree models")
print(" - Apply L1 regularization for automatic selection")
print(" - Use PCA for dimensionality reduction")
# 10. BEST PRACTICES SUMMARY
print("
" + "="*60)
print("10. BEST PRACTICES FOR FEATURE ENGINEERING")
print("="*60)
best_practices = {
'Practice': [
'Domain Knowledge',
'Start Simple',
'Validate Each Feature',
'Avoid Data Leakage',
'Handle Outliers',
'Scale After Engineering',
'Document Transformations',
'Cross-Validate'
],
'Description': [
'Consult experts for meaningful feature creation',
'Begin with basic interactions before complex transforms',
'Check correlation with target and feature importance',
'Never use target mean before train/test split',
'Winsorize or use robust scaling for extreme values',
'Apply scaling after polynomial/log transforms',
'Record all transformations for production',
'Use CV to assess true impact of new features'
]
}
print(pd.DataFrame(best_practices).to_string(index=False))
When to Use
✅ Appropriate Use Cases:
- Polynomial features: When non-linear relationships are suspected, for polynomial regression, SVM with poly kernel
- Interaction features: When combined effects are expected (e.g., price × quality), for tree model alternatives
- Binning: When non-linear threshold effects exist, for handling outliers, creating interpretable ranges
- Log transforms: For right-skewed distributions, multiplicative relationships, heteroscedastic data
- Domain features: Always use expert knowledge—often the highest value features
- Rolling features: For time series, capturing trends and seasonality patterns
❌ Avoid When:
- Avoid high-degree polynomials with limited data—causes overfitting and numerical instability
- Don't create interactions for every pair—explodes dimensionality and creates multicollinearity
- Avoid binning when precise continuous values matter—loss of information
- Don't engineer features using test set information—causes data leakage
- Avoid complex features when simple ones perform equally—parsimony principle
- Don't transform features without understanding the distribution—wrong transforms hurt performance
Common Pitfalls
- Data leakage through target-based features—always compute statistics on training data only
- Feature explosion causing overfitting—limit polynomial degree, use regularization
- Multicollinearity from redundant features—check VIF, remove highly correlated features
- Not handling division by zero—always add epsilon (1e-8) to denominators
- Forgetting inverse transforms for predictions—needed when target was transformed
- Not validating feature importance—some engineered features add noise, not signal