Preprocessing

Master data cleaning, encoding, scaling, feature engineering, and handling imbalanced data.

9 Topics

Datetime Feature Engineering: Extraction, Encoding, and Cyclical Features

Intermediate

Datetime feature engineering transforms temporal data into numerical representations that machine learning models can process. Raw timestamps contain rich…

1 prereq 3 related ~13 min read

Encoding Categorical Variables: Label, One-Hot, Target, and Ordinal Encoding

Beginner

Categorical variable encoding is the process of converting qualitative data (categories, labels, or discrete values) into numerical representations that…

3 related ~11 min read

Feature Engineering: Polynomials, Interactions, Binning, and Domain Features

Advanced

Feature engineering is the process of transforming raw data into features that better represent the underlying problem to predictive models, resulting in…

2 prereqs 3 related ~13 min read

Feature Scaling: Standardization, Normalization, and Robust Scaling

Beginner

Feature scaling is the process of transforming numeric features to a common scale without distorting differences in the ranges of values. Machine learning…

1 prereq 3 related ~11 min read

Imbalanced Data: SMOTE, ADASYN, and Class Weights

Advanced

Class imbalance occurs when the distribution of target classes in a classification dataset is significantly skewed, with some classes having many more samples…

2 prereqs 3 related ~13 min read

Imputation Strategies: From Simple to Advanced Techniques

Intermediate

Imputation is the process of replacing missing data with substituted values. Unlike deletion methods that discard incomplete observations, imputation preserves…

1 prereq 3 related ~10 min read

Missing Values: Detection, Patterns, and Handling Strategies

Beginner

Missing values are data points that are absent, unknown, or unrecorded in a dataset. They appear as NULL, NaN (Not a Number), empty strings, or special codes…

3 related ~7 min read

Outlier Handling: Detection Methods and Treatment Strategies

Intermediate

Outliers are data points that deviate significantly from other observations in a dataset. They can arise from measurement errors, data entry mistakes, natural…

2 prereqs 3 related ~12 min read

Text Preprocessing: Cleaning, Tokenization, and Normalization

Intermediate

Text preprocessing transforms unstructured natural language into structured, machine-readable formats suitable for analysis and modeling. Raw text contains…

1 prereq 3 related ~14 min read