Machine Learning - ML Guide

Clustering: K-means, Hierarchical, and DBSCAN Comparison

Intermediate

Clustering is unsupervised learning's fundamental task: grouping data points into clusters where members are more similar to each other than to those in other…

2 prereqs 3 related ~8 min read

Decision Trees: Splitting Criteria, Pruning, and Interpretability

Beginner

Decision Trees are non-parametric supervised learning algorithms that model decisions through a hierarchical structure of internal nodes (feature tests),…

1 prereq 3 related ~6 min read

Hyperparameter Tuning: Grid Search, Random Search, and Bayesian Optimization

Advanced

Hyperparameter tuning is the process of optimizing algorithm configuration parameters that control model learning but are not learned from data. Unlike model…

3 prereqs 3 related ~9 min read

K-Nearest Neighbors: Distance Metrics, Choosing K, and Curse of Dimensionality

Beginner

K-Nearest Neighbors (KNN) is a non-parametric, instance-based learning algorithm that classifies or predicts based on the k closest training examples in…

1 prereq 3 related ~7 min read

Linear Regression: Theory, Assumptions, and Diagnostics

Beginner

Linear Regression is the foundational algorithm of statistical modeling and machine learning, establishing a linear relationship between a dependent variable…

1 prereq 3 related ~5 min read

Logistic Regression: Classification, Sigmoid, and Odds Ratios

Beginner

Logistic Regression is the foundational classification algorithm in machine learning, despite its name suggesting regression. It models the probability that a…

2 prereqs 3 related ~6 min read

Machine Learning Overview: From Supervised to Unsupervised Learning

Beginner

Machine Learning (ML) is a subset of Artificial Intelligence that enables systems to learn patterns from data and improve performance on specific tasks through…

4 related ~4 min read

Model Evaluation: Metrics, Cross-Validation, and ROC-AUC

Intermediate

Model evaluation is the cornerstone of machine learning practice, providing rigorous methods to assess how well models generalize to unseen data. The…

2 prereqs 3 related ~8 min read

Naive Bayes: Bayes Theorem, Conditional Independence, and Text Classification

Beginner

Naive Bayes is a family of probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between features. Despite…

1 prereq 3 related ~7 min read

Principal Component Analysis: Dimensionality Reduction and Variance Explained

Intermediate

Principal Component Analysis (PCA) is a fundamental unsupervised learning technique for dimensionality reduction that transforms high-dimensional data into a…

2 prereqs 3 related ~6 min read

Random Forest: Ensembling, Bagging, and Feature Importance

Intermediate

Random Forest is an ensemble learning method that constructs multiple decision trees during training and outputs the mode of their predictions (classification)…

2 prereqs 3 related ~6 min read

Support Vector Machines: Margin Maximization and Kernel Methods

Intermediate

Support Vector Machines (SVM) are supervised learning models that find the optimal hyperplane to separate classes by maximizing the margin - the distance…

2 prereqs 3 related ~6 min read

XGBoost: Gradient Boosting, Parameters, and When to Use

Advanced

XGBoost (eXtreme Gradient Boosting) is an optimized, distributed gradient boosting library that has dominated structured data machine learning competitions…

3 prereqs 3 related ~8 min read