Deep Learning - ML Guide

Backpropagation: The Chain Rule in Neural Networks

Advanced

Backpropagation is the cornerstone algorithm for training neural networks, efficiently computing gradients of the loss function with respect to all weights in…

3 prereqs 3 related ~6 min read

CNN Design Patterns: From VGG to ResNet

Advanced

CNN design patterns are architectural principles and proven strategies for constructing effective convolutional neural networks. These patterns have evolved…

3 prereqs 3 related ~7 min read

Chain-of-Thought: Reasoning in Large Language Models

Advanced

Chain-of-Thought (CoT) prompting is a technique that enables Large Language Models (LLMs) to solve complex reasoning tasks by generating intermediate reasoning…

3 prereqs 3 related ~11 min read

Convolutional Neural Networks: Architecture and Components

Intermediate

Convolutional Neural Networks (CNNs) are specialized deep learning architectures designed to automatically learn spatial hierarchies of features from grid-like…

3 prereqs 3 related ~7 min read

HiRoPE: Hierarchical Rotary Position Embedding

Advanced

Hierarchical Rotary Position Embedding (HiRoPE) is an extension of RoPE designed specifically for modeling extremely long sequences beyond the capabilities of…

3 prereqs 3 related ~10 min read

Neural Networks: From Perceptron to Deep Networks

Intermediate

Neural networks are computational models inspired by biological neural systems, consisting of interconnected nodes (neurons) organized in layers. The…

3 prereqs 3 related ~5 min read

RMSNorm: Root Mean Square Layer Normalization

Advanced

Root Mean Square Layer Normalization (RMSNorm), introduced by Zhang and Sennrich in 2019, is a simplification of Layer Normalization that removes the…

3 prereqs 3 related ~8 min read

RoPE: Rotary Position Embedding

Advanced

Rotary Position Embedding (RoPE), introduced by Su et al. in 2021, is a novel position encoding method for Transformers that encodes absolute position through…

3 prereqs 3 related ~9 min read

Sequential Modeling: RNN, LSTM, and GRU

Advanced

Recurrent Neural Networks (RNNs) are neural architectures designed to process sequential data by maintaining hidden state that captures information from…

3 prereqs 3 related ~9 min read

SwiGLU: Swish-Gated Linear Unit

Advanced

SwiGLU (Swish-Gated Linear Unit) is a modern activation function introduced by Shazeer in the GLU Variants paper (2020) and popularized by the PaLM and Llama…

3 prereqs 3 related ~11 min read

Training Stability in Deep Neural Networks

Advanced

Training stability in deep neural networks refers to the ability to train models without numerical instabilities, divergence, or degradation as depth and scale…

4 prereqs 4 related ~10 min read

Transformer Architecture: Attention Is All You Need

Advanced

The Transformer architecture, introduced by Vaswani et al. in the seminal 2017 paper 'Attention Is All You Need', revolutionized deep learning by replacing…

3 prereqs 4 related ~8 min read