Deep Learning
Neural networks, CNNs, RNNs, transformers, and modern architectures like RoPE and SwiGLU.
Backpropagation: The Chain Rule in Neural Networks
AdvancedBackpropagation is the cornerstone algorithm for training neural networks, efficiently computing gradients of the loss function with respect to all weights in…
CNN Design Patterns: From VGG to ResNet
AdvancedCNN design patterns are architectural principles and proven strategies for constructing effective convolutional neural networks. These patterns have evolved…
Chain-of-Thought: Reasoning in Large Language Models
AdvancedChain-of-Thought (CoT) prompting is a technique that enables Large Language Models (LLMs) to solve complex reasoning tasks by generating intermediate reasoning…
Convolutional Neural Networks: Architecture and Components
IntermediateConvolutional Neural Networks (CNNs) are specialized deep learning architectures designed to automatically learn spatial hierarchies of features from grid-like…
HiRoPE: Hierarchical Rotary Position Embedding
AdvancedHierarchical Rotary Position Embedding (HiRoPE) is an extension of RoPE designed specifically for modeling extremely long sequences beyond the capabilities of…
Neural Networks: From Perceptron to Deep Networks
IntermediateNeural networks are computational models inspired by biological neural systems, consisting of interconnected nodes (neurons) organized in layers. The…
RMSNorm: Root Mean Square Layer Normalization
AdvancedRoot Mean Square Layer Normalization (RMSNorm), introduced by Zhang and Sennrich in 2019, is a simplification of Layer Normalization that removes the…
RoPE: Rotary Position Embedding
AdvancedRotary Position Embedding (RoPE), introduced by Su et al. in 2021, is a novel position encoding method for Transformers that encodes absolute position through…
Sequential Modeling: RNN, LSTM, and GRU
AdvancedRecurrent Neural Networks (RNNs) are neural architectures designed to process sequential data by maintaining hidden state that captures information from…
SwiGLU: Swish-Gated Linear Unit
AdvancedSwiGLU (Swish-Gated Linear Unit) is a modern activation function introduced by Shazeer in the GLU Variants paper (2020) and popularized by the PaLM and Llama…
Training Stability in Deep Neural Networks
AdvancedTraining stability in deep neural networks refers to the ability to train models without numerical instabilities, divergence, or degradation as depth and scale…
Transformer Architecture: Attention Is All You Need
AdvancedThe Transformer architecture, introduced by Vaswani et al. in the seminal 2017 paper 'Attention Is All You Need', revolutionized deep learning by replacing…