[Verse 1] Sequential whispers through the neural maze Vanilla networks stumble, memories fade Gradients vanish like morning mist Each timestep loses what came before this Simple recurrence can't hold the thread When context stretches too far ahead [Chorus] Gates and memories, LSTM's key Long Short-Term Memory sets sequences free Forget gate filters, input gate decides Output gate controls what memory provides GRU streamlines with update and reset Sequential patterns we'll never forget [Verse 2] Hidden states carry forward what they've learned But vanilla RNNs get their signals burned Sigmoid squashing shrinks the gradient small Backprop through time hits a concrete wall Cell states rescue long dependencies Gated architecture holds the keys [Chorus] Gates and memories, LSTM's key Long Short-Term Memory sets sequences free Forget gate filters, input gate decides Output gate controls what memory provides GRU streamlines with update and reset Sequential patterns we'll never forget [Bridge] Encoder-decoder bridges the gap Sequence to sequence, no context trap Bahdanau aligns with additive score Luong attention gives us so much more When context matters across the span Attention mechanisms understand [Verse 3] Mamba emerges with structured state Linear scaling at a faster rate Transformers parallel but memory-bound Sequential models still hold their ground Time series, speech, and language flows Choose your weapon as the sequence grows [Outro] From RNN's simple recursive call To attention spanning sequences all Model the temporal, capture the trend Sequential learning has no end
β Unit 3.2 β Convolutional Neural Networks (CNNs) | Unit 3.4 β Generative Models β