[Verse 1]
In the cortex where neurons dance and weave
Query vectors search through memory's maze
Keys unlock the patterns we perceive
Values hold the knowledge in their gaze
Three matrices transform the signal flow
Attention maps what matters most below
[Chorus]
Query finds the key that fits just right
Values weighted by their relevance
Soft attention spreads across the night
Hard attention makes one choice intense
Set to set transformations rearrange
Self-attention orchestrates the change
[Verse 2]
Soft attention blurs the boundaries wide
Every element gets a gentle vote
Probabilities summing side by side
Like a choir where every voice can float
Hard attention cuts through with a blade
Winner takes all in the choice parade
[Verse 3]
Self-attention transforms every row
Input set becomes output through design
No external guidance needs to show
Internal structure learns to realign
Each position talks to every other
Contextual meaning they discover
[Chorus]
Query finds the key that fits just right
Values weighted by their relevance
Soft attention spreads across the night
Hard attention makes one choice intense
Set to set transformations rearrange
Self-attention orchestrates the change
[Bridge]
Kernel methods hide beneath the hood
Distance functions wrapped in fancy dress
Similarity in the neighborhood
Gaussian weights express what we assess
Dictionary lookup differentiable
Gradient flows make learning feasible
[Verse 4]
Keys are addresses in memory space
Queries search through organized terrain
Values stored with computational grace
Backpropagation flows through every vein
Attention weights learn what to emphasize
Neural architecture that never lies
[Verse 5]
Multi-headed mechanisms divide and conquer all
Different subspaces for different tasks
Parallel processing answers the call
Representation learning never asks
Concatenate and project back to one
Diverse perspectives merge when it's done
[Chorus]
Query finds the key that fits just right
Values weighted by their relevance
Soft attention spreads across the night
Hard attention makes one choice intense
Set to set transformations rearrange
Self-attention orchestrates the change
[Outro]
Cortical columns compute in parallel streams
Attention mechanisms fulfill our dreams