Exercises

sitar delta blues, dirty south balkan brass band, ambient dub techno, liquid drum and bass bluegrass · 4:31

Listen on 93

Lyrics

[Verse 1]
Two neural architectures on my bench tonight
Same dataset flowing through their hidden layers
Linear probe connecting what they learned inside
Testing if one network predicts the other's flavors
Architecture alpha builds its feature space
While beta constructs representations differently
Can I map between them with a linear trace?
Or do they encode knowledge independently?

[Chorus]
Linear probing, CKA measuring
Representational similarity
Networks learning, patterns turning
But can they speak the same language?
Alignment searching, kernels working
Centered correlation's the key
Some representations can't be bridged
That's the mathematics we need to see

[Verse 2]
Centered Kernel Alignment takes the stage
Computing similarities across the divide
Normalize the features, center every page
Then calculate how representations coincide
Gram matrices dancing in the kernel space
Frobenius inner products tell the tale
High CKA means networks share their grace
Low scores reveal where alignment starts to fail

[Chorus]
Linear probing, CKA measuring
Representational similarity
Networks learning, patterns turning
But can they speak the same language?
Alignment searching, kernels working
Centered correlation's the key
Some representations can't be bridged
That's the mathematics we need to see

[Bridge]
Now construct two networks that solve the same task
But their representations cannot be aligned
What conditions make this paradox last?
Different dimensional manifolds intertwined
Nonlinear transformations twist the space
Orthogonal subspaces living separate lives
When linear maps cannot find their place
That's when unalignment truly thrives

[Verse 3]
Provably unalignable, what must be true?
The feature spaces live in different worlds
Maybe one network learned a rotated view
While the other's representations are unfurled
Task performance identical on the surface
But internal geometry tells another story
Linear probes will fail to bridge the purpose
Different paths can lead to the same glory

[Verse 4]
Random initialization sets the course
Stochastic gradients carve their unique paths
Each network follows its own driving force
Creating representations that never match
Weight decay and dropout add their noise
Different architectures bend the solution space
Every hyperparameter makes its choice
Leading networks to their separate place

[Chorus]
Linear probing, CKA measuring
Representational similarity
Networks learning, patterns turning
But can they speak the same language?
Alignment searching, kernels working
Centered correlation's the key
Some representations can't be bridged
That's the mathematics we need to see

[Outro]
Train your networks, test the bridges
Mathematics reveals the hidden truth
Some connections live on distant ridges
That's computational learning proof

← Readings | Connection to neuroscience →