[Verse 1] First we clean the messy text that comes our way Tokenize to split the words, that's how we play Strip the punctuation marks and make it neat Stemming cuts the endings off, lemmatization's sweet Stop words like "the" and "and" just get in our way Filter them out to let the meaning stay [Chorus] From tokens to vectors, that's the NLP way Preprocess and represent what the data wants to say Classical foundations before the neural age TF-IDF and Word2Vec set the stage Bag of words and n-grams pave the road To understanding how language gets decoded [Verse 2] Bag of Words just counts them up, no order here Term frequency shows what matters crystal clear Inverse document frequency weighs the rare TF-IDF combines them with mathematical care N-grams capture sequences, two or three in line Context starts emerging from this simple design [Chorus] From tokens to vectors, that's the NLP way Preprocess and represent what the data wants to say Classical foundations before the neural age TF-IDF and Word2Vec set the stage Bag of words and n-grams pave the road To understanding how language gets decoded [Verse 3] Word2Vec learns embeddings in dimensional space Similar words cluster close in the same place GloVe takes global statistics, FastText handles parts Each algorithm capturing semantic arts Named entities get tagged, parts of speech assigned Dependency parsing shows how words are intertwined [Bridge] Classical methods hit the wall Sparse vectors don't scale at all But they teach us what we need to know Before the neural networks grow BLEU and ROUGE measure what we've done Perplexity shows when models come undone [Chorus] From tokens to vectors, that's the NLP way Preprocess and represent what the data wants to say Classical foundations before the neural age TF-IDF and Word2Vec set the stage Bag of words and n-grams pave the road To understanding how language gets decoded [Outro] Pipeline built from start to end Classical NLP, our faithful friend Sets the stage for what's ahead Neural networks soon instead
โ Unit 3.4 โ Generative Models | Unit 4.2 โ The Transformer Architecture โ