[Verse 1] Sarah's app has two new buttons, red and blue Half her users click on red, the other half click through Blue button gets more purchases, but wait before you choose Sample size is everything, don't jump to quick conclusions [Chorus] A equals this, B equals that Split your traffic, watch the stats P-value under zero-five Significance comes alive Random groups, equal chance Statistical confidence dance Test one change, keep the rest A-B testing at its best [Verse 2] Control group sees the old design, treatment gets the new Random assignment is the key, no bias bleeding through Calculate your sample size before you start the race Power analysis tells you how many users you'll need to face [Chorus] A equals this, B equals that Split your traffic, watch the stats P-value under zero-five Significance comes alive Random groups, equal chance Statistical confidence dance Test one change, keep the rest A-B testing at its best [Bridge] Conversion rates and confidence intervals False positives lurk in residuals Chi-squared tests and t-test calculations Guard against hasty declarations [Verse 3] Week one shows blue is winning, but don't celebrate too soon External factors might be skewing data out of tune Let it run until you reach statistical power's call Ninety-five percent confident means you can trust it all [Chorus] A equals this, B equals that Split your traffic, watch the stats P-value under zero-five Significance comes alive Random groups, equal chance Statistical confidence dance Test one change, keep the rest A-B testing at its best [Outro] Hypothesis before the test Control variables, measure best When p-value drops below Point-oh-five, then you'll know
← Avoiding Vanity Metrics: Focus on What Actually Matters | Guardrail Metrics: Protecting What Matters During Experiments →