VIII · Statistics & inference
Power & sample size
What it is
Power = 1 − β = probability of correctly rejecting the null when the alternative is true. Detecting a small effect requires a large sample.
Where it lives
How long does an A/B test need to run? How many requests does a load test need to bound p99?
The key insight
For a 1% effect at α = 0.05, power = 0.8 on a 10% baseline, you need ~16,000 samples per arm. Most A/B tests are stopped too early.