VIII · Statistics & inference

Power & sample size

What it is

Power = 1 − β = probability of correctly rejecting the null when the alternative is true. Detecting a small effect requires a large sample.

Where it lives

How long does an A/B test need to run? How many requests does a load test need to bound p99?

The key insight

For a 1% effect at α = 0.05, power = 0.8 on a 10% baseline, you need ~16,000 samples per arm. Most A/B tests are stopped too early.