I · Information theory
KL divergence
What it is
D_KL(P‖Q) = Σ p(x) log(p(x)/q(x)). The "extra bits" cost of using Q to encode P.
Where it lives
Cross-entropy loss in ML, variational inference, t-SNE.
The key insight
Asymmetric: D_KL(P‖Q) ≠ D_KL(Q‖P). Zero only when P = Q.