I · Information theory

KL divergence

What it is

D_KL(P‖Q) = Σ p(x) log(p(x)/q(x)). The "extra bits" cost of using Q to encode P.

Where it lives

Cross-entropy loss in ML, variational inference, t-SNE.

The key insight

Asymmetric: D_KL(P‖Q) ≠ D_KL(Q‖P). Zero only when P = Q.