Search tools and guides ⌘K

I · Information theory

KL divergence

What it is

D_KL(P‖Q) = Σ p(x) log(p(x)/q(x)). The "extra bits" cost of using Q to encode P.

Where it lives

Cross-entropy loss in ML, variational inference, t-SNE.

The key insight

Asymmetric: D_KL(P‖Q) ≠ D_KL(Q‖P). Zero only when P = Q.

More in Information theory

Shannon entropy H(X) = −Σ p(x) log₂ p(x). The expected number of bits needed to encode… Mutual information I(X;Y) = H(X) − H(X∣Y). How much knowing Y reduces uncertainty about X… Channel capacity (Shannon-Hartley) C = B · log₂(1 + S/N). Bits per second for a noisy channel of bandwidt…

Across the foundations

II Bayes theorem Probability & randomness III Shortest paths Graph theory III Min-cut / Max-flow Graph theory III Spanning trees Graph theory III Connectivity & cycles Graph theory IV Little's law Queueing theory IV M/M/1 queue Queueing theory IV Universal scalability law Queueing theory

← All foundations