I · Information theory

Shannon entropy

What it is

H(X) = −Σ p(x) log₂ p(x). The expected number of bits needed to encode a sample from X.

Where it lives

Compression bounds, password strength, ML loss functions, Huffman coding.

The key insight

Higher entropy = more uncertainty = more bits to encode. A fair coin: 1 bit. A biased coin (90/10): ~0.47 bits.