I · Information theory
Shannon entropy
What it is
H(X) = −Σ p(x) log₂ p(x). The expected number of bits needed to encode a sample from X.
Where it lives
Compression bounds, password strength, ML loss functions, Huffman coding.
The key insight
Higher entropy = more uncertainty = more bits to encode. A fair coin: 1 bit. A biased coin (90/10): ~0.47 bits.