Search tools and guides ⌘K

I · Information theory

Mutual information

What it is

I(X;Y) = H(X) − H(X∣Y). How much knowing Y reduces uncertainty about X.

Where it lives

Decision-tree splits (information gain), feature selection, channel capacity.

The key insight

A perfectly informative split has I = H(X). A useless feature has I ≈ 0.

More in Information theory

Shannon entropy H(X) = −Σ p(x) log₂ p(x). The expected number of bits needed to encode… Channel capacity (Shannon-Hartley) C = B · log₂(1 + S/N). Bits per second for a noisy channel of bandwidt… KL divergence D_KL(P‖Q) = Σ p(x) log(p(x)/q(x)). The "extra bits" cost of using Q to…

Across the foundations

II Poisson process Probability & randomness II Concentration inequalities Probability & randomness II Bayes theorem Probability & randomness III Shortest paths Graph theory III Min-cut / Max-flow Graph theory III Spanning trees Graph theory III Connectivity & cycles Graph theory IV Little's law Queueing theory

← All foundations