I · Information theory

Mutual information

What it is

I(X;Y) = H(X) − H(X∣Y). How much knowing Y reduces uncertainty about X.

Where it lives

Decision-tree splits (information gain), feature selection, channel capacity.

The key insight

A perfectly informative split has I = H(X). A useless feature has I ≈ 0.