I · Information theory
Mutual information
What it is
I(X;Y) = H(X) − H(X∣Y). How much knowing Y reduces uncertainty about X.
Where it lives
Decision-tree splits (information gain), feature selection, channel capacity.
The key insight
A perfectly informative split has I = H(X). A useless feature has I ≈ 0.