IV · Probabilistic structures

Count-Min sketch

Approximate frequencies, in tiny memory.

The story

Bloom filter's cousin for counting: how many times has key X been seen? Cormode and Muthukrishnan gave a small grid of counters — query the minimum across rows. Overestimates only; never undercounts. Tunable error ε with probability 1−δ in O((1/ε) log(1/δ)) space.

How it works

A 2D array, d × w. Each row uses a different hash function; each cell is a counter. Insert: for each row, hash the key, increment that cell. Query: take the minimum count across all rows.

Where it lives

Network anomaly detection (top talker IPs). Twitter's trending topics. Spark's frequency aggregations. Distributed top-K analytics.

The key insight

Count-Min sketch is used wherever exact counts are unaffordable. The "elephant flow" detection in carrier networks — finding heavy hitters out of billions of packets — runs on Count-Min.