ELI5 · Databases & storage

Bloom filters.

A bouncer’s “definitely not on the list” memory: tiny, fast, and never wrongly says no.

A bouncer with a thousand-name guest list cannot carry the whole book. So imagine a tiny grid of light switches instead. Each name, run through a few stamps, flips a fixed set of switches on. To check a name, you stamp it and look: if any of its switches is off, that name was definitely never added.

A Bloom filter is exactly that trick. It answers “is this maybe in the set?” using a fraction of the memory a real list would need. It can occasionally say “maybe” for something that was never added, but it will never say “no” for something that was.

All of it? No way.

1
A full guest list is huge — too big to carry or check quickly.
2
Keep a tiny grid of on/off switches instead — a few bits, not the whole book.
3
To add a name, run it through a few stamps and flip those switches on.
Not on the list.

4
To check: stamp the name. Any switch off? It was definitely never added.
5
All switches on? It is probably in — but switches can collide, so “maybe.”
6
Tiny and fast, never a false “no” — perfect for skipping expensive lookups.

A few bits and a few stamps stand in for a list too big to carry.

No false negatives, some false positives

The guarantee is one-directional. Because adding a name only ever turns switches on, a name that was truly added will always find all its switches on — so “no” is always trustworthy. But two different names can happen to share switches, so an item that was never added might find all of its switches already on by coincidence. That is a false positive: a “maybe” that turns out wrong.

Why that is still useful

Most real questions are “is this worth a slow lookup?” A database can keep a Bloom filter in memory and check it first: if it says “definitely not here,” the database skips a costly disk read entirely. The occasional false “maybe” just means a wasted check now and then — a tiny price for avoiding the expensive lookup the vast majority of the time.

The real version Bloom filter simulator →

← All explainers