ELI5 · Databases & storage

Replication.

Keeping copies of your data on several machines, so a failure or a far-away reader is no big deal.

If your only copy of the data lives on one machine, that machine is a single point of failure: when it dies, so does everything. Replication keeps copies of the same data on several machines, so losing one is survivable, and readers can be served by a copy close to them.

Think of it as photocopying an important ledger and keeping copies in different offices. Everyone can read their nearest copy quickly, and if one office burns down, the others still hold the record. The hard part is keeping all the copies in step when the ledger changes.

  1. the only ledger it dies all gone
    1

    One ledger on one machine is a single point of failure — lose it and everything is gone.

  2. a copy in every branch
    2

    So you photocopy it, keeping a copy in every branch office.

  3. Record it here.
    write leader
    3

    Writes go to one leader copy that owns the latest truth.

  4. leader change log followers
    4

    The leader streams its change log out so every follower stays in sync.

  5. read any copy
    5

    Reads can come from any copy, spreading the load and serving nearby readers fast.

  6. #9leader #8follower a little behind
    6

    The catch is lag: a follower is always a little behind, so it can briefly show stale data.

Keep synced photocopies of the ledger in every branch; writes lead, reads spread out.

Leader-follower, and the lag it brings

The most common setup has one leader take all the writes and stream its changes to follower copies, which serve reads. This spreads read load and gives you spare copies to promote if the leader fails. The catch is replication lag: a follower is always a little behind, so a reader might briefly see stale data — for example, you post a comment but a follower has not received it yet. That is eventual consistency in practice.

When one leader is not enough

A single leader is simple but limits write throughput and means a brief gap when it fails over. Multi-leader and leaderless designs let several copies accept writes — better for write scale and for users spread across the globe — but now two copies can be changed at once and disagree, so the system needs a way to detect and resolve conflicts. More availability and write capacity, more complexity to keep the copies sane.

The real version How replication works →
Found this useful?