14 / 20
Topics / 14

CRDTs

A conflict-free replicated data type is a data structure designed so that concurrent updates on different replicas always merge into the same result, no matter what order the updates arrive in or how many times they are redelivered. The trick is algebraic, not procedural: if the merge operation is associative, commutative, and idempotent, convergence is a theorem, not a hope. CRDTs trade away the ability to enforce global invariants in exchange for never needing a round of coordination to resolve a conflict.


The algebra: a join-semilattice

A state-based CRDT (a CvRDT) is built on a join-semilattice: a partial order with a least-upper-bound (the join, written ⊔) defined for every pair of states. The merge between two replicas is that join. Because a semilattice join is:

  • Associative(a ⊔ b) ⊔ c = a ⊔ (b ⊔ c), so grouping does not matter.
  • Commutativea ⊔ b = b ⊔ a, so order does not matter.
  • Idempotenta ⊔ a = a, so redelivering the same state is harmless.

…it does not matter which order replicas exchange states, how many times a message is duplicated, or how it is batched. As long as every update eventually reaches every replica, all replicas converge to the same value. This property is called strong eventual consistency: replicas that have seen the same set of updates have identical state, with no conflict-resolution round.

Two flavours: state-based and op-based

KindWhat shipsNetwork requirement
State-based (CvRDT)The full (or delta) state; merged with join on receiptOnly needs eventual delivery; duplicates and reordering are fine
Op-based (CmRDT)Each operation, applied at every replicaNeeds exactly-once, causally-ordered delivery of ops

State-based CRDTs are robust on lossy networks because the merge absorbs duplicates, but shipping whole states is expensive — delta-CRDTs send just the changed part to fix that. Op-based CRDTs send small operations but push the hard delivery guarantees onto the messaging layer. The two are mathematically equivalent in expressiveness; the choice is an engineering trade-off about what your transport can promise.

Counters

The simplest CRDTs. A G-Counter (grow-only) is a vector of per-replica counts; each replica only increments its own entry. The value is the sum; the merge takes the element-wise maximum. Because each replica owns its slot, two concurrent increments never collide.

A PN-Counter supports decrements by pairing two G-Counters — one for increments, one for decrements — and reporting their difference. This is how you build a distributed "like count" or inventory tally that never loses an increment to a concurrent update.

Sets — and why LWW is not enough

A grow-only set (G-Set) merges by union and can never remove an element. Removal is where naive designs break. A 2P-Set adds a tombstone set, but once removed an element can never be re-added.

The workhorse is the OR-Set (observed-remove set). Each add tags the element with a unique token; a remove only deletes the tokens it has actually observed. So an add concurrent with a remove on the same element wins — the add's fresh token was never observed by the remove. This matches user intuition: if you add an item to a shopping cart at the same moment a teammate clears it, your item survives.

This is why last-writer-wins is dangerous. LWW resolves concurrent writes by timestamp, silently discarding the loser. For a set or a cart that loses data. A CRDT encodes the merge rule into the type itself, so concurrent updates combine rather than one overwriting the other.

Sequences — collaborative text

Editing shared text is the hardest CRDT problem: concurrent inserts at the same position must converge to one agreed order without a coordinator. Sequence CRDTs (RGA, Treedoc, LSEQ, Logoot, and Yjs's YATA) give every character a unique, densely-orderable identifier so that any two inserts have a deterministic relative order on every replica.

This is the technology under collaborative editors. Automerge and Yjs are the popular open-source libraries; products like Figma, Linear, and various local-first apps use CRDT or CRDT-like merge to let many people edit offline and reconcile cleanly when they reconnect.

Limits — what CRDTs cannot do

  • No global invariants. CRDTs cannot enforce "balance ≥ 0" or "at most one booking for this seat" across replicas, because that needs coordination — exactly what CRDTs avoid. Use consensus for invariants; use CRDTs for mergeable state.
  • Metadata and tombstones grow. OR-Set tokens and removed-element tombstones accumulate. Garbage-collecting them safely (without resurrecting deleted data) is the recurring practical headache.
  • Merge semantics may surprise users. "Add wins over concurrent remove" is a deliberate choice; a different app might want remove-wins. The type encodes a policy, and you must pick the one your product actually needs.

Further reading

Found this useful?