ELI5 · Distributed systems

The circuit breaker.

A fuse that trips so a failing service can’t drag the rest down with it.

When something in your house shorts out, the fuse trips and cuts the power to that line. It looks drastic, but it stops one fault from overheating the whole system and burning the place down.

A circuit breaker does the same for services. When a dependency starts failing, the breaker “trips” and your service stops calling it for a while — failing fast instead of piling up doomed requests. That gives the broken service room to recover and keeps the failure from cascading.

  1. Try again… again… again…
    caller failing
    1

    One service keeps calling a dependency that has started failing.

  2. watching failures
    2

    A breaker sits in the middle, watching the failure rate.

  3. Open. Stop calling.
    tripped — fail fast
    3

    Too many failures and it trips: calls stop, and they fail instantly instead.

  4. resting room to recover
    4

    Failing fast spares both sides — and gives the sick service room to recover.

  5. One probe…
    probe one test call
    5

    After a pause it lets a single test call through to check (half-open).

  6. closed — flowing again
    6

    Healthy again? It closes, and normal traffic resumes.

Trip the fuse, fail fast, let the sick service heal, then resume.

Why failing fast is kinder

When a dependency is struggling, hammering it with retries makes things worse and ties up your own threads waiting on calls that will time out anyway. By tripping, the breaker turns slow, hopeful failures into instant ones. Your service stays responsive — it can return a cached value or a clear error — and the overloaded dependency gets a break to recover.

Stopping the cascade

In a web of services, one slow component can stall its callers, which stall theirs, until the whole system locks up — a cascading failure. The breaker is a firewall against that: it contains the damage to the one failing edge instead of letting back-pressure spread. The half-open probe then restores service automatically, with no human paging required.

The real version Circuit breaker simulator →
Found this useful?