ELI5 · Networking & the web

Load balancers.

The host at a busy restaurant, seating each new guest at the emptiest table.

One server can only handle so much traffic before it slows to a crawl. So you run several identical copies and put a load balancer in front of them.

Like a good restaurant host, it greets every arriving request and sends it to whichever server has the most room, so no single one gets mobbed while others sit idle.

  1. DING
    1

    All traffic arrives at one address — the load balancer.

  2. Station 3 has room — right this way.
    LB
    2

    Like a restaurant host, it sends each request to a server with room to spare.

  3. Everyone alive back there?
    ~
    3

    It quietly health-checks each server.

  4. Skip station 2.
    LB
    4

    A server that stops answering is pulled from rotation — nobody gets seated there.

  5. +1
    5

    Swamped? Add more identical servers and the load spreads across them all.

  6. I’m back!
    6

    Scale out, route around failures — diners never notice the drama in the kitchen.

One front door, many kitchens — and nobody waits on a sick one.

Scaling out instead of up

Without a balancer your only option when traffic grows is a bigger, more expensive server (scaling up), and there is always a ceiling. With one, you just add more ordinary servers behind it (scaling out), which is cheaper and has almost no ceiling.

This is the backbone of how large sites survive a sudden spike: spin up more copies, and the balancer spreads the load across all of them.

It also makes failures boring

Because the balancer constantly checks health, a crashed server becomes a non-event — it is simply skipped until it recovers. The same mechanism lets you take servers down on purpose to deploy new code without any downtime, by draining traffic from one at a time.

The real version How load balancing works →
Found this useful?