Load balancers.
The host at a busy restaurant, seating each new guest at the emptiest table.
One server can only handle so much traffic before it slows to a crawl. So you run several identical copies and put a load balancer in front of them.
Like a good restaurant host, it greets every arriving request and sends it to whichever server has the most room, so no single one gets mobbed while others sit idle.
- 1
All traffic arrives at one address — the load balancer.
- Station 3 has room — right this way.2
Like a restaurant host, it sends each request to a server with room to spare.
- Everyone alive back there?3
It quietly health-checks each server.
- Skip station 2.4
A server that stops answering is pulled from rotation — nobody gets seated there.
- 5
Swamped? Add more identical servers and the load spreads across them all.
- I’m back!6
Scale out, route around failures — diners never notice the drama in the kitchen.
Scaling out instead of up
Without a balancer your only option when traffic grows is a bigger, more expensive server (scaling up), and there is always a ceiling. With one, you just add more ordinary servers behind it (scaling out), which is cheaper and has almost no ceiling.
This is the backbone of how large sites survive a sudden spike: spin up more copies, and the balancer spreads the load across all of them.
It also makes failures boring
Because the balancer constantly checks health, a crashed server becomes a non-event — it is simply skipped until it recovers. The same mechanism lets you take servers down on purpose to deploy new code without any downtime, by draining traffic from one at a time.