Should you use a message queue?
Yes, for any work that can finish after the user has their answer — emails, image processing, syncing other systems. No, if the caller needs the result to keep going; then a queue only adds latency and new ways to fail.
Before "Kafka or RabbitMQ", answer one question
Almost every message-queue debate skips the only question that settles it: does the caller need the answer right now, or can the work happen later? Get that straight and the rest mostly falls out. A queue is how a system says, "I’ve got this — I’ll do it shortly, you don’t have to wait."
When someone hits checkout and you have to charge their card before showing a success page, that’s synchronous: the result is the response, so there’s no queue in sight. But the receipt email, the analytics update, the warehouse pick — none of that has to hold up the spinner. That’s queue work.
So sort your tasks into two piles: the user is waiting on this, and this can run after. If the second pile is basically empty, you don’t need a broker yet. You need a function call.
When you should use a message queue
Two payoffs do most of the convincing: decoupling and a shock absorber. Decoupling means the producer doesn’t care whether the consumer is up, fast, or even deployed. It drops a message and moves on. Take the email service down for maintenance and checkouts keep flowing — the emails just queue up and drain when it comes back.
The shock absorber earns its keep under spikes. A marketing blast brings ten times the traffic for fifteen minutes; without a buffer, every request slams your image resizer at once and it tips over. With a queue, the flood becomes a backlog your consumers work off at a sane pace — a few hundred jobs a second — and the worst anyone notices is thumbnails arriving a little late.
Fan-out comes nearly free, too. One "order placed" event can feed email, analytics, and loyalty points, each consuming on its own. Adding a fourth later is a new subscriber, not a change to the producer.
When to skip it and just make the call
If the caller genuinely needs the result to continue, a queue is the wrong tool — you’ve added a round trip and a waiting game to get an answer you needed immediately. Make the direct call and move on.
Low volume is the other case. One producer, one consumer, a handful of jobs a minute? A database table you poll, or a plain function call, is simpler and easier to debug than a broker. And if you’re adding a queue to look scalable before you have a load problem, that’s architecture theatre: three new failure modes in exchange for nothing.
What a message queue actually costs
A queue turns one moving part into three — producer, broker, consumer — plus the wire between them. A request that used to be a stack trace you read top to bottom is now a message that left here and should turn up there. When it doesn’t, you’re correlating logs across services and squinting at the broker. Put a trace and a correlation ID on every message before your first incident, not after.
Delivery is where people get burned. Most brokers promise at-least-once, which politely means you’ll occasionally get the same message twice. Every consumer has to be idempotent — handling a repeat must be a no-op, not a second charge. That’s real work, usually a dedup key or an "already done" check, and skipping it is how a customer gets the same email four times.
Then there’s the poison message: one job that can never succeed — malformed, points at a deleted row — that a naive consumer retries forever while everything behind it stalls. You need a dead-letter queue to park those, an alert on its depth, and a human who looks. None of it is hard; all of it is work that didn’t exist before the queue.
The mistakes that bite
The big one is treating the queue as a database. Messages persist, so someone starts stashing state they mean to query later — "what’s the status of order 12345?" A queue is a pipe, not a table; you can’t ask it questions. The moment you need to look something up, that data belongs in a store you can query, and the queue should carry only the signal that something happened.
The quieter one is reaching for it by reflex. A queue in front of a workload that sees a few requests a minute adds latency and failure modes to solve a problem you don’t have. If a direct call is fast enough and you’ve measured it, the direct call is the senior choice.
Message queue vs a direct API call
Same job, different deal. A direct call is simple and hands back the answer immediately, but both sides have to be up at once, and a slow consumer drags the caller down with it. A queue breaks that link and soaks up bursts — at the price of latency, eventual consistency, and the operational tax above.
Rule of thumb: if the caller needs the result to proceed, call directly. If the work can finish later, or several things should react to one event, or you need to ride out spikes and partial outages, put a queue between them.
How to adopt one without regret
Start narrow. Take the single most obvious async job — email is the classic — and move just that behind a queue. Get idempotency, a dead-letter queue, and a depth alert right for that one path, then live with it for a few weeks.
Once the pattern feels routine and the tooling exists, the second and third use cases are cheap. The expensive path is declaring everything event-driven on day one and waking up inside a distributed system you can’t debug.
When it fits, when it doesn't
Reach for it when
- A task can run after the user gets their response — sending email, resizing an image, generating a report.
- Traffic is spiky and you want a buffer so a flood piles up instead of crushing the consumer.
- Two services should not depend on each other being up at the same instant.
- You need to fan one event out to several independent consumers.
Skip it when
- The caller genuinely needs the result before it can continue — a queue only adds latency and complexity.
- You have one producer and one consumer and almost no volume; a database table or a direct call is simpler.
- You are reaching for it to "look scalable" before you have a load problem to solve.
Common mistakes
- Treating the queue as a database — long-lived state belongs in a store you can query, not a backlog.
- Forgetting that consumers must be idempotent, because at-least-once delivery means messages get redelivered.
- No dead-letter queue, so a single poison message silently jams the whole pipeline.