09 / 09
GCP / 09

Pub/Sub

Most messaging systems make you think about brokers: how many, in which region, how big the disks are, what happens when one dies. Pub/Sub asks you to think about none of that. There is one global service, one endpoint, and two nouns that matter: topics you publish to and subscriptions you read from. This page covers what that simplicity costs and what it buys — the ack-deadline lease that drives at-least-once delivery, ordering keys, exactly-once mode, dead letters, filtering, replay, and the honest answer to "why not just run Kafka."


One global service, and why that is the headline

The first thing to understand about Pub/Sub is what is missing. There are no brokers to provision, no clusters to size, no partitions to count, no regions to pick at create time. A topic is a name in a project. Publishers anywhere in the world send to pubsub.googleapis.com and Google routes the publish to the nearest region; subscribers anywhere in the world pull from the same endpoint and the service moves messages between regions when a subscriber is far from where a message was published. Storage, replication, and scaling are entirely Google's problem. You never run out of partitions because there are none to run out of, and you never rebalance a consumer group at 3 a.m. because there is no consumer group protocol to rebalance.

That single design choice is the real differentiator against the alternatives. SQS and SNS on AWS are regional services; a multi-region setup means wiring up cross-region replication yourself or accepting that each region is its own island (the AWS messaging page covers that model in detail). Self-run Kafka is even more hands-on: brokers, ZooKeeper or KRaft, partition counts chosen up front, MirrorMaker for cross-region. Pub/Sub gives you a planet-wide bus with the operational surface of an API. A service in Tokyo publishes, a worker in Iowa consumes, and neither side knows or cares where the other lives.

Two caveats keep this honest. First, "global" describes the endpoint and routing, not free bandwidth: a message published in one region and consumed in another crosses the network and you pay egress for it, so colocating heavy consumers with their publishers still saves money. Second, by default a message can be stored in any region; if you have data-residency requirements, message storage policies let you pin a topic's persistence to a set of allowed regions, and the global endpoint keeps working in front of it.

Topics and subscriptions: fan-out is the default

Pub/Sub's data model is two resources. A topic is where publishers send messages. A subscription is a named, durable attachment to a topic, and it is the thing consumers actually read from. The rule that defines the whole system: every subscription receives every message published to the topic (unless a filter says otherwise, more on that later). Subscriptions are independent copies of the stream, each with its own backlog, its own acknowledgement state, and its own pace. One slow consumer never delays another, because they are not sharing a queue; they are sharing a topic and holding separate cursors into it.

publisherm1 m2 m3topic: ordersglobal, no partitionssub: billing-workerm1 m2 m3 — own backlogsub: analytics-exportm1 m2 m3 — own backlog→ pull, ack each msg→ pull, lags behindtwo subscriptions, two full copies of the stream, two independent cursors
Fan-out is built in. Attach a second subscription and it gets every message from that point on, with no change to the publisher.

If you have used AWS, the cleanest mapping is: a Pub/Sub topic behaves like an SNS topic, and each subscription behaves like an SQS queue subscribed to it, except the pair comes as one service with one API and no glue. Each subscription gives you SQS-style competing consumers: run ten instances of a worker against one subscription and the service spreads messages across them, with each message going to one of the ten. Want a second, unrelated system to see the same events? Add a second subscription. The publisher's code does not change, which is the property that makes topics the natural backbone for event-driven designs: producers declare what happened, and the set of interested parties is configuration, not code.

Unacked messages are retained per subscription for up to 7 days by default (configurable, and extendable to 31 days for acked messages too if you turn on retention for replay). A subscription that is never attached to a topic at all receives nothing retroactively: messages published before a subscription existed are not delivered to it. Create subscriptions first, then publish, which is also the order the lab at the bottom uses.

Push vs pull delivery

Each subscription delivers in one of two modes. With pull, your subscriber calls the service (in practice, the client library holds open streaming pulls) and receives batches of messages. Pull is the default and the right choice for almost all backend workers: you control the flow, you can pull as fast as you can process, and flow control in the client library lets you cap outstanding messages and bytes so a burst does not flatten your process. Throughput is also best in pull mode, since the streaming protocol keeps a pipe full rather than waiting on request-response round trips.

With push, Pub/Sub makes an HTTPS POST to an endpoint you register, one message per request, and interprets your HTTP response as the ack: a 2xx acks the message, anything else (or a timeout) means redelivery later. Push exists for environments that cannot hold a connection open or do not want to manage a polling loop — Cloud Run and Cloud Functions are the canonical targets, where a push subscription plus an autoscaling service gives you a worker fleet that scales to zero. The service ramps push traffic up and down using a slow-start style algorithm based on your endpoint's success rate, so a struggling endpoint automatically gets less traffic. The trade-offs: your endpoint must be reachable from Google, authenticated (push supports signed OIDC tokens so your endpoint can verify the caller), and able to finish work within the request timeout. A reasonable rule: pull for steady backend processing, push for serverless and for low-traffic integrations where running a poller would be silly.

There is a third option worth knowing exists: export subscriptions write directly to BigQuery or Cloud Storage with no consumer code at all. For the very common "archive every event to the warehouse" subscription, that removes an entire service from your architecture.

At-least-once delivery and the ack-deadline lease

Pub/Sub's default contract is at-least-once: every message is delivered until something acks it, and duplicates are possible. The mechanism behind that contract is a lease. When a message is delivered to a subscriber, the subscription starts a timer — the ack deadline, 10 seconds by default, configurable up to 600. Until the deadline expires, the message is outstanding: leased to that delivery, not handed to anyone else. Three things can happen next. The subscriber acks, and the message is done. The subscriber nacks (or the deadline expires with no response), and the message becomes eligible for redelivery, to the same subscriber or a different one. Or the subscriber needs more time and extends the deadline with a modifyAckDeadline call, which the client libraries do automatically in the background for as long as your handler is running, a behaviour usually called lease management.

deliveredlease startsoutstandingack deadline: 10s…600sackeddone, removedextendedmodAckDeadlineexpired / nackeligible againackstill working…deadline passesredelivery — possibly to a different subscriber
The ack deadline is a lease, not a guarantee. Extending it keeps the lease alive; letting it lapse puts the message back in play.

The duplicate cases follow directly from the lease. A subscriber that crashes mid-handler never acks, so the message comes back: good, that is the system saving your data. A subscriber that is merely slow lets the deadline lapse while still working, so a second copy goes out and the work happens twice: that is the case lease extension exists to prevent. And an ack can be lost on the network after a successful ack call, in which case the service redelivers anyway, because an ack in base Pub/Sub is best-effort. The conclusion every team eventually writes in their runbook: handlers must be idempotent. Key your side effects on the message ID or a business key, make the second application of a message a no-op, and at-least-once becomes a non-event. The general theory of why this is the right place to spend effort is covered in delivery semantics.

Tuning note. Set the ack deadline near your real processing time, not at the 600-second maximum. A huge deadline means a crashed subscriber silently holds messages for ten minutes before anyone else can have them; a sane deadline plus client-side lease extension gives you fast failover and long-running handlers at the same time.

Ordering keys: per-key order without a totally ordered log

By default Pub/Sub does not promise order, and the global, partition-free architecture is exactly why: messages take different paths and independent subscribers ack at their own pace. When you need order, you almost never need total order; you need order per entity — all events for account 42 in sequence, with no opinion about how account 42 interleaves with account 7. That is what ordering keys give you. Publishers set a key on each message, the topic's messages with the same key are delivered to a subscriber in the order they were published, and different keys flow in parallel with no coordination between them.

key: acct-42key: acct-7key: acct-99createdupdatedpaidcreatedclosedcreatedupdatedclosedin order ✓in order ✓in order ✓lanes never wait for each other; only within a lane is sequence preserved
Ordering keys: each key is its own lane. No global sequence exists, and none of the lanes block the others.

The mechanics have sharp edges worth knowing. Ordering must be enabled on the subscription at creation time (--enable-message-ordering), and ordered publishing only holds per key, per region, per publisher client. Throughput per ordering key is capped (on the order of 1 MB/s per key), so a key must be an entity, not a constant — set every message's key to "global" and you have rebuilt a single-partition queue with extra steps. And ordering interacts with failure: if a message with a key fails and is redelivered, the messages behind it on the same key wait, because delivering them early would break the promise. Per-key head-of-line blocking is the price of per-key order; it is the same deal a Kafka partition gives you, just at finer granularity.

Exactly-once delivery, and what it actually promises

Subscriptions can opt into exactly-once delivery, which changes the contract in a specific, narrow way: while a message is outstanding to one delivery, it will not be delivered again, and acks become first-class — the ack call returns a result you can check, and a successful ack is honoured rather than best-effort. In base Pub/Sub, a redelivery can race your in-flight processing; with exactly-once enabled, it cannot. The feature works within a single region and only on pull subscriptions, and it costs some throughput and latency, because the service now has to track delivery state with much stronger consistency.

Be precise about what this is not. It is exactly-once delivery by the service, not exactly-once processing by your system. If your handler writes to a database, then crashes after the write but before the ack, the message comes back and your side effect can still happen twice. The end-to-end version of the guarantee still requires idempotent handlers or a transactional outbox on your side; what exactly-once mode removes is the class of duplicates created by the delivery system itself (deadline races, ack loss). For many teams the honest cost-benefit is: write the idempotent handler anyway, and then exactly-once mode becomes optional. For workloads where duplicate suppression is genuinely hard, like triggering a non-idempotent third-party API, it is a real and welcome upgrade.

Retry policies and dead-letter topics

What happens to a message that keeps failing? Two subscription-level settings decide. The retry policy controls pacing: immediate redelivery (the default) or exponential backoff between a minimum and maximum delay, which you want on any subscription whose handler calls something that can have a bad hour — hammering a failing downstream with instant retries helps nobody. The dead-letter policy controls the exit: after a configured number of delivery attempts (5 to 100), the service stops retrying and forwards the message to a dead-letter topic, which is just another topic. You attach a subscription to it, alert on its backlog, and build whatever inspection or replay tooling you need; the dead-lettered message carries the original data plus attributes recording where it came from.

Without a dead-letter topic, a poison message — one that crashes the handler every time — circulates for the full retention period, burning retries and, if ordering keys are on, blocking everything behind it on the same key. With one, the poison drains out after N attempts and the lane unblocks. Two operational details bite people: the Pub/Sub service account needs publish permission on the dead-letter topic and subscribe permission on the source subscription (the lab below grants these), and the delivery-attempt counter is approximate, so treat max-delivery-attempts as "around N," not exactly N. The pattern is the same one queues have used forever — the message queues guide walks the general version — Pub/Sub just makes the dead-letter destination a topic, so even your failures can fan out.

Filtering: subscriptions that take a subset

"Every subscription gets every message" has one sanctioned exception. A subscription can carry a filter, an expression over message attributes (not the payload) set at creation time: attributes.region = "eu", hasPrefix(attributes.type, "order."), combined with AND, OR, and NOT. Messages that fail the filter are dropped by the service and acked automatically; your subscriber never sees them and you are not billed message delivery for them, though the publisher was billed to publish.

Filters change architecture more than they first appear to. Instead of one topic per event type — and a publisher that must know about all of them — you publish one stream of well-attributed events and let each consumer declare the slice it wants. The constraints: filters work on attributes only, so anything you want to filter by must be lifted out of the payload into an attribute at publish time; the filter is fixed at subscription creation (changing it means a new subscription); and expressions have a size limit. Plan attribute schemas early. Adding an attribute later is easy; needing one you never set is a backfill problem.

Seek and replay: moving the cursor by hand

Acked messages are normally gone from a subscription's point of view. Seek reopens that. If the topic or subscription retains acked messages (retention is configurable up to 31 days), you can seek a subscription to a timestamp, which marks everything before that instant as acked and everything after it as unacked — rewind to before the bad deploy and the subscription redelivers from there. Seeking forward works too: seek to now and you have purged the backlog, the fastest way to discard a flood of messages you no longer want.

Timestamps are blunt; snapshots are precise. A snapshot captures a subscription's exact ack state at a moment, and seeking to the snapshot later restores that state, including messages that were in flight. The deployment ritual writes itself: snapshot the subscription, roll out the new consumer, and if it misbehaves, roll back and seek to the snapshot — every message the broken version acked comes back, none processed by the healthy version are repeated beyond the in-flight ones. Two cautions: replay is at-least-once with duplicates by design, so the idempotency you built earlier is what makes seek safe to use; and a seek moves the cursor for every consumer on that subscription, so replay-for-one-team means giving that team its own subscription. This is the closest Pub/Sub gets to Kafka's "consumers own their offsets" superpower; less flexible, but enough for the recovery cases that matter.

Pub/Sub Lite, briefly, and why it is on the way out

You will see Pub/Sub Lite in older docs and designs: a cheaper sibling where you provision capacity yourself — zonal or regional topics, explicit partitions, reserved throughput — trading away the global elastic model for a lower bill, essentially a managed single-cluster log with a Pub/Sub-shaped API. Google deprecated it, and it shuts down in March 2026; the recommended migrations are standard Pub/Sub or Managed Service for Apache Kafka. The reason it failed is instructive: once you are choosing partition counts and capacity again, you have given up the thing that made Pub/Sub worth choosing, without getting Kafka's ecosystem in return. For a new design, the decision is standard Pub/Sub vs Kafka, and Lite is a footnote.

Throughput, quotas, and what you pay

Standard Pub/Sub has no provisioned capacity. Publish throughput scales automatically and is governed by regional quotas — multiple GB/s per region by default, raisable by request — rather than by anything you sized. The per-message limits are the ones you design around: a message can be at most 10 MB, a publish batch likewise, attributes are limited (up to 100 per message, with size caps per key and value), and an ordering key serialises its own lane at roughly 1 MB/s. Subscriber throughput scales with the number of streaming pulls you hold open; the practical ceiling is almost always your handler, not the service.

Pricing is volume-based, with a minimum billable size of 1 KB per request: you pay per TiB for publish, per TiB for delivery on each subscription, plus storage for retained backlog and network egress when messages cross regions or leave Google. The two corollaries that matter for design: fan-out multiplies delivery cost (one publish, five subscriptions, five deliveries billed), and tiny messages are billed as 1 KB each, so batching small events into fewer publishes saves real money at volume. There is a small free tier (10 GiB a month), which is plenty for the lab below and for most prototypes. Compare this with running a three-broker Kafka cluster that costs the same whether you push one message or one billion, and the shape of the trade is clear: Pub/Sub is cheap when traffic is spiky or small, and the costs converge as sustained volume grows.

Pub/Sub or Kafka: the honest version

The two systems get compared constantly and the comparison is usually dishonest in one direction or the other. The structural difference: Kafka is a log you read at your own pace — messages live at offsets, consumers own their position, replay is free and constant, and the broker barely tracks consumers at all. Pub/Sub is a delivery service — the service tracks per-message ack state, pushes work toward you, and replay is a feature (seek) rather than the substrate. Read the Kafka guide next to this page and the contrast in worldview is plain.

QuestionPub/SubKafka
OperationsNone: no brokers, no partitions, no sizingYours (or a managed service's): brokers, partitions, rebalances
TopologyGlobal endpoint, cross-region built inPer cluster; cross-region is replication you build
OrderingPer ordering key, opt-inPer partition, inherent
ReplaySeek to timestamp or snapshot, within retentionRe-read any offset, any time within retention; many readers at different positions for free
Stream processingPairs with Dataflow; no native processing layerKafka Streams, ksqlDB, exactly-once transactions, a large ecosystem
Cost shapePer-message volume; near zero at low trafficCluster-shaped; flat floor, efficient at sustained high volume

A fair rule of thumb. Choose Pub/Sub when the job is moving events between services — fan-out, work distribution, feeding pipelines and warehouses — especially on GCP, at spiky or modest scale, with a team that does not want to own messaging infrastructure. Choose Kafka when the log itself is the product: long-retention event sourcing, many independent readers at different positions, Kafka Streams-style processing with transactional guarantees, log compaction, or a hard multi-cloud requirement. And if the pull toward Kafka is mostly its API and ecosystem rather than self-hosting, note that GCP now sells Managed Service for Apache Kafka, which is the cleaner answer than bending either system into the other's shape.

CLI lab: topic, fan-out, ack, dead letters

Twenty minutes in a shell makes all of the above concrete. You need a GCP project with the Pub/Sub API enabled and gcloud authenticated; everything here fits in the free tier. The plan: create a topic with two subscriptions, watch fan-out happen, pull and ack by hand to feel the lease, wire up a dead-letter topic, and tear it all down.

1. Create a topic and two subscriptions. Subscriptions first, then publish — messages published before a subscription exists are never delivered to it.

gcloud pubsub topics create orders

gcloud pubsub subscriptions create billing-worker \
    --topic=orders --ack-deadline=20

gcloud pubsub subscriptions create analytics-export \
    --topic=orders --ack-deadline=20

2. Publish a few messages, with an attribute you could filter on later.

gcloud pubsub topics publish orders \
    --message='{"order_id": 1, "total": 42}' --attribute=region=eu
gcloud pubsub topics publish orders \
    --message='{"order_id": 2, "total": 7}' --attribute=region=us
gcloud pubsub topics publish orders \
    --message='{"order_id": 3, "total": 99}' --attribute=region=eu

3. Pull from both subscriptions. Each one has its own copy of all three messages — that is the fan-out from the first diagram. Pull without acking first and note the message IDs and ack IDs that come back.

gcloud pubsub subscriptions pull billing-worker --limit=3
gcloud pubsub subscriptions pull analytics-export --limit=3

# pull again on billing-worker after ~20s: the ack deadline
# lapsed, so the same messages come back. that is the lease.

gcloud pubsub subscriptions pull billing-worker --limit=3 --auto-ack

# now billing-worker is drained, but analytics-export still
# has its own backlog — acks on one never touch the other.
gcloud pubsub subscriptions pull analytics-export --limit=3 --auto-ack

4. Attach a dead-letter topic. Create a topic for the dead letters and a subscription to catch them, then point billing-worker at it with a low attempt limit. The two IAM bindings let the Pub/Sub service agent forward on your behalf; find your project number with gcloud projects describe.

gcloud pubsub topics create orders-dead-letter
gcloud pubsub subscriptions create dead-letter-monitor \
    --topic=orders-dead-letter

PROJECT_NUMBER=$(gcloud projects describe \
    $(gcloud config get-value project) --format='value(projectNumber)')
PUBSUB_SA="[email protected]"

gcloud pubsub topics add-iam-policy-binding orders-dead-letter \
    --member="serviceAccount:$PUBSUB_SA" --role=roles/pubsub.publisher
gcloud pubsub subscriptions add-iam-policy-binding billing-worker \
    --member="serviceAccount:$PUBSUB_SA" --role=roles/pubsub.subscriber

gcloud pubsub subscriptions update billing-worker \
    --dead-letter-topic=orders-dead-letter --max-delivery-attempts=5

5. Poison a message. Publish one more, then repeatedly pull billing-worker without acking. After about five failed deliveries the message stops coming back — and appears on the monitor subscription instead, with attributes naming its origin.

gcloud pubsub topics publish orders --message='{"order_id": 666}'

# run this several times, never acking:
gcloud pubsub subscriptions pull billing-worker --limit=1

# then check the dead-letter monitor:
gcloud pubsub subscriptions pull dead-letter-monitor --limit=1 --auto-ack

6. Teardown. Subscriptions first, then topics.

gcloud pubsub subscriptions delete billing-worker analytics-export \
    dead-letter-monitor
gcloud pubsub topics delete orders orders-dead-letter

If you want to keep going: recreate analytics-export with --message-filter='attributes.region = "eu"' and watch it receive only two of the three messages, or add --enable-message-ordering and publish with --ordering-key to see the lanes from the third diagram in your terminal.

Found this useful?