Cloud Storage
GCS looks like S3 with the labels changed, and for the first hour it behaves that way too: buckets, objects, a flat namespace, an HTTP API. Then the differences start to matter. A bucket can span two regions, or a continent. The archive tier answers in milliseconds instead of asking you to file a restore request and wait. Every overwrite mints a new generation number you can use as a compare-and-swap token. This page works through the object model, location types, storage classes, consistency, access control, lifecycle, Autoclass, and the upload machinery, and closes with a gcloud lab you can run in ten minutes.
The object model: buckets, objects, generations
A bucket is a named container in a global namespace: the name has to be unique across
every Google Cloud customer, DNS-compatible, and between 3 and 63 characters. The bucket
carries the settings that matter operationally — location, default storage class, access
control mode, versioning, lifecycle rules, retention — and it belongs to a project, which
is where billing and IAM inheritance come from (the
resource hierarchy page covers that
chain). Objects inside it are byte blobs up to 5 TiB, addressed by a key of up to 1,024
bytes of UTF-8. The slashes in 2026/06/invoice.parquet are characters in one
flat key, not directories; the console draws folders because humans like them.
Objects are immutable. There is no append and no in-place edit; writing to an existing
key replaces the whole object, and the replacement gets a new generation
number. Metadata-only changes (content type, cache control, custom metadata) bump a
second counter, the metageneration, which resets to 1 whenever a new
generation is written. These are not trivia. Every mutating request accepts preconditions
like ifGenerationMatch, so you can say "overwrite this object only if it is
still the version I read" and get an atomic compare-and-swap from a storage system. Pass
ifGenerationMatch=0 and the write succeeds only if the object does not exist
yet, which is a perfectly serviceable distributed lock or leader-election primitive for
low-traffic coordination. S3 spent most of its life without conditional writes; in GCS
they have been there from the start, and a lot of GCP tooling quietly depends on them.
One more thing the model is not: a filesystem. No rename (copy then delete, two operations, not atomic), no partial overwrite, no native locking beyond the precondition trick. Where S3 workloads famously had to spread writes across key prefixes to dodge per-prefix request ceilings, GCS auto-shards its index by key range in the background — though a brand-new bucket still needs ramp-up time before it will absorb tens of thousands of requests per second, and sequential key patterns (timestamps, counters) can still concentrate load on one index range while the system rebalances.
Where a bucket lives: regional, dual-region, multi-region
This is the first real divergence from S3, where every bucket lives in exactly one region and anything wider is replication you assemble yourself. A GCS bucket is created with one of three location types, and the choice is permanent for the life of the bucket.
A regional bucket stores data redundantly across at least three zones in
one region, say us-central1. It is the cheapest option and the right one for
data that is processed by compute in the same region, which is most analytics and ML
work. A dual-region bucket replicates every object between two specific
regions, either a predefined pair such as nam4 (us-central1 plus us-east1)
or a custom pair you choose within the same continent. Both regions serve reads and
writes against the same bucket name, so a failover does not involve changing endpoints,
re-pointing replication, or reconciling two buckets — the property AWS users approximate
with cross-region replication plus failover logic in the client. A
multi-region bucket names only a continent (us,
eu, or asia), and Google decides which regions inside it hold
the replicas. It is the serving-tier choice: website assets, downloads, anything read
from everywhere and written from nowhere in particular.
Replication across regions is asynchronous by default. Most objects land in the second location within minutes; the design target is under an hour. If an entire region is lost before an object replicated, a recently written object could be lost with it, which is why dual-region buckets offer turbo replication as a paid upgrade — more on that below. Pricing follows the same ladder: dual-region costs more per gigabyte than regional, multi-region sits between, and reads served to compute in another region incur network egress that regional buckets co-located with their consumers avoid.
Storage classes: cold data without the wait
GCS has four storage classes — Standard, Nearline, Coldline, and Archive — and the headline is what they share rather than how they differ. Every class sits behind the same API, the same bucket, the same latency profile: first byte in milliseconds, even from Archive. There is no restore step, no thaw queue, no "your data will be available in 12 hours" email. An object in Archive is exactly as readable as an object in Standard. Anyone coming from S3 should pause on that, because Glacier's restore model (hours for Flexible Retrieval, up to 48 for Deep Archive bulk) shapes entire architectures that GCS simply does not need.
What the cold classes trade away is price structure, not access. Storage gets cheaper as you descend the ladder; reads get more expensive. Nearline, Coldline, and Archive each charge a per-gigabyte retrieval fee on every read, on top of higher per-operation rates. Each also has a minimum storage duration — 30, 90, and 365 days — and deleting or rewriting an object earlier bills you for the remainder anyway. The classes are a pricing contract, not a hardware tier you can feel.
The decision rule falls out of the arithmetic: pick the class by how often the data is read. Roughly monthly, Nearline. Roughly quarterly, Coldline. Roughly yearly or never, Archive. Read Archive data weekly and the retrieval fees swamp the storage savings; keep hot data in Coldline and you pay the early-delete penalty every time a pipeline rewrites it. The class is per-object, the bucket default is just what new writes inherit, and a lifecycle rule or a rewrite can change it later — which is exactly the chore Autoclass exists to take off your plate.
Strong consistency, everywhere
GCS has been strongly consistent since launch, and globally so. After a write completes — a new object, an overwrite, a delete — every subsequent read, metadata fetch, or list operation reflects it, from any client anywhere on earth. That includes the cases engineers learned to distrust on other systems: list-after-write shows the new object immediately, read-after-delete returns 404 immediately, and this holds for dual-region and multi-region buckets too, because the metadata layer coordinates across regions before acknowledging the write. S3 only retired its eventual-consistency caveats in December 2020, and a decade of workaround patterns (manifest files, list-then-verify loops, "wait and retry" wrappers) still circulates in codebases and folklore. On GCS those patterns were never needed.
Two honest footnotes. First, IAM permission changes are the exception: granting or
revoking access can take up to a few minutes to propagate, so do not build a security
invariant on a revocation taking effect instantly. Second, strong consistency does not
mean transactions. There is no atomic multi-object commit; if you need two objects to
change together, you write a manifest object last and treat its generation as the commit
point, using ifGenerationMatch to fence concurrent writers.
Uniform bucket-level access, and why ACLs are legacy
GCS has two permission systems with overlapping history. The old one is per-object ACLs,
a list of grantees on every object, interoperable with the S3 ACL model and just as easy
to get wrong. The modern one is IAM on the bucket: roles like
roles/storage.objectViewer granted to principals, inherited down the
project and folder hierarchy, with
IAM Conditions available for prefix-scoped grants. The switch that resolves the overlap
is uniform bucket-level access: enable it and ACLs stop being evaluated
entirely, leaving IAM as the single source of truth. Audit tooling, org policy, and
Google's own guidance all assume it; new buckets should always have it on, and an org
policy constraint can make that mandatory. You get a 90-day window to change your mind,
after which the setting is permanent for the bucket.
The companion setting is public access prevention, which makes
allUsers grants impossible no matter what someone clicks. The shape of the
whole arrangement is worth noticing: one boolean turns off the legacy system instead of
asking you to police it. Per-object authorization does not disappear, though — it moves
to signed URLs, which is where it belonged anyway.
Signed URLs and signed policy documents
A signed URL grants time-limited access to one object to whoever holds the URL, with no Google credentials involved. Your backend constructs a canonical description of the request — verb, resource, expiry, headers — and signs it with a service account's RSA private key (or an HMAC key for the S3-compatible XML API). The signature and the signing account's identity ride along as query parameters. When the browser presents the URL, GCS recomputes the signature, checks the expiry, and evaluates the request as if the service account itself had made it. Nothing about the URL is registered with GCS beforehand; the signature is self-contained, which is why signing is a pure local computation that costs no API call.
The V4 scheme caps expiry at seven days, and a signature made from short-lived
credentials dies with the credentials, whichever comes first. In environments without a
key file — Cloud Run, GKE with workload identity — the clean pattern is to call the IAM
signBlob API to have Google perform the signature on behalf of a service
account, which trades the local computation for one API call and zero key management.
Signed URLs handle GET and simple PUT well, but a bare PUT URL lets the holder upload
anything of any size. For browser uploads GCS has a second instrument, the
signed policy document: a JSON policy your backend signs declaring what
an HTML form POST may contain — key prefix, content-type, and most usefully a
content-length-range cap. The browser posts the form straight to the bucket,
and GCS enforces the policy at the door. It is the difference between "here is a door
key" and "here is a door key that only accepts JPEGs under 10 MB into
uploads/user-42/", and it is the right tool for user-generated content.
Lifecycle rules: policy you write once
A lifecycle configuration is a list of rules on the bucket, each pairing one action with
a set of conditions that must all hold. Actions: Delete,
SetStorageClass, or AbortIncompleteMultipartUpload. Conditions:
object age, a creation-date cutoff, name prefix and suffix matches, current storage
class, and the versioning-aware ones — isLive, numNewerVersions,
and daysSinceNoncurrentTime. A typical production bucket carries three or
four rules: demote logs to Coldline at 60 days, delete them at 400, purge noncurrent
versions 14 days after they stop being live, and abort stale multipart uploads after a
week so half-finished uploads stop costing money.
Two operational truths. The evaluator is a background batch process: rules apply within
roughly a day of their conditions becoming true, not at the stroke of midnight, so never
build a compliance deadline on lifecycle timing — for hard guarantees use bucket
retention policies, which can be locked irreversibly. And downward
SetStorageClass transitions interact with minimum durations: demote an
object to Coldline and delete it three weeks later, and the 90-day clock still bills the
difference.
Versioning and soft delete
Generations make versioning almost free conceptually. With object
versioning enabled on a bucket, an overwrite or delete keeps the previous
generation around as a noncurrent version instead of discarding it. Every
noncurrent version is addressable by its generation number
(hello.txt#1748812040112864), billable at its own storage class, and
manageable by the versioning-aware lifecycle conditions above — keep the last three
versions, expire the rest after 30 days, that sort of thing. Without a pruning rule a
busy bucket quietly becomes a museum of every byte it has ever held, so versioning and
lifecycle are effectively one feature used together.
Soft delete is the newer, blunter safety net, and it is on by default:
any deleted or overwritten object is retained in a recoverable state for a retention
window, seven days unless you change it (anywhere from off to 90 days). Unlike
versioning, it applies whether or not you asked for it, it covers
bucket-deletion accidents too, and restoring is a one-line gcloud storage
restore. The two overlap but answer different questions: versioning is a data
model you design around ("what did this object look like last Tuesday?"), soft delete is
an undo button for mistakes and bad deploys. You pay storage for soft-deleted bytes
during the window, which on a high-churn bucket is real money — turning the window down,
or off for scratch buckets, is a legitimate cost lever.
Autoclass: the class ladder, automated
Choosing classes by hand assumes you can predict access patterns, and mostly nobody can. Autoclass moves the decision into the bucket: every object starts in Standard, and each object that goes unread descends the ladder on the same schedule the minimum durations suggest — Nearline after 30 days without access, then (if you opt the bucket into the full ladder) Coldline after 90 and Archive after a year. The moment an object is read, it is promoted back to Standard and the clock restarts. Each object rides the ladder independently, so one bucket can hold last night's hot build artifacts and five-year-old compliance dumps, each priced about right.
The pricing arrangement is what makes it safe to adopt: while Autoclass is enabled, the
retrieval fees and early-delete charges of the cold classes are waived. The failure mode
of manual tiering — someone bulk-reads the Archive prefix and the bill spikes — cannot
happen. In exchange you pay a small per-object management fee, which is why the feature
suits buckets of moderately sized, unpredictably accessed objects and is a poor fit for
billions of tiny ones (objects under 128 KiB are left in Standard and skip the fee).
One sharp edge for the lab below: Autoclass owns class transitions, so it cannot be
combined with SetStorageClass lifecycle rules. Delete rules are fine. S3's
counterpart, Intelligent-Tiering, is the same idea with a similar fee — but its instant
tiers stop at "infrequent access", and its deepest tiers reintroduce restore latency,
which Autoclass never does.
Composition and resumable uploads
GCS has a server-side primitive S3 lacks: compose, which concatenates up
to 32 existing objects into a new one without the bytes leaving the service. Composite
objects can themselves be composed, so a few rounds of fan-in assemble thousands of parts
into one object. That enables a genuinely parallel upload strategy — split a large file
locally, upload the chunks concurrently as independent objects, compose, delete the
parts — which is how gsutil's parallel composite uploads worked and what the
gcloud storage CLI does for large files when conditions allow. It is also a
tidy fit for log aggregation: many small flushed segments, composed hourly into one
query-friendly object. The trade-off is checksums: composite objects carry a CRC32C but
no MD5, which breaks integrity checks in older tooling that insists on MD5.
The other upload primitive is the resumable upload. One POST initiates a session and returns a session URI; you then PUT bytes against that URI in chunks (any multiple of 256 KiB), and after an interruption a simple status query tells you the last byte the server persisted so you continue from there. Sessions live for a week, no signature gymnastics per chunk, and the session URI itself can be handed to an untrusted client as a single-purpose upload credential. Since 2021 GCS also speaks the S3-style multipart upload protocol through its XML API, which matters mostly for compatibility: tools built for S3, including most data engineering software, run against GCS with an endpoint and HMAC-key swap. The details of S3's own approach are in S3 internals.
Requester pays and turbo replication
Two bucket-level switches round out the feature set. Requester pays flips the network and operation charges from the bucket owner to the caller: anyone reading must attach a billing project to the request, and that project picks up the egress and per-operation costs. It exists for exactly one scenario, sharing large datasets publicly without volunteering to fund the world's downloads, and it is how the big public genomics and satellite-imagery archives are published.
Turbo replication applies only to dual-region buckets and tightens the
asynchronous replication window to a 15-minute recovery point objective, with an SLA
behind it rather than a design target. The default best-effort replication is fine until
the day a region goes dark with your last hour of writes unreplicated; turbo replication
is the premium you pay so that the worst case is bounded and contractual. Pair it with
the rpo metadata on the bucket and the replication-lag metrics in
monitoring, because an RPO you do not measure is a wish, not a number.
How it compares to S3
Both are flat-namespace blob stores with eleven nines of durability, and code that treats them as such ports easily. The differences cluster where each provider's history shows.
| Dimension | Cloud Storage | S3 |
|---|---|---|
| Bucket scope | Regional, dual-region, or multi-region — one bucket name across regions | Regional only; wider scope means cross-region replication between distinct buckets |
| Cold-tier access | All four classes answer in milliseconds; cold reads cost retrieval fees | Glacier Flexible/Deep Archive need a restore request and minutes-to-hours wait; only Instant Retrieval is immediate |
| Consistency | Strong and global since launch, including listings | Strong since December 2020, regional |
| Conditional writes | Generation preconditions on every mutation since launch; usable as compare-and-swap | If-Match / If-None-Match conditional writes arrived in 2024 |
| Auto-tiering | Autoclass: per-object, retrieval fees waived, never adds latency | Intelligent-Tiering: similar fee model, deep tiers reintroduce restore waits |
| Server-side concatenation | Compose, up to 32 objects per call, chainable | None; multipart upload only assembles parts you uploaded |
| Access control default | Uniform bucket-level access: one switch retires ACLs in favour of IAM | Bucket-owner-enforced object ownership (default since 2023) plays the same role |
| Interop | XML API speaks the S3 protocol with HMAC keys | The de facto protocol everyone else implements |
The honest summary: S3 has the larger ecosystem and the longer feature tail (Object Lambda, Access Points, S3 Tables), while GCS holds the cleaner core model — location types, flat-latency classes, generations, and consistency that never needed an asterisk. Where each sits in the wider taxonomy of blob, block, and file storage is the subject of the cloud storage concepts page.
Lab: bucket, Autoclass, signed URL, lifecycle, teardown
Everything above, exercised in about ten minutes with the gcloud storage
CLI. You need a project with billing enabled and gcloud auth login done; the
whole run costs a few cents at most if you tear down at the end.
- Create a bucket with Autoclass and uniform bucket-level access.
Bucket names are global, so suffix with a timestamp.
BUCKET=gcs-lab-$(date +%s) gcloud storage buckets create gs://$BUCKET \ --location=us-central1 \ --enable-autoclass \ --uniform-bucket-level-access - Inspect what you made. Note
locationType: region, the Autoclass block, and the default soft-delete policy of 604800 seconds — seven days.gcloud storage buckets describe gs://$BUCKET \ --format="yaml(location,locationType,autoclass,iamConfiguration,softDeletePolicy)" - Upload an object and meet its generation. Overwrite it and watch the
generation change while the key stays put.
echo "hello gcs" > /tmp/hello.txt gcloud storage cp /tmp/hello.txt gs://$BUCKET/lab/hello.txt gcloud storage objects describe gs://$BUCKET/lab/hello.txt \ --format="yaml(generation,metageneration,storageClass,timeCreated)" echo "hello again" > /tmp/hello.txt gcloud storage cp /tmp/hello.txt gs://$BUCKET/lab/hello.txt gcloud storage ls --all-versions gs://$BUCKET/lab/ - Generate a signed URL and fetch it with no credentials. Signing needs
a key: either a service-account key file, or impersonation of a service account you
hold
roles/iam.serviceAccountTokenCreatoron (no key file touches disk).# option A: key file gcloud storage sign-url gs://$BUCKET/lab/hello.txt \ --duration=10m --private-key-file=sa-key.json # option B: impersonation, keyless gcloud storage sign-url gs://$BUCKET/lab/hello.txt \ --duration=10m \ --impersonate-service-account=signer@$(gcloud config get-value project).iam.gserviceaccount.com # paste the signed_url it prints: curl -s "<signed_url>" # → hello again - Attach a lifecycle rule. Autoclass owns class transitions, so
SetStorageClassrules are rejected here — but expiry and multipart hygiene rules are fine.cat > /tmp/lifecycle.json <<'EOF' { "rule": [ { "action": { "type": "Delete" }, "condition": { "age": 365, "matchesPrefix": ["lab/"] } }, { "action": { "type": "AbortIncompleteMultipartUpload" }, "condition": { "age": 7 } } ] } EOF gcloud storage buckets update gs://$BUCKET --lifecycle-file=/tmp/lifecycle.json gcloud storage buckets describe gs://$BUCKET --format="yaml(lifecycle)" - Delete an object, then bring it back. Soft delete keeps the bytes
recoverable for the retention window even though versioning was never enabled.
gcloud storage rm gs://$BUCKET/lab/hello.txt gcloud storage ls --soft-deleted gs://$BUCKET/lab/ gcloud storage restore gs://$BUCKET/lab/hello.txt gcloud storage cat gs://$BUCKET/lab/hello.txt # → hello again - Tear it down. Clear the soft-delete window first so deleted bytes do
not linger (and bill) for a week after the bucket is gone.
gcloud storage buckets update gs://$BUCKET --clear-soft-delete gcloud storage rm --recursive gs://$BUCKET # removes objects and the bucket
Worth trying afterwards: re-run step 3 with a multi-gigabyte file and watch the CLI
switch to parallel uploads on its own, or create a nam4 dual-region bucket
in step 1 and compare the describe output. The bill for curiosity here is pleasantly
small.
Further reading
- GCS — consistency documentation — the precise list of what is strongly consistent (almost everything) and what is not (IAM propagation).
- GCS — storage classes — current prices, minimum durations, and retrieval fees; the numbers in this page's diagram, kept fresh.
- GCS — Autoclass — transition schedules, the fee model, and the lifecycle-rule interactions.
- GCS — signed URLs and policy documents — the V4 signing process in detail, including the keyless signBlob path.
- Semicolony — S3 internals — the other side of the comparison table, down to the partitioner and ShardStore.