Blob Storage
Azure's object store looks like S3 from a distance and behaves quite differently up close. The unit you design around is not a bucket but a storage account: a named slice of the global namespace that fixes your redundancy, your performance tier, and your throughput ceiling before a single byte lands. Inside it live three different kinds of blob, four access tiers, and an alphabet of replication acronyms that interviewers love and docs explain badly. This page decodes all of it, then builds one from the command line.
The storage account is the unit, and that changes everything
S3 and GCS are bucket-first systems: you create a bucket, the bucket has a globally unique
name, and most of the interesting settings hang off the bucket. Azure inverts this. The
globally unique name belongs to the storage account, a resource that lives
in a resource group and a region like everything else in Azure (the
foundations page covers that
hierarchy). The account name becomes a DNS prefix:
contoso.blob.core.windows.net. Every blob URL in your estate starts with it,
which is why account names are lowercase, 3 to 24 characters, letters and digits only, and
contested the same way bucket names are everywhere else.
The decisions you make at account creation are the ones that matter most, because several of them are fixed or expensive to change later. The kind (almost always StorageV2 today, or a premium block-blob account for SSD-backed low latency) and the redundancy SKU (the LRS-to-RA-GZRS alphabet, decoded below) are set on the account, not on a container or a blob. The same is true of the performance tier: a standard account stores blobs on HDD-backed infrastructure with per-GB pricing that rewards capacity, while a premium account stores them on SSDs with single-digit-millisecond latency and pricing that rewards small, hot objects. You cannot mix the two inside one account. In S3 you would express that difference per object with a storage class; in Azure you express it by creating two accounts.
The account is also the throughput boundary. A standard account in a large region has a default ceiling of around 5 PiB of capacity and tens of gigabits per second of ingress and egress, shared by everything inside it. When a team puts the CDN origin, the analytics landing zone, and the backup target into one account because it was convenient, they have coupled three workloads to one rate limit and one blast radius. The practical rule: one account per workload class, separated by performance needs, redundancy needs, and the question "who should be able to hold keys to this." Accounts are free; the limits are not.
Containers, and the shape of the namespace
Inside the account sit containers, which are closer to S3 prefixes with
access policies than to buckets. A container is a flat namespace of blobs; its name appears
as the first path segment of every blob URL
(https://contoso.blob.core.windows.net/images/cat.png). Containers are where
public-access settings, stored access policies, and immutability scopes attach. Blob names
can contain slashes, and every SDK and the portal will render reports/2026/q1.pdf
as folders, but by default that hierarchy is a naming convention, not a real one. There is
no rename: "moving" a blob is a server-side copy plus a delete, exactly as in S3. (The
exception is Data Lake Gen2, later on this page, which makes the hierarchy real.)
Three blob types, and why each exists
S3 has one kind of object. Blob Storage has three, and the type is fixed at creation: you cannot convert one to another without rewriting it. Each type exists because a real workload was a poor fit for the others.
Block blobs are the ordinary objects, and the type you will use for
everything that looks like a file: images, parquet, video, backups, build artifacts. A block
blob is assembled from up to 50,000 blocks of up to 4,000 MiB each, which puts the ceiling
around 190.7 TiB per blob. The block structure is exposed in the API and is the equivalent
of S3's multipart upload, with one nice difference: you stage blocks with
Put Block in any order, retry any of them independently, and nothing is visible
until you send Put Block List naming the blocks that make up the final blob.
The commit is atomic, and you can even commit a mix of newly staged blocks and blocks from
the existing version of the blob, which makes partial updates of huge objects cheap.
Append blobs exist because logs broke the object model. With a block blob,
ten writers appending lines would each have to read-modify-write or coordinate block lists.
An append blob supports exactly one mutation: Append Block, which atomically
adds data to the end. Blocks here are at most 4 MiB and there can be 50,000 of them, so an
append blob tops out around 195 GiB. Multiple writers can append concurrently without
clobbering each other; what you give up is the ability to modify or delete anything already
written. Diagnostic logs, audit trails, and anything event-shaped lands here.
Page blobs exist because virtual machine disks are not files you replace, they are address spaces you mutate. A page blob is a sparse array of 512-byte pages, up to 8 TiB, where any aligned range can be written or read in place. Empty ranges cost nothing and read as zeros. This is the substrate that unmanaged VM disks were built on, and the lineage survives inside managed disks today. Unless you are building something that needs random-access writes into a large fixed-size blob (a database file, a disk image), you will never create one by hand, but knowing they exist explains why the Blob API has page-range operations that make no sense for objects.
Access tiers: Hot, Cool, Cold, and the Archive trap
Block blobs carry an access tier that trades storage price against access price. Hot is the default: the highest per-GB rate, the lowest transaction and read costs. Cool roughly halves the storage rate in exchange for pricier operations, a per-GB retrieval fee, and a 30-day early-deletion charge. Cold extends the same trade: cheaper still at rest, more expensive to touch, 90-day minimum. All three are online tiers; a read returns in milliseconds regardless. The account carries a default tier (Hot or Cool) for new blobs, and individual blobs can be set explicitly or moved by lifecycle rules.
Archive is different in kind, not just in price. An archived blob is offline: its metadata is still listable, but the content cannot be read at all until you rehydrate it back to an online tier, and rehydration is measured in hours. Standard priority can take up to 15 hours; high priority usually completes in under one hour for blobs under 10 GB, at a much higher per-GB rate. This is the single biggest cross-cloud gotcha in this space: GCS's Archive class is still online with millisecond reads, and even S3 Glacier now has an Instant Retrieval class. Azure Archive has no instant variant. If a compliance officer might ever ask for a file "today," Archive is the wrong tier, and your restore-time SLA must be written with a 15-hour worst case in mind. Archive also carries a 180-day early-deletion window and is set per blob, never as an account default.
Rehydration itself has two shapes: change the blob's tier in place with
Set Blob Tier and wait, or copy the archived blob to a new online blob, which
leaves the cheap archived original where it is. The copy approach is the right one when you
need the data once but want to keep paying archive rates for the canonical copy.
The redundancy alphabet, decoded
Redundancy is chosen per account, and the SKU names compose from three ideas: where the three synchronous copies live (L for one datacenter, Z for three availability zones), whether the whole set is asynchronously replicated to the paired region (G), and whether you can read from that secondary without a failover (RA). Spell it out once and the acronyms stop being trivia.
| SKU | Copies | Survives | Does not survive |
|---|---|---|---|
| LRS | 3, one datacenter | disk, node, and rack failures | a datacenter or zone outage |
| ZRS | 3, across 3 zones | a full availability-zone outage, with no failover step | a region-wide disaster |
| GRS | 3 + 3 in paired region | a regional disaster, after failover | a zone outage in the primary without downtime (primary is LRS) |
| GZRS | 3 zonal + 3 in pair | a zone outage transparently and a regional disaster after failover | simultaneous loss of both paired regions |
| RA-GRS | as GRS | as GRS, plus the secondary is readable at any time | writes to the secondary; reads there lag |
| RA-GZRS | as GZRS | as GZRS, plus read access to the secondary | same caveat: the secondary is read-only and eventually consistent |
Two details separate people who have read the marketing from people who have run this. First, geo-replication is asynchronous, so a regional failover can lose recent writes; the account exposes a last sync time that tells you the point up to which the secondary is guaranteed complete, and your recovery story should quote it. Second, failover is not automatic. For a customer-initiated account failover you flip the secondary to primary yourself, accept the data loss after the last sync time, and your account comes out the other side as LRS in the new region: geo-redundancy is gone until you re-enable it and a full re-replication completes. Durability numbers (eleven nines for LRS, twelve for ZRS, sixteen for the geo options) describe surviving hardware loss, not the operational work of a failover. ZRS is the quiet workhorse here: zone loss is handled with no action and no consistency caveats, which is why it is the default choice for anything that matters in a zonal region.
Getting in: SAS tokens versus Entra
Every request to Blob Storage is authorized one of three ways: with one of the two account
keys, with a shared access signature (SAS) derived from a key or a delegated credential, or
with an Entra ID token checked against RBAC roles. The account keys are root: full data and
most management rights, no identity attached, no audit trail of who used them. The
whole modern security story is about not letting those keys touch application code, and the
account even has an allowSharedKeyAccess switch to outlaw them entirely.
A SAS is a signed URL query string granting specific permissions on a specific scope for a specific window. There are three kinds, and they are not equally safe. An account SAS is signed with an account key and can span services and operations broadly; it is the closest to handing out the key itself. A service SAS is also key-signed but scoped to one container or blob, and it can reference a stored access policy on the container, which matters because that gives you a revocation handle: delete the policy and every SAS minted against it dies. A user-delegation SAS is the one to prefer: it is signed with a user-delegation key obtained through Entra, so it inherits the RBAC permissions of the identity that created it, shows up in audit logs attached to that identity, is capped at seven days, and can be revoked by revoking the delegation keys without rotating anything.
The gotchas are where interviews go. A SAS is a bearer credential living in a URL, which means it leaks everywhere URLs leak: proxy logs, browser history, referrer headers, chat messages. A key-signed SAS with a generous expiry cannot be revoked except by rotating the signing key, which invalidates every SAS signed with it, usually at 2 a.m. during an incident. Clock skew bites too: a token whose start time is "now" can fail for clients a few seconds ahead, so issue with a start a few minutes in the past. The defensive defaults: user-delegation SAS, short expiry, HTTPS-only flag set, IP range pinned when the consumer is known, and never an account SAS in anything customer-facing.
For service-to-service access inside Azure, skip SAS entirely: give the workload a managed identity and a data-plane role such as Storage Blob Data Reader or Storage Blob Data Contributor. One trap there: the control-plane Owner role does not grant data access. You can own the account and still get 403s on blobs until you grant yourself a data role, which is the single most common first-day confusion with Entra-based storage auth. An Azure Function reading a container through its managed identity needs that data role, not a connection string.
Lifecycle management: the tier moves you do not make by hand
Tiers only save money if blobs actually move through them, and nobody moves a billion blobs
by hand. A lifecycle management policy is a JSON document on the account: a
set of rules, each with a filter (blob type, name prefixes, blob index tags) and actions
keyed on age, such as tierToCool after 30 days since modification,
tierToArchive after 90, delete after 2,555 for a seven-year
retention. Rules can also act on previous versions and snapshots separately from the base
blob, which is how you keep current data Hot while old versions drain to Archive. Two
operational facts to know: the policy engine runs roughly once a day, and a new or changed
policy can take up to 48 hours to complete its first pass, so lifecycle is an economic tool
rather than something with timing guarantees. There is also a
last-access-time option (it must be enabled on the account first) that lets rules key on
reads instead of writes, which is the closer cousin to S3's Intelligent-Tiering, though here
it is still rules you write rather than a class that decides for you.
Immutability: WORM for regulators
For data that must be provably unmodifiable (financial records under SEC 17a-4, health records, evidence) Blob Storage offers immutability policies: write once, read many, enforced by the service rather than by promises. A time-based retention policy on a container or an individual blob version makes every blob unmodifiable and undeletable for N days from creation; while the policy is unlocked you can test and adjust it, and once locked nobody, including the subscription owner and Microsoft support, can shorten it. A legal hold is the open-ended variant: tagged data is frozen until every hold is explicitly cleared, independent of any clock. Combined with versioning, version-level WORM lets a blob name keep accepting new writes while each written version becomes individually immutable, which is usually what an audit-trail design actually wants. The operational warning is the obvious one inverted: a locked policy is a commitment, and storage under it cannot be deleted to save money, ever, until the clock runs out.
Soft delete, versioning, and the undo stack
Three features form the account's undo stack, and they compose. Soft delete for
blobs keeps deleted blobs (and overwritten snapshots) recoverable for a retention
window of 1 to 365 days; a delete becomes a soft delete, and Undelete Blob
brings it back. Container soft delete does the same one level up, because
the classic catastrophe is not deleting a blob but deleting the container holding ten
million of them. Blob versioning is the stronger tool: every overwrite or
delete automatically captures the previous state as an immutable prior version, so
"restore to yesterday 14:02" is a copy of a version ID rather than an archaeology project.
Versions are billed as stored data, which is why versioning without a lifecycle rule that
tiers or deletes old versions is a slow cost leak. Turn on all three for anything
production-shaped; the combination is also the substrate for point-in-time restore on
block-blob data.
Static websites, and where they stop
Flip on static website hosting and the account gains a special $web container
plus a separate web endpoint (contoso.z13.web.core.windows.net style) that
serves its contents over HTTP with an index document and a custom 404. It is the cheapest
possible hosting for a built SPA or documentation site. Its limits arrive quickly: no
server-side redirects, no header control, and custom domains with HTTPS need a CDN or Azure
Front Door in front of the endpoint. In practice the pattern is "$web as origin, Front Door
for TLS, caching, and rewrites," and once you need real routing rules Azure points you at
Static Web Apps instead. For an account that is also doing other work, note that the web
endpoint bypasses container ACL thinking entirely: anything in $web is public
by design.
Data Lake Storage Gen2: when the folders become real
One checkbox at account creation, hierarchical namespace, turns a blob
account into Azure Data Lake Storage Gen2. The flat key space becomes a real filesystem:
directories are first-class objects, renames and moves of a directory are single atomic
metadata operations instead of a copy-every-blob-then-delete loop, and each file and
directory carries POSIX-style ACLs alongside RBAC. Analytics engines speak to it through the
abfss:// driver rather than the blob endpoint, though the blob API keeps
working against the same data.
Why this matters is concrete: a Spark job that commits results by writing to
_temporary/ and renaming into place does O(1) metadata work on a hierarchical
namespace and O(n) copy work on a flat one. Directory-level ACLs map cleanly onto
"team X may read /curated/finance" requirements that object-level RBAC handles
awkwardly. The trade-offs: HNS must effectively be chosen up front (migration of an existing
account exists but is restrictive), a handful of blob features arrive late or never on HNS
accounts, and the per-operation pricing model differs. The decision rule is simple. Big-data
analytics with engines that think in directories: enable it. General object storage for an
application: leave it off. Do not enable it "just in case," because you are buying filesystem
semantics you will pay for in feature lag.
Consistency, briefly
Blob Storage is strongly consistent for everything within the primary region: a committed
write, overwrite, or delete is visible to all subsequent reads and listings immediately.
There was never an eventual-consistency era to design around, unlike early
S3. Concurrency control comes from
HTTP itself: every blob carries an ETag, writes accept If-Match for optimistic
checks, and leases give you pessimistic exclusive-write locks on a blob or container when
you need an election or a single-writer guarantee. The one consistency asterisk is the one
already flagged: the RA secondary endpoint lags by design, and code reading from it must
tolerate stale data bounded by the last sync time.
What you actually pay for
The bill has more lines than the per-GB headline. Capacity per tier, with versions and snapshots counted. Transactions, priced per 10,000 and more expensive in cooler tiers; a workload that lists and stats millions of small blobs can owe more for operations than for storage. Retrieval per GB on Cool, Cold, and Archive reads, on top of the transaction cost. Early deletion charges if a blob leaves Cool before 30 days, Cold before 90, or Archive before 180. Tier changes themselves: moving data colder is billed as write operations at the destination tier, and moving it warmer is billed as reads plus retrieval at the source tier, so a lifecycle policy that bounces marginal data between tiers can cost more than leaving it alone. Egress out of the region, as everywhere in the cloud storage world, plus a geo-replication bandwidth charge on GRS-family accounts. The classic self-inflicted wounds: archiving millions of tiny blobs (per-blob operation costs dwarf the capacity savings), turning on versioning without lifecycle rules for old versions, and putting log data that gets re-read by a dashboard into Cool, where every refresh pays retrieval.
CLI lab: account to teardown in ten commands
Theory done; build one. This lab creates a ZRS account, uploads a blob using Entra auth
instead of keys, mints a user-delegation SAS, attaches a lifecycle rule, and tears
everything down. You need the az CLI logged in (az login) on a
subscription where you can create resources. Total cost if you tear down promptly: effectively
zero.
1. Resource group and account. Account names are global, so suffix with some randomness. Note the SKU choosing the redundancy rung and the explicit refusal of public blob access.
az group create --name blobdemo-rg --location westeurope
ACCT="blobdemo$RANDOM"
az storage account create \
--name "$ACCT" \
--resource-group blobdemo-rg \
--location westeurope \
--kind StorageV2 \
--sku Standard_ZRS \
--min-tls-version TLS1_2 \
--allow-blob-public-access false 2. Grant yourself a data-plane role. Owning the account does not let you read blobs with Entra auth. Assign a data role scoped to the account; propagation can take a minute or two, so if step 3 throws 403, wait and retry.
SCOPE=$(az storage account show --name "$ACCT" \
--resource-group blobdemo-rg --query id --output tsv)
az role assignment create \
--assignee "$(az ad signed-in-user show --query id --output tsv)" \
--role "Storage Blob Data Contributor" \
--scope "$SCOPE" 3. Container and upload. --auth-mode login sends your Entra token instead of fishing for an account key; this is the habit worth building.
az storage container create --account-name "$ACCT" \
--name docs --auth-mode login
echo "hello from the lab" > hello.txt
az storage blob upload --account-name "$ACCT" \
--container-name docs --name hello.txt \
--file hello.txt --auth-mode login
az storage blob list --account-name "$ACCT" \
--container-name docs --auth-mode login \
--query "[].{name:name, tier:properties.blobTier}" --output table 4. A user-delegation SAS. The --as-user flag is what makes this a user-delegation SAS signed via Entra rather than an account key. Read-only, one hour, and the resulting URL works in a plain browser.
EXPIRY=$(date -u -d "+1 hour" +%Y-%m-%dT%H:%MZ 2>/dev/null \
|| date -u -v+1H +%Y-%m-%dT%H:%MZ)
SAS=$(az storage blob generate-sas --account-name "$ACCT" \
--container-name docs --name hello.txt \
--permissions r --expiry "$EXPIRY" \
--auth-mode login --as-user --https-only --output tsv)
curl "https://$ACCT.blob.core.windows.net/docs/hello.txt?$SAS" 5. A lifecycle rule. Tier anything under docs/ to Cool 30 days after last modification. Remember the engine runs about daily; this is policy, not a trigger.
cat > policy.json << 'EOF'
{ "rules": [ {
"enabled": true,
"name": "docs-to-cool",
"type": "Lifecycle",
"definition": {
"filters": { "blobTypes": ["blockBlob"], "prefixMatch": ["docs/"] },
"actions": { "baseBlob": {
"tierToCool": { "daysAfterModificationGreaterThan": 30 }
} }
}
} ] }
EOF
az storage account management-policy create \
--account-name "$ACCT" --resource-group blobdemo-rg \
--policy @policy.json 6. Teardown. Deleting the resource group removes the account, the container, the blob, the role assignment scoped to it, and the policy in one motion.
az group delete --name blobdemo-rg --yes --no-wait Worth trying before you tear down: re-run the upload with --tier Cool and list
again to watch the tier column change, or run az storage blob set-tier to
Archive and then attempt the curl to see what an offline blob looks like to a
client (a 409 with BlobArchived in the body, which is the error your retry
logic needs to treat as "hours," not "transient").
Further reading
- Microsoft Learn — Azure Storage redundancy — the canonical decoding of the alphabet, with the durability math and failover semantics.
- Microsoft Learn — Hot, Cool, Cold, and Archive access tiers — tier behaviour, early-deletion windows, and the rehydration rules in full.
- Microsoft Learn — Shared access signatures — the three SAS kinds, stored access policies, and the official best-practice list.
- Calder et al. (SOSP 2011) — Windows Azure Storage — the paper behind all of this: the stream layer, the partition layer, and how strong consistency and high availability coexist.
- Semicolony — S3 internals — the bucket-first contrast: prefixes, request-rate sharding, and Glacier's restore model.