07 / 09
Azure / 07

Blob Storage

Azure's object store looks like S3 from a distance and behaves quite differently up close. The unit you design around is not a bucket but a storage account: a named slice of the global namespace that fixes your redundancy, your performance tier, and your throughput ceiling before a single byte lands. Inside it live three different kinds of blob, four access tiers, and an alphabet of replication acronyms that interviewers love and docs explain badly. This page decodes all of it, then builds one from the command line.


The storage account is the unit, and that changes everything

S3 and GCS are bucket-first systems: you create a bucket, the bucket has a globally unique name, and most of the interesting settings hang off the bucket. Azure inverts this. The globally unique name belongs to the storage account, a resource that lives in a resource group and a region like everything else in Azure (the foundations page covers that hierarchy). The account name becomes a DNS prefix: contoso.blob.core.windows.net. Every blob URL in your estate starts with it, which is why account names are lowercase, 3 to 24 characters, letters and digits only, and contested the same way bucket names are everywhere else.

The decisions you make at account creation are the ones that matter most, because several of them are fixed or expensive to change later. The kind (almost always StorageV2 today, or a premium block-blob account for SSD-backed low latency) and the redundancy SKU (the LRS-to-RA-GZRS alphabet, decoded below) are set on the account, not on a container or a blob. The same is true of the performance tier: a standard account stores blobs on HDD-backed infrastructure with per-GB pricing that rewards capacity, while a premium account stores them on SSDs with single-digit-millisecond latency and pricing that rewards small, hot objects. You cannot mix the two inside one account. In S3 you would express that difference per object with a storage class; in Azure you express it by creating two accounts.

The account is also the throughput boundary. A standard account in a large region has a default ceiling of around 5 PiB of capacity and tens of gigabits per second of ingress and egress, shared by everything inside it. When a team puts the CDN origin, the analytics landing zone, and the backup target into one account because it was convenient, they have coupled three workloads to one rate limit and one blast radius. The practical rule: one account per workload class, separated by performance needs, redundancy needs, and the question "who should be able to hold keys to this." Accounts are free; the limits are not.

Containers, and the shape of the namespace

Inside the account sit containers, which are closer to S3 prefixes with access policies than to buckets. A container is a flat namespace of blobs; its name appears as the first path segment of every blob URL (https://contoso.blob.core.windows.net/images/cat.png). Containers are where public-access settings, stored access policies, and immutability scopes attach. Blob names can contain slashes, and every SDK and the portal will render reports/2026/q1.pdf as folders, but by default that hierarchy is a naming convention, not a real one. There is no rename: "moving" a blob is a server-side copy plus a delete, exactly as in S3. (The exception is Data Lake Gen2, later on this page, which makes the hierarchy real.)

storage account: mediaprodkind StorageV2 · sku Standard_ZRS · default tier Hot · mediaprod.blob.core.windows.netcontainer: imagesblock blobsobjects: images, parquet,backups · up to ~190 TiBcontainer: logsappend blobsappend-only writes: logs,audit trails · ~195 GiB maxcontainer: vhdspage blobsrandom access in 512-bytepages: VHDs · up to 8 TiBredundancy, performance, and the throughput ceiling all attach to the outer box
The account-first model: containers partition the namespace, but the settings that matter belong to the account.

Three blob types, and why each exists

S3 has one kind of object. Blob Storage has three, and the type is fixed at creation: you cannot convert one to another without rewriting it. Each type exists because a real workload was a poor fit for the others.

Block blobs are the ordinary objects, and the type you will use for everything that looks like a file: images, parquet, video, backups, build artifacts. A block blob is assembled from up to 50,000 blocks of up to 4,000 MiB each, which puts the ceiling around 190.7 TiB per blob. The block structure is exposed in the API and is the equivalent of S3's multipart upload, with one nice difference: you stage blocks with Put Block in any order, retry any of them independently, and nothing is visible until you send Put Block List naming the blocks that make up the final blob. The commit is atomic, and you can even commit a mix of newly staged blocks and blocks from the existing version of the blob, which makes partial updates of huge objects cheap.

Append blobs exist because logs broke the object model. With a block blob, ten writers appending lines would each have to read-modify-write or coordinate block lists. An append blob supports exactly one mutation: Append Block, which atomically adds data to the end. Blocks here are at most 4 MiB and there can be 50,000 of them, so an append blob tops out around 195 GiB. Multiple writers can append concurrently without clobbering each other; what you give up is the ability to modify or delete anything already written. Diagnostic logs, audit trails, and anything event-shaped lands here.

Page blobs exist because virtual machine disks are not files you replace, they are address spaces you mutate. A page blob is a sparse array of 512-byte pages, up to 8 TiB, where any aligned range can be written or read in place. Empty ranges cost nothing and read as zeros. This is the substrate that unmanaged VM disks were built on, and the lineage survives inside managed disks today. Unless you are building something that needs random-access writes into a large fixed-size blob (a database file, a disk image), you will never create one by hand, but knowing they exist explains why the Blob API has page-range operations that make no sense for objects.

Access tiers: Hot, Cool, Cold, and the Archive trap

Block blobs carry an access tier that trades storage price against access price. Hot is the default: the highest per-GB rate, the lowest transaction and read costs. Cool roughly halves the storage rate in exchange for pricier operations, a per-GB retrieval fee, and a 30-day early-deletion charge. Cold extends the same trade: cheaper still at rest, more expensive to touch, 90-day minimum. All three are online tiers; a read returns in milliseconds regardless. The account carries a default tier (Hot or Cool) for new blobs, and individual blobs can be set explicitly or moved by lifecycle rules.

Archive is different in kind, not just in price. An archived blob is offline: its metadata is still listable, but the content cannot be read at all until you rehydrate it back to an online tier, and rehydration is measured in hours. Standard priority can take up to 15 hours; high priority usually completes in under one hour for blobs under 10 GB, at a much higher per-GB rate. This is the single biggest cross-cloud gotcha in this space: GCS's Archive class is still online with millisecond reads, and even S3 Glacier now has an Instant Retrieval class. Azure Archive has no instant variant. If a compliance officer might ever ask for a file "today," Archive is the wrong tier, and your restore-time SLA must be written with a 15-hour worst case in mind. Archive also carries a 180-day early-deletion window and is set per blob, never as an account default.

time to first byte (log scale)Hot~ms · onlineCool~ms · online, retrieval fee per GBCold~ms · online, higher retrieval feeArchive (high pri)typically under 1 h for blobs under 10 GBArchive (standard)up to 15 hthe gap between Cold and Archive is not a price step, it is an availability cliff
Hot, Cool, and Cold are online tiers. Archive is offline, and rehydration is hours, not milliseconds.

Rehydration itself has two shapes: change the blob's tier in place with Set Blob Tier and wait, or copy the archived blob to a new online blob, which leaves the cheap archived original where it is. The copy approach is the right one when you need the data once but want to keep paying archive rates for the canonical copy.

The redundancy alphabet, decoded

Redundancy is chosen per account, and the SKU names compose from three ideas: where the three synchronous copies live (L for one datacenter, Z for three availability zones), whether the whole set is asynchronously replicated to the paired region (G), and whether you can read from that secondary without a failover (RA). Spell it out once and the acronyms stop being trivia.

SKUCopiesSurvivesDoes not survive
LRS3, one datacenterdisk, node, and rack failuresa datacenter or zone outage
ZRS3, across 3 zonesa full availability-zone outage, with no failover stepa region-wide disaster
GRS3 + 3 in paired regiona regional disaster, after failovera zone outage in the primary without downtime (primary is LRS)
GZRS3 zonal + 3 in paira zone outage transparently and a regional disaster after failoversimultaneous loss of both paired regions
RA-GRSas GRSas GRS, plus the secondary is readable at any timewrites to the secondary; reads there lag
RA-GZRSas GZRSas GZRS, plus read access to the secondarysame caveat: the secondary is read-only and eventually consistent
LRSone datacenter · survives hardware failureZRSthree zones · survives a zone outage, no failoverGRSasyncLRS + paired region · survives regional disasterGZRSZRS + paired region · zone loss and region lossRA-…adds -secondary read endpoint:contoso-secondary.blob.core.windows.netreadable withoutfailing over →L = one building, Z = three zones, G = a second region, RA = you may read it
The ladder. Each rung adds a failure domain: rack, zone, region, then read access to the far copy.

Two details separate people who have read the marketing from people who have run this. First, geo-replication is asynchronous, so a regional failover can lose recent writes; the account exposes a last sync time that tells you the point up to which the secondary is guaranteed complete, and your recovery story should quote it. Second, failover is not automatic. For a customer-initiated account failover you flip the secondary to primary yourself, accept the data loss after the last sync time, and your account comes out the other side as LRS in the new region: geo-redundancy is gone until you re-enable it and a full re-replication completes. Durability numbers (eleven nines for LRS, twelve for ZRS, sixteen for the geo options) describe surviving hardware loss, not the operational work of a failover. ZRS is the quiet workhorse here: zone loss is handled with no action and no consistency caveats, which is why it is the default choice for anything that matters in a zonal region.

Getting in: SAS tokens versus Entra

Every request to Blob Storage is authorized one of three ways: with one of the two account keys, with a shared access signature (SAS) derived from a key or a delegated credential, or with an Entra ID token checked against RBAC roles. The account keys are root: full data and most management rights, no identity attached, no audit trail of who used them. The whole modern security story is about not letting those keys touch application code, and the account even has an allowSharedKeyAccess switch to outlaw them entirely.

A SAS is a signed URL query string granting specific permissions on a specific scope for a specific window. There are three kinds, and they are not equally safe. An account SAS is signed with an account key and can span services and operations broadly; it is the closest to handing out the key itself. A service SAS is also key-signed but scoped to one container or blob, and it can reference a stored access policy on the container, which matters because that gives you a revocation handle: delete the policy and every SAS minted against it dies. A user-delegation SAS is the one to prefer: it is signed with a user-delegation key obtained through Entra, so it inherits the RBAC permissions of the identity that created it, shows up in audit logs attached to that identity, is capped at seven days, and can be revoked by revoking the delegation keys without rotating anything.

The gotchas are where interviews go. A SAS is a bearer credential living in a URL, which means it leaks everywhere URLs leak: proxy logs, browser history, referrer headers, chat messages. A key-signed SAS with a generous expiry cannot be revoked except by rotating the signing key, which invalidates every SAS signed with it, usually at 2 a.m. during an incident. Clock skew bites too: a token whose start time is "now" can fail for clients a few seconds ahead, so issue with a start a few minutes in the past. The defensive defaults: user-delegation SAS, short expiry, HTTPS-only flag set, IP range pinned when the consumer is known, and never an account SAS in anything customer-facing.

For service-to-service access inside Azure, skip SAS entirely: give the workload a managed identity and a data-plane role such as Storage Blob Data Reader or Storage Blob Data Contributor. One trap there: the control-plane Owner role does not grant data access. You can own the account and still get 403s on blobs until you grant yourself a data role, which is the single most common first-day confusion with Entra-based storage auth. An Azure Function reading a container through its managed identity needs that data role, not a connection string.

Rule of thumb. Managed identity + RBAC for anything that runs in Azure. User-delegation SAS for anything that does not, minted fresh and short-lived. Account keys disabled unless a legacy tool forces your hand.

Lifecycle management: the tier moves you do not make by hand

Tiers only save money if blobs actually move through them, and nobody moves a billion blobs by hand. A lifecycle management policy is a JSON document on the account: a set of rules, each with a filter (blob type, name prefixes, blob index tags) and actions keyed on age, such as tierToCool after 30 days since modification, tierToArchive after 90, delete after 2,555 for a seven-year retention. Rules can also act on previous versions and snapshots separately from the base blob, which is how you keep current data Hot while old versions drain to Archive. Two operational facts to know: the policy engine runs roughly once a day, and a new or changed policy can take up to 48 hours to complete its first pass, so lifecycle is an economic tool rather than something with timing guarantees. There is also a last-access-time option (it must be enabled on the account first) that lets rules key on reads instead of writes, which is the closer cousin to S3's Intelligent-Tiering, though here it is still rules you write rather than a class that decides for you.

Immutability: WORM for regulators

For data that must be provably unmodifiable (financial records under SEC 17a-4, health records, evidence) Blob Storage offers immutability policies: write once, read many, enforced by the service rather than by promises. A time-based retention policy on a container or an individual blob version makes every blob unmodifiable and undeletable for N days from creation; while the policy is unlocked you can test and adjust it, and once locked nobody, including the subscription owner and Microsoft support, can shorten it. A legal hold is the open-ended variant: tagged data is frozen until every hold is explicitly cleared, independent of any clock. Combined with versioning, version-level WORM lets a blob name keep accepting new writes while each written version becomes individually immutable, which is usually what an audit-trail design actually wants. The operational warning is the obvious one inverted: a locked policy is a commitment, and storage under it cannot be deleted to save money, ever, until the clock runs out.

Soft delete, versioning, and the undo stack

Three features form the account's undo stack, and they compose. Soft delete for blobs keeps deleted blobs (and overwritten snapshots) recoverable for a retention window of 1 to 365 days; a delete becomes a soft delete, and Undelete Blob brings it back. Container soft delete does the same one level up, because the classic catastrophe is not deleting a blob but deleting the container holding ten million of them. Blob versioning is the stronger tool: every overwrite or delete automatically captures the previous state as an immutable prior version, so "restore to yesterday 14:02" is a copy of a version ID rather than an archaeology project. Versions are billed as stored data, which is why versioning without a lifecycle rule that tiers or deletes old versions is a slow cost leak. Turn on all three for anything production-shaped; the combination is also the substrate for point-in-time restore on block-blob data.

Static websites, and where they stop

Flip on static website hosting and the account gains a special $web container plus a separate web endpoint (contoso.z13.web.core.windows.net style) that serves its contents over HTTP with an index document and a custom 404. It is the cheapest possible hosting for a built SPA or documentation site. Its limits arrive quickly: no server-side redirects, no header control, and custom domains with HTTPS need a CDN or Azure Front Door in front of the endpoint. In practice the pattern is "$web as origin, Front Door for TLS, caching, and rewrites," and once you need real routing rules Azure points you at Static Web Apps instead. For an account that is also doing other work, note that the web endpoint bypasses container ACL thinking entirely: anything in $web is public by design.

Data Lake Storage Gen2: when the folders become real

One checkbox at account creation, hierarchical namespace, turns a blob account into Azure Data Lake Storage Gen2. The flat key space becomes a real filesystem: directories are first-class objects, renames and moves of a directory are single atomic metadata operations instead of a copy-every-blob-then-delete loop, and each file and directory carries POSIX-style ACLs alongside RBAC. Analytics engines speak to it through the abfss:// driver rather than the blob endpoint, though the blob API keeps working against the same data.

Why this matters is concrete: a Spark job that commits results by writing to _temporary/ and renaming into place does O(1) metadata work on a hierarchical namespace and O(n) copy work on a flat one. Directory-level ACLs map cleanly onto "team X may read /curated/finance" requirements that object-level RBAC handles awkwardly. The trade-offs: HNS must effectively be chosen up front (migration of an existing account exists but is restrictive), a handful of blob features arrive late or never on HNS accounts, and the per-operation pricing model differs. The decision rule is simple. Big-data analytics with engines that think in directories: enable it. General object storage for an application: leave it off. Do not enable it "just in case," because you are buying filesystem semantics you will pay for in feature lag.

Consistency, briefly

Blob Storage is strongly consistent for everything within the primary region: a committed write, overwrite, or delete is visible to all subsequent reads and listings immediately. There was never an eventual-consistency era to design around, unlike early S3. Concurrency control comes from HTTP itself: every blob carries an ETag, writes accept If-Match for optimistic checks, and leases give you pessimistic exclusive-write locks on a blob or container when you need an election or a single-writer guarantee. The one consistency asterisk is the one already flagged: the RA secondary endpoint lags by design, and code reading from it must tolerate stale data bounded by the last sync time.

What you actually pay for

The bill has more lines than the per-GB headline. Capacity per tier, with versions and snapshots counted. Transactions, priced per 10,000 and more expensive in cooler tiers; a workload that lists and stats millions of small blobs can owe more for operations than for storage. Retrieval per GB on Cool, Cold, and Archive reads, on top of the transaction cost. Early deletion charges if a blob leaves Cool before 30 days, Cold before 90, or Archive before 180. Tier changes themselves: moving data colder is billed as write operations at the destination tier, and moving it warmer is billed as reads plus retrieval at the source tier, so a lifecycle policy that bounces marginal data between tiers can cost more than leaving it alone. Egress out of the region, as everywhere in the cloud storage world, plus a geo-replication bandwidth charge on GRS-family accounts. The classic self-inflicted wounds: archiving millions of tiny blobs (per-blob operation costs dwarf the capacity savings), turning on versioning without lifecycle rules for old versions, and putting log data that gets re-read by a dashboard into Cool, where every refresh pays retrieval.

CLI lab: account to teardown in ten commands

Theory done; build one. This lab creates a ZRS account, uploads a blob using Entra auth instead of keys, mints a user-delegation SAS, attaches a lifecycle rule, and tears everything down. You need the az CLI logged in (az login) on a subscription where you can create resources. Total cost if you tear down promptly: effectively zero.

1. Resource group and account. Account names are global, so suffix with some randomness. Note the SKU choosing the redundancy rung and the explicit refusal of public blob access.

az group create --name blobdemo-rg --location westeurope

ACCT="blobdemo$RANDOM"
az storage account create \
  --name "$ACCT" \
  --resource-group blobdemo-rg \
  --location westeurope \
  --kind StorageV2 \
  --sku Standard_ZRS \
  --min-tls-version TLS1_2 \
  --allow-blob-public-access false

2. Grant yourself a data-plane role. Owning the account does not let you read blobs with Entra auth. Assign a data role scoped to the account; propagation can take a minute or two, so if step 3 throws 403, wait and retry.

SCOPE=$(az storage account show --name "$ACCT" \
  --resource-group blobdemo-rg --query id --output tsv)

az role assignment create \
  --assignee "$(az ad signed-in-user show --query id --output tsv)" \
  --role "Storage Blob Data Contributor" \
  --scope "$SCOPE"

3. Container and upload. --auth-mode login sends your Entra token instead of fishing for an account key; this is the habit worth building.

az storage container create --account-name "$ACCT" \
  --name docs --auth-mode login

echo "hello from the lab" > hello.txt
az storage blob upload --account-name "$ACCT" \
  --container-name docs --name hello.txt \
  --file hello.txt --auth-mode login

az storage blob list --account-name "$ACCT" \
  --container-name docs --auth-mode login \
  --query "[].{name:name, tier:properties.blobTier}" --output table

4. A user-delegation SAS. The --as-user flag is what makes this a user-delegation SAS signed via Entra rather than an account key. Read-only, one hour, and the resulting URL works in a plain browser.

EXPIRY=$(date -u -d "+1 hour" +%Y-%m-%dT%H:%MZ 2>/dev/null \
  || date -u -v+1H +%Y-%m-%dT%H:%MZ)

SAS=$(az storage blob generate-sas --account-name "$ACCT" \
  --container-name docs --name hello.txt \
  --permissions r --expiry "$EXPIRY" \
  --auth-mode login --as-user --https-only --output tsv)

curl "https://$ACCT.blob.core.windows.net/docs/hello.txt?$SAS"

5. A lifecycle rule. Tier anything under docs/ to Cool 30 days after last modification. Remember the engine runs about daily; this is policy, not a trigger.

cat > policy.json << 'EOF'
{ "rules": [ {
  "enabled": true,
  "name": "docs-to-cool",
  "type": "Lifecycle",
  "definition": {
    "filters": { "blobTypes": ["blockBlob"], "prefixMatch": ["docs/"] },
    "actions": { "baseBlob": {
      "tierToCool": { "daysAfterModificationGreaterThan": 30 }
    } }
  }
} ] }
EOF

az storage account management-policy create \
  --account-name "$ACCT" --resource-group blobdemo-rg \
  --policy @policy.json

6. Teardown. Deleting the resource group removes the account, the container, the blob, the role assignment scoped to it, and the policy in one motion.

az group delete --name blobdemo-rg --yes --no-wait

Worth trying before you tear down: re-run the upload with --tier Cool and list again to watch the tier column change, or run az storage blob set-tier to Archive and then attempt the curl to see what an offline blob looks like to a client (a 409 with BlobArchived in the body, which is the error your retry logic needs to treat as "hours," not "transient").

Further reading

Found this useful?