04 / 09

Azure / 04

Virtual machines

Everything else in Azure is, in some sense, a managed wrapper around this. A VM is where you still see the raw materials: a size name that compresses the hardware into a short string, a disk catalogue with real performance cliffs, and an availability ladder that runs from a single machine with a modest SLA up to zone-spread scale sets. This page teaches you to read the catalogue properly — decode Standard_D4s_v5 on sight, pick a disk tier for a reason, and know which rung of the ladder a workload actually needs.

What an Azure VM actually is

An Azure VM is not one resource. It is a small assembly of them: the compute resource itself (which carries the size), an OS disk and any data disks (each a standalone managed disk resource), a network interface plugged into a subnet of a virtual network, and usually a public IP and a network security group. They live together in a resource group but have independent lifecycles — delete the VM and by default the disks and NIC survive, billing quietly. This is the same decomposition AWS makes with EC2, EBS, and ENIs, just with different names, and the same general anatomy covered in cloud compute. If the Azure resource model itself is still hazy — subscriptions, resource groups, ARM — read the foundations page first.

Two states matter for billing and behaviour. A stopped VM (shut down from inside the guest) still holds its host allocation and still bills for compute. A deallocated VM (stopped through Azure) releases the host, stops compute billing, keeps the disks, and may come back on different physical hardware with a different temporary disk. The distinction sounds pedantic until the first surprise invoice, or until a deallocated VM in a constrained region fails to restart because there is no capacity to reallocate it into.

The rest of this page works through the decisions you make when you create one: which size, which disks, which availability arrangement, and whether you should be making single VMs at all or letting a scale set make them for you.

Reading the size name

Azure VM sizes look like line noise until you learn the grammar, and then they read like a spec sheet. The pattern is: family letter, optional sub-family, vCPU count, optional constrained-vCPU count, additive feature letters, optional accelerator type, and a version. Take the workhorse Standard_D4s_v5: the D says general-purpose family with roughly one vCPU per 4 GiB of memory, the 4 says four vCPUs (so 16 GiB), the s says the VM can attach premium storage, and v5 is the hardware generation. Nothing in the name is decorative.

The size decoder. Family letter, vCPU count, feature letters, generation — every character means something.

The feature letters compose, and reading them in combination is where the skill pays off. Standard_D4ads_v5 is the same general-purpose shape on an AMD chip (a) with a local temp disk (d) and premium storage support (s). Standard_D4pls_v5 is ARM64 (p) with reduced memory (l, here 1:2 instead of 1:4) — meaningfully cheaper if your workload runs on ARM and does not need the RAM. Standard_E8bds_v5 is a memory-optimised size with boosted disk throughput (b), a temp disk, and premium storage. Constrained-vCPU sizes like Standard_M8-2ms exist for per-core licensed software: you pay for the memory and I/O of the eight-core machine but only two vCPUs are exposed, which keeps the database licence bill down.

Two letters deserve special attention. First, s: a size without it cannot attach Premium SSD disks at all, and almost everything you deploy in production should have it. Second, d: it means the VM has a local, physically-attached temp disk on the host. In generations up to v4 every size had one; from v5 onward the plain sizes (D4s_v5) ship with no temp disk, and you opt back in with d (D4ds_v5). Scripts that assumed a scratch disk at /dev/sdb or D: broke quietly on v5 — worth knowing before, not after, the migration.

The version suffix is not a minor detail either. A v5 machine is a different CPU generation from a v3 at a similar or lower price, and new feature letters and disk capabilities tend to land in new versions first. When a size is unavailable in your region or zone, the adjacent version of the same shape is usually the first substitution to try.

The families

The family letter is the first and biggest decision because it sets the ratio of CPU to memory and the character of the local hardware. Six families cover nearly everything:

Family	Shape	When you reach for it
B — burstable	Baseline CPU plus banked credits	Dev boxes, small services, anything mostly idle. Cheap, but throttled to baseline when credits run out.
D — general	~1 vCPU : 4 GiB	The default. Web servers, app tiers, small databases, almost everything that has no strong opinion.
E — memory	~1 vCPU : 8 GiB	Caches, in-memory stores, JVM heaps, relational databases that page too much on D.
F — compute	~1 vCPU : 2 GiB	CPU-bound work: batch encoding, gateways, build agents, game servers.
L — storage	Large local NVMe	Scylla, Cassandra, Elasticsearch hot tiers — anything that wants fast local disk and handles its own replication.
N — GPU	NVIDIA accelerators	NC for CUDA compute, ND for training at scale, NV for visualisation and virtual workstations.

The B family deserves a closer look because its failure mode is sneaky. A B2s accrues CPU credits while it runs below its baseline and spends them when it bursts above. Under sustained load the credit bank drains, and the VM is then pinned to a fraction of a core — which looks, from the outside, like the application suddenly got slow for no reason. Burstable sizes are excellent for the workloads they were designed for and a trap for anything with steady traffic. If you find yourself monitoring credit balance graphs to keep a production service alive, you wanted a D.

Beyond these six, the M family goes up to multiple terabytes of memory for SAP HANA-class workloads, and the H family covers RDMA-connected HPC. You will know if you need them. Within any family, the AMD (a) and ARM (p) variants are the easy cost lever: same shape, lower price, and for most Linux workloads the ARM jump is a recompile rather than a rewrite.

Managed disks

Every disk attached to a modern Azure VM is a managed disk: a top-level resource, replicated three times within its placement (locally redundant, or zone-redundant for some types), and snapshot-able independently of the VM. The catalogue has five tiers, and the differences are about performance model, not just speed:

Type	Performance model	Use
Standard HDD	Spinning disk, low IOPS	Backups, rarely-touched data. Not for OS disks you care about.
Standard SSD	SSD latency, modest IOPS tied to size	Light web servers, dev/test, anything tolerant of jitter.
Premium SSD	IOPS and throughput fixed per size tier (P10, P30…)	The production default. Required for the single-VM SLA.
Premium SSD v2	Capacity, IOPS, and throughput provisioned independently	Databases where the size-tier coupling wastes money. Data disks only, zonal only.
Ultra Disk	Dial IOPS and throughput live, sub-millisecond latency	The extreme end: top-tier OLTP, SAP. Fewer features, limited regions.

The thing to internalise about Premium SSD is that performance is welded to capacity. A P10 (128 GiB) gives you 500 IOPS; a P30 (1 TiB) gives you 5,000 IOPS and 200 MBps. Teams routinely provision a 1 TiB disk for 50 GiB of data purely to buy the IOPS — a known and slightly absurd idiom that Premium SSD v2 exists to fix, since v2 lets you buy 50 GiB of capacity and 5,000 IOPS separately. The catch with v2: it is zonal, it does not support host caching, and it cannot be an OS disk, so it is a data-disk play for databases rather than a general replacement.

Bursting papers over some of the cliff. Premium SSDs of P20 and smaller burst on a credit system, up to 3,500 IOPS and 170 MBps for up to half an hour — enough to make boot storms and batch spikes invisible. P30 and larger can enable on-demand bursting for an extra fee. As with B-series VMs, bursting is a comfort for spiky loads and a mask over under-provisioning for steady ones; if a disk lives in its burst budget, it is the wrong tier.

Host caching is the other half of disk performance. Each disk attaches with a cache mode: ReadWrite (default for OS disks), ReadOnly (the right answer for read-heavy data disks), or None (write-heavy logs, anything where the cache only adds a hop). The cache lives on the host's local SSD, and here is the subtlety: VM sizes have separate cached and uncached throughput limits, and the VM-level limit can throttle you before any disk's own limit does. Stripe four P30s across a small VM and you will hit the VM's uncached IOPS ceiling long before the disks break a sweat. Sizing storage on Azure means reading two spec sheets — the disk's and the VM's — and EBS-trained intuition (the AWS page covers that side) transfers only partially, because EBS pushes the equivalent ceiling into a single per-instance throughput number.

Ephemeral OS disks

There is one more option that changes the economics of stateless fleets: the ephemeral OS disk. Instead of backing the OS disk with remote managed storage, Azure carves it out of the host's local storage — the cache, the temp disk, or an NVMe placement depending on the size. It costs nothing, reads and writes at local-SSD latency, and reimages back to a clean state in seconds because there is no remote disk to rehydrate. The trade is total amnesia: deallocate the VM, or have the host fail, and the OS disk's contents are gone. For scale set instances, AKS node pools, build agents, and anything else that treats the OS disk as disposable, that is not a trade at all — it is free money. The only real constraints are that the image must fit in the chosen placement (cache sizes are small on small VMs) and that you give up stop-deallocate as a pause button.

The availability ladder

Azure publishes a different SLA for each way you arrange your VMs, and the arrangement is a real architectural decision, not a checkbox. The ladder has four rungs.

Rung one: a single VM. One VM, with all its disks on Premium SSD or Ultra, carries a 99.9% SLA — about 43 minutes of allowed downtime a month. That premium-disk requirement is easy to miss: a lone VM on Standard SSD has no SLA at all. Fine for dev, for batch jobs that can rerun, and for nothing customer-facing.

Rung two: an availability set. This is the older, single-datacenter model. Members of a set are spread across fault domains — racks with independent power and network, so one hardware failure takes out at most one slice of your VMs — and across update domains, the batches Azure walks through one at a time during planned host maintenance. Sets give you two or three fault domains and up to twenty update domains (five by default), and lift the SLA to 99.95%. What they cannot do is survive the building: every member sits in the same datacenter.

Rung three: availability zones. Zones are physically separate datacenters within a region — their own power, cooling, and network, close enough for single-digit millisecond latency between them. Pin one VM to zone 1 and its replica to zone 2 and no rack, no host, and no single building failure can take both. Two or more VMs spread across zones carry the 99.99% SLA. Zones replaced sets as the default answer; sets remain relevant mainly in the shrinking list of regions without zones and in legacy estates.

Fault and update domains separate racks and maintenance batches within one building. Zones separate the buildings.

Rung four: a scale set spread across zones. Once you have more than a couple of identical VMs, hand-placing them stops scaling. A Virtual Machine Scale Set distributes instances across zones (and fault domains within each zone) for you, replaces failed ones, and adds elasticity on top. For any fleet — web tiers, workers, node pools — this is where you should land, and it gets its own section next. The honest decision rule for the whole ladder: single VM for the disposable, zones for the stateful pair, a zonal VMSS for the fleet, and availability sets only where zones do not exist.

Scale sets

A Virtual Machine Scale Set is a template plus a desired count plus the machinery to keep reality matching it. It comes in two orchestration modes, and the choice is one of the first flags you set. Uniform is the original model: every instance is stamped from the scale set's model, instances are managed through the set rather than as individual VM resources, and you get a few uniform-only conveniences such as overprovisioning, where Azure briefly spins up extra instances during scale-out and deletes the slowest, at no charge, so the requested count arrives faster. Flexible is the newer mode: each instance is a full VM resource you can inspect and manage directly, and the set can mix things uniform cannot — different sizes, Spot and regular instances side by side in one pool. Flexible is the recommended default for new work; uniform still earns its keep for very large homogeneous fleets and for features that have not yet crossed over.

Autoscale is a separate Azure Monitor resource pointed at the scale set, made of profiles and rules. A rule pairs a condition with an action: average CPU above 70% over a ten-minute window, add two instances; below 30%, remove one. Three details separate a calm autoscaler from a flapping one. Give scale-in a wider margin than scale-out (the gap between 70 and 30 is hysteresis, and shrinking it invites oscillation). Respect the cooldown, which blocks repeat actions while the previous one settles. And remember the window is an average — a ten-minute average ignores a two-minute spike, which is usually what you want and occasionally exactly what you do not. Schedule-based profiles layer on top for predictable cycles: more minimum capacity during business hours, less at night. A scale-in policy decides which instance dies first — default (zone-balanced), newest, or oldest.

Scaling out: the autoscale rule fires, the set creates instances across zones, the load balancer picks them up via health probes.

Upgrades are the other thing a scale set automates. When you change the model — a new image version, a new extension — the upgrade policy decides what happens to existing instances: manual (nothing, until you trigger it), automatic (all at once, which is honest about being an outage), or rolling, which walks the fleet in batches of a configurable percentage, checks health between batches through a load balancer probe or the application health extension, and halts if a batch comes up sick. Paired with automatic OS image upgrades, a rolling policy gives you patched fleets with no human in the loop — provided your health signal actually reflects application health, because the rolling upgrade trusts it completely.

Spot VMs

Spot is Azure selling its spare capacity at a steep discount — frequently 60 to 90% off — with the right to take it back. Eviction happens for two reasons: the capacity is needed for full-price customers, or the floating Spot price rises above the maximum you set. Set your max price to -1 and you opt out of price evictions entirely: you pay up to the regular pay-as-you-go rate and are evicted only for capacity. For most uses that is the sensible setting, since the discount is what the market gives you anyway.

What happens at eviction is your second policy choice. Deallocate stops the VM but keeps its disks; you can try to restart it later, and you keep paying for the storage meanwhile. Delete removes the whole thing, disks included, which is usually what a fleet wants. Either way you get a warning through Scheduled Events — about thirty seconds, enough to drain a worker, checkpoint progress, and deregister from a queue, and not enough for anything heroic. There is no SLA, B-series sizes are excluded, and Spot capacity has its own separate quota. The patterns that work: batch and CI fleets, stateless workers behind a queue, and — the neat trick flexible orchestration allows — a single scale set running a guaranteed floor of regular instances with Spot instances layered on top for the cheap elastic middle.

Images and the Compute Gallery

Marketplace images get you started, but any serious estate ends up baking its own — hardened base, agents installed, dependencies pre-pulled, so instances boot ready instead of spending ten minutes configuring themselves. The Azure Compute Gallery is where those images live. The structure has three levels: the gallery, an image definition (the identity and metadata — OS, generation, whether it is generalized or specialized), and numbered image versions underneath it, which is what VMs and scale sets actually reference. Generalized images have been stripped of machine identity (waagent -deprovision on Linux, sysprep on Windows) and are re-provisioned at deploy time; specialized images are byte-for-byte clones, identity and all, useful for lift-and-shift and quick clones but not for fleets.

Two gallery features matter operationally. Versions replicate to the regions you list, with a replica count per region — and that count is a throughput knob, not redundancy trivia: each replica supports a limited number of simultaneous deployments (a common rule of thumb is one replica per twenty concurrent VM creations), so a scale-out of two hundred instances against a single replica will crawl. And sharing has options beyond RBAC — direct sharing to other subscriptions and tenants, or community galleries for public images. Build the versions with Packer or the Azure Image Builder service and you have a proper image pipeline: source image in, scripted customisation, versioned artifact out, replicated to wherever the fleets are.

Extensions, cloud-init, and the metadata endpoint

There are two distinct ways to run configuration inside a VM, and they fire at different times. cloud-init runs once, at first boot, during provisioning — you pass a YAML or shell payload via --custom-data at create time, and the supported Linux images execute it before the VM reports ready. It is the right tool for first-boot identity: packages, users, mounts, joining the cluster. VM extensions run through the Azure guest agent and can be pushed at any point in the VM's life from outside it — the Custom Script extension to run arbitrary commands, monitoring agents, domain join, disk encryption. Extensions are also how a scale set rolls configuration to existing instances, and az vm run-command rides the same channel for ad-hoc operations on a machine you cannot SSH into. A reasonable division of labour: cloud-init for what an instance is, extensions for what the platform does to it later.

Inside every VM there is also a service worth knowing cold: the Instance Metadata Service, at the link-local address 169.254.169.254. It answers only from inside the VM, never over the network, and requires a Metadata: true header so that a naive SSRF hole cannot trivially read it. From it a process can learn the VM's size, zone, tags, and network layout; fetch tokens for the VM's managed identity (this is how code on a VM calls Azure APIs with no stored credentials); request signed attested data proving it really runs in Azure; and poll Scheduled Events, the endpoint that announces upcoming reboots, redeploys, and — most usefully — Spot evictions, with enough notice to drain gracefully. Any workload that cares about its own lifecycle should be watching that endpoint.

Quotas, the quiet capacity limit

Every subscription carries vCPU quotas per region: a total regional cap plus a separate cap for each VM family — so you can have headroom in Dv5 and be flat out of NC at the same time, in the same region. Spot capacity has its own independent quota. New subscriptions start small, sometimes ten or twenty vCPUs, which is exactly enough to make your first scale-out test fail in a confusing way. Check before you need to:

az vm list-usage --location westeurope --output table

Quota increases are a support request — usually quick, not always, and never instant at 2 a.m. during an incident. Plan them like capacity, ahead of demand: if the autoscaler's max count times vCPUs per instance exceeds the family quota, the autoscaler will hit a wall that no amount of configuration fixes. And note that quota is permission, not a reservation — a zone can be short of a specific size even when your quota has room, which surfaces as an allocation failure. For capacity you cannot live without, on-demand capacity reservations are the actual guarantee.

CLI lab — a zonal VM, a snapshot, and an autoscaling scale set

Everything above, condensed into one runnable session. You need the az CLI and a subscription you are allowed to spend a little money in; the whole lab costs pennies if you tear it down at the end, and the teardown is one command because everything lives in one resource group.

1 · Create the resource group.

az group create --name vmlab-rg --location westeurope

2 · Create a zonal VM with a Premium SSD OS disk. Note the three decisions from this page in the flags: an s-capable size, Premium_LRS for the OS disk (which is what earns the single-VM SLA), and a zone pin.

az vm create \
  --resource-group vmlab-rg \
  --name vmlab-01 \
  --image Ubuntu2204 \
  --size Standard_D2s_v5 \
  --zone 1 \
  --storage-sku Premium_LRS \
  --admin-username azureuser \
  --generate-ssh-keys

3 · Attach a data disk. A new 64 GiB Premium SSD, created and attached in one step. It lands in zone 1 automatically, because a zonal VM can only attach zonal disks from its own zone — disks do not cross zones.

az vm disk attach \
  --resource-group vmlab-rg \
  --vm-name vmlab-01 \
  --name vmlab-data01 \
  --new --size-gb 64 --sku Premium_LRS

4 · Snapshot the OS disk. Grab the disk's resource ID, then take an incremental snapshot — incremental snapshots bill only for changed data and are what backup tooling builds on.

OSDISK_ID=$(az vm show --resource-group vmlab-rg --name vmlab-01 \
  --query storageProfile.osDisk.managedDisk.id --output tsv)

az snapshot create \
  --resource-group vmlab-rg \
  --name vmlab-os-snap \
  --source "$OSDISK_ID" \
  --incremental true

5 · Build a two-instance flexible scale set across zones. Flexible orchestration, two instances, spread over zones 1 and 2. The CLI creates the load balancer plumbing for you.

az vmss create \
  --resource-group vmlab-rg \
  --name vmlab-ss \
  --image Ubuntu2204 \
  --vm-sku Standard_D2s_v5 \
  --orchestration-mode Flexible \
  --instance-count 2 \
  --zones 1 2 \
  --admin-username azureuser \
  --generate-ssh-keys

6 · Add autoscale. A profile with a floor of two and a ceiling of six, then the two rules — note the deliberate hysteresis gap between the scale-out threshold and the scale-in one, and the asymmetric step sizes.

az monitor autoscale create \
  --resource-group vmlab-rg \
  --resource vmlab-ss \
  --resource-type Microsoft.Compute/virtualMachineScaleSets \
  --name vmlab-autoscale \
  --min-count 2 --max-count 6 --count 2

az monitor autoscale rule create \
  --resource-group vmlab-rg \
  --autoscale-name vmlab-autoscale \
  --condition "Percentage CPU > 70 avg 10m" \
  --scale out 2

az monitor autoscale rule create \
  --resource-group vmlab-rg \
  --autoscale-name vmlab-autoscale \
  --condition "Percentage CPU < 30 avg 10m" \
  --scale in 1

7 · Poke around. List the scale set's instances (in flexible mode they are real VM resources, which you can verify by listing VMs in the group), and if you SSH into vmlab-01, query IMDS from inside:

az vm list --resource-group vmlab-rg --output table

# from inside a VM:
curl -s -H "Metadata: true" \
  "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | python3 -m json.tool

8 · Tear it down. One group, one delete. This removes the VM, both disks, the snapshot, the scale set, the autoscale settings, and all the network plumbing.

az group delete --name vmlab-rg --yes --no-wait

Leftover check. The group delete catches everything here because the lab kept everything in one group. In real estates, deleted VMs routinely orphan disks and public IPs in other groups — az disk list --query "[?diskState=='Unattached']" --output table is a cheap habit that pays for itself.

Up next

05 — AKS

Kubernetes on Azure is, underneath, scale sets of these exact VMs. How AKS assembles node pools, networking, and upgrades on top of the machinery from this page.

Continue

← Back to Azure ↑ The codex

Found this useful?