03 / 09

GCP / 03

Compute Engine

Compute Engine is GCP's virtual machine service, and it is where Google's habits as an infrastructure operator show most clearly. You can dial in the exact number of vCPUs and the exact amount of memory instead of picking from a menu. Your VM survives host maintenance without a reboot because Google moves it to another machine while it runs. Discounts apply themselves. This page covers the machine families, disks, images, instance groups, and pricing mechanics an engineer actually needs, and ends with a lab you can run with nothing but gcloud.

What a Compute Engine instance actually is

An instance is a KVM-based virtual machine running on one physical host in one zone, attached to a network interface in a VPC and to one or more persistent disks. Each of those pieces has its own lifecycle, which is the first thing worth internalising: the VM, its disks, and its IP addresses are separate resources that happen to be connected, not one bundle. You can delete the VM and keep the disk, move a disk between VMs, or promote an ephemeral IP to a static one that outlives everything. The scoping rules from the foundations page apply directly here: instances and disks are zonal resources, instance templates and snapshots are global, and a few things like static external IPs are regional.

A VM is defined by a handful of choices. The machine type sets vCPU and memory. The boot disk comes from an image. The network interface lands in a subnet of some VPC, with firewall rules deciding what can reach it. A service account decides what the VM itself is allowed to call. Metadata carries configuration, including the startup script that runs on first boot. Everything else, from GPUs to confidential computing, hangs off those basics. If you know EC2, the mapping is mostly one-to-one, and the EC2, EBS, and AMI page makes a good side-by-side read. The differences that matter are the ones this page spends its time on: custom shapes, live migration, and pricing that does not require a spreadsheet.

Machine families: picking the silicon

Machine types are grouped into families, and the family letter tells you what the hardware is tuned for. The naming is consistent: a family letter, a generation number, then the size, so n2-standard-8 is a second-generation general-purpose machine with 8 vCPUs. A vCPU is one hardware thread, not one physical core, on every family except the compute-optimised ones, where simultaneous multithreading is disabled and a vCPU maps to a full core.

Family	What it is	Reach for it when
`E2`	Cost-optimised general purpose; runs on whatever CPU platform Google has spare, with some oversubscription	dev environments, small services, anything where price beats consistency
`N2 / N2D / N4`	Balanced general purpose on Intel (N2) or AMD (N2D); the workhorse family, supports custom shapes	most production services, databases of ordinary size
`C3 / C4`	Compute-optimised, latest CPUs, highest sustained per-core performance, no SMT sharing surprises	game servers, scientific compute, latency-sensitive single-threaded work
`M1–M4`	Memory-optimised, up to many terabytes of RAM	SAP HANA, very large in-memory databases and caches
`A2 / A3 / G2`	Accelerator-optimised with attached NVIDIA GPUs (A100, H100, L4)	ML training and inference, video transcoding

Two practical notes. First, E2 is cheap but you give up some predictability: Google schedules E2 VMs across mixed CPU platforms and reserves the right to oversubscribe slightly, so do not benchmark on E2 and assume the numbers carry over. Second, family availability varies by zone. The newest compute-optimised and accelerator families exist only in a subset of zones, so check gcloud compute machine-types list for your zone before you design a deployment around a specific family. This is the same trap as AWS instance type availability, and it bites at exactly the same moment, usually mid-migration.

Custom machine types: pay for the shape you need

Here is the first real differentiator. On AWS you pick from a catalogue of fixed shapes, and the catalogue marches in powers of two: 2 vCPUs with 8 GB, 4 with 16, 8 with 32. If your service needs 6 vCPUs and 14 GB, you buy the 8/32 machine and pay for capacity you never touch. Compute Engine lets you specify the exact shape instead. On the N and E families you choose any even number of vCPUs (plus 1 as a special case) and any memory amount between 0.5 GB and 8 GB per vCPU, in 256 MB steps. The resulting machine type is named for what you picked: n2-custom-6-14336 is 6 vCPUs and 14336 MB of RAM, exactly what the workload wanted.

Fixed shapes force rounding up; a custom machine type bills per vCPU and per GB for exactly what you configure.

Billing follows the shape: custom types are priced per vCPU and per GB of memory, at a small premium (around 5 percent) over the equivalent predefined type. The premium is usually far smaller than the waste it eliminates. A fleet of memory-light web servers that would each waste 18 GB on a fixed 8/32 shape recovers that cost many times over. If you need more than 8 GB per vCPU there is an extended memory flag that lifts the ceiling for another premium, which is occasionally the difference between fitting a JVM heap and buying a whole extra size class.

When should you bother? Rightsizing established workloads is the obvious case: run the predefined shape for a few weeks, read the rightsizing recommendations the console produces for you, then cut a custom type that matches reality. The less obvious case is migration, where a machine from on-prem or another cloud has an awkward shape and you want to reproduce it instead of re-tuning the application. Either way, treat custom types as a tuning tool rather than a default. Predefined shapes are simpler to reason about, easier to cover with committed-use discounts, and what most documentation and benchmarks assume.

Persistent disks live longer than your VM

Persistent disks are network-attached block storage, GCP's answer to EBS, and the network part matters: a persistent disk is not a physical drive in the host but a slice of a replicated storage service that the hypervisor presents as a block device. Detach it from one VM, attach it to another, and the data follows. Delete the VM and the disk persists unless you asked otherwise, which is why the boot disk's auto-delete flag deserves a glance before you delete anything you might want back.

Disks come in performance tiers. pd-standard is backed by spinning disks and is only sensible for cold data and batch scratch space. pd-balanced is the SSD default for almost everything, with a good price for the IOPS it gives. pd-ssd raises the IOPS and throughput ceilings for databases that need them, and pd-extreme lets you provision IOPS explicitly for the rare workload that needs six figures of them. Performance scales with disk size and with the VM's vCPU count, a detail that surprises people: a tiny VM cannot drive a big disk at full speed, so disk benchmarks need to be run from the machine shape you will actually use. There is also a newer Hyperdisk line that decouples size from provisioned performance entirely, and local SSDs that are physically attached NVMe, blisteringly fast and gone forever when the VM stops.

Two features here are quietly excellent. The first is live resizing: you can grow a disk while the VM is running and using it, with no detach and no downtime, then grow the filesystem with resize2fs or its xfs equivalent. Shrinking is not supported, so grow in modest steps. The second is regional persistent disks, which synchronously replicate every write to two zones in a region. If the zone holding your VM fails, you force-attach the disk to a replacement VM in the other zone and carry on with zero data loss. It is a one-flag high-availability story for stateful workloads that would otherwise need application-level replication.

Snapshots round out the story. A snapshot is a point-in-time copy of a disk, and two properties make them more useful than you might expect. They are incremental: the first snapshot copies everything, and each one after stores only the blocks that changed, while deletion quietly rewrites references so you can delete any snapshot in the chain without breaking the others. And they are global resources: a snapshot taken from a zonal disk in Tokyo can seed a new disk in Belgium, which makes snapshots the simplest tool for moving data between regions, cloning environments, and seeding test databases from production. Schedule them with a resource policy rather than cron; the platform will handle retention.

Live migration: the maintenance you never see

This is the flagship. Physical hosts need maintenance: kernel and hypervisor upgrades, firmware patches, failing hardware swaps. On most clouds that means your VM gets a scheduled maintenance notice and a reboot, and you build your operations around tolerating it. Compute Engine instead moves the running VM to a different host while it keeps serving traffic. No reboot, no dropped TCP connections, no changed IP. The guest OS does not participate and mostly cannot tell, beyond a brief performance dip. It is the default behaviour for nearly all machine types, and Google performs these migrations constantly across the fleet without anyone filing a ticket.

The mechanism is a staged memory copy. Once a target host is picked, the source enters a brownout phase: the VM keeps running while its memory pages are copied across the network, and the hypervisor tracks which pages the guest dirties during the copy, much like the dirty tracking described in the virtual memory page. Dirtied pages are re-sent in repeated passes, each pass smaller than the last as the set of changing pages shrinks. When the remainder is small enough, the VM is paused for a blackout typically well under a second, the final dirty pages, CPU state, and device state move over, and execution resumes on the target. The network fabric redirects the VM's addresses to the new host, and a short post-migration brownout on the target covers the remaining cleanup while the source forwards any stragglers.

Live migration in one picture: repeated memory pre-copy passes while the VM serves, then a sub-second blackout at cutover.

What this buys you operationally is hard to overstate. The maintenance-event runbook most teams maintain on other clouds, the one about draining instances ahead of scheduled reboots, largely does not exist on GCP. The exceptions prove the rule: VMs with GPUs attached cannot live-migrate because device state cannot be moved, so they get a terminate-and-restart policy instead, and Spot VMs are simply preempted rather than moved. You choose the behaviour with the --maintenance-policy flag, where MIGRATE is the default and TERMINATE is for the cases above. For latency-critical systems it is still worth knowing migrations happen, because the brownout phases steal some memory bandwidth and the blackout is a real pause; if you run something like a low-latency trading workload you will notice the blip even though your uptime counter does not.

Images and image families

An image is a bootable disk template, the thing your boot disk is stamped from. Google maintains public images for the usual operating systems, and you build custom images on top of them: boot a VM from a public image, install your runtime and agents, then create an image from its disk. Like snapshots, images are global resources, so one golden image serves every region, and you can build new images from existing disks, from snapshots, or from other images.

The detail worth stealing for your own pipelines is the image family. A family is a named pointer to the newest non-deprecated image in a series. Instance templates and scripts reference the family, for example debian-12 in project debian-cloud, or your own api-server family, and automatically pick up the newest image at creation time. Roll out a bad build and you mark that image deprecated, at which point the family pointer falls back to the previous good one. That gives you versioned, rollback-friendly golden images with no extra machinery, which is exactly what you want feeding the instance templates in the next section. Bake what is slow and common into the image, leave what is fast and environment-specific to startup scripts, and your instances boot in seconds instead of minutes.

Managed instance groups: the fleet abstraction

A single VM is a pet. A managed instance group, universally shortened to MIG, is how Compute Engine does cattle: you write an instance template (machine type, image or image family, disks, network, metadata, the full VM spec, immutable once created), tell the MIG how many copies you want, and the control plane keeps reality matched to that number. Delete an instance and the MIG replaces it. The MIG is also the unit that load balancers target, the unit autoscaling operates on, and the unit rolling updates flow through, which makes it the centre of gravity for any serious Compute Engine deployment.

Three behaviours earn their keep. Autoscaling adds and removes instances against a target signal: average CPU load, load balancer serving capacity, a Cloud Monitoring metric, or a schedule. You set a minimum, a maximum, and a target, and a scale-in control can slow down how aggressively it removes capacity after a spike. Autohealing goes beyond replacing deleted VMs: you attach an application health check, and if an instance fails it some number of consecutive times, the MIG deletes and recreates it from the template. This catches the wedged-but-running states a liveness signal exists to catch, such as a deadlocked process or a full disk, that plain VM-level monitoring misses. The initial delay setting matters here; set it longer than your slowest cold boot or the MIG will shoot instances that were merely still starting. Rolling updates swap the template under a running group, replacing instances in controlled waves with knobs for surge and unavailability, plus canary support so a new template can take a fraction of the group while you watch the dashboards.

A regional MIG spreads the fleet across zones; autohealing recreates the instance that keeps failing its health check.

MIGs come in zonal and regional flavours, and for production you almost always want regional. A regional MIG spreads instances evenly across zones in a region (three by default, selectable), so a zone outage costs you a third of capacity rather than all of it. Pair that with the over-provisioning rule of thumb: if you need N instances to carry peak load, run enough that losing one zone still leaves N. The combination of a regional MIG, an autohealing health check, and a load balancer in front is the standard pattern, and it is worth noting how little configuration it takes compared to assembling the same thing from an AWS auto scaling group, lifecycle hooks, and target group health checks. Stateful MIGs exist too, preserving instance names and disks across recreation, but if you find yourself reaching for one, first ask whether the workload belongs in a database or on GKE.

Spot VMs: deep discounts, 30 seconds notice

Spot VMs are spare capacity sold at a 60 to 91 percent discount, with the catch that Compute Engine can take them back whenever it needs the capacity. When that happens the VM gets a preemption notice through the metadata server and the ACPI shutdown signal, and has roughly 30 seconds to checkpoint and exit cleanly. Spot replaced the older preemptible VMs, and the differences are friendly: preemptible instances had a hard 24-hour maximum runtime and a fixed discount, while Spot VMs have no maximum runtime and dynamic pricing that in practice moves slowly, with a ceiling guaranteeing it never exceeds the on-demand price. Unlike the AWS spot market there is no bidding; you simply ask for Spot provisioning and pay the current rate.

The fit is any workload that tolerates interruption: batch processing, CI runners, rendering, fault-tolerant data pipelines, and stateless services where a MIG quietly replaces preempted instances. The trap is anything that holds unreplicated state or takes longer to checkpoint than the notice window. A pattern that works well is a mixed fleet: a baseline of standard VMs sized for the minimum acceptable capacity, plus a Spot MIG that adds cheap capacity when it is available. Write the preemption handler first, test it by running gcloud compute instances simulate-maintenance-event, and treat any Spot instance as something that might vanish mid-request, because eventually one will.

Sole-tenant nodes: a host of your own

Sometimes the problem is not the VM but who else is on the physical machine. Sole-tenant nodes rent you an entire physical server, and only your project's VMs get scheduled onto it. The usual reasons are compliance regimes that demand physical isolation, noisy-neighbour elimination for the most latency-sensitive systems, and, most commonly in practice, bring-your-own-license software whose terms are written per physical core or socket. Windows Server and some Oracle licensing only pencil out when you can point at specific hardware, and sole tenancy plus CPU-overcommit controls and host affinity rules exists for exactly that audience. You still create normal VMs; they just land on your node group, and live migration can be configured to keep them within it. Expect to pay for the whole node whether you fill it or not, so pack it deliberately.

Discounts that apply themselves

GCP's compute pricing has one idea AWS still has not copied: the platform discounts you automatically. Sustained-use discounts kick in when a vCPU and memory footprint runs for more than a quarter of the month, scaling up to roughly 30 percent off for resources that run the entire month. You do nothing. There is no instance to tag, no commitment to sign, and the calculation even combines partial-month usage across different VMs into equivalent full-time footprints, so tearing down one VM and starting a similar one keeps accumulating the discount. It applies to the general-purpose and compute-optimised families (notably not E2, whose pricing is already lowered to compensate).

Committed-use discounts are the deliberate version: commit to a quantity of vCPU and memory in a region for one or three years and the price drops up to 57 percent for general-purpose resources, deeper for memory-optimised. The contrast with AWS Reserved Instances is the level of abstraction. A classic RI is a bet on an instance type in a location, and an entire cottage industry exists to manage, resell, and re-balance those bets; AWS Savings Plans closed much of the gap but still ask for a dollars-per-hour commitment you must model. A GCP commitment is just resources: any mix of VMs in the region, predefined or custom shapes, can consume it, and resource reshuffling within the family does not strand your discount. Spend-based commitments exist too for the Savings-Plan-style approach. The practical upshot: start with nothing and collect sustained-use discounts for free, then buy commitments once a quarter of steady-state usage is obvious from the bill, and keep Spot in the mix for the interruptible remainder.

The metadata server and startup scripts

Every instance can reach a metadata server at metadata.google.internal (169.254.169.254), a link-local HTTP service that answers questions about the VM itself: its name, zone, network interfaces, service account tokens, and any custom key-value pairs attached to the instance or inherited from the project. Requests must carry the header Metadata-Flavor: Google, a small but deliberate defence: a browser or a naive SSRF proxy will not add that header, so casual request-forgery attacks against the token endpoint fail. This is the same idea as the IMDSv2 hardening on EC2, built in from the start. The token endpoint is how workloads authenticate to Google APIs without any key file on disk; the client libraries query it automatically, which is why a correctly configured VM needs no credentials anywhere in its filesystem.

Startup scripts ride on metadata. Put a script in the startup-script key (or point startup-script-url at a Cloud Storage object) and the guest agent runs it as root on every boot. It is the standard place for last-mile configuration: pulling environment-specific settings, registering with service discovery, starting the application. Shutdown scripts are the mirror image, with a bounded window (around 90 seconds on standard VMs, the 30-second notice on Spot), and are where graceful drain logic lives. Keep startup scripts short and idempotent. They run on every boot, not just the first, and every minute of work in one is a minute added to autoscaling response time, which is the practical argument for baking dependencies into a custom image instead.

OS Login: SSH keys meet IAM

The default SSH story on Compute Engine is metadata-based keys, where public keys are stored in instance or project metadata and the guest agent writes them into user accounts. It works, but it makes key hygiene a manual chore and disconnects SSH access from your identity system. OS Login replaces it: enable one metadata flag (enable-oslogin=TRUE) and the instance hands account management to IAM. Users upload an SSH key to their Google identity once, and whether they can reach a VM is decided by IAM roles, roles/compute.osLogin for a normal user or roles/compute.osAdminLogin for sudo. Linux usernames, UIDs, and home directories derive from the identity, consistent across every VM in the fleet.

The payoff is in lifecycle and audit. When someone leaves the team, removing their IAM role removes their access to every instance at once, with no metadata to scrub, and access can be made conditional or time-bound the same way as any other IAM grant. Two-factor enforcement and Cloud Audit Logs integration come along for free. For interview purposes and for real designs the rule is simple: OS Login on, project-wide, everywhere, with metadata keys blocked, and pair it with Identity-Aware Proxy TCP forwarding so instances need no public IPs at all. SSH then becomes one more thing governed by the IAM policy you already review, rather than a parallel access system nobody audits.

Lab: build the whole story with gcloud

Everything above fits in a fifteen-minute terminal session on a fresh project. The lab creates a VM with a custom shape, attaches and grows a data disk, snapshots it, then builds a regional MIG with autohealing, and tears everything down. Costs are pennies if you delete as you go; set your project and a default region first.

1. A VM with a custom machine type. Six vCPUs, 14 GB, the shape no fixed catalogue offers. Note the machine type name the API derives for you.

gcloud config set project YOUR_PROJECT_ID

gcloud compute instances create lab-vm \
  --zone=us-central1-a \
  --custom-vm-type=n2 --custom-cpu=6 --custom-memory=14GB \
  --image-family=debian-12 --image-project=debian-cloud

gcloud compute instances describe lab-vm --zone=us-central1-a \
  --format="value(machineType)"   # ...machineTypes/n2-custom-6-14336

2. Attach a disk, then grow it live. The resize happens while the VM runs; only the filesystem step would remain inside the guest.

gcloud compute disks create lab-data \
  --zone=us-central1-a --size=100GB --type=pd-balanced

gcloud compute instances attach-disk lab-vm \
  --zone=us-central1-a --disk=lab-data

gcloud compute disks resize lab-data \
  --zone=us-central1-a --size=150GB --quiet

3. Snapshot it. The snapshot is global; list it and note there is no zone in its identity.

gcloud compute disks snapshot lab-data \
  --zone=us-central1-a --snapshot-names=lab-data-snap1

gcloud compute snapshots list --filter="name=lab-data-snap1"

4. A regional MIG with autohealing. Template, health check, then the group. The initial delay gives new instances two minutes to boot before the health check can condemn them. Once it is up, delete an instance and watch the MIG replace it.

gcloud compute instance-templates create lab-tmpl \
  --machine-type=e2-small \
  --image-family=debian-12 --image-project=debian-cloud \
  --metadata=startup-script='#!/bin/bash
apt-get update && apt-get install -y nginx'

gcloud compute health-checks create http lab-hc \
  --port=80 --check-interval=10s --unhealthy-threshold=3

gcloud compute instance-groups managed create lab-mig \
  --region=us-central1 --size=3 --template=lab-tmpl \
  --health-check=lab-hc --initial-delay=120

gcloud compute instance-groups managed set-autoscaling lab-mig \
  --region=us-central1 --min-num-replicas=3 --max-num-replicas=6 \
  --target-cpu-utilization=0.65

# watch autohealing do its job
gcloud compute instance-groups managed list-instances lab-mig --region=us-central1
gcloud compute instance-groups managed delete-instances lab-mig \
  --region=us-central1 --instances=INSTANCE_NAME_FROM_LIST
gcloud compute instance-groups managed list-instances lab-mig --region=us-central1

5. Teardown. Order matters only a little: the MIG before its template, the disk after it is detached.

gcloud compute instance-groups managed delete lab-mig --region=us-central1 --quiet
gcloud compute health-checks delete lab-hc --quiet
gcloud compute instance-templates delete lab-tmpl --quiet
gcloud compute snapshots delete lab-data-snap1 --quiet
gcloud compute instances detach-disk lab-vm --zone=us-central1-a --disk=lab-data
gcloud compute instances delete lab-vm --zone=us-central1-a --quiet
gcloud compute disks delete lab-data --zone=us-central1-a --quiet

One extra experiment. Before deleting lab-vm, run gcloud compute instances simulate-maintenance-event lab-vm --zone=us-central1-a and keep a ping running against it from another terminal. You are watching a live migration happen, and the point is how little you see.

Compute Engine

What a Compute Engine instance actually is

Machine families: picking the silicon

Custom machine types: pay for the shape you need

Persistent disks live longer than your VM

Live migration: the maintenance you never see

Images and image families

Managed instance groups: the fleet abstraction

Spot VMs: deep discounts, 30 seconds notice

Sole-tenant nodes: a host of your own

Discounts that apply themselves

The metadata server and startup scripts

OS Login: SSH keys meet IAM

Lab: build the whole story with gcloud

Further reading

04 — GKE