03 / 09

Azure / 03

Virtual networks

Everything in Azure that has a private IP address lives in a VNet, and most production incidents that look like application failures turn out to be a route, a security rule, or a DNS record inside one. This page covers the model end to end: regional VNets and their subnets, the NSG rule engine and how it evaluates twice, peering and the hub-and-spoke pattern, Private Endpoints versus service endpoints, user-defined routes, the four load balancers and how to pick between them, and the DNS machinery that quietly holds it all together. It closes with a CLI lab you can run and delete in twenty minutes.

A VNet is a regional network you fully control

A virtual network is a private, isolated slice of the Azure network in one region. You give it an address space in CIDR notation, usually from the RFC 1918 private ranges, and Azure gives you a software-defined network where every VM, container instance, and private endpoint you place inside it gets an IP from that space. Nothing outside the VNet can reach those addresses unless you explicitly connect it: peering, a VPN, an ExpressRoute circuit, or a public IP you attach on purpose. The default posture is isolation.

The word regional carries weight. A VNet exists in exactly one region, the same model AWS uses for a VPC, and the opposite of GCP, where a VPC is a global object with regional subnets. If you have workloads in West Europe and East US, that is two VNets, and making them talk is a deliberate act of peering or gateway plumbing. People arriving from GCP get surprised by this; people arriving from AWS feel at home, and the VPC deep dive maps almost one-to-one onto this page. If Azure itself is new to you, the foundations page covers the resource group and subscription scaffolding this all hangs from.

A VNet can hold more than one address range, and you can add ranges later without rebuilding anything, which is the standard escape hatch when a network planned at /20 turns out to need more room. The constraint to respect from day one is overlap: two networks with overlapping address space can never be peered or connected through a gateway. Address planning is dull, and it is also the one decision on this page that is painful to reverse, because re-IPing a production VNet means rebuilding the things inside it. Keep a spreadsheet, or better, an IPAM tool, and hand each VNet a unique block from the start.

Subnets, and the five addresses you never get

Subnets carve the address space into segments, and they are the unit at which most policy attaches: NSGs, route tables, service endpoints, and delegations all bind to subnets. A sensible layout separates tiers — a subnet for web VMs, one for application services, one for databases or private endpoints — not because subnets provide isolation by themselves (they do not; by default everything in a VNet can reach everything else) but because they give you clean boundaries to hang security rules and routes on.

Azure reserves five addresses in every subnet: the network address, the broadcast address, and three more — x.x.x.1 for the default gateway and x.x.x.2 and x.x.x.3 for Azure DNS. A /24 therefore yields 251 usable addresses, not 254, and a /29, the smallest subnet Azure allows, yields exactly three. This matters more than it sounds when you size subnets for services that consume many IPs, such as AKS with the Azure CNI, where every pod takes an address from the subnet.

Some subnets are special by name. A VPN or ExpressRoute gateway will only deploy into a subnet called GatewaySubnet; Azure Firewall demands AzureFirewallSubnet; Bastion wants AzureBastionSubnet. These named subnets cannot carry NSGs in the usual way and should hold nothing else. Separately, subnet delegation hands a subnet to a PaaS service, such as App Service VNet integration or Azure Container Instances, letting that service inject and manage its own resources there. A delegated subnet is effectively owned by the service, so plan for it as consumed space.

Network security groups: the rule engine

An NSG is an ordered list of allow and deny rules evaluated against the classic five-tuple: source, source port, destination, destination port, and protocol. Each rule has a priority from 100 to 4096, lower numbers run first, and evaluation stops at the first match. The engine is stateful, so if an inbound rule admits a TCP connection, the replies flow out without needing a matching outbound rule. Sources and destinations can be IP ranges, service tags — Azure-maintained labels such as Internet, VirtualNetwork, or Storage.WestEurope that expand to the right prefixes automatically — or application security groups, which we get to next.

Every NSG ships with default rules at priorities 65000 and up, which you cannot delete but can override with anything numbered lower. Inbound, the defaults allow traffic from within the VNet and from the Azure load balancer's health-probe address, then deny everything else. Outbound, they allow VNet traffic and internet traffic, then deny the rest. Two consequences are worth tattooing somewhere visible: by default, anything in a VNet can reach anything else in it, including across peerings, because the VirtualNetwork tag covers peered space; and by default, every VM can reach the internet outbound. Locking down either is your job, not Azure's.

An NSG can attach to a subnet, to a NIC, or both, and this is where people get bitten. When both exist, inbound traffic must pass the subnet NSG first and then the NIC NSG; outbound traffic is checked at the NIC first and then the subnet. Both must allow the traffic. There is no merging of rules, no most-specific-wins across the two — they are two independent gates in series. The classic failure is an engineer adding a NIC-level allow rule and waiting for a connection that the subnet NSG silently drops one layer earlier.

Two independent gates in series. Traffic must pass both NSGs; a deny at either one drops the packet, and the order flips for outbound.

Practical advice. Pick one attachment point and stay consistent. Most teams standardise on subnet-level NSGs, which keeps the rule count manageable and matches how tiers are laid out, and reserve NIC-level NSGs for genuine exceptions. Running both everywhere doubles your debugging surface for very little gain.

Application security groups: rules that say what you mean

IP-based rules rot. The rule that allows 10.0.1.0/24 to reach 10.0.2.4 on port 5432 was written when the web tier lived in that subnet and the database had that address; six months and one re-deployment later, nobody remembers what it protects. Application security groups fix this by letting you write rules in terms of workload identity instead of addresses. An ASG is a label. You attach it to the NICs of VMs that play a role — asg-web, asg-db — and then write NSG rules whose source and destination are ASGs: allow asg-web to reach asg-db on 5432, deny everything else to asg-db.

The payoff is operational. When a new web VM comes up, its NIC joins asg-web and inherits every rule that mentions the group; nothing about the NSG changes. The rules read as intent, which makes security review humane, and membership changes do not require touching the rule set at all. The limits: ASGs only apply to NICs, so they describe IaaS workloads rather than PaaS endpoints, and all NICs referenced in one rule must live in the same VNet. Within those bounds, using ASGs for anything beyond trivial single-purpose networks is just the correct default.

Peering, and why hub-and-spoke exists

VNet peering joins two VNets so resources in each can reach the other by private IP, over the Microsoft backbone, with no gateway, no public internet, and near-zero added latency. Peering within a region and global peering across regions work the same way; both are configured as a pair of one-directional links and only carry traffic when both sides agree. Address spaces must not overlap, and the data path costs per gigabyte, more for global than regional.

The property that shapes every real Azure network is that peering is non-transitive. If A peers with B and B peers with C, A still cannot reach C. Traffic will not flow through an intermediate VNet by default, full stop. With a handful of VNets you could mesh them all, but the pair count grows quadratically and so does the bookkeeping. The standard answer is hub-and-spoke: one hub VNet holds the shared, expensive plumbing — the VPN or ExpressRoute gateway, Azure Firewall, Bastion, shared private DNS — and every workload VNet peers only with the hub.

Hub-and-spoke. Spokes peer only with the hub; peering is non-transitive, so spoke-to-spoke traffic exists only because UDRs steer it through the firewall.

Two flags make the pattern work. Gateway transit lets spokes use the hub's VPN or ExpressRoute gateway as their path to on-premises — the hub allows it, each spoke sets "use remote gateways" — so you buy one gateway instead of one per VNet. And because peering alone will never carry spoke-to-spoke traffic, each spoke subnet gets a user-defined route sending 0.0.0.0/0, or at least the other spokes' prefixes, to the firewall's private IP, with "allow forwarded traffic" enabled on the peerings. The firewall then becomes the single inspection and logging point for east-west traffic, which auditors tend to appreciate. At larger scale, Azure Virtual WAN packages this whole arrangement as a managed service, but the topology you are paying for is the same one in the diagram.

Private Endpoints versus service endpoints

Both features exist to answer the same question — how do VMs in my VNet reach Azure PaaS services like Storage or SQL without traversing the public internet — and they answer it so differently that picking the wrong one shows up in security reviews. The difference matters; learn it once.

A service endpoint is a subnet-level switch. Enable it for, say, Microsoft.Storage, and traffic from that subnet to Storage flows over the Azure backbone with the VM's private identity attached, letting you write a firewall rule on the storage account that says "only accept traffic from this subnet". It costs nothing and takes a minute. But the storage account keeps its public IP — you are still connecting to a public endpoint, just over a better path with a source-based ACL. On-premises machines coming in over VPN cannot use it, and a VM in the subnet can still reach anyone's storage account, which makes data exfiltration a one-liner.

A Private Endpoint, built on Private Link, goes further: it injects a NIC into your subnet that carries a private IP from your address space and maps to one specific resource — this storage account, this SQL server, not the service in general. The resource becomes reachable at something like 10.0.2.5, you can disable its public endpoint entirely, and the private address works from peered VNets and from on-premises over VPN or ExpressRoute, since it is just an IP in your network. The cost is real money per endpoint plus per-gigabyte processing, and a DNS obligation we will meet below: the service's hostname must resolve to the private IP inside your network, which is what the privatelink.* DNS zones are for.

Rule of thumb. Service endpoints are a quick win for locking a PaaS firewall to your subnets when public exposure is acceptable. Private Endpoints are the answer when the requirement is "no public endpoint at all", when on-premises clients need the path, or when exfiltration to attacker-controlled accounts is in your threat model. Most regulated environments end up all-in on Private Endpoints.

User-defined routes and forced tunnelling

Every subnet starts with system routes Azure maintains for you: the VNet's own space is reachable directly, peered VNets via peering, 0.0.0.0/0 heads to the internet. You override them with a route table — a set of user-defined routes attached to a subnet, where the most specific prefix wins and a UDR beats a system route of equal specificity. Each route names a next hop type: VirtualAppliance with an IP (almost always a firewall), VirtualNetworkGateway, Internet, or None, which blackholes the traffic.

Nearly all UDR usage is one idea: steering traffic through an inspection point. The spoke-to-firewall route in the hub-and-spoke diagram is a UDR. So is forced tunnelling, where 0.0.0.0/0 points at the VPN or ExpressRoute gateway so that even internet-bound traffic from Azure VMs hairpins through on-premises security appliances before reaching the world. Some compliance regimes demand it; everyone else mostly suffers the latency. Two operational notes: routes advertised over BGP from on-premises mix into the same routing decision, and the effective routes view on a NIC (az network nic show-effective-route-table) is the single most useful debugging command in Azure networking — it shows what the packet will actually do, after all sources are merged, rather than what you hoped.

The four load balancers, and how to choose

Azure ships four distinct load-balancing services, and interviewers love the question because the names do not help: one is called Load Balancer as if the others were not, and two of the four are global. The grid that untangles them has two axes — does the service operate at layer 4 (TCP/UDP) or layer 7 (HTTP), and is it regional or global? If the mechanics of L4 versus L7 balancing are fuzzy, the load balancing guide builds them up from scratch.

The decision ladder: pick a row by protocol awareness you need, a column by geography. The two can stack — global in front of regional.

Azure Load Balancer is the plain L4 workhorse: it spreads TCP and UDP flows across VMs in one region using a five-tuple hash, runs health probes, and adds essentially zero latency because it rewrites packets rather than proxying connections. It has no idea what HTTP is. Use it inside a VNet in front of a database cluster or a pool of stateless services, or as a public entry point when you terminate TLS yourself. Application Gateway is the regional L7 proxy: it terminates TLS, routes by URL path and host header, handles cookie-based session affinity, and offers a web application firewall that blocks the OWASP-style attacks before they reach your code.

Front Door takes the L7 job global. Clients connect to the nearest Microsoft edge location via anycast, TLS terminates there, static content can be cached there, and requests ride the backbone to your nearest healthy origin, with failover between regions in seconds because Front Door sees every request. Traffic Manager is the odd one out: it is a DNS server with opinions. A client resolves your hostname, Traffic Manager answers with the IP of the best endpoint by priority, weight, geography, or measured latency, and then steps out of the way entirely — the actual traffic never touches it. That makes it protocol-agnostic and very cheap, and also means failover is hostage to DNS TTLs and clients that ignore them.

Service	Layer	Scope	Reach for it when
Azure Load Balancer	L4 (TCP/UDP)	Regional	Spreading raw connections across VMs in one region; internal tiers; non-HTTP protocols; lowest latency
Application Gateway	L7 (HTTP/S)	Regional	TLS termination, path and host routing, WAF in front of regional web apps
Front Door	L7 (HTTP/S)	Global	Multi-region web apps and APIs: edge TLS, caching, WAF, fast regional failover
Traffic Manager	DNS	Global	Steering clients across endpoints for any protocol, including non-Azure ones; failover bounded by DNS TTL

These compose. The canonical multi-region web architecture is Front Door at the edge, an Application Gateway (or the load balancer) in each region behind it, and the regional service spreading load across instances. The canonical mistake is putting Traffic Manager in front of an HTTP app that needs fast failover and discovering during an outage that thousands of clients cached the dead region's IP.

VPN Gateway and ExpressRoute at a glance

Both connect your VNets to networks outside Azure, both live in the GatewaySubnet, and they solve different problems. VPN Gateway builds IPsec tunnels over the public internet: site-to-site tunnels to your office firewall, point-to-site connections for individual laptops. It is quick to stand up and cheap to run, and its throughput and latency are at the mercy of the internet path between you and the Azure edge — fine for management traffic and modest workloads, painful for chatty database replication.

ExpressRoute is a private circuit into Microsoft's network, provisioned through a connectivity provider at 50 Mbps to 100 Gbps, with an SLA, predictable latency, and no public internet in the path. Two details surprise people: ExpressRoute traffic is not encrypted by default, since it is a private path rather than a tunnel, so regulated workloads often layer IPsec or MACsec on top; and a circuit takes weeks of provider lead time, not minutes. The common enterprise pattern runs both — ExpressRoute as the primary path, a VPN tunnel as warm standby — terminating in the hub VNet, with gateway transit handing connectivity to every spoke. Routes from on-premises arrive over BGP either way and merge into the effective-route calculation each NIC sees.

DNS inside a VNet

By default, VMs in a VNet resolve names through Azure's wire-served resolver at the magic address 168.63.129.16, which answers for public names and for Azure's internal ones. The moment you want your own names — db.internal.contoso.com resolving to a private IP — you create a private DNS zone and link it to your VNets. A link can enable auto-registration, in which case every VM in the linked VNet registers its hostname in the zone automatically and the record follows the VM through reallocation, which beats maintaining a spreadsheet of IPs by a comfortable margin.

Private DNS zones are also the other half of Private Link. When you create a Private Endpoint for a storage account, clients still address it as myaccount.blob.core.windows.net — nothing in your code changes. Resolution inside the VNet has to return the private IP, though, and the mechanism is a private zone named privatelink.blob.core.windows.net holding the record, linked to every VNet that needs the private path. Public resolvers keep returning the public IP, or a pointer to it, so the same hostname lands in different places depending on where you ask from. Forgetting the zone link is the classic Private Link failure: the endpoint exists, the NIC has its IP, and every client sails straight past it to the public endpoint, or to a firewall that now rejects them. If on-premises machines must resolve these names too, Azure DNS Private Resolver gives your corporate DNS servers something inside the VNet to forward queries to.

CLI lab: build it, peer it, prove it, delete it

Theory done. This lab builds a VNet with two subnets, locks SSH down to your own IP with an NSG, stands up two VMs, peers to a second VNet, proves the private path works, and deletes everything. It costs a few cents if you finish within the hour. You need the az CLI logged in and a subscription where you can create resource groups.

1. A resource group and the first VNet with two subnets.

az group create --name rg-vnet-lab --location westeurope

az network vnet create \
  --resource-group rg-vnet-lab --name vnet-app \
  --address-prefix 10.10.0.0/16 \
  --subnet-name snet-web --subnet-prefixes 10.10.1.0/24

az network vnet subnet create \
  --resource-group rg-vnet-lab --vnet-name vnet-app \
  --name snet-db --address-prefixes 10.10.2.0/24

2. An NSG that allows SSH from your IP only. The rule at priority 100 admits you; the built-in DenyAllInbound at 65500 handles everyone else. Attaching the NSG at the subnet level covers every NIC that will ever land in snet-web.

MYIP=$(curl -s ifconfig.me)

az network nsg create --resource-group rg-vnet-lab --name nsg-web

az network nsg rule create \
  --resource-group rg-vnet-lab --nsg-name nsg-web \
  --name allow-ssh-from-me --priority 100 \
  --direction Inbound --access Allow --protocol Tcp \
  --source-address-prefixes $MYIP/32 \
  --destination-port-ranges 22

az network vnet subnet update \
  --resource-group rg-vnet-lab --vnet-name vnet-app \
  --name snet-web --network-security-group nsg-web

3. Two small VMs. One in each subnet. The web VM gets a public IP so you can reach it; the db VM gets none, which is the point — its only callers should be inside the network.

az vm create \
  --resource-group rg-vnet-lab --name vm-web \
  --image Ubuntu2204 --size Standard_B1s \
  --vnet-name vnet-app --subnet snet-web \
  --admin-username azureuser --generate-ssh-keys

az vm create \
  --resource-group rg-vnet-lab --name vm-db \
  --image Ubuntu2204 --size Standard_B1s \
  --vnet-name vnet-app --subnet snet-db \
  --public-ip-address "" \
  --admin-username azureuser --generate-ssh-keys

4. A second VNet, peered both ways. Note the non-overlapping address space, and that peering takes two commands because each direction is its own link.

az network vnet create \
  --resource-group rg-vnet-lab --name vnet-data \
  --address-prefix 10.20.0.0/16 \
  --subnet-name snet-tools --subnet-prefixes 10.20.1.0/24

az vm create \
  --resource-group rg-vnet-lab --name vm-tools \
  --image Ubuntu2204 --size Standard_B1s \
  --vnet-name vnet-data --subnet snet-tools \
  --public-ip-address "" \
  --admin-username azureuser --generate-ssh-keys

az network vnet peering create \
  --resource-group rg-vnet-lab --name app-to-data \
  --vnet-name vnet-app --remote-vnet vnet-data \
  --allow-vnet-access

az network vnet peering create \
  --resource-group rg-vnet-lab --name data-to-app \
  --vnet-name vnet-data --remote-vnet vnet-app \
  --allow-vnet-access

5. Verify. SSH to the web VM from your machine (the NSG admits only your IP — try from a phone hotspot if you want to watch it time out), then hop to the private VMs from inside. The peered VM at 10.20.1.4 is reachable only because of step 4; pause the peering and the ping dies.

WEB_IP=$(az vm show -d --resource-group rg-vnet-lab \
  --name vm-web --query publicIps -o tsv)

ssh -A azureuser@$WEB_IP

# from inside vm-web:
ping -c 3 10.10.2.4        # vm-db, same VNet, different subnet
ping -c 3 10.20.1.4        # vm-tools, across the peering
ssh [email protected]    # full session over the peered link

While you are connected, run the effective-route check from your own machine in a second terminal and find the VNetPeering route that makes the second ping possible.

az network nic show-effective-route-table \
  --resource-group rg-vnet-lab --name vm-webVMNic -o table

6. Tear it down. One command, because everything lives in one resource group. This is the habit worth keeping from the whole lab.

az group delete --name rg-vnet-lab --yes --no-wait

Worthwhile extensions once the basics feel boring: replace the IP-based NSG rule with two ASGs and an intent rule between them; add a route table that blackholes internet traffic from snet-db with a None next hop; or stand up a private DNS zone with auto-registration and watch the VM records appear.

Virtual networks

A VNet is a regional network you fully control

Subnets, and the five addresses you never get

Network security groups: the rule engine

Application security groups: rules that say what you mean

Peering, and why hub-and-spoke exists

Private Endpoints versus service endpoints

User-defined routes and forced tunnelling

The four load balancers, and how to choose

VPN Gateway and ExpressRoute at a glance

DNS inside a VNet

CLI lab: build it, peer it, prove it, delete it

Further reading

04 — Virtual machines