18 / 28

Linux / 18

ip

You are on a box you have never seen before and a service on it cannot reach something it should. Before any packet capture, before any firewall theory, two questions come first: what addresses does this machine actually have, and which way will its packets leave? The ip command answers both, and it replaced four older tools to do it. This page covers the five invocations worth knowing cold, reads the output line by line, walks three production incidents, and ends with a drill that touches nothing.

The question it answers

Every network problem on a Linux box eventually reduces to a chain: does this machine have an address, does it have a route to the destination, can it resolve the next hop's hardware address, and is the interface actually up and passing frames. The ip command is the front door to all four. It reads and edits the kernel's view of network identity: the interfaces, the addresses bound to them, the routing table that decides where each packet goes, and the neighbour table that maps next-hop IPs to MAC addresses.

Historically that view was scattered across four tools. ifconfig showed interfaces and addresses, route (and netstat -r) showed the routing table, and arp showed the neighbour cache. All of them came from the net-tools package, all of them speak an old kernel interface, and all of them were replaced by the iproute2 suite, of which ip is the main binary. This is not a cosmetic renaming. The old tools predate features the kernel has had for decades: multiple addresses on one interface shown properly, multiple routing tables, address scopes, modern interface types. ifconfig on a current machine can show you an incomplete picture and look perfectly confident doing it, which is worse than showing nothing. The pitfalls section has the details; the short version is that on any box you care about, ip is the tool and the others are history.

What ip does not do is also worth a sentence. It does not show you sockets or which process owns a connection; that is ss. It does not capture traffic; that is tcpdump. It tells you what the box would do with a packet, not what it did do with one. In the standard incident sequence it comes first: establish identity and routing with ip, then move up to sockets and captures once you know the ground floor is sane. The wider decision tree for "is it the network at all" lives in is it the network?

The five invocations that matter

Like most of iproute2, ip has an enormous surface. It can build tunnels, manage policy rules, and configure things you will never touch outside a network team. For diagnosis, five invocations cover the daily work. All of them are read-only as written here, so none of them needs root and none of them can break anything.

Invocation	What it shows	When you reach for it
`ip addr`	Every interface with its addresses, CIDR masks, scopes, and state	First. Does this box have the address you think it has, on the subnet you think it is on?
`ip route`	The routing table: prefixes, next hops, interfaces, metrics	Second. Where do packets leave, and is there a default route at all?
`ip route get 8.8.8.8`	The exact route, interface, and source address the kernel picks for one destination	The killer move. Stop inferring from the table; ask the kernel directly.
`ip link` / `ip -s link`	Layer-2 state: up or down, MTU, MAC, and with `-s` the packet and error counters	When addresses and routes look right but traffic still dies, or large payloads fail.
`ip neigh`	The neighbour table (the ARP cache): next-hop IP to MAC mappings and their state	When the next hop itself is in doubt: FAILED entries mean nobody answered for that address.

One more belongs on the list as a habit rather than a diagnostic: ip -br addr (and its sibling ip -br link). The -br flag means brief, and it collapses the full output into one line per interface: name, state, addresses. It is the view you want ninety percent of the time, and it is worth an alias.

$ ip -br addr
lo               UNKNOWN        127.0.0.1/8 ::1/128
eth0             UP             10.0.4.12/24 fe80::858:aff:fe00:40c/64
eth1             UP             192.168.50.7/24
docker0          DOWN           172.17.0.1/16

Four lines, and you already know a lot: this box has a loopback, a primary interface on the 10.0.4.0/24 subnet, a second interface on a 192.168.50.0/24 network, and a Docker bridge that is down. Note the state column: UNKNOWN on loopback is normal (loopback has no carrier concept), UP means the link is administratively up and has carrier, and DOWN on docker0 just means no container is attached. If a physical interface you depend on says DOWN, you can stop reading routing tables and go look at the cable, the switch port, or the hypervisor.

Worth aliasing today. alias ipb='ip -br -c addr' gives you the brief view with colour. On a strange box during an incident, ip -br addr then ip route is a ten-second ritual that answers half the questions you came with.

Reading the output

ip addr, line by line

The full ip addr output looks dense the first time, but every field is there for a reason and most of them matter during a real incident. Here is a typical two-interface box, trimmed to loopback and the primary NIC.

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 0a:58:0a:00:04:0c brd ff:ff:ff:ff:ff:ff
    inet 10.0.4.12/24 brd 10.0.4.255 scope global dynamic eth0
       valid_lft 2856sec preferred_lft 2856sec
    inet6 fe80::858:aff:fe00:40c/64 scope link
       valid_lft forever preferred_lft forever

Take eth0 from the top. The angle-bracket flags are the link's properties: UP means an administrator (or the boot config) enabled the interface, and LOWER_UP means the physical layer below it has carrier, a live cable or a happy virtual NIC. You want both. An interface that is UP without LOWER_UP is configured but disconnected, like a lamp that is switched on with no bulb. The mtu 1500 is the largest payload this link will carry in one frame, which becomes the whole story in the third scenario below. state UP repeats the operational verdict, and qlen 1000 is the transmit queue length, the buffer of packets waiting for the hardware. You rarely act on qlen, but knowing it is a queue explains why an overloaded link adds latency before it drops anything.

The inet line is where identity lives. 10.0.4.12/24 is the address and the subnet mask in one token: this box is 10.0.4.12, and it considers everything in 10.0.4.0 through 10.0.4.255 to be directly on its own network, reachable without a router. Read the suffix every time. A box configured as /24 when the network is really a /25 will try to talk directly to neighbours that are actually behind a router, and the failure looks like random unreachability rather than a typo. scope global means the address is valid for talking to the wider world. Loopback's address has scope host: valid only inside this machine. The IPv6 fe80:: address has scope link: valid only on this physical segment, never routed. The word dynamic tells you DHCP assigned this address, and the valid_lft 2856sec below it is the lease countdown. A static address says nothing there and lives forever.

ip route, and the one rule that explains it

$ ip route
default via 10.0.4.1 dev eth0 proto dhcp metric 100
10.0.4.0/24 dev eth0 proto kernel scope link src 10.0.4.12 metric 100
10.8.0.0/16 via 10.0.4.200 dev eth0 proto static
192.168.50.0/24 dev eth1 proto kernel scope link src 192.168.50.7

Each line is a rule: packets for this prefix go out this interface, either directly or via a next hop. The default line is the catch-all, equivalent to 0.0.0.0/0: anything not matched by a more specific line goes to the router at 10.0.4.1. The 10.0.4.0/24 ... scope link line says the local subnet is reached directly, no router involved, and src 10.0.4.12 records which source address the kernel will stamp on packets using this route. The 10.8.0.0/16 line is a static route someone added, pointing a private range at a different gateway on the same segment. The proto field is provenance, not behaviour: kernel means the route appeared automatically when the address was configured, dhcp and static mean what they say. metric breaks ties when two routes cover the same prefix; lower wins.

The rule that makes the whole table make sense is longest prefix match: for each packet, the kernel picks the matching route with the most specific prefix, and specificity beats everything else. A /24 beats a /16 beats the default, no matter what order the lines appear in. That single sentence is most of routing.

The routing decision for one packet. Every matching prefix is a candidate; the longest one wins; the winner names the interface, the next hop (if any), and the source address.

ip route get: ask, don't infer

You can run that match in your head, and on a four-line table you will get it right. On a real machine, with VPN routes and container bridges and a cloud agent injecting things, you will eventually get it wrong. So don't do it in your head. ip route get hands a hypothetical destination to the kernel and prints the decision it would make, using exactly the lookup the real packet will get.

$ ip route get 8.8.8.8
8.8.8.8 via 10.0.4.1 dev eth0 src 10.0.4.12 uid 1000
    cache
$ ip route get 192.168.50.40
192.168.50.40 dev eth1 src 192.168.50.7 uid 1000
    cache

Three answers per line, and each one can be the bug. The interface (dev): is traffic leaving the way you assumed? The next hop (via, absent when the destination is on a local subnet): is it the router you expected? And the source address (src): is the kernel stamping the address the far end will accept? That last one bites multi-homed boxes constantly, because firewalls and allow-lists are written in terms of source addresses, and the kernel chooses the source from the route, not from your intentions. ip route get is the single highest-value command on this page. It turns "I think it should go out eth0" into a fact, in one line, with no packets sent.

Three production scenarios

"The service is unreachable"

A deploy goes out, health checks fail, and the load balancer reports the backend unreachable. The temptation is to start at the top: application logs, TLS, DNS. Start at the bottom instead, because the bottom takes thirty seconds to clear. SSH in (if you can SSH in, layer 3 works at least for your path, which is itself information) and run ip -br addr. Is the address the load balancer is targeting actually bound on this box? Cloud reprovisioning, DHCP lease changes, and copy-pasted netplan files all produce machines whose real address is not the one in the service registry. Then read the mask: an address of 10.0.4.12/16 on a network that is really carved into /24s means this box believes the entire 10.0.x.x space is local and will ARP for addresses it should be routing to. The symptom is maddeningly partial: same-subnet neighbours work, everything else times out.

Next, ip route. Is there a default route at all? A box that boots with a misconfigured gateway serves local traffic perfectly and drops everything else, which on a segmented network can look like "the service is up but flaky" for hours. Then ip neigh for the gateway's entry: REACHABLE or STALE are both fine (stale just means not recently confirmed), but FAILED means the box asked who has the gateway's address and nobody answered, and now you are debugging layer 2, not your service.

$ ip neigh
10.0.4.1 dev eth0 lladdr 0a:58:0a:00:04:01 REACHABLE
10.0.4.33 dev eth0 lladdr 0a:58:0a:00:04:21 STALE
10.0.4.99 dev eth0 FAILED

Traffic leaving the wrong interface

A database box has two NICs: eth0 on the application network with the default route, eth1 on a dedicated replication network. Replication is slow, and the network team says replication traffic is showing up on the application network where it has no business being. Why would the kernel do that? Because routing is per-destination, not per-intention. If the replica's address is on 192.168.50.0/24 and the route for that prefix exists on eth1, fine. But if someone re-addressed the replica, or the eth1 route was never added on this box, the only matching route is the default, and the default goes out eth0. The kernel is not wrong; the table is.

Two interfaces, one table. The replica is reached via the /24 on eth1 only because that route exists; without it, replication falls through to the default and crosses the wrong network.

The diagnosis is one command, run on the actual box, with the actual replica address: ip route get 192.168.50.40. If the answer says dev eth1, routing is innocent and you look elsewhere. If it says via 10.0.4.1 dev eth0, you have your answer, and the src field gives you the second half of the story: traffic arriving at the replica from 10.0.4.12 instead of 192.168.50.7, which the replica's firewall may simply drop. This same shape produces asymmetric routing, where the request leaves one interface and the reply tries to come back another, and stateful firewalls in the middle drop the half they never saw. Whenever a multi-homed box behaves strangely, run ip route get for the destination in both directions before forming a theory.

Small requests work, large ones hang

The strangest one in the family. Health checks pass, small API calls succeed, and then a file upload or a chunky JSON response hangs forever. SSH works but printing a large file freezes the session. The pattern to recognise is size-dependent failure, and the cause is usually MTU: somewhere on the path, a link carries less than the 1500 bytes everyone assumes, often because a VPN or an overlay network (WireGuard, IPsec, VXLAN in a container cluster) spends 50 to 80 bytes of every frame on its own encapsulation header. Packets small enough to fit pass; packets that do not are dropped, and if the router's "too big" error messages are firewalled off somewhere (they often are), the sender never learns why. The connection does not reset. It just stops.

ip link shows each interface's MTU directly, and an overlay interface advertising 1420 or 1450 next to a physical NIC at 1500 tells you encapsulation is in play. The counters from -s add the second clue:

$ ip -s link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 0a:58:0a:00:04:0c brd ff:ff:ff:ff:ff:ff
    RX:  bytes packets errors dropped  missed   mcast
    48217553211 41273882      0    1204       0       0
    TX:  bytes packets errors dropped carrier collsns
    9385527140 22094471      0       0       0       0

Errors mean corrupted frames, a bad cable or NIC. Drops mean the kernel received frames and threw them away, usually buffer pressure. A small, static drop count is life; a count that climbs while you watch is a finding. For the MTU question specifically, the confirmation test is a ping with fragmentation forbidden: ping -M do -s 1472 host sends a packet that needs the full 1500 (1472 of payload plus 28 of headers). If that fails while -s 1392 succeeds, something on the path tops out near 1420, and you have found your overlay. The fix is to lower the MTU on the relevant interface or fix the path; the diagnosis is the part ip gives you.

What's underneath

Everything ip prints is a view over three kernel structures, and the chain between them is the chain every outgoing packet walks. First, the interface list: each NIC, bridge, tunnel, or loopback is a kernel object with a state, an MTU, a hardware address, and a set of IP addresses bound to it. ip link reads the object, ip addr reads the object plus its addresses.

Second, the routing table. When a process sends to a destination, the kernel runs the longest-prefix-match lookup you saw in the diagram and gets back a verdict: which interface, whether a gateway is involved, and which source address to use. This happens for every destination, on every box, including ones you do not think of as routers; "routing" is not something that only happens in the network core. How addresses and prefixes work is covered in IP, and how routers stitch these per-hop decisions into a path across the internet is routing.

Third, the neighbour table. A route names the next hop by IP address, but an Ethernet frame needs a MAC address to leave the building. ARP (and its IPv6 successor, neighbour discovery) resolves one to the other, and the kernel caches the answers in the table that ip neigh prints, complete with a freshness state per entry. When a destination is on the local subnet the kernel ARPs for the destination itself; when it is remote, it ARPs for the gateway and never for the destination, which is why a box can talk to the whole internet while holding MAC addresses for exactly one device. The full descent from IP packet to electrical signal lives in bytes on the wire.

One boundary worth marking: this entire page is about packets, not connections. The routing lookup neither knows nor cares whether a packet belongs to a TCP stream; connections are a socket-layer idea, built on top of this machinery and visible through different tools. Where sockets pick up the story is sockets, and the tool for inspecting them is ss.

Pitfalls

Trusting ifconfig on a modern box. The old tool does not just look dated; it can mislead. ifconfig historically shows one IPv4 address per interface, so a box with secondary addresses (common with virtual IPs and failover setups) shows you the first and hides the rest. It knows nothing about modern interface types or multiple routing tables, and on many distributions it is not installed at all, so the muscle memory fails exactly when you are on an unfamiliar machine under pressure. If a teammate's ifconfig output disagrees with your ip addr output, believe ip. They are reading different interfaces to the same kernel, and only one of them speaks the current one.

Assuming there is only one routing table. There is not. Linux supports many routing tables plus a rule layer that picks between them based on source address, firewall marks, and more; this is policy routing, and VPN clients, container runtimes, and cloud multi-NIC setups all use it. The practical consequence: ip route shows only the main table, so a VPN can be steering all your traffic while the main table looks untouched. You do not need to operate policy routing from this page, just to know the trapdoor exists: ip rule lists the rules, and ip route show table all shows everything. ip route get, helpfully, evaluates the whole stack of rules, which is one more reason it beats reading tables by hand.

Fighting the network manager. On most modern systems a daemon owns network configuration: NetworkManager on desktops and many servers, systemd-networkd or netplan elsewhere, cloud-init on first boot. If you change an address or route by hand with ip while one of these is active, the daemon may put things back on its next renewal or reconfiguration pass, minutes or hours later. The result is a fix that silently un-fixes itself, which is far more confusing than no fix at all. Find out who owns the config before editing live state, and make the change in that system's terms.

Forgetting that ip changes do not survive a reboot. Related but distinct: everything ip sets is kernel state, and kernel state dies with the kernel. An address added with ip addr add or a route added with ip route add is gone after a reboot unless it is also written into the persistent config (netplan, NetworkManager profiles, networkd units). The classic failure is the emergency static route added during an incident that evaporates during a maintenance reboot three months later, reopening the incident for whoever is on call that night. If you change live state, write it down and persist it the same day.

A drill you can run right now

Everything below is read-only. It changes nothing, needs no root, and works on any Linux machine, including production. Ten minutes, and the address-route-neighbour chain stops being a diagram and becomes something you have walked on a real box.

Step 1, identity. Run ip -br addr. Count the interfaces and account for each one: loopback, the primary NIC, and whatever else lives there (a Docker bridge, a VPN tunnel, a virtual NIC). For the primary interface, read the address and say the subnet out loud from the mask: a /24 means the last octet varies, a /20 means the box considers a 4096-address block local. If you cannot derive the subnet from the suffix, that is the gap to close first, because every routing question depends on it.

Step 2, the table. Run ip route. Find the default line and name the gateway. Find the scope link line that matches your primary address and notice the src on it. For every other line, try to say in one sentence why it exists; on a laptop with Docker and a VPN, the answers ("that is the container bridge", "that is the corporate range through the tunnel") are a tour of everything networking on your machine.

Step 3, two predictions. Before running anything, predict the route for two destinations: one on your local subnet and one out on the internet. Then check both:

$ ip route get 8.8.8.8
8.8.8.8 via 10.0.4.1 dev eth0 src 10.0.4.12 uid 1000
    cache
$ ip route get 10.0.4.1
10.0.4.1 dev eth0 src 10.0.4.12 uid 1000
    cache

Read the difference: the internet destination has a via, the local one does not, because the kernel reaches the local subnet directly. If either answer surprises you (a destination you thought was local goes via the gateway, or traffic picks an interface you did not expect), you have just learned something true about your network that you did not know, which is the entire point of the exercise. Follow up with ip neigh and find your gateway's entry with its MAC address: that is the one device your box actually talks to for everything beyond the local subnet.

Step 4, the link layer. Run ip -s link show for your primary interface. Read the MTU and check it is what you expect (1500 on plain Ethernet, lower on tunnels and overlays). Then read the RX and TX counters: bytes, packets, errors, drops. Run it again a minute later and compare. On a healthy box the byte counters climb and the error counters do not; if you ever need to argue "the NIC is fine" or "the NIC is not fine" during an incident, this pair of snapshots is how the argument ends.

If you remember one line. ip -br addr for who this box is, ip route for where its packets go, and ip route get DEST when you need the kernel's answer for one destination instead of your own guess.

ip

The question it answers

The five invocations that matter

Reading the output

ip addr, line by line

ip route, and the one rule that explains it

ip route get: ask, don't infer

Three production scenarios

"The service is unreachable"

Traffic leaving the wrong interface

Small requests work, large ones hang

What's underneath

Pitfalls

A drill you can run right now

Further reading

19 — nc & mtr