08 / 28

Linux / 08

tcpdump

The client swears it sent the request. Your server's access log swears nothing arrived. The load balancer's metrics say something else again. When the logs disagree, you need a witness that cannot be wrong, and there is exactly one: the packets themselves. tcpdump shows you what actually crossed the wire — every SYN, every RST, every retransmission, timestamped to the microsecond. This page covers the five flags worth memorising, decodes a TCP handshake field by field, walks three production incidents, and ends with a drill you can run against a local server without touching anything that matters.

The question it answers

Every layer of a networked system keeps its own account of what happened, and every account is an interpretation. The application logs what it believes it received, after the runtime, the TLS library, and the HTTP parser have each had a turn. The load balancer logs what it believes it forwarded. The client logs what it believes it sent. Most days these stories agree and nobody checks. On the bad days they diverge: the client reports timeouts, the server reports an idle Tuesday, and the metrics dashboard reports a third reality that matches neither. Now you have a question none of the logs can settle, because every log is a party to the dispute.

tcpdump is the court of final appeal. It does not ask the application what it saw; it copies the packets themselves as they pass through the kernel, before or after everyone else's opinions get attached. If a SYN reached the host, it is in the capture. If the server answered with a RST, the capture shows which direction the RST travelled and what was in it. If the request never arrived at all, the capture is silent, and silence from a packet capture is evidence in a way that silence from a log never is — a log can be misconfigured, rotated, sampled, or simply looking at the wrong thing. The capture is the traffic.

It helps to be precise about what the tool is not. It is not a connection-state viewer; ss tells you which sockets exist and what state they are in, while tcpdump tells you what travelled between them, and the two views complement each other in nearly every network incident. It is not a protocol analyser in the full sense; it prints one line per packet and leaves deep dissection to Wireshark, which is exactly the division of labour you want. And it is not free to run: capturing everything on a busy interface costs CPU and disk, which is why half the flags below exist to narrow what you ask for. What it gives you in exchange is ground truth, and during the worst incidents ground truth is the only currency that spends.

The five flags that matter

Like every venerable Unix tool, tcpdump has accumulated decades of options. Five of them carry the daily work, and one belongs in your fingers as a reflex.

Flag	What it does	When you reach for it
`-i any` / `-i eth0`	Which interface to capture on	`any` when you are not sure which path the traffic takes (including loopback); a named interface once you know, for less noise
`-nn`	No reverse DNS, no port-to-service-name lookup	Always. Every invocation. See below.
`'host 10.0.9.55 and port 8080'`	A capture filter — BPF — that drops non-matching packets in the kernel	Any box with real traffic. The filter is the difference between a usable capture and a firehose.
`-w /tmp/cap.pcap`	Write raw packets to a file instead of printing	Capture during the incident, analyse in Wireshark afterwards — the right split almost every time
`-c 500` / `-s 128`	Stop after 500 packets / keep only the first 128 bytes of each	Guard rails so a capture cannot melt the box or fill the disk

The capture filter deserves a closer look because it is its own small language. The primitives are host, net, port, and portrange, optionally prefixed with src or dst to pin a direction, and they combine with and, or, and not. So 'host 10.0.9.55 and port 8080' is the conversation between you and one client on one port; 'tcp port 443 and not host 10.0.0.7' is all TLS traffic except the health checker that would otherwise drown the capture; 'src host 10.0.9.55 and tcp[tcpflags] & tcp-syn != 0' is only the SYNs arriving from one machine. Quote the whole expression so the shell does not eat the operators. These filters compile to BPF bytecode and run inside the kernel, which matters for reasons covered further down — the short version is that a filtered capture discards unwanted packets before they ever cross into userspace, and an unfiltered one does not.

Why -nn is non-negotiable. Without it, tcpdump tries to resolve every IP address to a hostname and every port number to a service name as it prints. On a box with a slow resolver that turns a live capture into a stuttering mess, and even when DNS is healthy, the lookups generate their own DNS packets — which then appear in your capture, polluting the evidence with traffic the investigation itself created. -nn prints raw numbers immediately and adds nothing to the wire. Raw numbers are what you want during an incident anyway.

The guard rails earn a sentence each. -c puts a hard ceiling on how long the capture runs, which means you can start one and not worry about forgetting it. -s (snaplen) limits how many bytes of each packet are kept; headers live in the first hundred bytes or so, so -s 128 keeps everything you need for a connection-level investigation while cutting the disk and CPU cost of a payload-heavy capture by an order of magnitude. When you genuinely need full payloads, -s 0 means "the whole packet" — just know that you asked for it.

Reading the output

Here is the opening of a healthy connection: a client at 10.0.9.55 connecting to a service on 10.0.4.12:8080. Three packets, one line each. This is the TCP three-way handshake, live, and learning to read these three lines fluently is most of the value of this page.

$ sudo tcpdump -nn -i eth0 'host 10.0.9.55 and port 8080'
15:42:01.118332 IP 10.0.9.55.49210 > 10.0.4.12.8080: Flags [S], seq 3920841302, win 64240, options [mss 1460,sackOK,TS val 318411 ecr 0,nop,wscale 7], length 0
15:42:01.118401 IP 10.0.4.12.8080 > 10.0.9.55.49210: Flags [S.], seq 1882093411, ack 3920841303, win 65160, options [mss 1460,sackOK,TS val 902117 ecr 318411,nop,wscale 7], length 0
15:42:01.119012 IP 10.0.9.55.49210 > 10.0.4.12.8080: Flags [.], ack 1, win 502, length 0

Take the first line apart field by field. 15:42:01.118332 is the kernel's timestamp, microsecond precision, and the deltas between timestamps are often the entire finding — they are how you measure latency that no log records. IP names the network-layer protocol. Then the addressing: 10.0.9.55.49210 > 10.0.4.12.8080 reads as source address and port, an arrow, destination address and port. The arrow is the direction of travel, and watching it flip between lines is how you follow a conversation.

Flags [S] is the TCP flag field, compressed into bracketed letters: S is SYN, F is FIN, R is RST, P is PSH, and — the one everyone trips on — the dot . is ACK. So [S] is a bare SYN, [S.] is SYN plus ACK, [.] is a pure acknowledgement, [P.] is data being pushed with an ACK riding along, and [F.] and [R.] are FIN and RST with their ACKs. Once the dot stops looking like punctuation and starts looking like a flag, the output stops looking like noise.

The rest of the line: seq 3920841302 is the client's initial sequence number, the random starting point for its byte numbering. win 64240 is the receive window — how much the sender of this packet is currently willing to accept, which is the number to watch when you suspect a slow consumer. The options list is the capability negotiation: mss 1460 caps segment size, wscale 7 multiplies the window field, sackOK enables selective acknowledgement. And length 0 is the payload length: zero, because handshake packets carry no data. Line two answers with [S.] and ack 3920841303 — exactly the client's sequence number plus one, the proof the SYN arrived. Line three completes the handshake, and notice it says ack 1, not ack 1882093412: after the handshake, tcpdump switches to relative sequence numbers so you can read byte counts directly. The mechanics behind all of this — why sequence numbers are randomised, what the window really regulates — live in the TCP page, and you can step through the exchange packet by packet in the handshake simulator.

The handshake as tcpdump sees it. The server is committed after sending [S.]; the client after sending the final [.]. The timestamps on the left are the same ones in the capture above.

A refusal, on the wire

Now the unhappy paths, because those are what you are usually capturing for. Here is what "connection refused" looks like as packets — a client trying port 9090, where nothing is listening:

15:43:10.220154 IP 10.0.9.55.49444 > 10.0.4.12.9090: Flags [S], seq 199174521, win 64240, options [mss 1460,sackOK,TS val 318999 ecr 0,nop,wscale 7], length 0
15:43:10.220199 IP 10.0.4.12.9090 > 10.0.9.55.49444: Flags [R.], seq 0, ack 199174522, win 0, length 0

The SYN arrives, and 45 microseconds later the server's kernel answers with [R.] — RST plus ACK. The ack number proves it saw the SYN; the RST says no socket wants it. That near-zero turnaround is itself a clue: a RST this fast comes from a kernel with no listener on the port, not from an application deciding anything. The client's connect() fails with ECONNREFUSED, and now you know exactly why and exactly who said so. Compare that with a SYN that gets no answer at all — no [S.], no [R.], just the client retrying — which means a firewall dropping packets silently, or the SYN never arriving. Refused and dropped look identical in most application logs ("could not connect"); on the wire they look nothing alike.

A retransmission

The third pattern worth recognising on sight is the retransmission. TCP resends anything that is not acknowledged in time, doubling the wait between attempts:

15:44:02.101533 IP 10.0.4.12.8080 > 10.0.9.55.49210: Flags [P.], seq 1:1449, ack 738, win 501, length 1448
15:44:02.309214 IP 10.0.4.12.8080 > 10.0.9.55.49210: Flags [P.], seq 1:1449, ack 738, win 501, length 1448
15:44:02.725611 IP 10.0.4.12.8080 > 10.0.9.55.49210: Flags [P.], seq 1:1449, ack 738, win 501, length 1448

Same direction, same seq 1:1449, same length, and the gaps between attempts roughly doubling — 208 ms, then 416 ms. That repeated sequence range is the signature: the sender is reoffering the same bytes because no ACK came back. One retransmission in a capture is weather. A steady drizzle of them is packet loss, and the direction of the retransmitted packets tells you which leg of the path is losing — data retransmitted toward the client means the loss is somewhere between you and them, on either the data's path out or the ACKs' path back.

Three production scenarios

"Is the request even reaching us?"

A partner team reports that calls to your API time out. Your access log has no trace of them, but your access log only records requests that completed enough to be logged. The question to settle first, before anyone theorises, is whether their packets reach your host at all. Start a narrow capture on the server and have them retry:

$ sudo tcpdump -nn -i any -c 50 'host 203.0.113.40 and tcp port 443'
tcpdump: listening on any, link-type LINUX_SLL2
16:02:11.402113 eth0  In  IP 203.0.113.40.51820 > 10.0.4.12.443: Flags [S], seq 884220197, win 64240, length 0
16:02:11.402167 eth0  Out IP 10.0.4.12.443 > 203.0.113.40.51820: Flags [S.], seq 71022381, ack 884220198, win 65160, length 0
16:02:12.431808 eth0  In  IP 203.0.113.40.51820 > 10.0.4.12.443: Flags [S], seq 884220197, win 64240, length 0
16:02:12.431861 eth0  Out IP 10.0.4.12.443 > 203.0.113.40.51820: Flags [S.], seq 71022381, ack 884220198, win 65160, length 0

This capture is worth a thousand meetings. The SYN arrives — so the path inbound works. Your host answers with [S.] — so your side is healthy and willing. But then the client sends the same SYN again, one second later, same sequence number: it never received your reply. The handshake's third packet never comes. Conclusion, with evidence: the problem is on the return path — asymmetric routing, a NAT device dropping the reply, a stateful firewall that saw only half the conversation. Without the capture, this incident is two teams pointing at each other's logs. With it, the fault domain shrinks to "between my NIC and yours, in one direction," and the network team has something concrete to chase. Each outcome of this capture means something different: SYNs arriving and being answered (look at the return path), SYNs arriving and met with RSTs (look at your listener), no SYNs at all (the traffic dies before your host — DNS, routing, an upstream firewall, or it was never sent).

Who sent the RST?

Connections through a particular path keep dying mid-flight with "connection reset by peer." Both endpoints deny sending resets, and both can be telling the truth, because middleboxes — stateful firewalls, NAT gateways, load balancers — inject RSTs of their own, typically when an idle connection outlives their state-table timeout. The way to settle it is to capture at both ends at once and compare:

on the server:  16:30:44.118 IP 10.0.9.55.50112 > 10.0.4.12.8080: Flags [R], seq 8841, win 0, length 0
on the client:  (no outbound RST at 16:30:44 — the client never sent it)

The server received a RST that claims to come from the client, but the client's own capture shows it never sent one. Something between them forged it, stamping the client's address on a packet the client never made. That is the fingerprint of a middlebox clearing out a connection it considers dead — common with long-lived, mostly-idle connections like database pools and message-queue consumers sitting behind a firewall with a five-minute idle timeout. Details give the forger away too: a suspiciously bare [R] with no ACK where the real stack would send [R.], a TTL that does not match the client's other packets, sequence numbers that sit oddly against the conversation. The fix is usually TCP keepalives set below the middlebox timeout, or a longer timeout on the device. But the diagnosis has to come first, and only captures from both ends can make it.

Small responses fine, large responses hang

The strangest-looking one, and a classic. Health checks pass. Small API responses return instantly. Any response larger than a couple of kilobytes hangs forever. The application team finds nothing, because there is nothing to find at their layer — this is a path MTU blackhole, and it lives in the IP layer.

16:51:08.114210 IP 10.0.4.12.443 > 198.51.100.23.55818: Flags [P.], seq 1:1429, ack 517, win 501, length 1428
16:51:08.114391 IP 10.0.4.12.443 > 198.51.100.23.55818: Flags [.], seq 1429:2877, ack 517, win 501, length 1448
16:51:08.339102 IP 10.0.4.12.443 > 198.51.100.23.55818: Flags [.], seq 1429:2877, ack 517, win 501, length 1448
16:51:08.787442 IP 10.0.4.12.443 > 198.51.100.23.55818: Flags [.], seq 1429:2877, ack 517, win 501, length 1448

Read the pattern: the 1428-byte segment goes out once and is never retried, so it was acknowledged. The full-size 1448-byte segment is retransmitted over and over and never acknowledged. Somewhere on the path sits a link with an MTU smaller than the packet — a VPN tunnel, an overlay network, anything that adds encapsulation headers. TCP sets the don't-fragment bit on its segments, so the router at the narrow link cannot fragment the packet; it is supposed to drop it and send back an ICMP "fragmentation needed" message telling the sender to use smaller packets. When a firewall somewhere blocks that ICMP — and firewalls that block all ICMP are depressingly common — the sender never learns. It just retransmits the same too-big packet into the same hole, forever. Big packets die, small ones sail through, and every layer above sees only an unexplained hang. The capture turns "it hangs sometimes, we cannot reproduce it" into "every segment over 1428 bytes is lost on this path," which is a sentence a network engineer can act on the same hour.

Where the tap actually sits

Knowing where tcpdump gets its copy of each packet keeps you from misreading captures. On Linux the tool opens an AF_PACKET socket, which asks the kernel for a copy of frames at the point where the device driver hands traffic to the network stack — just after the NIC and driver on the way in, just before them on the way out.

The receive path. tcpdump's copy is taken right after the driver, so a packet visible in the capture but invisible to the application points at something in between — usually local firewall rules.

Two consequences fall out of that placement. First, on the inbound side the tap fires before the local firewall: a packet that iptables or nftables later drops still shows up in your capture. "I can see the SYN in tcpdump but the application never gets a connection" therefore does not clear the host — check the local firewall rules before blaming the network. (Outbound, it is roughly the mirror image: the tap sees packets after the local rules have had their say.) Second, the tap hands over copies of frames just as the driver presents them, which is why captures sometimes show "impossible" 64 KB TCP packets: with offload features enabled, the NIC and driver merge segments before the tap sees them inbound and split them after it outbound, so you are watching what the kernel handled, not literally the bytes on the wire. For connection-level work this rarely matters; for byte-precise analysis it does, and turning offloads off with ethtool is the standard move.

The second piece of machinery is BPF, and it is the reason the capture filter is not just a convenience. The filter expression you pass compiles into a small bytecode program that the kernel runs against every candidate packet at the tap. Matching packets get copied to tcpdump; everything else is discarded right there, cheaply, without crossing into userspace at all. A filtered capture on a 10 Gb interface is routine. An unfiltered one asks the kernel to copy every frame on that interface into a userspace process, which is how captures end up dropping the very packets you needed — the packets dropped by kernel count tcpdump prints at exit is the receipt. This is the same BPF that, in its extended form, runs much of modern Linux observability; the capture filter is its original job.

Which leads to the last habit worth forming: for anything beyond a quick look, capture with -w and analyse afterwards. A pcap file keeps the raw packets, so Wireshark can reassemble streams, graph round-trip times, flag every retransmission and zero-window stall, and let you re-ask questions you did not think of during the incident. Squinting at a scrolling terminal commits you to noticing everything in real time, once. The honest division of labour: tcpdump is the capture tool you can rely on finding on a bare production box, and Wireshark is the analysis tool on your laptop. tcpdump -r file.pcap reads a capture back when the laptop is not an option, and you can even apply a new filter on the way: tcpdump -nn -r file.pcap 'tcp port 443'.

Pitfalls

Capturing without a filter on a busy box. The most expensive mistake. Unfiltered, every packet on the interface gets copied to userspace, the kernel starts dropping capture copies under load, and with -w and no -c the pcap grows at line rate — a 10 Gb interface can produce a gigabyte of capture in seconds. Always give it a filter, and on anything production-shaped, a -c ceiling or a snaplen too. Check the drop counters the tool prints when it exits; a capture with significant kernel drops is evidence with pages missing.

Forgetting -nn. Covered above, but it bites differently here than with other tools: the DNS lookups tcpdump performs show up as new packets on the wire, so an unfiltered, name-resolving capture literally contaminates its own crime scene. Make -nn a reflex.

Expecting to read TLS payloads. Most traffic worth debugging is encrypted, and tcpdump does not change that: the payload bytes in a TLS connection are ciphertext, full stop. What you still get is everything around the payload, and that is more than it sounds: connection establishment and teardown, the TLS handshake itself (including the server name in the ClientHello on most deployments), timing of every packet, sizes, retransmissions, resets, window stalls. Whole classes of incidents — all three scenarios above, in fact — are diagnosable without reading one byte of plaintext. For the payload, go to the endpoints: application logs, or curl with -v reproducing the request from a box you control.

Forgetting it needs root. Opening a packet socket requires CAP_NET_RAW, so unprivileged tcpdump fails immediately with a permissions error. Unlike some tools, at least it fails loudly rather than showing you a misleading subset. Inside a container, the same rule applies to the container's capabilities — and remember that a container capture sees the container's network namespace, which on an overlay network may not be the traffic the host's NIC sees.

Treating captures casually. A pcap is a recording of other people's traffic. Even with TLS, it holds IPs, ports, hostnames from DNS and SNI, timing, and volume — and any unencrypted protocol's full payload. That can fall under the same handling rules as production data: capture the minimum (filters and snaplen again, doing compliance work this time), store pcaps like secrets, delete them when the incident closes, and know your organisation's policy before shipping one to a vendor or attaching it to a ticket. The habit of narrow, short, deliberately scoped captures keeps you fast and keeps you out of awkward conversations.

A drill you can run right now

Everything below stays on your own machine: a throwaway web server on loopback, your own requests, one pcap file in /tmp. Ten minutes, and the handshake stops being a diagram and becomes something you have watched happen.

Step 1 — start a server and a capture. In one terminal, start a local web server; in another, point a capture at loopback with a port filter and a packet ceiling:

terminal 1 $ python3 -m http.server 8000
terminal 2 $ sudo tcpdump -nn -i lo -c 20 'tcp port 8000'
tcpdump: listening on lo, link-type EN10MB

Step 2 — make one request and read the handshake. In a third terminal, run curl -s http://127.0.0.1:8000/ > /dev/null and watch the capture terminal. The first three lines are the handshake from the diagram above: find the [S], the [S.] with its ack equal to the client's seq plus one, and the closing [.]. Then comes the request as a [P.] with a nonzero length — and since this is plain HTTP on loopback, that length is your GET request, visible because nothing encrypted it. Watch the teardown too: [F.] from each side, each acknowledged. One curl, one complete TCP lifetime, maybe ten packets end to end.

Step 3 — see a refusal. Run curl -s http://127.0.0.1:8001/ — port 8001, where nothing listens (re-run the capture with 'tcp port 8001' first). One [S] out, one [R.] straight back, and curl reports "connection refused." You have now seen ECONNREFUSED as packets, which is the version of it that settles arguments.

Step 4 — write a pcap and read it back. The capture-now-analyse-later workflow, on training wheels:

$ sudo tcpdump -nn -i lo -c 20 -w /tmp/drill.pcap 'tcp port 8000'
(run the curl again in another terminal, wait for "20 packets captured")
$ tcpdump -nn -r /tmp/drill.pcap | head -5
reading from file /tmp/drill.pcap, link-type EN10MB
17:20:43.551208 IP 127.0.0.1.52114 > 127.0.0.1.8000: Flags [S], seq 1014522871, win 65495, length 0
17:20:43.551219 IP 127.0.0.1.8000 > 127.0.0.1.52114: Flags [S.], seq 3300114532, ack 1014522872, win 65483, length 0
17:20:43.551231 IP 127.0.0.1.52114 > 127.0.0.1.8000: Flags [.], ack 1, win 512, length 0
$ tcpdump -nn -r /tmp/drill.pcap 'tcp[tcpflags] & tcp-syn != 0'

Notice that reading the file back needs no sudo — the privilege was for the tap, not the file — and that the last command applies a fresh filter to an existing capture, pulling out just the SYNs. Open the same file in Wireshark if you have it nearby and click through the packets you just read as text; that round trip, terminal to GUI and back, is the whole workflow you will use on a real incident. Clean up with kill %1 for the server and delete the pcap.

If you remember one line. sudo tcpdump -nn -i any -c 200 'host CLIENT_IP and port PORT' — names off, a ceiling on, and a filter narrowing the world to one conversation. Add -w /tmp/case.pcap when you want Wireshark to do the reading.

tcpdump

The question it answers

The five flags that matter

Reading the output

A refusal, on the wire

A retransmission

Three production scenarios

"Is the request even reaching us?"

Who sent the RST?

Small responses fine, large responses hang

Where the tap actually sits

Pitfalls

A drill you can run right now

Further reading

09 — dig