tcpdump
The client swears it sent the request. Your server's access log swears nothing arrived.
The load balancer's metrics say something else again. When the logs disagree, you need a
witness that cannot be wrong, and there is exactly one: the packets themselves.
tcpdump shows you what actually crossed the wire — every SYN, every RST,
every retransmission, timestamped to the microsecond. This page covers the five flags
worth memorising, decodes a TCP handshake field by field, walks three production
incidents, and ends with a drill you can run against a local server without touching
anything that matters.
The question it answers
Every layer of a networked system keeps its own account of what happened, and every account is an interpretation. The application logs what it believes it received, after the runtime, the TLS library, and the HTTP parser have each had a turn. The load balancer logs what it believes it forwarded. The client logs what it believes it sent. Most days these stories agree and nobody checks. On the bad days they diverge: the client reports timeouts, the server reports an idle Tuesday, and the metrics dashboard reports a third reality that matches neither. Now you have a question none of the logs can settle, because every log is a party to the dispute.
tcpdump is the court of final appeal. It does not ask the application what it
saw; it copies the packets themselves as they pass through the kernel, before or after
everyone else's opinions get attached. If a SYN reached the host, it is in the capture. If
the server answered with a RST, the capture shows which direction the RST travelled and
what was in it. If the request never arrived at all, the capture is silent, and silence
from a packet capture is evidence in a way that silence from a log never is — a log can be
misconfigured, rotated, sampled, or simply looking at the wrong thing. The capture is the
traffic.
It helps to be precise about what the tool is not. It is not a connection-state viewer;
ss tells you which sockets exist and what state
they are in, while tcpdump tells you what travelled between them, and the two
views complement each other in nearly every network incident. It is not a protocol
analyser in the full sense; it prints one line per packet and leaves deep dissection to
Wireshark, which is exactly the division of labour you want. And it is not free to run:
capturing everything on a busy interface costs CPU and disk, which is why half the flags
below exist to narrow what you ask for. What it gives you in exchange is ground truth, and
during the worst incidents ground truth is the only currency that spends.
The five flags that matter
Like every venerable Unix tool, tcpdump has accumulated decades of options.
Five of them carry the daily work, and one belongs in your fingers as a reflex.
| Flag | What it does | When you reach for it |
|---|---|---|
-i any / -i eth0 | Which interface to capture on | any when you are not sure which path the traffic takes (including loopback); a named interface once you know, for less noise |
-nn | No reverse DNS, no port-to-service-name lookup | Always. Every invocation. See below. |
'host 10.0.9.55 and port 8080' | A capture filter — BPF — that drops non-matching packets in the kernel | Any box with real traffic. The filter is the difference between a usable capture and a firehose. |
-w /tmp/cap.pcap | Write raw packets to a file instead of printing | Capture during the incident, analyse in Wireshark afterwards — the right split almost every time |
-c 500 / -s 128 | Stop after 500 packets / keep only the first 128 bytes of each | Guard rails so a capture cannot melt the box or fill the disk |
The capture filter deserves a closer look because it is its own small language. The
primitives are host, net, port, and
portrange, optionally prefixed with src or dst to
pin a direction, and they combine with and, or, and
not. So 'host 10.0.9.55 and port 8080' is the conversation
between you and one client on one port; 'tcp port 443 and not host 10.0.0.7'
is all TLS traffic except the health checker that would otherwise drown the capture;
'src host 10.0.9.55 and tcp[tcpflags] & tcp-syn != 0' is only the SYNs
arriving from one machine. Quote the whole expression so the shell does not eat the
operators. These filters compile to BPF bytecode and run inside the kernel, which matters
for reasons covered further down — the short version is that a filtered capture discards
unwanted packets before they ever cross into userspace, and an unfiltered one does not.
tcpdump tries to
resolve every IP address to a hostname and every port number to a service name as it
prints. On a box with a slow resolver that turns a live capture into a stuttering mess,
and even when DNS is healthy, the lookups generate their own DNS packets — which then
appear in your capture, polluting the evidence with traffic the investigation itself
created. -nn prints raw numbers immediately and adds nothing to the wire.
Raw numbers are what you want during an incident anyway.The guard rails earn a sentence each. -c puts a hard ceiling on how long the
capture runs, which means you can start one and not worry about forgetting it.
-s (snaplen) limits how many bytes of each packet are kept; headers live in
the first hundred bytes or so, so -s 128 keeps everything you need for a
connection-level investigation while cutting the disk and CPU cost of a payload-heavy
capture by an order of magnitude. When you genuinely need full payloads, -s 0
means "the whole packet" — just know that you asked for it.
Reading the output
Here is the opening of a healthy connection: a client at 10.0.9.55 connecting
to a service on 10.0.4.12:8080. Three packets, one line each. This is the TCP
three-way handshake, live, and learning to read these three lines fluently is most of the
value of this page.
$ sudo tcpdump -nn -i eth0 'host 10.0.9.55 and port 8080' 15:42:01.118332 IP 10.0.9.55.49210 > 10.0.4.12.8080: Flags [S], seq 3920841302, win 64240, options [mss 1460,sackOK,TS val 318411 ecr 0,nop,wscale 7], length 0 15:42:01.118401 IP 10.0.4.12.8080 > 10.0.9.55.49210: Flags [S.], seq 1882093411, ack 3920841303, win 65160, options [mss 1460,sackOK,TS val 902117 ecr 318411,nop,wscale 7], length 0 15:42:01.119012 IP 10.0.9.55.49210 > 10.0.4.12.8080: Flags [.], ack 1, win 502, length 0
Take the first line apart field by field. 15:42:01.118332 is the kernel's
timestamp, microsecond precision, and the deltas between timestamps are often the entire
finding — they are how you measure latency that no log records. IP names the
network-layer protocol. Then the addressing: 10.0.9.55.49210 > 10.0.4.12.8080
reads as source address and port, an arrow, destination address and port. The arrow is the
direction of travel, and watching it flip between lines is how you follow a conversation.
Flags [S] is the TCP flag field, compressed into bracketed letters:
S is SYN, F is FIN, R is RST, P is
PSH, and — the one everyone trips on — the dot . is ACK. So [S]
is a bare SYN, [S.] is SYN plus ACK, [.] is a pure
acknowledgement, [P.] is data being pushed with an ACK riding along, and
[F.] and [R.] are FIN and RST with their ACKs. Once the dot
stops looking like punctuation and starts looking like a flag, the output stops looking
like noise.
The rest of the line: seq 3920841302 is the client's initial sequence number,
the random starting point for its byte numbering. win 64240 is the receive
window — how much the sender of this packet is currently willing to accept, which is the
number to watch when you suspect a slow consumer. The options list is the
capability negotiation: mss 1460 caps segment size, wscale 7
multiplies the window field, sackOK enables selective acknowledgement. And
length 0 is the payload length: zero, because handshake packets carry no
data. Line two answers with [S.] and ack 3920841303 — exactly
the client's sequence number plus one, the proof the SYN arrived. Line three completes the
handshake, and notice it says ack 1, not ack 1882093412: after
the handshake, tcpdump switches to relative sequence numbers so you can read
byte counts directly. The mechanics behind all of this — why sequence numbers are
randomised, what the window really regulates — live in
the TCP page, and you can step
through the exchange packet by packet in the
handshake simulator.
A refusal, on the wire
Now the unhappy paths, because those are what you are usually capturing for. Here is what "connection refused" looks like as packets — a client trying port 9090, where nothing is listening:
15:43:10.220154 IP 10.0.9.55.49444 > 10.0.4.12.9090: Flags [S], seq 199174521, win 64240, options [mss 1460,sackOK,TS val 318999 ecr 0,nop,wscale 7], length 0 15:43:10.220199 IP 10.0.4.12.9090 > 10.0.9.55.49444: Flags [R.], seq 0, ack 199174522, win 0, length 0
The SYN arrives, and 45 microseconds later the server's kernel answers with
[R.] — RST plus ACK. The ack number proves it saw the SYN; the RST says no
socket wants it. That near-zero turnaround is itself a clue: a RST this fast comes from a
kernel with no listener on the port, not from an application deciding anything. The
client's connect() fails with ECONNREFUSED, and now you know exactly why and
exactly who said so. Compare that with a SYN that gets no answer at all — no
[S.], no [R.], just the client retrying — which means a firewall
dropping packets silently, or the SYN never arriving. Refused and dropped look identical
in most application logs ("could not connect"); on the wire they look nothing alike.
A retransmission
The third pattern worth recognising on sight is the retransmission. TCP resends anything that is not acknowledged in time, doubling the wait between attempts:
15:44:02.101533 IP 10.0.4.12.8080 > 10.0.9.55.49210: Flags [P.], seq 1:1449, ack 738, win 501, length 1448 15:44:02.309214 IP 10.0.4.12.8080 > 10.0.9.55.49210: Flags [P.], seq 1:1449, ack 738, win 501, length 1448 15:44:02.725611 IP 10.0.4.12.8080 > 10.0.9.55.49210: Flags [P.], seq 1:1449, ack 738, win 501, length 1448
Same direction, same seq 1:1449, same length, and the gaps between attempts
roughly doubling — 208 ms, then 416 ms. That repeated sequence range is the signature: the
sender is reoffering the same bytes because no ACK came back. One retransmission in a
capture is weather. A steady drizzle of them is packet loss, and the direction of the
retransmitted packets tells you which leg of the path is losing — data retransmitted
toward the client means the loss is somewhere between you and them, on either the data's
path out or the ACKs' path back.
Three production scenarios
"Is the request even reaching us?"
A partner team reports that calls to your API time out. Your access log has no trace of them, but your access log only records requests that completed enough to be logged. The question to settle first, before anyone theorises, is whether their packets reach your host at all. Start a narrow capture on the server and have them retry:
$ sudo tcpdump -nn -i any -c 50 'host 203.0.113.40 and tcp port 443' tcpdump: listening on any, link-type LINUX_SLL2 16:02:11.402113 eth0 In IP 203.0.113.40.51820 > 10.0.4.12.443: Flags [S], seq 884220197, win 64240, length 0 16:02:11.402167 eth0 Out IP 10.0.4.12.443 > 203.0.113.40.51820: Flags [S.], seq 71022381, ack 884220198, win 65160, length 0 16:02:12.431808 eth0 In IP 203.0.113.40.51820 > 10.0.4.12.443: Flags [S], seq 884220197, win 64240, length 0 16:02:12.431861 eth0 Out IP 10.0.4.12.443 > 203.0.113.40.51820: Flags [S.], seq 71022381, ack 884220198, win 65160, length 0
This capture is worth a thousand meetings. The SYN arrives — so the path inbound works.
Your host answers with [S.] — so your side is healthy and willing. But then
the client sends the same SYN again, one second later, same sequence number: it
never received your reply. The handshake's third packet never comes. Conclusion, with
evidence: the problem is on the return path — asymmetric routing, a NAT device dropping
the reply, a stateful firewall that saw only half the conversation. Without the capture,
this incident is two teams pointing at each other's logs. With it, the fault domain shrinks
to "between my NIC and yours, in one direction," and the network team has something
concrete to chase. Each outcome of this capture means something different: SYNs arriving
and being answered (look at the return path), SYNs arriving and met with RSTs (look at
your listener), no SYNs at all (the traffic dies before your host — DNS, routing, an
upstream firewall, or it was never sent).
Who sent the RST?
Connections through a particular path keep dying mid-flight with "connection reset by peer." Both endpoints deny sending resets, and both can be telling the truth, because middleboxes — stateful firewalls, NAT gateways, load balancers — inject RSTs of their own, typically when an idle connection outlives their state-table timeout. The way to settle it is to capture at both ends at once and compare:
on the server: 16:30:44.118 IP 10.0.9.55.50112 > 10.0.4.12.8080: Flags [R], seq 8841, win 0, length 0 on the client: (no outbound RST at 16:30:44 — the client never sent it)
The server received a RST that claims to come from the client, but the client's own
capture shows it never sent one. Something between them forged it, stamping the client's
address on a packet the client never made. That is the fingerprint of a middlebox clearing
out a connection it considers dead — common with long-lived, mostly-idle connections like
database pools and message-queue consumers sitting behind a firewall with a five-minute
idle timeout. Details give the forger away too: a suspiciously bare [R] with
no ACK where the real stack would send [R.], a TTL that does not match the
client's other packets, sequence numbers that sit oddly against the conversation. The fix
is usually TCP keepalives set below the middlebox timeout, or a longer timeout on the
device. But the diagnosis has to come first, and only captures from both ends can make it.
Small responses fine, large responses hang
The strangest-looking one, and a classic. Health checks pass. Small API responses return instantly. Any response larger than a couple of kilobytes hangs forever. The application team finds nothing, because there is nothing to find at their layer — this is a path MTU blackhole, and it lives in the IP layer.
16:51:08.114210 IP 10.0.4.12.443 > 198.51.100.23.55818: Flags [P.], seq 1:1429, ack 517, win 501, length 1428 16:51:08.114391 IP 10.0.4.12.443 > 198.51.100.23.55818: Flags [.], seq 1429:2877, ack 517, win 501, length 1448 16:51:08.339102 IP 10.0.4.12.443 > 198.51.100.23.55818: Flags [.], seq 1429:2877, ack 517, win 501, length 1448 16:51:08.787442 IP 10.0.4.12.443 > 198.51.100.23.55818: Flags [.], seq 1429:2877, ack 517, win 501, length 1448
Read the pattern: the 1428-byte segment goes out once and is never retried, so it was acknowledged. The full-size 1448-byte segment is retransmitted over and over and never acknowledged. Somewhere on the path sits a link with an MTU smaller than the packet — a VPN tunnel, an overlay network, anything that adds encapsulation headers. TCP sets the don't-fragment bit on its segments, so the router at the narrow link cannot fragment the packet; it is supposed to drop it and send back an ICMP "fragmentation needed" message telling the sender to use smaller packets. When a firewall somewhere blocks that ICMP — and firewalls that block all ICMP are depressingly common — the sender never learns. It just retransmits the same too-big packet into the same hole, forever. Big packets die, small ones sail through, and every layer above sees only an unexplained hang. The capture turns "it hangs sometimes, we cannot reproduce it" into "every segment over 1428 bytes is lost on this path," which is a sentence a network engineer can act on the same hour.
Where the tap actually sits
Knowing where tcpdump gets its copy of each packet keeps you from
misreading captures. On Linux the tool opens an AF_PACKET socket, which asks
the kernel for a copy of frames at the point where the device driver hands traffic to the
network stack — just after the NIC and driver on the way in, just before them on the way
out.
Two consequences fall out of that placement. First, on the inbound side the tap fires
before the local firewall: a packet that iptables or nftables later drops still shows up
in your capture. "I can see the SYN in tcpdump but the application never gets a
connection" therefore does not clear the host — check the local firewall rules before
blaming the network. (Outbound, it is roughly the mirror image: the tap sees packets
after the local rules have had their say.) Second, the tap hands over copies of frames
just as the driver presents them, which is why captures sometimes show "impossible"
64 KB TCP packets: with offload features enabled, the NIC and driver merge segments
before the tap sees them inbound and split them after it outbound, so you are watching
what the kernel handled, not literally the
bytes on the wire.
For connection-level work this rarely matters; for byte-precise analysis it does, and
turning offloads off with ethtool is the standard move.
The second piece of machinery is BPF, and it is the reason the capture filter is not just
a convenience. The filter expression you pass compiles into a small bytecode program that
the kernel runs against every candidate packet at the tap. Matching packets get
copied to tcpdump; everything else is discarded right there, cheaply, without
crossing into userspace at all. A filtered capture on a 10 Gb interface is routine. An
unfiltered one asks the kernel to copy every frame on that interface into a userspace
process, which is how captures end up dropping the very packets you needed — the
packets dropped by kernel count tcpdump prints at exit is the receipt. This is
the same BPF that, in its extended form, runs much of modern Linux observability; the
capture filter is its original job.
Which leads to the last habit worth forming: for anything beyond a quick look, capture
with -w and analyse afterwards. A pcap file keeps the raw packets, so
Wireshark can reassemble streams, graph round-trip times, flag every retransmission and
zero-window stall, and let you re-ask questions you did not think of during the incident.
Squinting at a scrolling terminal commits you to noticing everything in real time, once.
The honest division of labour: tcpdump is the capture tool you can rely on
finding on a bare production box, and Wireshark is the analysis tool on your laptop.
tcpdump -r file.pcap reads a capture back when the laptop is not an option,
and you can even apply a new filter on the way: tcpdump -nn -r file.pcap 'tcp port 443'.
Pitfalls
Capturing without a filter on a busy box. The most expensive mistake.
Unfiltered, every packet on the interface gets copied to userspace, the kernel starts
dropping capture copies under load, and with -w and no -c the
pcap grows at line rate — a 10 Gb interface can produce a gigabyte of capture in
seconds. Always give it a filter, and on anything production-shaped, a -c
ceiling or a snaplen too. Check the drop counters the tool prints when it exits; a capture
with significant kernel drops is evidence with pages missing.
Forgetting -nn. Covered above, but it bites differently here than with
other tools: the DNS lookups tcpdump performs show up as new packets on the wire, so an
unfiltered, name-resolving capture literally contaminates its own crime scene. Make
-nn a reflex.
Expecting to read TLS payloads. Most traffic worth debugging is
encrypted, and tcpdump does not change that: the payload bytes in a TLS
connection are ciphertext, full stop. What you still get is everything around the
payload, and that is more than it sounds: connection establishment and teardown, the TLS
handshake itself (including the server name in the ClientHello on most deployments),
timing of every packet, sizes, retransmissions, resets, window stalls. Whole classes of
incidents — all three scenarios above, in fact — are diagnosable without reading one byte
of plaintext. For the payload, go to the endpoints: application logs, or
curl with -v reproducing the request from a
box you control.
Forgetting it needs root. Opening a packet socket requires
CAP_NET_RAW, so unprivileged tcpdump fails immediately with a
permissions error. Unlike some tools, at least it fails loudly rather than showing you a
misleading subset. Inside a container, the same rule applies to the container's
capabilities — and remember that a container capture sees the container's network
namespace, which on an overlay network may not be the traffic the host's NIC sees.
Treating captures casually. A pcap is a recording of other people's traffic. Even with TLS, it holds IPs, ports, hostnames from DNS and SNI, timing, and volume — and any unencrypted protocol's full payload. That can fall under the same handling rules as production data: capture the minimum (filters and snaplen again, doing compliance work this time), store pcaps like secrets, delete them when the incident closes, and know your organisation's policy before shipping one to a vendor or attaching it to a ticket. The habit of narrow, short, deliberately scoped captures keeps you fast and keeps you out of awkward conversations.
A drill you can run right now
Everything below stays on your own machine: a throwaway web server on loopback, your own
requests, one pcap file in /tmp. Ten minutes, and the handshake stops being a
diagram and becomes something you have watched happen.
Step 1 — start a server and a capture. In one terminal, start a local web server; in another, point a capture at loopback with a port filter and a packet ceiling:
terminal 1 $ python3 -m http.server 8000 terminal 2 $ sudo tcpdump -nn -i lo -c 20 'tcp port 8000' tcpdump: listening on lo, link-type EN10MB
Step 2 — make one request and read the handshake. In a third terminal,
run curl -s http://127.0.0.1:8000/ > /dev/null and watch the capture
terminal. The first three lines are the handshake from the diagram above: find the
[S], the [S.] with its ack equal to the client's
seq plus one, and the closing [.]. Then comes the request as a
[P.] with a nonzero length — and since this is plain HTTP on
loopback, that length is your GET request, visible because nothing encrypted it. Watch the
teardown too: [F.] from each side, each acknowledged. One curl, one complete
TCP lifetime, maybe ten packets end to end.
Step 3 — see a refusal. Run curl -s http://127.0.0.1:8001/ —
port 8001, where nothing listens (re-run the capture with 'tcp port 8001'
first). One [S] out, one [R.] straight back, and curl reports
"connection refused." You have now seen ECONNREFUSED as packets, which is the version of
it that settles arguments.
Step 4 — write a pcap and read it back. The capture-now-analyse-later workflow, on training wheels:
$ sudo tcpdump -nn -i lo -c 20 -w /tmp/drill.pcap 'tcp port 8000' (run the curl again in another terminal, wait for "20 packets captured") $ tcpdump -nn -r /tmp/drill.pcap | head -5 reading from file /tmp/drill.pcap, link-type EN10MB 17:20:43.551208 IP 127.0.0.1.52114 > 127.0.0.1.8000: Flags [S], seq 1014522871, win 65495, length 0 17:20:43.551219 IP 127.0.0.1.8000 > 127.0.0.1.52114: Flags [S.], seq 3300114532, ack 1014522872, win 65483, length 0 17:20:43.551231 IP 127.0.0.1.52114 > 127.0.0.1.8000: Flags [.], ack 1, win 512, length 0 $ tcpdump -nn -r /tmp/drill.pcap 'tcp[tcpflags] & tcp-syn != 0'
Notice that reading the file back needs no sudo — the privilege was for the
tap, not the file — and that the last command applies a fresh filter to an existing
capture, pulling out just the SYNs. Open the same file in Wireshark if you have it nearby
and click through the packets you just read as text; that round trip, terminal to GUI and
back, is the whole workflow you will use on a real incident. Clean up with
kill %1 for the server and delete the pcap.
sudo tcpdump -nn -i any -c 200 'host CLIENT_IP and port PORT' — names off, a
ceiling on, and a filter narrowing the world to one conversation. Add
-w /tmp/case.pcap when you want Wireshark to do the reading.Further reading
- tcpdump(1) — the manual page — the OUTPUT FORMAT section is the official decoder ring for everything this page read by hand.
- pcap-filter(7)
— the full grammar of capture filters, including the byte-offset tricks like
tcp[tcpflags]. - Julia Evans — tcpdump is amazing — the best short argument for keeping this tool in your daily rotation.
- Semicolony — TCP — sequence numbers, windows, and retransmission timers, the machinery behind every line decoded above.