nc & mtr
The client times out. Somewhere between it and the server, something is wrong, and "something"
covers the app, the host firewall, a security group, a load balancer, and every router in
between. Two small tools split that haystack in half. nc answers the endpoint
question: can a TCP connection to that exact port be made at all, and what answers when it can?
mtr answers the path question: which hop along the way is losing or delaying
packets, and is the loss even real? This page covers both — the three nc outcomes
and what each one proves, the single most misread number in networking, and a drill you can run
without touching anything that matters.
Two tools, two halves of the question
"The network is broken" is never the actual problem. The actual problem is one of two much smaller things. Either the far endpoint cannot be reached on the port you need — a firewall is dropping the connection, nothing is listening, the service bound to the wrong interface — or the endpoint is fine and the path to it is degraded, with some router along the way losing packets or adding latency. The first is a yes-or-no question about one host and one port. The second is a measurement question about a dozen hops you do not own. They need different tools.
nc, netcat, tests the endpoint. It tries to open a real TCP connection to a real
port and tells you exactly how the attempt ended, and the way it ends is the diagnosis. It also
runs the other direction: it can be a listener, which means you can test a network
path before the application that will use it even exists. mtr tests the path. It
is traceroute that never stops: it probes every router between you and the destination over and
over, accumulating loss and latency statistics per hop, so a problem that flickers for two
seconds out of every sixty still shows up in the numbers.
Together they bracket the diagnosis. If nc connects instantly and
mtr shows a clean path, the network is innocent and the bug is in the application.
If nc times out, the path or a firewall is eating your packets, and
mtr tells you roughly where. This page is the reference for both tools; the
incident-shaped walkthrough that decides which one to reach for first lives in
is it the network?
netcat: the universal socket tool
nc does one thing: it opens a socket and connects it to your terminal. Point it at
a host and port and it dials out; give it -l and it listens. Whatever you type
goes down the wire, whatever arrives comes back up, and when either side closes, it exits.
There is no protocol knowledge in it at all, and that absence is the feature. curl
speaks HTTP, psql speaks the Postgres protocol, and both of them blur the line
between "the network rejected me" and "the server disliked my request." nc speaks
nothing, so when it fails, the failure is purely about reachability — TCP either completed its
handshake or it did not, and no application-layer noise can hide which.
Five uses cover nearly everything you will ever do with it.
| Invocation | What it does | When you reach for it |
|---|---|---|
nc -vz host 443 | Attempts one connection, reports the outcome, sends no data | "Can I reach that port?" — the everyday reachability test |
nc -l 9000 | Listens on a port and prints whatever arrives | The instant test server: prove a path works before the app exists |
nc -v host 25 | Connects and stays open, showing whatever the server says first | Banner grabbing: "what is actually answering on this port?" |
nc -l 9000 > f / nc host 9000 < f | Pipes a file across a raw TCP connection | Moving a file between boxes when scp is not an option |
nc -vz host 8000-8010 | Tries each port in the range, one connection each | The poor-man's port scan: which of these ports answer at all? |
The flags in the first row are the ones to memorise. -z means zero-I/O: connect,
then immediately close, sending nothing — exactly right for a reachability check, because you
want the handshake's verdict and nothing else. -v makes it say what happened
instead of exiting silently with a status code. Add -w 3 to cap the wait at three
seconds; without it, a filtered port can leave you staring at a silent terminal for a couple of
minutes while the kernel retries its SYNs on a backoff schedule.
The listener deserves a moment, because it is the half of nc people forget exists.
nc -l 9000 turns any box you have a shell on into a server, instantly, with no
code and no config. Connect to it from anywhere and the two terminals become a chat session:
lines typed on one side appear on the other. That sounds like a toy until you realise what it
proves — every firewall, security group, route table, and NAT between the two shells passed
your traffic. The banner grab is the same idea in reverse. Many protocols speak first: SSH
sends its version string, SMTP says 220 and its hostname, Redis answers a typed
PING with +PONG. Connect with plain nc -v and you see
who is really on the port, which matters when the answer is not what the port number implies.
$ nc -v mail.internal 25 Connection to mail.internal (10.0.6.30) 25 port [tcp/smtp] succeeded! 220 mail.internal ESMTP Postfix <- the server speaks first: that is the banner QUIT 221 2.0.0 Bye
The file transfer and the port-range scan are conveniences built from the same two halves.
Receiver listens and redirects to a file; sender connects with the file on stdin; TCP does the
rest. It is unencrypted and unauthenticated, so it belongs on trusted internal networks and
lab machines, not across the internet — but on an airgapped box where scp has
nothing to authenticate against, it has saved more than one migration. The range scan,
nc -vz host 8000-8010, just runs the reachability test once per port and prints a
verdict line for each. It is slow and sequential and that is fine: when you want to know which
of four candidate ports a service actually came up on, you do not need nmap, and on a hardened
production box nmap is usually not installed anyway.
Reading the verdict: three outcomes, three diagnoses
Everything nc -vz can tell you fits in one line of output, and there are only
three lines it will ever print. Learn to read all three, because they are not three shades of
"broken" — each one rules out a different set of suspects, and the difference between the
second and third is the difference between fixing the server and filing a firewall ticket.
$ nc -vz api.internal 443 Connection to api.internal (10.0.4.12) 443 port [tcp/https] succeeded! handshake completed: route works, firewalls pass, a process is listening $ nc -vz api.internal 9000 nc: connect to api.internal (10.0.4.12) port 9000 (tcp) failed: Connection refused the host answered with a RST: it is reachable, but nothing is listening there $ nc -vz -w 3 db.internal 5432 nc: connect to db.internal (10.0.8.20) port 5432 (tcp) failed: Operation timed out nothing came back at all: something is silently dropping your packets
Succeeded means the TCP three-way handshake completed. That is a strong statement: a SYN left your machine, crossed every router and firewall in between, found a process in a listening state on that exact port, and the reply made it all the way back. If the application still misbehaves after this, the problem lives above TCP — TLS, auth, the application itself — not in the network.
Connection refused is the one people misread as "the network is blocking me,"
and it means nearly the opposite. Refused means your SYN arrived and the destination
host actively answered "nothing here" with a RST packet. The route works. The firewalls passed
your traffic in both directions. The host is up and its kernel is responding. What is missing
is a listener on that port: the service crashed, came up on a different port, or — the classic
— bound itself to 127.0.0.1 instead of 0.0.0.0, so it exists but
only loopback connections can see it. The next move is on the server, with
ss: ss -ltnp shows what is listening
where, and the mismatch is usually obvious in one glance.
Timed out means nothing came back. No SYN-ACK, no RST, no ICMP error — silence. Your packets are being dropped somewhere, and dropping silently is exactly what firewalls and cloud security groups are configured to do. It can also mean the host is down, or the route is wrong and packets are sailing into a void, but in practice, inside any environment with security groups, "timeout to a host I believe is up" means a filter rule is missing nine times out of ten. One caveat keeps the picture honest: some firewalls are configured to reject rather than drop, sending back a RST or an ICMP unreachable on the host's behalf. That shows up as refused even though the packet never reached the host. It is the less common configuration, but when refused does not make sense — the service is definitely listening, you checked — remember that a middlebox can forge the refusal.
Three places nc earns its keep
"Is it the app or the firewall?"
A client cannot reach a service and the argument starts: the app team says the network is
blocking it, the network team says the app is down. nc from both sides settles it
in two minutes. On the server itself, test the service locally:
nc -vz 127.0.0.1 9000, then again against the host's own non-loopback address,
nc -vz 10.0.4.12 9000. From the client, test across the network:
nc -vz 10.0.4.12 9000. Three results, and the pattern reads like a truth table.
All three succeed: nothing is wrong at this layer, look above TCP. Local succeeds but remote
times out: the service is fine and something between the hosts is dropping traffic — firewall
ticket, with evidence attached. Loopback succeeds but the host's own address is refused: the
service bound to 127.0.0.1 only, and no firewall change will ever fix it. All
three refused: the service is not running, and the network was never the problem.
Testing the path before the app exists
New environment, new security groups, new routing — and the service that will run there ships
next week. You do not have to wait for the app to find out whether the network is right. On
the future server, open a throwaway listener: nc -l 9000. From the client side,
nc -vz future-server 9000. If it succeeds, every rule and route between the two
is proven before a single line of application code is deployed; if it times out, you get to
fix the security group this week instead of during the launch. This works because
nc -l is indistinguishable, at the TCP layer, from the real service: a listener
is a listener. When the test is done, Ctrl-C and the listener is gone — nothing to clean up,
nothing left running.
What does the load balancer actually forward?
A load balancer is a black box that claims to pass your traffic through. nc lets
you look at what really comes out the back. Run nc -l 9000 on a backend (or a
stand-in box registered with the LB), send one request through the front door, and read the
raw bytes that arrive: every header the proxy injected, the X-Forwarded-For you
were promised, the PROXY protocol preamble you forgot was enabled and which is why your app's
parser chokes on the first line. You will also see the health checks arriving on their own
schedule, which answers "why does my access log show a request every five seconds" before
anyone asks it.
$ nc -l 9000 PROXY TCP4 203.0.113.50 10.0.4.12 49812 9000 <- so THAT is why the parser breaks GET /health HTTP/1.1 Host: 10.0.4.12:9000 User-Agent: ELB-HealthChecker/2.0 X-Forwarded-For: 203.0.113.50
mtr: the traceroute that keeps measuring
Traceroute shows you the path once: one probe per hop, one snapshot, and a problem that comes
and goes will dodge it more often than not. mtr runs the same trace in a loop.
Every cycle it probes each hop again, and for every router along the way it accumulates a
running scoreboard: what fraction of probes went unanswered (Loss%), how many
were sent (Snt), and the latency distribution — Last,
Avg, Best, Wrst, and StDev. Run it bare
(mtr host) and you get a live, continuously updating screen, which is the right
mode for watching an intermittent problem happen. Run it as mtr --report -c 100
host and it sends a fixed hundred cycles, prints a static table, and exits — the right
mode for evidence, because a report you can paste into a ticket is worth ten screenshots of a
flickering terminal.
Here is a report from a box whose users are complaining, with the two patterns that matter
planted in it. Read the Loss% column from top to bottom before reading anything
else.
$ mtr --report -c 100 api.example.com HOST: build-runner-3 Loss% Snt Last Avg Best Wrst StDev 1.|-- 10.0.0.1 0.0% 100 0.4 0.5 0.3 2.1 0.2 2.|-- 100.64.12.1 0.0% 100 1.2 1.4 1.0 8.9 0.9 3.|-- 198.51.100.41 0.0% 100 2.0 2.3 1.8 11.2 1.1 4.|-- core-7.transit.example 60.0% 100 9.8 10.1 9.0 14.7 1.0 <- scary, and fake 5.|-- 203.0.113.9 0.0% 100 10.4 10.6 9.8 19.3 1.3 <- the proof it is fake 6.|-- peer-2.example.net 0.0% 100 11.0 11.5 10.2 24.8 2.0 7.|-- edge-1.dest.example 12.0% 100 18.9 19.4 17.8 88.2 9.4 <- real loss starts here 8.|-- 192.0.2.66 11.0% 100 19.2 19.9 18.0 91.5 9.8 9.|-- api.example.com 12.0% 100 19.5 20.1 18.3 90.7 9.6 <- the only row that measures end to end
Hop 4 reports 60% loss. Hop 5, immediately after it, reports zero. If hop 4 were truly dropping six out of every ten packets, those packets could never have reached hop 5 — loss at a hop has to show up at every hop beyond it, because everything downstream is reached through it. So hop 4's number cannot be describing your traffic. What it describes is hop 4's enthusiasm for answering probes, which is a different thing entirely.
The mechanics: a router forwards transit packets in dedicated hardware, at line rate, without the router's CPU ever seeing them. But answering a traceroute probe means generating an ICMP time-exceeded reply, and that work happens on the router's control-plane CPU — the same modest processor that runs its routing protocols. Every sane router rate-limits and de-prioritises that work, because a router that lets strangers' probes compete with BGP for CPU time is a router waiting to be taken down. So under load, or just by policy, hop 4 ignores most of your probes while forwarding your actual packets flawlessly. The result is the single most misread output in networking: a terrifying loss number at a middle hop, followed by clean hops, reported as an outage in good faith by someone who read the column top to bottom and stopped at the first big number.
Real loss has a different shape. From hop 7 onward, every row shows roughly 12%, all the way down to the destination. That is what genuine packet loss looks like: it begins at the hop where the problem lives and it propagates, because every probe beyond hop 7 has to survive hop 7's drops to get anywhere. The destination row is the anchor for the whole reading. It is the only row that measures the complete round trip your application traffic experiences, so its loss number is the one that is "real" by definition — and the first earlier hop where that same loss level begins tells you where the damage is being done. A one-line rule covers ninety percent of mtr literacy: ignore any loss that does not persist to the final hop.
Three places mtr earns its keep
Blaming the right network segment
Once you can spot where real loss begins, the report becomes an accountability map. The first hop or two belong to you: your host, your switch, your office or VPC gateway. Loss that starts there is your problem and your fix. The next few hops belong to your ISP or cloud provider; persistent loss beginning there goes into a support ticket, with the report pasted in, and the hop names — transit routers usually carry their operator's domain — tell you exactly whom to page. Loss that begins at the final hops sits with the destination: their edge, their load balancer, their problem. One honest caveat belongs in every such ticket: mtr measures the round trip, and the reply packets may come home along a different route than your probes went out. Loss that appears in your report can live on the return path, which your trace never shows. When the destination is a box you control, run mtr from both ends before declaring which direction is broken.
The latency spike that only happens sometimes
"Every few minutes the API gets slow for a second" is the kind of complaint a single
traceroute will never catch. Leave interactive mtr running in a spare terminal
while the complaint reproduces and read the Wrst and StDev columns
instead of Avg. A hop whose average is 11 ms but whose worst case is
480 ms with a fat standard deviation is a hop that queues badly under bursts, and if the
following hops inherit those worst-case numbers, your traffic is sitting in that queue too.
The same island rule applies as with loss: a worst-case spike at one middle hop that the later
hops do not echo is just the router answering probes lazily; a spike that propagates
downstream is a congested link with your name on it.
When ICMP probes are not believable
By default mtr probes with ICMP echo packets, and some networks treat ICMP as a second-class
citizen — shaped, filtered, or routed differently from real traffic. When the report looks
implausible, change what the probes are made of: mtr -u host sends UDP probes the
way classic traceroute does, and mtr -T -P 443 host sends TCP SYNs to port 443,
which makes your probes nearly indistinguishable from genuine HTTPS traffic. The TCP mode is
the one to reach for when a path "looks fine in mtr" but the application still suffers:
routers doing per-flow load balancing may steer ICMP and TCP down different parallel links,
and probing with TCP on the real port is how you measure the path your bytes actually take.
What is underneath: a handshake and an expiring counter
Neither tool is doing anything exotic, and knowing the two mechanisms makes the output
trustworthy. nc rides the TCP three-way handshake. Its connect()
call sends a SYN; a listener answers SYN-ACK and nc completes with an ACK — that
is "succeeded." A reachable kernel with no listener on the port answers RST — "refused." And
a filter that drops the SYN produces silence, which the kernel retries with growing patience
until nc gives up — "timed out." The three verdicts in the decision diagram are
just the three possible fates of one SYN packet. The full state machine, including what
happens after the handshake, lives in
TCP, and you can step through the
exchange packet by packet in the
handshake simulator.
mtr rides an accident of IP's design. Every IP packet carries a TTL — time to
live — a counter that each router decrements before forwarding. When the counter hits zero,
the router discards the packet and sends back an ICMP time-exceeded message from its own
address. That return address is the whole trick: send a probe with TTL 1 and the first
router identifies itself, TTL 2 and the second does, and so on until a probe survives all the
way and the destination answers directly. The TTL exists to stop routing loops from
circulating packets forever; traceroute and mtr simply discovered that a safety mechanism
doubles as a map-maker. It also explains every quirk in the output: hops appear only if they
bother to send the time-exceeded reply, which is why rate-limited routers fake loss and
silent ones show as ???. If you want to see both mechanisms with your own eyes,
run tcpdump beside either tool: the SYN, the
RST, and the parade of time-exceeded replies are all right there in the capture.
Pitfalls
There are at least four netcats. "nc" on a given box might be OpenBSD netcat
(Debian and Ubuntu's default), traditional GNU netcat, nmap's ncat, or the
minimal applet inside BusyBox — and their flags drift. The listener is the classic trap:
OpenBSD nc listens with nc -l 9000, while traditional netcat wants
nc -l -p 9000 and treats the OpenBSD form as an error (or worse, on some old
builds, silently does something else). Some variants lack -z entirely; BusyBox
nc supports only a handful of flags. The habit that saves you: on an unfamiliar box, run
nc -h first and spend five seconds reading which dialect you have. If you can
choose, ncat behaves the same everywhere nmap is installed, and on machines with
no netcat at all, bash's /dev/tcp trick
(echo > /dev/tcp/host/443) does a crude reachability test with no binary at all.
mtr needs raw sockets. Building ICMP probes by hand is privileged work. Most
distributions ship mtr with a setuid helper (mtr-packet) or grant it
cap_net_raw, so it works for ordinary users — but a copy installed by hand, or
running inside a stripped container image, will fail with an error about raw sockets or
simply show no hops. The fix is to run it with sudo, or restore the capability
with setcap cap_net_raw+ep on the helper. If mtr prints nothing useful as you
and works under sudo, this is why; it is permissions, not the network.
Some hops never answer, and that is fine. A row of ??? or 100%
loss at a middle hop, with clean hops after it, is a router (or a whole network) that filters
ICMP entirely — common inside cloud provider backbones and MPLS cores. The island rule
handles it: if later hops and the destination are clean, the silent hop is forwarding
perfectly and merely declining to introduce itself. The reading only turns bad when the
destination row is silent too. Then you know nothing end to end from this probe
type, and it is time for -T -P <port>, or for falling back to
nc -vz, which needs no cooperation from anything except the final host.
Load-balanced paths smear the picture. Routers commonly split traffic across
parallel links per flow. Probes in different cycles can take different parallel paths, so a
single mtr row may interleave two routers' worth of latency, and a hop can appear to flap
between two addresses. Newer mtr builds vary fields deliberately and traceroute's
-P/paris modes exist for exactly this. When one hop's numbers look bimodal —
tight cluster at 10 ms, another at 40 ms — suspect two physical paths before
suspecting one sick router.
nc and UDP do not mix the way you hope. nc -uvz host 53 looks
like a UDP port test, but UDP has no handshake, so there is usually nothing to confirm
delivery. Unless the host sends back an ICMP port-unreachable (often filtered) or the service
happens to reply, nc reports success simply because nothing said no. Treat a UDP
"open" verdict as "open or filtered or silently dropped" — for UDP services, the only honest
test is speaking enough of the real protocol to provoke a reply.
A drill you can run right now
Everything below is safe on any machine with a network connection: one connection attempt to a public HTTPS host, a listener on your own loopback, and a twenty-cycle trace to a public resolver. Nothing is scanned, nothing is left running, nothing needs cleanup.
Step 1 — one real handshake. Test a port you know is open on a host built to receive the whole internet:
$ nc -vz -w 3 www.google.com 443 Connection to www.google.com (142.250.187.36) 443 port [tcp/https] succeeded!
Read the line back against the decision diagram: a SYN crossed your LAN, your router, your ISP, and some unknowable number of backbone links, found a listener, and the SYN-ACK made it home — all in the time the line took to print. Now aim the same command at a port that host does not serve, say 8443, and watch the three-second wait end in a timeout. Same host, same path, different verdict: that is a filter dropping you silently, and now you have seen what "filtered" feels like compared to "open."
Step 2 — both ends of the wire. Open two terminals. In the first, listen; in the second, test the port twice — once before the listener exists, once while it runs:
terminal B$ nc -vz 127.0.0.1 9000 nc: connect to 127.0.0.1 port 9000 (tcp) failed: Connection refused terminal A$ nc -l 9000 <- (or nc -l -p 9000, depending on your netcat) terminal B$ nc -vz 127.0.0.1 9000 Connection to 127.0.0.1 9000 port [tcp/*] succeeded! terminal B$ nc 127.0.0.1 9000 <- now connect for real and type something hello from B <- it appears in terminal A; type back and it appears here
That refused-then-succeeded pair is the heart of the page performed on your own machine. Same host, same port, no firewall involved: the only variable that changed between the two verdicts is whether a listener existed. When you later see refused in production, your hands will already know what it means. Ctrl-C both sides when you are done and the server you built is gone.
Step 3 — trace a real path. Run a short report to a public resolver and read it with the island rule:
$ mtr --report -c 20 one.one.one.one (sudo mtr ... if raw sockets are refused)
Twenty cycles takes about twenty seconds. Then read the Loss% column bottom-up:
start at the destination row, note its number (almost certainly 0% to a host like this), and
treat any louder number above it as a router rationing its replies. Find your own gateway in
hop 1, find the hop where the names change from your ISP's domain to someone else's — that is
a network boundary — and check whether any ??? rows are followed by hops that
answer fine. Ten minutes ago a report like this looked like a wall of numbers; now it reads as
a story with exactly one question: does the loss persist to the bottom?
nc: refused means the
host answered and nothing is listening; timeout means nothing answered at all — they are
different diagnoses, not different severities. For mtr: loss is only real if it
persists to the final hop; everything else is a router declining to chat.Further reading
- nc(1) — the OpenBSD manual page — the dialect most Linux distributions ship as default; the EXAMPLES section alone is a decent course in the tool.
- mtr — the project page — documentation and source for the tool itself, including the probe modes and report options.
- Richard Steenbergen — A Practical Guide to (Correctly) Troubleshooting with Traceroute (NANOG) — the canonical treatment of ICMP rate limiting, asymmetric return paths, and every other way hop-by-hop output lies to you.
- Semicolony — Is it the network? — the incident walkthrough that decides when these two tools come out of the bag, and in which order.