19 / 28

Linux / 19

nc & mtr

The client times out. Somewhere between it and the server, something is wrong, and "something" covers the app, the host firewall, a security group, a load balancer, and every router in between. Two small tools split that haystack in half. nc answers the endpoint question: can a TCP connection to that exact port be made at all, and what answers when it can? mtr answers the path question: which hop along the way is losing or delaying packets, and is the loss even real? This page covers both — the three nc outcomes and what each one proves, the single most misread number in networking, and a drill you can run without touching anything that matters.

Two tools, two halves of the question

"The network is broken" is never the actual problem. The actual problem is one of two much smaller things. Either the far endpoint cannot be reached on the port you need — a firewall is dropping the connection, nothing is listening, the service bound to the wrong interface — or the endpoint is fine and the path to it is degraded, with some router along the way losing packets or adding latency. The first is a yes-or-no question about one host and one port. The second is a measurement question about a dozen hops you do not own. They need different tools.

nc, netcat, tests the endpoint. It tries to open a real TCP connection to a real port and tells you exactly how the attempt ended, and the way it ends is the diagnosis. It also runs the other direction: it can be a listener, which means you can test a network path before the application that will use it even exists. mtr tests the path. It is traceroute that never stops: it probes every router between you and the destination over and over, accumulating loss and latency statistics per hop, so a problem that flickers for two seconds out of every sixty still shows up in the numbers.

Together they bracket the diagnosis. If nc connects instantly and mtr shows a clean path, the network is innocent and the bug is in the application. If nc times out, the path or a firewall is eating your packets, and mtr tells you roughly where. This page is the reference for both tools; the incident-shaped walkthrough that decides which one to reach for first lives in is it the network?

netcat: the universal socket tool

nc does one thing: it opens a socket and connects it to your terminal. Point it at a host and port and it dials out; give it -l and it listens. Whatever you type goes down the wire, whatever arrives comes back up, and when either side closes, it exits. There is no protocol knowledge in it at all, and that absence is the feature. curl speaks HTTP, psql speaks the Postgres protocol, and both of them blur the line between "the network rejected me" and "the server disliked my request." nc speaks nothing, so when it fails, the failure is purely about reachability — TCP either completed its handshake or it did not, and no application-layer noise can hide which.

Five uses cover nearly everything you will ever do with it.

Invocation	What it does	When you reach for it
`nc -vz host 443`	Attempts one connection, reports the outcome, sends no data	"Can I reach that port?" — the everyday reachability test
`nc -l 9000`	Listens on a port and prints whatever arrives	The instant test server: prove a path works before the app exists
`nc -v host 25`	Connects and stays open, showing whatever the server says first	Banner grabbing: "what is actually answering on this port?"
`nc -l 9000 > f` / `nc host 9000 < f`	Pipes a file across a raw TCP connection	Moving a file between boxes when scp is not an option
`nc -vz host 8000-8010`	Tries each port in the range, one connection each	The poor-man's port scan: which of these ports answer at all?

The flags in the first row are the ones to memorise. -z means zero-I/O: connect, then immediately close, sending nothing — exactly right for a reachability check, because you want the handshake's verdict and nothing else. -v makes it say what happened instead of exiting silently with a status code. Add -w 3 to cap the wait at three seconds; without it, a filtered port can leave you staring at a silent terminal for a couple of minutes while the kernel retries its SYNs on a backoff schedule.

The listener deserves a moment, because it is the half of nc people forget exists. nc -l 9000 turns any box you have a shell on into a server, instantly, with no code and no config. Connect to it from anywhere and the two terminals become a chat session: lines typed on one side appear on the other. That sounds like a toy until you realise what it proves — every firewall, security group, route table, and NAT between the two shells passed your traffic. The banner grab is the same idea in reverse. Many protocols speak first: SSH sends its version string, SMTP says 220 and its hostname, Redis answers a typed PING with +PONG. Connect with plain nc -v and you see who is really on the port, which matters when the answer is not what the port number implies.

$ nc -v mail.internal 25
Connection to mail.internal (10.0.6.30) 25 port [tcp/smtp] succeeded!
220 mail.internal ESMTP Postfix          <- the server speaks first: that is the banner
QUIT
221 2.0.0 Bye

The file transfer and the port-range scan are conveniences built from the same two halves. Receiver listens and redirects to a file; sender connects with the file on stdin; TCP does the rest. It is unencrypted and unauthenticated, so it belongs on trusted internal networks and lab machines, not across the internet — but on an airgapped box where scp has nothing to authenticate against, it has saved more than one migration. The range scan, nc -vz host 8000-8010, just runs the reachability test once per port and prints a verdict line for each. It is slow and sequential and that is fine: when you want to know which of four candidate ports a service actually came up on, you do not need nmap, and on a hardened production box nmap is usually not installed anyway.

Reading the verdict: three outcomes, three diagnoses

Everything nc -vz can tell you fits in one line of output, and there are only three lines it will ever print. Learn to read all three, because they are not three shades of "broken" — each one rules out a different set of suspects, and the difference between the second and third is the difference between fixing the server and filing a firewall ticket.

$ nc -vz api.internal 443
Connection to api.internal (10.0.4.12) 443 port [tcp/https] succeeded!
    handshake completed: route works, firewalls pass, a process is listening

$ nc -vz api.internal 9000
nc: connect to api.internal (10.0.4.12) port 9000 (tcp) failed: Connection refused
    the host answered with a RST: it is reachable, but nothing is listening there

$ nc -vz -w 3 db.internal 5432
nc: connect to db.internal (10.0.8.20) port 5432 (tcp) failed: Operation timed out
    nothing came back at all: something is silently dropping your packets

Succeeded means the TCP three-way handshake completed. That is a strong statement: a SYN left your machine, crossed every router and firewall in between, found a process in a listening state on that exact port, and the reply made it all the way back. If the application still misbehaves after this, the problem lives above TCP — TLS, auth, the application itself — not in the network.

Connection refused is the one people misread as "the network is blocking me," and it means nearly the opposite. Refused means your SYN arrived and the destination host actively answered "nothing here" with a RST packet. The route works. The firewalls passed your traffic in both directions. The host is up and its kernel is responding. What is missing is a listener on that port: the service crashed, came up on a different port, or — the classic — bound itself to 127.0.0.1 instead of 0.0.0.0, so it exists but only loopback connections can see it. The next move is on the server, with ss: ss -ltnp shows what is listening where, and the mismatch is usually obvious in one glance.

Timed out means nothing came back. No SYN-ACK, no RST, no ICMP error — silence. Your packets are being dropped somewhere, and dropping silently is exactly what firewalls and cloud security groups are configured to do. It can also mean the host is down, or the route is wrong and packets are sailing into a void, but in practice, inside any environment with security groups, "timeout to a host I believe is up" means a filter rule is missing nine times out of ten. One caveat keeps the picture honest: some firewalls are configured to reject rather than drop, sending back a RST or an ICMP unreachable on the host's behalf. That shows up as refused even though the packet never reached the host. It is the less common configuration, but when refused does not make sense — the service is definitely listening, you checked — remember that a middlebox can forge the refusal.

The three outcomes of a connection attempt. Refused proves the host is reachable; timeout proves nothing except that something, somewhere, ate your packets.

Three places nc earns its keep

"Is it the app or the firewall?"

A client cannot reach a service and the argument starts: the app team says the network is blocking it, the network team says the app is down. nc from both sides settles it in two minutes. On the server itself, test the service locally: nc -vz 127.0.0.1 9000, then again against the host's own non-loopback address, nc -vz 10.0.4.12 9000. From the client, test across the network: nc -vz 10.0.4.12 9000. Three results, and the pattern reads like a truth table. All three succeed: nothing is wrong at this layer, look above TCP. Local succeeds but remote times out: the service is fine and something between the hosts is dropping traffic — firewall ticket, with evidence attached. Loopback succeeds but the host's own address is refused: the service bound to 127.0.0.1 only, and no firewall change will ever fix it. All three refused: the service is not running, and the network was never the problem.

Testing the path before the app exists

New environment, new security groups, new routing — and the service that will run there ships next week. You do not have to wait for the app to find out whether the network is right. On the future server, open a throwaway listener: nc -l 9000. From the client side, nc -vz future-server 9000. If it succeeds, every rule and route between the two is proven before a single line of application code is deployed; if it times out, you get to fix the security group this week instead of during the launch. This works because nc -l is indistinguishable, at the TCP layer, from the real service: a listener is a listener. When the test is done, Ctrl-C and the listener is gone — nothing to clean up, nothing left running.

What does the load balancer actually forward?

A load balancer is a black box that claims to pass your traffic through. nc lets you look at what really comes out the back. Run nc -l 9000 on a backend (or a stand-in box registered with the LB), send one request through the front door, and read the raw bytes that arrive: every header the proxy injected, the X-Forwarded-For you were promised, the PROXY protocol preamble you forgot was enabled and which is why your app's parser chokes on the first line. You will also see the health checks arriving on their own schedule, which answers "why does my access log show a request every five seconds" before anyone asks it.

$ nc -l 9000
PROXY TCP4 203.0.113.50 10.0.4.12 49812 9000   <- so THAT is why the parser breaks
GET /health HTTP/1.1
Host: 10.0.4.12:9000
User-Agent: ELB-HealthChecker/2.0
X-Forwarded-For: 203.0.113.50

mtr: the traceroute that keeps measuring

Traceroute shows you the path once: one probe per hop, one snapshot, and a problem that comes and goes will dodge it more often than not. mtr runs the same trace in a loop. Every cycle it probes each hop again, and for every router along the way it accumulates a running scoreboard: what fraction of probes went unanswered (Loss%), how many were sent (Snt), and the latency distribution — Last, Avg, Best, Wrst, and StDev. Run it bare (mtr host) and you get a live, continuously updating screen, which is the right mode for watching an intermittent problem happen. Run it as mtr --report -c 100 host and it sends a fixed hundred cycles, prints a static table, and exits — the right mode for evidence, because a report you can paste into a ticket is worth ten screenshots of a flickering terminal.

Here is a report from a box whose users are complaining, with the two patterns that matter planted in it. Read the Loss% column from top to bottom before reading anything else.

$ mtr --report -c 100 api.example.com
HOST: build-runner-3             Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.0.0.1                  0.0%   100    0.4   0.5   0.3   2.1   0.2
  2.|-- 100.64.12.1               0.0%   100    1.2   1.4   1.0   8.9   0.9
  3.|-- 198.51.100.41             0.0%   100    2.0   2.3   1.8  11.2   1.1
  4.|-- core-7.transit.example   60.0%  100    9.8  10.1   9.0  14.7   1.0   <- scary, and fake
  5.|-- 203.0.113.9               0.0%   100   10.4  10.6   9.8  19.3   1.3   <- the proof it is fake
  6.|-- peer-2.example.net        0.0%   100   11.0  11.5  10.2  24.8   2.0
  7.|-- edge-1.dest.example      12.0%  100   18.9  19.4  17.8  88.2   9.4   <- real loss starts here
  8.|-- 192.0.2.66               11.0%  100   19.2  19.9  18.0  91.5   9.8
  9.|-- api.example.com          12.0%  100   19.5  20.1  18.3  90.7   9.6   <- the only row that measures end to end

Hop 4 reports 60% loss. Hop 5, immediately after it, reports zero. If hop 4 were truly dropping six out of every ten packets, those packets could never have reached hop 5 — loss at a hop has to show up at every hop beyond it, because everything downstream is reached through it. So hop 4's number cannot be describing your traffic. What it describes is hop 4's enthusiasm for answering probes, which is a different thing entirely.

The mechanics: a router forwards transit packets in dedicated hardware, at line rate, without the router's CPU ever seeing them. But answering a traceroute probe means generating an ICMP time-exceeded reply, and that work happens on the router's control-plane CPU — the same modest processor that runs its routing protocols. Every sane router rate-limits and de-prioritises that work, because a router that lets strangers' probes compete with BGP for CPU time is a router waiting to be taken down. So under load, or just by policy, hop 4 ignores most of your probes while forwarding your actual packets flawlessly. The result is the single most misread output in networking: a terrifying loss number at a middle hop, followed by clean hops, reported as an outage in good faith by someone who read the column top to bottom and stopped at the first big number.

Real loss has a different shape. From hop 7 onward, every row shows roughly 12%, all the way down to the destination. That is what genuine packet loss looks like: it begins at the hop where the problem lives and it propagates, because every probe beyond hop 7 has to survive hop 7's drops to get anywhere. The destination row is the anchor for the whole reading. It is the only row that measures the complete round trip your application traffic experiences, so its loss number is the one that is "real" by definition — and the first earlier hop where that same loss level begins tells you where the damage is being done. A one-line rule covers ninety percent of mtr literacy: ignore any loss that does not persist to the final hop.

The two loss patterns side by side. Hop 4's 60% is the router declining to answer probes; hop 7's 12% repeats on every later row and is the genuine article.

Three places mtr earns its keep

Blaming the right network segment

Once you can spot where real loss begins, the report becomes an accountability map. The first hop or two belong to you: your host, your switch, your office or VPC gateway. Loss that starts there is your problem and your fix. The next few hops belong to your ISP or cloud provider; persistent loss beginning there goes into a support ticket, with the report pasted in, and the hop names — transit routers usually carry their operator's domain — tell you exactly whom to page. Loss that begins at the final hops sits with the destination: their edge, their load balancer, their problem. One honest caveat belongs in every such ticket: mtr measures the round trip, and the reply packets may come home along a different route than your probes went out. Loss that appears in your report can live on the return path, which your trace never shows. When the destination is a box you control, run mtr from both ends before declaring which direction is broken.

The latency spike that only happens sometimes

"Every few minutes the API gets slow for a second" is the kind of complaint a single traceroute will never catch. Leave interactive mtr running in a spare terminal while the complaint reproduces and read the Wrst and StDev columns instead of Avg. A hop whose average is 11 ms but whose worst case is 480 ms with a fat standard deviation is a hop that queues badly under bursts, and if the following hops inherit those worst-case numbers, your traffic is sitting in that queue too. The same island rule applies as with loss: a worst-case spike at one middle hop that the later hops do not echo is just the router answering probes lazily; a spike that propagates downstream is a congested link with your name on it.

When ICMP probes are not believable

By default mtr probes with ICMP echo packets, and some networks treat ICMP as a second-class citizen — shaped, filtered, or routed differently from real traffic. When the report looks implausible, change what the probes are made of: mtr -u host sends UDP probes the way classic traceroute does, and mtr -T -P 443 host sends TCP SYNs to port 443, which makes your probes nearly indistinguishable from genuine HTTPS traffic. The TCP mode is the one to reach for when a path "looks fine in mtr" but the application still suffers: routers doing per-flow load balancing may steer ICMP and TCP down different parallel links, and probing with TCP on the real port is how you measure the path your bytes actually take.

What is underneath: a handshake and an expiring counter

Neither tool is doing anything exotic, and knowing the two mechanisms makes the output trustworthy. nc rides the TCP three-way handshake. Its connect() call sends a SYN; a listener answers SYN-ACK and nc completes with an ACK — that is "succeeded." A reachable kernel with no listener on the port answers RST — "refused." And a filter that drops the SYN produces silence, which the kernel retries with growing patience until nc gives up — "timed out." The three verdicts in the decision diagram are just the three possible fates of one SYN packet. The full state machine, including what happens after the handshake, lives in TCP, and you can step through the exchange packet by packet in the handshake simulator.

mtr rides an accident of IP's design. Every IP packet carries a TTL — time to live — a counter that each router decrements before forwarding. When the counter hits zero, the router discards the packet and sends back an ICMP time-exceeded message from its own address. That return address is the whole trick: send a probe with TTL 1 and the first router identifies itself, TTL 2 and the second does, and so on until a probe survives all the way and the destination answers directly. The TTL exists to stop routing loops from circulating packets forever; traceroute and mtr simply discovered that a safety mechanism doubles as a map-maker. It also explains every quirk in the output: hops appear only if they bother to send the time-exceeded reply, which is why rate-limited routers fake loss and silent ones show as ???. If you want to see both mechanisms with your own eyes, run tcpdump beside either tool: the SYN, the RST, and the parade of time-exceeded replies are all right there in the capture.

nc's verdicts are the three fates of a SYN; mtr's hop list is routers naming themselves as they discard probes whose TTL ran out.

Pitfalls

There are at least four netcats. "nc" on a given box might be OpenBSD netcat (Debian and Ubuntu's default), traditional GNU netcat, nmap's ncat, or the minimal applet inside BusyBox — and their flags drift. The listener is the classic trap: OpenBSD nc listens with nc -l 9000, while traditional netcat wants nc -l -p 9000 and treats the OpenBSD form as an error (or worse, on some old builds, silently does something else). Some variants lack -z entirely; BusyBox nc supports only a handful of flags. The habit that saves you: on an unfamiliar box, run nc -h first and spend five seconds reading which dialect you have. If you can choose, ncat behaves the same everywhere nmap is installed, and on machines with no netcat at all, bash's /dev/tcp trick (echo > /dev/tcp/host/443) does a crude reachability test with no binary at all.

mtr needs raw sockets. Building ICMP probes by hand is privileged work. Most distributions ship mtr with a setuid helper (mtr-packet) or grant it cap_net_raw, so it works for ordinary users — but a copy installed by hand, or running inside a stripped container image, will fail with an error about raw sockets or simply show no hops. The fix is to run it with sudo, or restore the capability with setcap cap_net_raw+ep on the helper. If mtr prints nothing useful as you and works under sudo, this is why; it is permissions, not the network.

Some hops never answer, and that is fine. A row of ??? or 100% loss at a middle hop, with clean hops after it, is a router (or a whole network) that filters ICMP entirely — common inside cloud provider backbones and MPLS cores. The island rule handles it: if later hops and the destination are clean, the silent hop is forwarding perfectly and merely declining to introduce itself. The reading only turns bad when the destination row is silent too. Then you know nothing end to end from this probe type, and it is time for -T -P <port>, or for falling back to nc -vz, which needs no cooperation from anything except the final host.

Load-balanced paths smear the picture. Routers commonly split traffic across parallel links per flow. Probes in different cycles can take different parallel paths, so a single mtr row may interleave two routers' worth of latency, and a hop can appear to flap between two addresses. Newer mtr builds vary fields deliberately and traceroute's -P/paris modes exist for exactly this. When one hop's numbers look bimodal — tight cluster at 10 ms, another at 40 ms — suspect two physical paths before suspecting one sick router.

nc and UDP do not mix the way you hope. nc -uvz host 53 looks like a UDP port test, but UDP has no handshake, so there is usually nothing to confirm delivery. Unless the host sends back an ICMP port-unreachable (often filtered) or the service happens to reply, nc reports success simply because nothing said no. Treat a UDP "open" verdict as "open or filtered or silently dropped" — for UDP services, the only honest test is speaking enough of the real protocol to provoke a reply.

A drill you can run right now

Everything below is safe on any machine with a network connection: one connection attempt to a public HTTPS host, a listener on your own loopback, and a twenty-cycle trace to a public resolver. Nothing is scanned, nothing is left running, nothing needs cleanup.

Step 1 — one real handshake. Test a port you know is open on a host built to receive the whole internet:

$ nc -vz -w 3 www.google.com 443
Connection to www.google.com (142.250.187.36) 443 port [tcp/https] succeeded!

Read the line back against the decision diagram: a SYN crossed your LAN, your router, your ISP, and some unknowable number of backbone links, found a listener, and the SYN-ACK made it home — all in the time the line took to print. Now aim the same command at a port that host does not serve, say 8443, and watch the three-second wait end in a timeout. Same host, same path, different verdict: that is a filter dropping you silently, and now you have seen what "filtered" feels like compared to "open."

Step 2 — both ends of the wire. Open two terminals. In the first, listen; in the second, test the port twice — once before the listener exists, once while it runs:

terminal B$ nc -vz 127.0.0.1 9000
nc: connect to 127.0.0.1 port 9000 (tcp) failed: Connection refused
terminal A$ nc -l 9000          <- (or nc -l -p 9000, depending on your netcat)
terminal B$ nc -vz 127.0.0.1 9000
Connection to 127.0.0.1 9000 port [tcp/*] succeeded!
terminal B$ nc 127.0.0.1 9000     <- now connect for real and type something
hello from B                       <- it appears in terminal A; type back and it appears here

That refused-then-succeeded pair is the heart of the page performed on your own machine. Same host, same port, no firewall involved: the only variable that changed between the two verdicts is whether a listener existed. When you later see refused in production, your hands will already know what it means. Ctrl-C both sides when you are done and the server you built is gone.

Step 3 — trace a real path. Run a short report to a public resolver and read it with the island rule:

$ mtr --report -c 20 one.one.one.one
(sudo mtr ... if raw sockets are refused)

Twenty cycles takes about twenty seconds. Then read the Loss% column bottom-up: start at the destination row, note its number (almost certainly 0% to a host like this), and treat any louder number above it as a router rationing its replies. Find your own gateway in hop 1, find the hop where the names change from your ISP's domain to someone else's — that is a network boundary — and check whether any ??? rows are followed by hops that answer fine. Ten minutes ago a report like this looked like a wall of numbers; now it reads as a story with exactly one question: does the loss persist to the bottom?

If you remember one rule per tool. For nc: refused means the host answered and nothing is listening; timeout means nothing answered at all — they are different diagnoses, not different severities. For mtr: loss is only real if it persists to the final hop; everything else is a router declining to chat.

nc & mtr

Two tools, two halves of the question

netcat: the universal socket tool

Reading the verdict: three outcomes, three diagnoses

Three places nc earns its keep

"Is it the app or the firewall?"

Testing the path before the app exists

What does the load balancer actually forward?

mtr: the traceroute that keeps measuring

Three places mtr earns its keep

Blaming the right network segment

The latency spike that only happens sometimes

When ICMP probes are not believable

What is underneath: a handshake and an expiring counter

Pitfalls

A drill you can run right now

Further reading

20 — perf