What's holding this port?
The deploy fails. The log says bind: address already in use and the new
release refuses to start. This error has exactly two shapes: either something is
listening on the port, or the old process is gone but its sockets are not. The
whole investigation fits in a handful of commands, and the hard part is never running them —
it is reading the answers and deciding what is safe to do next. This page walks the
investigation the way you would at 3am: one command, one decision, repeat until the port is
yours.
Two shapes of the same error
Here is the failure as it usually arrives — a service that started, tried to bind, and died:
$ journalctl -u app -n 3 --no-pager Jun 08 14:02:11 web-3 app[52110]: starting on 0.0.0.0:8080 Jun 08 14:02:11 web-3 app[52110]: Error: listen EADDRINUSE: address already in use 0.0.0.0:8080 Jun 08 14:02:11 web-3 systemd[1]: app.service: Main process exited, code=exited, status=1/FAILURE
EADDRINUSE comes from the kernel's bind() call, and the kernel
refuses for one of two reasons. Either another socket is already in the LISTEN state on
that address and port — a live process holds it — or no process holds it at all, but
sockets from a previous life of the service are still draining through TCP's shutdown
states and the kernel will not hand the address out again yet. The first shape has a
culprit with a PID. The second shape has no culprit; killing things will not help, because
there is nothing to kill.
Everything that follows is about telling those two shapes apart fast, and then making the right move for whichever one you have. The shortcut version, if you have seen all this before, is three commands.
sudo ss -tlnp 'sport = :8080' — is anything
listening, and what is its PID? If a listener shows up:
ps -fp PID and decide whether it should die and how. If nothing is listening:
ss -tan 'sport = :8080' — TIME-WAIT rows mean you need
SO_REUSEADDR in the app or sixty seconds of patience, not a kill.Step one: ask ss who is listening
Start with ss, because it asks the kernel's socket tables directly and answers in milliseconds even on a loaded box. You want listening TCP sockets, numeric output, with the owning process, filtered to the one port you care about:
$ sudo ss -tlnp 'sport = :8080' State Recv-Q Send-Q Local Address:Port Peer Address:Port Process LISTEN 0 4096 0.0.0.0:8080 0.0.0.0:* users:(("java",pid=38104,fd=89))
Each letter of -tlnp earns its place: -t for TCP, -l
for sockets in the listening state only, -n for raw numbers instead of slow
name lookups, -p for the process column on the right. The quoted
'sport = :8080' filter narrows the answer to the source port you care about —
quote it, because = and : mean things to your shell. If you can
never remember the filter syntax, sudo ss -tlnp | grep 8080 gets you the same
rows with less elegance, and during an incident nobody is grading elegance.
Run it with sudo. Without root, ss -p can only name processes you
own; sockets held by other accounts still show up as rows, but with an empty process column,
which is exactly the kind of half-answer that sends people down wrong paths.
Now read what came back. One row means one listener, and the process column hands you everything: the command name, the PID, and the file descriptor holding the socket. Before doing anything else, classify the listener, because each kind has a different right move:
| What the process column says | What it usually means | The move |
|---|---|---|
| Your own service, an older PID | The previous release never died, or an orphaned worker survived the restart | Identify how it was started, stop it that way — see below |
| Your own service, a PID seconds old | A supervisor already restarted it; you are racing systemd or docker | Stop fighting the supervisor; stop the unit, not the PID |
| A different service entirely | A port collision: two things configured onto 8080 | Nobody needs killing; one of them needs a config change |
docker-proxy or kube-proxy | The port is published from a container or a Service | The fix lives in the container layer — see the proxy section |
| Empty process column (with sudo) | Rare: a kernel-space holder or a socket in another namespace | Cross-check with lsof, then suspect containers |
One subtlety about the address column before moving on. A listener on 0.0.0.0:8080
(or *:8080 for IPv6) claims the port on every interface, so anything else
binding 8080 anywhere will collide with it. But a listener on 127.0.0.1:8080
only claims the loopback side — a new process binding 10.0.4.12:8080 would
succeed alongside it. When the row's local address is specific and your bind is the
wildcard, the wildcard loses. That is occasionally the entire incident: someone bound a
debug instance to localhost, and the real service wants the wildcard. The mechanics of
addresses, binds, and what a listening socket actually is are in
sockets.
Step two: cross-check with lsof
If ss answered cleanly, you can skip ahead. But there are two cases where a
second opinion from lsof earns its keep. The
first is the empty process column: ss shows a LISTEN row but cannot say who
owns it, usually because you forgot sudo, occasionally because the socket lives somewhere
strange. The second is plain paranoia before a destructive action — you are about to kill
something, and two tools agreeing is cheap insurance.
$ sudo lsof -nP -iTCP:8080 -sTCP:LISTEN COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 38104 deploy 89u IPv6 798112 0t0 TCP *:8080 (LISTEN)
Same answer, different angle: lsof walks /proc per process rather
than reading the socket tables, so it also tells you the USER the process runs as and the
descriptor number, both useful when you write the incident notes later. The
-sTCP:LISTEN filter matters; without it you also get every established
connection touching 8080, and right now those are noise. The flags, the column decoder, and
why -nP is non-negotiable are all on the
lsof page.
If both tools agree something is listening, jump to the "is it safe to kill" section. If both agree nothing is listening and the bind still fails, you have the second shape of the error, and it is the more interesting one.
Nothing is listening, and bind still fails
This is the shape that confuses people, because the mental model "a process owns the port"
has run out. No process owns it. The kernel does. Drop the -l flag and look at
all TCP sockets on that source port, not just listeners:
$ ss -tan 'sport = :8080' State Recv-Q Send-Q Local Address:Port Peer Address:Port TIME-WAIT 0 0 10.0.4.12:8080 10.0.9.55:49210 TIME-WAIT 0 0 10.0.4.12:8080 10.0.9.61:50112 TIME-WAIT 0 0 10.0.4.12:8080 10.0.9.55:49377
There they are: connections from the old instance, finished but not yet forgotten. When a
TCP connection closes, the side that sent the first FIN — the active closer — must hold the
connection's address pair in a state called TIME-WAIT for long enough that any packets
still wandering the network can arrive and die harmlessly, instead of being mistaken for
data on a brand-new connection that reused the same addresses. On Linux that hold is sixty
seconds, a constant compiled into the kernel. (People often try to tune it with
tcp_fin_timeout; that sysctl controls a different state, FIN-WAIT-2, and does
not touch TIME-WAIT at all.) The full state machine, and why the wait is two times the
maximum segment lifetime, is in TCP.
So why does this block a fresh bind()? By default, the kernel refuses to bind a
listening socket to a local port while any socket — even a dying one — still occupies that
port. The escape hatch is the socket option SO_REUSEADDR, set before the bind,
which tells the kernel: I accept the (tiny, mostly theoretical) risk, let me bind even
though TIME-WAIT remnants exist. It is worth being precise about what this option does and
does not do, because folklore has inflated it:
| Claim | True? |
|---|---|
SO_REUSEADDR lets you bind while old connections sit in TIME-WAIT | Yes. This is its actual job, and why every serious server sets it. |
| It lets you bind over a port someone else is LISTENING on | No. A live listener still wins; you still get EADDRINUSE. |
| It lets two processes listen on the same port at once | No. That is SO_REUSEPORT, a different option with a different purpose — see the prevention section. |
| It is dangerous and lets old data leak into new connections | Mostly no. TCP's sequence-number checks make confusion vanishingly unlikely; the option is standard practice in essentially all server software. |
This re-frames the fix. If your investigation lands here — no listener, TIME-WAIT rows
present — there is nothing to kill, and anything you do kill is collateral damage. The
honest fixes are: wait sixty seconds (fine for a one-off, miserable as a habit), or fix the
application so it sets SO_REUSEADDR before binding. Most frameworks do this
for you; the ones that bite are hand-rolled servers and older runtimes where the option is
off unless asked for. If your service hits this on every restart, the bug is in its socket
setup, and the deploy script that "fixes" it with sleep 90 is a confession,
not a cure.
One more case lives on this branch of the tree: no listener and no TIME-WAIT rows,
yet the bind still fails. Before doubting the kernel, doubt your vantage point. Are you on
the right machine? Is the application running inside a container with its own network
namespace, so that the conflict exists in a namespace your host-level ss
cannot see? ss run on the host shows the host's namespace only; for a
container's view you need nsenter -t PID -n ss -tlnp or an exec into the
container. The error message is generated wherever the bind ran. Look there.
You found the holder. Is it safe to kill?
Back to the first shape: a live listener with a PID. The reflex is
kill -9 and redeploy, and the reflex is wrong often enough to spend ninety
seconds checking. Three questions, in order: what is this process, who started it, and what
is the polite way to stop it?
$ ps -o pid,ppid,user,lstart,cmd -p 38104 PID PPID USER STARTED CMD 38104 1 deploy Tue Jun 2 09:14:33 2026 java -jar /opt/app/app-2.4.1.jar
Read this line slowly, because every field is a clue. The command string with the version number tells you whether this is the previous release (expected, safe to retire) or something else entirely (stop and ask around). The start time tells you whether this process predates the deploy — an old instance — or appeared two seconds ago, which means a supervisor is restarting it and you are about to play whack-a-mole. And the parent PID of 1 means its original parent is gone: either it was started by the init system directly, or it was orphaned and re-parented, which is exactly what happens when a worker survives its parent's death and keeps the inherited listening socket alive.
Then find out who is responsible for it, because the right way to stop a process is to ask its manager, not the process itself:
$ systemctl status 38104 ● app.service - Payments API Loaded: loaded (/etc/systemd/system/app.service; enabled) Active: active (running) since Tue 2026-06-02 09:14:33 UTC Main PID: 38104 (java) CGroup: /system.slice/app.service └─38104 java -jar /opt/app/app-2.4.1.jar
systemctl status accepts a bare PID and tells you which unit, if any, owns it.
Three outcomes, three moves. If it belongs to a systemd unit:
systemctl stop app.service — never kill, because systemd will
either restart what you killed or mark the unit failed in a way that confuses the next
deploy. If the cgroup line says docker or the command is
containerd-shim-adjacent: docker stop CONTAINER, which delivers a
TERM, waits, then escalates — the same etiquette you would follow by hand. And if it
belongs to nothing — no unit, no container, started from someone's terminal three weeks ago
— then and only then is a manual signal the right tool: kill 38104 sends TERM,
the process gets a chance to drain connections and exit cleanly, and you escalate to
kill -9 only after TERM has had a few seconds and visibly failed. The
difference between those signals, and why 9 should be the last resort instead of the
reflex, is the whole subject of
kill & signals.
After the stop, verify before you redeploy. The polite shutdown you just triggered closes
connections, and closing connections creates — you guessed it — TIME-WAIT entries. The
listener row should vanish from ss -tlnp within a second or two; if your app
lacks SO_REUSEADDR, the TIME-WAIT remnants may still block it for a minute.
Knowing both shapes of the error means this does not surprise you twice in one incident.
The supervisor trap: you kill it and it comes back
A classic mid-incident detour. You found the listener, killed it, ran ss again
— and there it is, same command, new PID, listening smugly on 8080. You did not
fail to kill it. Something is paid to bring it back.
$ sudo ss -tlnp 'sport = :8080' LISTEN 0 4096 0.0.0.0:8080 0.0.0.0:* users:(("java",pid=39882,fd=89)) # a moment ago this was pid 38104. someone restarted it.
The restarter is almost always one of three things: a systemd unit with
Restart=always or Restart=on-failure, a container runtime with a
restart policy, or a process-level supervisor (supervisord, pm2, a shell script in a while
loop that someone wrote in 2019 and everyone forgot). The fastest way to find out which is
the cgroup, because on a systemd machine every process's cgroup path spells out its
ancestry:
$ cat /proc/39882/cgroup 0::/system.slice/app.service # or, for a container: 0::/system.slice/docker-9f81c2e4a77b…scope
A system.slice/something.service path means systemd owns it; stop the unit,
and if you need it to stay stopped while you work, remember that
systemctl stop already suppresses the restart logic — Restart=
policies fire on process death, not on deliberate stops. A docker-…scope path
means a container; docker ps plus docker stop from there, and if
the container has --restart=always, docker stop likewise wins
where kill loses, because stopping through the engine records intent while a
raw kill looks like a crash to be repaired. And if the cgroup is just your user session,
the supervisor is a script or a tmux pane somewhere: ps -o ppid= -p PID and
walk up parents until you find the loop.
The general law underneath the trap: kill processes through their managers. A supervisor's whole purpose is to disbelieve in spontaneous process death. Going around it does not defeat it; it recruits it against you.
When the holder is a proxy: containers and kubernetes
Sometimes the process column answers with a name that is technically correct and practically useless:
$ sudo ss -tlnp 'sport = :8080' LISTEN 0 4096 0.0.0.0:8080 0.0.0.0:* users:(("docker-proxy",pid=2219,fd=4))
docker-proxy is not your application. When you publish a container port with
-p 8080:80, Docker sets up address translation in the kernel's netfilter
tables and also starts this small userland helper, which holds the host port for
the cases the kernel rules cannot cover (traffic from the host itself to its own published
ports, mostly). Killing it changes nothing durable and breaks loopback access to the
container in the meantime. The actual owner of the port is whichever container published
it, and the question routes there:
$ docker ps --filter publish=8080 CONTAINER ID IMAGE PORTS NAMES 9f81c2e4a77b app:2.3.9 0.0.0.0:8080->80/tcp app-old $ docker stop app-old
Kubernetes adds one more layer of indirection in the same spirit. On a node, a Service's
NodePort traffic is usually handled by netfilter rules that kube-proxy
programs, so ss may show kube-proxy holding the port (it binds
NodePorts to reserve them and to fail fast on conflicts) or, in some configurations, show
nothing listening at all while the port still answers — the redirect happens in the kernel
before any socket would. Either way the lesson is identical: a proxy's name in the process
column is a forwarding address, not a culprit. Find the workload behind it —
docker ps, kubectl get svc -A | grep 8080 — and resolve the
conflict at that layer: stop the container, change the published port, or pick a different
NodePort. Reaching for kill at the proxy layer is treating the receptionist as
the CEO.
A worked example, end to end
Here is the investigation run once, whole, the way it actually plays out. The setting: a
deploy of app-2.4.2 to a box called web-3 fails with EADDRINUSE on 8080.
$ sudo ss -tlnp 'sport = :8080' LISTEN 0 4096 0.0.0.0:8080 0.0.0.0:* users:(("java",pid=31550,fd=89)) # a listener. who is pid 31550? $ ps -o pid,ppid,user,lstart,cmd -p 31550 PID PPID USER STARTED CMD 31550 1 deploy Mon Jun 1 22:40:09 2026 java -jar /opt/app/app-2.4.0.jar # version 2.4.0 — two releases old, parent is pid 1. orphan? unit? $ systemctl status 31550 Failed to get unit for PID 31550: PID 31550 does not belong to any loaded unit. $ cat /proc/31550/cgroup 0::/user.slice/user-1004.slice/session-91.scope # a login session, not a service: someone started 2.4.0 by hand a week ago # and it has squatted on the port through two deploys. TERM it politely: $ sudo kill 31550 $ sudo ss -tlnp 'sport = :8080' # (no output — the listener is gone) $ ss -tan 'sport = :8080' | head -3 State Recv-Q Send-Q Local Address:Port Peer Address:Port TIME-WAIT 0 0 10.0.4.12:8080 10.0.9.55:49210 TIME-WAIT 0 0 10.0.4.12:8080 10.0.9.61:50112 # its dying connections went TIME-WAIT, as expected. our app sets # SO_REUSEADDR, so this won't block the bind. deploy: $ sudo systemctl start app && sleep 2 && curl -s localhost:8080/healthz ok
Notice what the walkthrough never needed: kill -9, a reboot, or a guess. Each
command's output chose the next command. The single judgement call — is the 2.4.0 process
safe to terminate — was answered by evidence (wrong version, no unit, a human's login
session) rather than hope. And the final ss -tan check meant the TIME-WAIT
rows were an expected footnote instead of a second mystery.
Endings, and how to stop meeting this error
Every run of this investigation ends in one of a small set of places. The old instance was
never stopped: fix the deploy script so stopping the old release is a step with its own
verification, not a fire-and-forget. Two services were configured onto one port: move one,
and consider making port assignments a reviewed file instead of tribal knowledge. The app
lacked SO_REUSEADDR: set it; this is a one-line fix that deletes a whole class
of restart pain. A human's hand-started process was squatting: that one is culture, not
code.
Two pieces of prevention are worth building once and keeping. First, make the deploy script wait for the port to actually be free instead of binding into a race:
# after stopping the old release, before starting the new one: for i in $(seq 1 30); do ss -tlnH 'sport = :8080' | grep -q . || break sleep 1 done # -H drops the header, so "no output" cleanly means "port free"
Thirty seconds of patience, checked once a second, ends the race between the old listener's shutdown and the new bind. The same loop inverted — wait until the port answers — belongs after the start, as the health check that gates the deploy's success. Start, wait for listen, curl the health endpoint, only then call it done.
Second, for services that cannot afford the gap at all, SO_REUSEPORT exists:
it lets several sockets listen on the same port simultaneously, with the kernel spreading
incoming connections across them. The zero-downtime pattern is to start the new release
alongside the old one — both listening, both serving — then tell the old one to stop
accepting and drain. No gap, no race, no EADDRINUSE, because the port is never released at
all. It costs real engineering (both versions serve traffic at once, so they must be
compatible), which is why it is a deliberate choice for the services that need it rather
than a default. The option's mechanics live in
sockets.
ss line showing the holder (command,
PID, fd), the ps line with the start time and full command, what
systemctl status PID or the cgroup said about ownership, the action you took
and the signal you used, and whether TIME-WAIT showed up afterwards. Five lines of pasted
terminal output turns "the port was stuck so I killed some stuff" into a record the next
engineer can learn the method from — and it makes the pattern visible when the same
squatter shows up three deploys in a row.Further reading
- socket(7) — the manual page where SO_REUSEADDR and SO_REUSEPORT are actually specified, two paragraphs that settle most folklore arguments.
- Vincent Bernat — Coping with the TCP TIME-WAIT state on busy Linux servers — the definitive treatment of what TIME-WAIT costs, what it protects, and which tuning advice to ignore.
- ss(8)
— the filter language (
sport,dport, state matching) rewards ten minutes of reading once. - Semicolony — ss — the full tour of the tool this investigation leans on hardest.