lsof
Something is squatting on port 8080. The disk is full but du can only account
for half of it. A service keeps dying with "too many open files." All three are the same
question wearing different clothes: who has this thing open right now? That is the
one question lsof answers, and because Linux treats sockets, pipes, and devices
as files too, it answers it for all of them. This page covers the five flags worth memorising,
decodes every column of the output, walks three production incidents, and ends with a drill
you can run on any machine without breaking anything.
The question it answers
The name is literal: lsof lists open files. That sounds narrow until you remember
what "file" means on Linux. A TCP socket is a file. A unix domain socket is a file. A pipe
between two processes is a file. The terminal you are typing into, the shared libraries a
process mapped into memory, the directory it is sitting in, the device node for your disk —
files, all of them. Every one of these is reached through a file descriptor, and
lsof is the tool that walks every process on the machine and reports every
descriptor each one holds.
That single capability turns out to be the answer to a whole family of operational questions.
Who is listening on port 8080? That is a process holding a socket open. Why can I not unmount
this volume? Some process has a file or a working directory on it. Where did 80 GB of disk
go that du cannot find? A process is holding a deleted file open, and the kernel
will not free the blocks until it lets go. Why does this service crash with "too many open
files"? Its descriptor count crept up to the limit, one leaked socket at a time. Different
symptoms, same diagnostic: list the open files and look.
It helps to know what lsof is not. It is not a network sniffer; it shows you which
process owns a connection, not what travels over it. It is not a snapshot of history; it shows
the state of the machine at the instant it ran, and a short-lived process can open and close a
file between two invocations without ever appearing. And it is not free: on a busy box it does
a surprising amount of work, which matters later. But when the question is "who has this open,
right now," nothing else gives you the same direct answer with the process name and PID
attached. The companion tools each cover a slice — ss
is faster for sockets, fuser is terser for a single file — but lsof
is the one that covers everything with one mental model.
The five flags that matter
The man page for lsof is enormous, and nearly all of it is ignorable. Five flags
cover the daily work, and one of them you should type by reflex every single time.
| Flag | What it selects | When you reach for it |
|---|---|---|
-i :8080 | Network files, filtered by port, host, or protocol | Port conflicts, "what is listening," tracing a connection to its process |
-p 41327 | Everything one process has open | Descriptor leaks, auditing what a service touches |
-u deploy | Everything one user's processes hold | Shared boxes, runaway cron jobs, "what is this account doing" |
+L1 | Files with a link count below one: deleted, but still open | Disk space that df sees and du cannot |
-nP | Nothing — it skips DNS (-n) and port-name (-P) lookups | Always. Every invocation. See below. |
The -i selector takes a small grammar: -i :8080 matches the port on
any address, -i TCP:8080 narrows to one protocol, -i @10.0.4.12
matches a host, and -i TCP:8080 -sTCP:LISTEN narrows to sockets actually in the
listening state, which is usually what you mean when you ask who owns a port. One subtlety
worth knowing before it bites you: when you stack several selectors, lsof ORs
them together by default. lsof -u deploy -i :8080 means "deploy's files,
plus anything on port 8080," not the intersection. Add -a to switch the
logic to AND: lsof -a -u deploy -i :8080 is "deploy's files that are on port
8080." Almost everyone learns this by staring at output that is mysteriously too long.
-n, lsof does a reverse DNS
lookup for every remote address it prints, and without -P it resolves every port
number to a service name. On a machine with hundreds of connections and a slow or absent DNS
resolver, those lookups serialise into what feels like a hang — the tool sits silent for thirty
seconds while you wonder if the box is dying. It is not dying. It is resolving hostnames you
did not ask for. lsof -nP prints raw numbers immediately, and raw numbers are what
you want during an incident anyway.Reading the output
Here is a realistic answer to "who is on port 8080," taken from the kind of box where a Java
service sits behind an nginx proxy. Run it with sudo, because without root you
only see your own processes — more on that in the pitfalls.
$ sudo lsof -nP -i :8080 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 41327 deploy 89u IPv6 812644 0t0 TCP *:8080 (LISTEN) java 41327 deploy 92u IPv6 815091 0t0 TCP 10.0.4.12:8080->10.0.9.55:49210 (ESTABLISHED) nginx 1290 root 12u IPv4 433190 0t0 TCP 127.0.0.1:46214->127.0.0.1:8080 (ESTABLISHED)
Most of the columns explain themselves once you have seen them once. COMMAND is the process
name, truncated to nine characters by default (widen it with +c 0 if the
truncation hides what you need). PID and USER are the process id and the account the process
runs as — the account that owns the process, which is not necessarily the account that
owns the file. TYPE tells you what kind of file this is: REG for a regular file,
DIR for a directory, CHR for a character device, FIFO
for a pipe, unix for a unix domain socket, IPv4 and IPv6
for network sockets. DEVICE identifies the device or socket in kernel terms. SIZE/OFF is the
file's size or the descriptor's current offset; sockets show 0t0 because the
concept does not apply. NODE is the inode number for filesystem objects and the protocol name
for sockets. NAME is the payoff: the path for files, and for network sockets the full
local->remote address pair with the connection state in parentheses.
The FD column is the one nobody teaches, and it is where the real information lives. It is not
always a number. lsof also uses it to report things a process has open that are
not descriptors at all, and when it is a number, the letter glued to the end tells you the
access mode.
| FD entry | What it means |
|---|---|
cwd | The process's current working directory. This alone can pin a filesystem and block an unmount. |
rtd | The process's root directory (interesting for chrooted or containerised processes) |
txt | Program text: the executable file itself |
mem | A memory-mapped file, most often a shared library |
89r | Descriptor number 89, open for reading only |
89w | Descriptor 89, open for writing only — log files usually look like this |
89u | Descriptor 89, open for both read and write — sockets usually look like this |
Two practical reads fall out of this decoder. First, when you are counting descriptors for a
leak investigation, only the numeric rows are actual descriptors; cwd,
txt, and the mem rows are not, so piping lsof -p into
wc -l overcounts. Second, the mode letter is a clue about intent: a process
holding a log file with 4w is appending to it, and a process holding a socket
with 92u is talking on it. Occasionally you will also see a lock indicator after
the mode letter, such as 4wW for a held write lock — useful when two processes
are fighting over a lock file and you want to know who won.
Three production scenarios
"Address already in use" on deploy
The deploy fails, the service will not start, and the log says
bind: address already in use. Something already owns the port. Maybe the old
instance never died, maybe a debug process from last week is still attached, maybe an orphaned
child survived a restart because children inherit their parent's descriptors across
fork(). You do not need to guess:
$ sudo lsof -nP -iTCP:8080 -sTCP:LISTEN COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 38104 deploy 89u IPv6 798112 0t0 TCP *:8080 (LISTEN)
One line, and you have the culprit's name, PID, and owner. From there it is judgement, not
tooling: is PID 38104 the previous release that the supervisor failed to reap, or something
that legitimately holds the port? Check with ps -fp 38104 before you reach for
kill. The narrower -sTCP:LISTEN filter matters here because without
it you also get every established connection touching port 8080, and during an incident the
extra rows are noise. The full decision tree for this incident, including the cases where
nothing appears to be listening and yet the bind still fails, lives in
what's holding this port? — and
if you only need the socket-side view on a box where every second counts,
ss gets the same answer faster.
Disk full, but du disagrees
df says the volume is at 96%. You run du on every directory and the
sum comes nowhere close. This is the classic deleted-but-open file: a process opened a log,
something (often logrotate, sometimes a tidy-minded human) deleted the file, and the process
kept writing to it. Deleting a file removes its name from the directory. The inode
and its data blocks stay allocated until the last open descriptor closes. du
walks names, so it cannot see the space; df asks the filesystem for allocated
blocks, so it can. The gap between them is your missing disk.
$ sudo lsof -nP +L1 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME java 41327 deploy 4w REG 259,1 84817930240 0 524291 /var/log/app/server.log (deleted)
+L1 means "files with a link count less than one" — zero names left, but still
open. The NLINK column appears just for this query, the size column shows the 84 GB you
were hunting, and NAME ends with (deleted). The fix is rarely "kill the process."
You can truncate the file through the descriptor without restarting anything:
: > /proc/41327/fd/4 empties it in place and the space comes back immediately.
Then fix the rotation config so it signals the process instead of deleting files out from
under it. The wider investigation, including the other ways a disk fills invisibly, is the
subject of why is the disk full?
The slow descriptor leak
A service that has run fine for weeks starts throwing EMFILE: too many open files.
Each process has a descriptor limit (ulimit -n, commonly 1024 or 65536), and
something in the code path opens sockets or files without closing them — an HTTP client that
never releases connections on the error path is the usual suspect. The diagnostic is to watch
the count grow:
$ sudo lsof -nP -p 41327 | wc -l 2741 $ sleep 300; sudo lsof -nP -p 41327 | wc -l 3088
A count that climbs steadily under constant load is a leak. Remember the overcount caveat from
the FD decoder: lsof -p includes mem and cwd rows that
are not descriptors, so for a precise number ask the kernel directly with
ls /proc/41327/fd | wc -l. Then look at what is leaking rather than how
much: ls -l /proc/41327/fd prints every descriptor as a symlink to its target,
and the pattern jumps out — hundreds of links to socket:[918432] means leaked
connections, hundreds to the same file path means a missing close() in a retry
loop. Group the lsof output by TYPE and NAME and the offending code path usually names itself.
Raising ulimit -n buys time; it does not fix a leak, it reschedules the outage.
What lsof actually reads
There is no magic in lsof, and knowing where its data comes from makes the output
easier to trust. Every process on Linux owns a file descriptor table: a per-process array,
indexed by small integers, where each slot points at an open file description in the kernel —
which in turn points at an inode, a socket, a pipe, or a device. Descriptor 0 is standard
input, 1 is standard output, 2 is standard error, and everything the process opens after that
takes the next free slot. When your code calls open() or socket(),
the integer it gets back is nothing more than an index into this table.
The kernel exposes that table through the /proc filesystem.
/proc/41327/fd/ is a directory containing one symlink per open descriptor, each
pointing at its target: a path for regular files, socket:[815091] for sockets,
pipe:[812001] for pipes. lsof is, to a first approximation, a
program that walks /proc/*/fd for every process, reads
/proc/PID/maps for the memory-mapped files, joins the socket inode numbers
against the tables in /proc/net/tcp and friends to recover addresses and states,
and formats the result. You can verify this yourself: ls -l /proc/$$/fd shows
your own shell's table, no tooling required, and during a bad incident when lsof
is not installed, raw /proc spelunking gets you most of the same answers.
This is also where descriptor inheritance comes from. fork() copies the parent's
descriptor table into the child, which is why a child process can hold a listening socket its
parent opened, and why "I killed the server but the port is still taken" usually means a
forked worker survived. The deeper anatomy of processes and their tables is covered in
processes, the
/proc filesystem gets its own page at
/proc, the inode-and-link-count machinery behind
the deleted-file trick lives in
file systems, and
what actually happens when a process reads or writes through one of these descriptors is the
subject of I/O.
Pitfalls
Forgetting -nP. Covered above, but it earns a second mention because it is the
most common way the tool wastes your time. If lsof appears to hang, it is almost
certainly resolving hostnames. Ctrl-C, add -nP, run it again.
Running it without root and trusting the silence. An unprivileged
lsof can only inspect your own processes, because reading another user's
/proc/PID/fd requires permission you do not have. The dangerous part is that the
output is not an error; it is a shorter list. You ask who is on port 8080, get nothing back,
and conclude the port is free while a root-owned process sits on it invisibly. If the question
involves any process you do not own — and during an incident it nearly always does — run it
under sudo, and treat an empty answer from an unprivileged run as "no answer," not
"no."
Expecting it to be fast on a big box. A bare lsof with no
selectors enumerates every descriptor of every process: on a host running thousands of
processes with tens of thousands of descriptors each, that is real work and real time. Worse,
stat-ing files on a hung NFS mount can block the whole run. Narrow the query with selectors
(-p, -i, -u) so it reads only the slice of
/proc you care about, and reach for -b to avoid blocking kernel
calls if flaky network mounts are part of your life.
Forgetting fuser exists. For two narrow questions, fuser is
quicker to type and quicker to run: fuser -v /var/log/app/server.log lists the
PIDs holding one specific file, and fuser -vm /data lists everything keeping a
mount point busy, which is exactly what you want when umount says the target is
in use. It prints far less detail than lsof, and that is the point. Know both;
use the small one when the question is small.
Treating the output as a recording. lsof is a snapshot. A
process that opens, reads, and closes a file in fifty milliseconds will almost never be
caught by it. If you need to know who touches a file over time rather than who holds
it open right now, that is a tracing problem, not a listing problem.
A drill you can run right now
Everything below is safe on any Linux machine, including a shared one: it inspects state and
creates one throwaway file in /tmp. Ten minutes, and the three big ideas — the
port view, the descriptor table, and the deleted-but-open inode — stop being trivia and become
things you have seen.
Step 1 — the network view. List every network file your account can see, with
lookups off: lsof -nP -i. Pick one row and read it column by column against the
decoder above: who owns it, which descriptor, what state. If you have sudo, run it again with
sudo and notice how much longer the list gets — that difference is the
unprivileged-silence pitfall made visible.
Step 2 — your own shell's table. Run lsof -p $$ (the shell
expands $$ to its own PID). Find cwd (the directory you are sitting
in), txt (the shell binary itself), the mem rows (libc and friends),
and descriptors 0, 1, and 2 all pointing at your terminal device. Then cross-check against the
kernel directly with ls -l /proc/$$/fd and confirm the numeric rows match.
Step 3 — make a ghost file and catch it. Create a file, hold it open with
tail -f, delete it, and watch it live on:
$ cd /tmp && echo "hold me" > demo.txt $ tail -f demo.txt & [1] 7012 $ rm demo.txt $ lsof -nP +L1 | grep demo tail 7012 nilesh 3r REG 259,1 8 0 524300 /tmp/demo.txt (deleted) $ cat /proc/7012/fd/3 hold me $ kill %1
Walk through what just happened. rm removed the name, so the file vanished from
ls and from anything du would count. But tail still
holds descriptor 3 on the inode, so lsof +L1 finds it, NLINK reads zero, and NAME
says (deleted). Better still, cat /proc/7012/fd/3 reads the file's
contents back after deletion — the same trick that lets you recover a log someone
deleted from under a running service, and the same mechanism that hides 84 GB on a
production volume. When you kill %1, the last descriptor closes and the kernel
finally frees the inode. That is the entire deleted-but-open story, performed on a file eight
bytes long instead of a pager at 3am.
sudo lsof -nP -i :PORT for "who owns
this port," sudo lsof -nP +L1 for "where did the disk go," and
ls -l /proc/PID/fd when you want the kernel's answer with no tool in between.Further reading
- lsof(8) — the manual page
— vast, but the OPTIONS section on
-iselectors and the FD column description in OUTPUT are worth a careful read once. - proc(5) — the documentation for everything under /proc, including the fd, maps, and net entries lsof is built on.
- Julia Evans — lsof, the comic — the whole tool on one page, which is roughly the right amount of ceremony for it.
- Semicolony — What's holding this port? — the full incident walkthrough this page's first scenario comes from.