Why is the disk full?
The alert fires at 2:14: root volume at 98%. At 100%, writes start failing — logs stop, databases refuse transactions, deploys break, and on a bad day the thing that needs disk to recover is the thing that cannot get any. Every full disk is one of three stories: real files you can find, deleted files a process still holds open, or a filesystem that ran out of inodes while half its bytes sat free. This page is the investigation that tells the three apart — two cheap questions, then the right tool for whichever branch you land on, with the outputs you will actually see and the fixes that do not make things worse.
One number, three suspects
Start with the number itself, because everything that follows hangs off it. The box in this
walkthrough is a web host with a 200 GB root volume, and df reports this:
$ df -h / Filesystem Size Used Avail Use% Mounted on /dev/nvme0n1p1 197G 186G 1.9G 98% /
Two things before you type anything else. First, df is telling you what the
filesystem believes: it asks the superblock how many blocks are allocated, and the
superblock answers instantly and accurately. It does not know or care what the blocks are for.
Second, notice the arithmetic does not add up — 186 used plus 1.9 available is 188, not 197.
That missing 9 GB is real and explained near the end of this page; for now just register
that df's columns hide a reserve.
A disk reads full for one of three reasons, and the entire investigation is sorting out which
one you have. Real files: something wrote a lot of data with names attached —
logs that never rotate, docker images, core dumps, a cache that grew for a year. du
will find these, and the job is a tree walk. Deleted-but-open files: a process
holds a descriptor on a file whose name was removed, so the blocks stay allocated while every
name-walking tool reports nothing. du cannot see these at all; the gap between
du and df is the tell. Inode exhaustion: the
filesystem ran out of inodes rather than bytes, usually because something created millions of
tiny files, and writes fail with "No space left on device" while df -h shows
plenty free. Different cause, identical error message, designed to waste your time.
The good news is that two commands split the whole space. df -i answers "bytes or
inodes?" in one line. Then a du walk answers "can the names account for the
bytes?" — if yes, you are hunting real files; if no, you are hunting a ghost. Here is the whole
tree before we walk it branch by branch.
Fork one: out of bytes, or out of inodes?
Every file on the filesystem needs an inode — the on-disk record that holds its metadata and
points at its data blocks. Most filesystems allocate a fixed pool of them when the volume is
created (ext4 does; XFS grows them dynamically, which is one reason this branch is mostly an
ext4 story). A million empty files consume a million inodes and almost no bytes. So a
filesystem has two ways to be full, and df has a flag for each:
$ df -i / Filesystem Inodes IUsed IFree IUse% Mounted on /dev/nvme0n1p1 13107200 1840312 11266888 15% /
On our incident box, inodes are at 15% — this branch is closed, and the bytes really are gone. But know what the other answer looks like, because when it happens it is deeply confusing. Here is a data volume that "ran out of space" with 35 GB free:
$ df -h /srv; df -i /srv Filesystem Size Used Avail Use% Mounted on /dev/sdb1 98G 59G 35G 63% /srv Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sdb1 6553600 6553600 0 100% /srv
Every create() on that volume now fails with ENOSPC — the same errno,
the same "No space left on device" string, no hint that the resource that ran out was inodes.
The culprits are always the same shape: something that creates one small file per event and
never cleans up. PHP session files. A maildir that receives a message a second. A cache that
shards into one file per key. A queue spooling one file per job, with the consumer dead for a
month. The bytes barely move; the inode pool drains.
Finding the directory is the only interesting part, because du sorts by bytes and
bytes are exactly the wrong metric here. You want a file count per directory, and the
trick is to make find print each file's parent directory and let
sort | uniq -c do the counting:
$ sudo find /srv -xdev -type f -printf '%h\n' | sort | uniq -c | sort -rn | head -3 6212482 /srv/app/var/cache/prod/pools 201510 /srv/app/var/sessions 88314 /srv/uploads/thumbs
Six million cache entries. A faster first pass, if you want one: directory files themselves
grow as they accumulate entries, so find /srv -xdev -type d -size +1M lists
directories whose own index has blown past a megabyte — a directory with millions of entries
cannot hide its bulk. Once you have the path, deleting six million files is its own small
project: a plain rm dir/* will fail because the shell expands the glob into an
argument list longer than the kernel accepts. Use find ... -delete or stream the
names through xargs instead — the mechanics, and why the argument-list limit
exists, are covered in find & xargs.
And the durable fix is never the deletion; it is whatever stops the directory refilling —
session garbage collection turned on, cache TTLs, a consumer for the queue.
Walking the tree
Back on the incident box: inodes fine, 186 GB allocated, so now the question is where the
bytes live. du is the opposite of df in every way that matters: it
takes no shortcuts, walks the directory tree, stats every file, and adds the sizes up. That
makes it slow, and it makes it honest about names — it can only count what it can reach by
path. The standard opening move is one level at a time, sorted so the biggest line is at the
bottom where your eye lands:
$ sudo du -xh --max-depth=1 / 2>/dev/null | sort -h | tail -6 1.6G /opt 2.2G /home 3.1G /usr 94G /var 102G /
Three deliberate choices in that one line. sudo, because an unprivileged
du silently skips directories it cannot read and hands you a number that is wrong
without saying so. 2>/dev/null, to drop the permission-denied noise from
/proc and friends. And -x, which is the flag people forget exactly
once. It tells du not to cross filesystem boundaries — without it, the walk of
/ happily descends into every mounted volume: the NFS share on
/data, the separate /home volume, every bind mount a container
runtime scattered around. You end up either staring at 600 GB of perfectly healthy network
storage that has nothing to do with your full root disk, or hanging on a dead NFS mount at the
worst possible moment. The question is "what is filling this filesystem," and
-x is what keeps the question honest.
From here it is recursion by hand: take the fattest directory, run the same command on it, repeat until you hit files.
$ sudo du -xh --max-depth=1 /var 2>/dev/null | sort -h | tail -4 2.1G /var/cache 22G /var/log 67G /var/lib 94G /var $ sudo du -xh --max-depth=1 /var/lib 2>/dev/null | sort -h | tail -3 3.8G /var/lib/journal 61G /var/lib/docker 67G /var/lib
If ncdu is installed — and it is worth installing before the incident, not during
it — it does the same walk once and gives you an interactive browser over the result:
sudo ncdu -x /, arrow keys to descend, sorted by size, with delete behind a
confirmation. Same data as du, far fewer keystrokes. Either way, after a few
minutes you have a map: docker is the biggest tenant at 61 GB, logs hold 22, and the rest
is scattered small stuff.
But stop and check the totals before you start cleaning, because this is the moment most
investigations go wrong. du says everything reachable from / on this
filesystem adds up to 102 GB. df says
186 GB is allocated. Eighty-four gigabytes are on the disk and not under
any name. No amount of tree-walking will find them, because tree-walking is exactly what they
are hiding from.
When du and df disagree
A big gap between df and du on the same filesystem almost always
means one thing: a deleted file that a process still has open. The mechanism is plain
filesystem bookkeeping. A file's name (the directory entry) and a file's substance (the inode
and its data blocks) are separate things, and deleting is the removal of a name. The
kernel frees the inode and blocks only when two counts both reach zero: the number of names
pointing at the inode, and the number of open descriptors holding it. Remove the last name
while a process still holds a descriptor, and you get a ghost — zero names, blocks still
allocated, a file that exists for exactly one process and for the block allocator and for
nobody else. The link-count machinery behind this is laid out in
file systems, and you
can perform the whole life cycle by hand — create, link, unlink, watch the counts — in the
filesystem simulator.
The classic way this happens in production: an application opens its log at startup and writes
to the same descriptor forever. Months later the file is enormous, and someone — a human doing
emergency cleanup, or a logrotate config that deletes instead of signalling — removes it.
The name disappears, ls looks clean, and the process keeps appending to a file
that no longer has a name, growing the invisible allocation with every request. The tool that
finds it is lsof +L1, which lists open files whose link count is below one:
$ sudo lsof -nP +L1 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME java 41327 deploy 4w REG 259,1 90194313216 0 524291 /var/log/app/server.log (deleted)
There is the missing 84 GB, with the holder's name and PID attached: java, PID 41327,
descriptor 4, open for writing, link count zero, (deleted). Reading every column
of that output — and everything else lsof can tell you — is the subject of
lsof. What matters here is the fix, and the fix has
a hierarchy. Work down it; do not skip to the bottom.
First choice: make the process let go politely. Most logging daemons and many
servers reopen their log files when asked — that is exactly what a logrotate
postrotate script does when it sends SIGHUP or
SIGUSR1, and many services expose the same thing as
systemctl reload. The process closes descriptor 4, the kernel sees the last
reference drop, and 84 GB returns instantly. Check what signal this particular service
expects before sending anything — HUP means "reopen logs" to nginx and rsyslog and means
"die" to plenty of other programs. Which signal does what, and how to find out, is covered in
kill & signals.
Second choice: truncate the file through the descriptor. If the process has no
reopen mechanism and you cannot restart it right now, /proc gives you a path to
the ghost: /proc/41327/fd/4 is a live handle on the deleted file, and truncating
it frees the blocks without touching the process.
$ sudo truncate -s 0 /proc/41327/fd/4 $ df -h / Filesystem Size Used Avail Use% Mounted on /dev/nvme0n1p1 197G 102G 86G 55% /
The process keeps its descriptor and keeps writing; you have simply cut the file's length to
zero underneath it. (One wrinkle worth knowing: if the program opened the log without
O_APPEND, its next write lands at its old offset and the file becomes sparse —
harmless for disk space, occasionally surprising in the output. Truncating an
O_APPEND log is clean.) Restarting the service also works, of course — it is just
the bluntest version of "close the descriptor," and it costs you whatever a restart costs.
What is never on the list: deleting more files. This is the moment in the
incident where the panicked move is to rm other big things you can see, and it is
worth being precise about why that is wrong twice over. Practically: if you rm
another file that some process has open, you have not freed its space either — you have minted
a second ghost and made the df-versus-du gap bigger. And operationally: files deleted in a
panic at 2 am have a way of turning out to be the database's write-ahead log. The
original sin in this incident was somebody deleting an open file instead of rotating it;
repeating the sin faster is not a fix. Truncate, signal, or rotate — deletion is for files
nothing holds open, identified calmly.
The usual suspects
The ghost explained 84 GB, but the named files on this box still hold 102, and the same handful of culprits show up on almost every machine. When the du walk lands you in one of these, here is what to check and what is safe to reclaim.
Logs that never rotate. Our box has 22 GB in /var/log, and
the tree walk will point at the heavy files directly. Two sub-cases. Application logs that
grew because nobody wrote a logrotate stanza for them: compress or truncate the big ones now,
add the stanza after the incident. And the systemd journal, which manages its own retention
and answers for itself:
$ journalctl --disk-usage Archived and active journals take up 3.8G in the file system. $ sudo journalctl --vacuum-size=500M Vacuuming done, freed 3.3G of archived journals from /var/log/journal/…
Vacuuming archived journals is one of the safest reclaims available — it deletes only rotated journal segments, oldest first, through the tool that owns them. What the journal is, how its retention settings work, and how to read it properly is the territory of journalctl & dmesg.
Docker, the 61 GB tenant. Container hosts accumulate disk in four
separate pools, and docker system df itemises them:
$ docker system df TYPE TOTAL ACTIVE SIZE RECLAIMABLE Images 38 6 41.2GB 33.9GB (82%) Containers 9 4 2.1GB 1.8GB (85%) Local Volumes 31 5 14.7GB 11.2GB (76%) Build Cache 214 0 8.4GB 8.4GB
Old image layers from every deploy ever, stopped containers nobody removed, build cache, and
volumes orphaned by deleted containers. docker image prune -a and
docker builder prune reclaim the first and last with little drama;
read before running docker volume prune, because volumes are where the
data lives, and an "unused" volume may be unused only because its container is temporarily
stopped. One more docker-shaped trap: container stdout logs default to unbounded json files
under /var/lib/docker/containers/, and a chatty container can write a 30 GB
log file that no logrotate config knows about. Set max-size in the logging
options and the problem stays solved.
Package debris and old kernels. /var/cache/apt holds every
package archive ever downloaded until you say otherwise — sudo apt-get clean is
free money, often a gigabyte or two. Old kernel versions accumulate in /boot and
as installed packages; apt autoremove --purge clears the ones the system no
longer needs. /boot is small and ignorable right up until it is a separate
100 MB partition, at which point three old kernels fill it completely and upgrades start
failing with a confusingly local version of this whole page.
Core dumps. A crashing process can leave a dump the size of its address
space — multi-gigabyte files written in seconds. Look for core.* files in
application working directories, check coredumpctl list on systemd boxes (dumps
live under /var/lib/systemd/coredump), and check /var/crash. A
service in a crash loop with core dumps enabled is one of the few things that can fill a disk
in minutes rather than months.
/tmp and /var/tmp. Crashed jobs leave their working sets behind: extracted
archives, sort spill files, half-written exports. Anything old in /tmp is usually
fair game, but check owners and timestamps before sweeping — and if a tmpfile is huge and
recent, the job that made it may still be running and holding it open, which puts you back in
the previous section. A final net for everything the categories miss: one sweep for individual
big files, sorted oldest-first so the long-forgotten ones surface:
$ sudo find / -xdev -type f -size +1G -printf '%s\t%TY-%Tm-%Td\t%p\n' 2>/dev/null | sort -n | tail -5 1438092871 2024-11-02 /home/deploy/heapdump-prod.hprof 2147748332 2025-03-19 /opt/app/exports/full-2025-03-19.sql 6442450944 2025-08-30 /var/tmp/dataset-staging.tar
Sparse files and the missing five percent
Two footnotes to the investigation, both of which eventually bite everyone who does this work.
The first: not every big-looking file is big. Filesystems support sparse files, where
blocks that were never written simply are not allocated — the file has a logical size (what
ls -l reports) and a physical size (what du reports, in actual
blocks), and the two can differ by orders of magnitude:
$ ls -lh /var/log/lastlog -rw-rw-r-- 1 root utmp 1.2G Jun 8 09:14 /var/log/lastlog $ du -h /var/log/lastlog 16K /var/log/lastlog
lastlog is indexed by UID, so one login from a high-UID account extends the file's
logical size out to gigabytes while allocating a few blocks. VM disk images and pre-sized
database files behave the same way. The practical rules: trust du over
ls when you are accounting for disk, do not "fix" a sparse file you do not
recognise, and be careful copying them — a copy tool that does not understand holes
(cp mostly does; some backup tools do not) writes the zeros for real, and the
copy genuinely consumes the full logical size on the destination.
The second footnote answers the arithmetic from the very first df: 186 used plus
1.9 available is not 197. ext filesystems reserve a slice of blocks — 5% by default — that
only root can allocate:
$ sudo tune2fs -l /dev/nvme0n1p1 | grep -i 'block count' Block count: 51642880 Reserved block count: 2582144
The reserve exists so the machine stays operable when users fill it: root can still log in,
the system's own daemons can still write what they must, and the filesystem keeps a little
slack for its allocator to avoid pathological fragmentation. The consequences for your
incident reading: unprivileged processes start getting ENOSPC at roughly 95%
real occupancy, Use% is computed against the space available to non-root and can
read 100% with gigabytes technically free, and a root-owned writer (a log daemon, say) can
keep growing a file after every user process has started failing. On a data-only volume that
root never writes to, tune2fs -m 1 trims the reserve and hands back several
gigabytes; on the root filesystem, leave it alone — it is the slack that lets you fix the
next one of these incidents while logged in.
Closing the incident
On our box the ledger now reconciles: 84 GB came back when the ghost log was truncated,
docker pruning and journal vacuuming reclaimed another 40, and df sits at 31%
with the arithmetic understood. Before you close the page, two pieces of discipline separate
an incident that is over from one that is merely paused.
Write the notes while the terminal scrollback still exists. Five lines is
enough, but they need to be the right five: the df line as found and as left; the
culprit, by path and by mechanism ("deleted-but-open /var/log/app/server.log,
84 GB, held by java PID 41327 since the May logrotate change"); exactly what you deleted
or truncated, with sizes; the fix applied and the signal or command used; and the follow-up
that prevents recurrence, with an owner. The mechanism line matters most. "Disk was full,
cleaned up logs" tells the next responder nothing; "logrotate was deleting a file the app
never reopens" tells them the config to fix and the trap to avoid. Paste the actual
lsof +L1 output — thirty seconds now, and the next 2 am responder gets your
whole investigation for free.
Then make the prevention real. Most full-disk pages are scheduled months in
advance, and the schedule is visible if anything is looking. Alert at 80%, not 95 — at 80% on
a slow-growth disk you have weeks to act during business hours; at 95% you have a pager and
adrenaline. Better, alert on the trend: "full in under four days at the current
growth rate" catches both the slow leak and the fast one. Audit logrotate after any incident
it caused: every config that rotates a live application's log must either signal the app to
reopen (postrotate) or use copytruncate — which copies then empties
the live file, at the cost of possibly losing the lines written between the two steps. And
give every growing dataset a retention job the day it is created: uploads, exports, build
artifacts, anything with a timestamp in its filename. Data with no deletion policy has a
deletion policy; it is called this incident.
df -h / to size the problem, df -i / to rule inodes in or out,
sudo du -xh --max-depth=1 / | sort -h repeated downward to find named bulk
(never forget the -x), and sudo lsof -nP +L1 when du's total falls
short of df's. Everything else on this page is what to do with their answers.Further reading
- df(1) and du(1) — short man pages for once; the df notes on how Use% is computed repay the two minutes.
- tune2fs(8)
— the
-mand-loptions behind the reserved-blocks story. - logrotate(8) — the create/postrotate/copytruncate section explains every way rotation goes wrong, including the one in this incident.
- Semicolony — lsof — the tool behind +L1, column by column, including the descriptor-table machinery the ghost-file trick rests on.