16 / 28

Linux / 16

Why is the disk full?

The alert fires at 2:14: root volume at 98%. At 100%, writes start failing — logs stop, databases refuse transactions, deploys break, and on a bad day the thing that needs disk to recover is the thing that cannot get any. Every full disk is one of three stories: real files you can find, deleted files a process still holds open, or a filesystem that ran out of inodes while half its bytes sat free. This page is the investigation that tells the three apart — two cheap questions, then the right tool for whichever branch you land on, with the outputs you will actually see and the fixes that do not make things worse.

One number, three suspects

Start with the number itself, because everything that follows hangs off it. The box in this walkthrough is a web host with a 200 GB root volume, and df reports this:

$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p1  197G  186G  1.9G  98% /

Two things before you type anything else. First, df is telling you what the filesystem believes: it asks the superblock how many blocks are allocated, and the superblock answers instantly and accurately. It does not know or care what the blocks are for. Second, notice the arithmetic does not add up — 186 used plus 1.9 available is 188, not 197. That missing 9 GB is real and explained near the end of this page; for now just register that df's columns hide a reserve.

A disk reads full for one of three reasons, and the entire investigation is sorting out which one you have. Real files: something wrote a lot of data with names attached — logs that never rotate, docker images, core dumps, a cache that grew for a year. du will find these, and the job is a tree walk. Deleted-but-open files: a process holds a descriptor on a file whose name was removed, so the blocks stay allocated while every name-walking tool reports nothing. du cannot see these at all; the gap between du and df is the tell. Inode exhaustion: the filesystem ran out of inodes rather than bytes, usually because something created millions of tiny files, and writes fail with "No space left on device" while df -h shows plenty free. Different cause, identical error message, designed to waste your time.

The good news is that two commands split the whole space. df -i answers "bytes or inodes?" in one line. Then a du walk answers "can the names account for the bytes?" — if yes, you are hunting real files; if no, you are hunting a ghost. Here is the whole tree before we walk it branch by branch.

The full-disk decision tree. df -i first because it costs one second; the du walk second because it costs minutes; lsof +L1 when the first two leave bytes unaccounted for.

Fork one: out of bytes, or out of inodes?

Every file on the filesystem needs an inode — the on-disk record that holds its metadata and points at its data blocks. Most filesystems allocate a fixed pool of them when the volume is created (ext4 does; XFS grows them dynamically, which is one reason this branch is mostly an ext4 story). A million empty files consume a million inodes and almost no bytes. So a filesystem has two ways to be full, and df has a flag for each:

$ df -i /
Filesystem       Inodes   IUsed    IFree IUse% Mounted on
/dev/nvme0n1p1 13107200 1840312 11266888   15% /

On our incident box, inodes are at 15% — this branch is closed, and the bytes really are gone. But know what the other answer looks like, because when it happens it is deeply confusing. Here is a data volume that "ran out of space" with 35 GB free:

$ df -h /srv; df -i /srv
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1        98G   59G   35G  63% /srv
Filesystem      Inodes   IUsed IFree IUse% Mounted on
/dev/sdb1      6553600 6553600     0  100% /srv

Every create() on that volume now fails with ENOSPC — the same errno, the same "No space left on device" string, no hint that the resource that ran out was inodes. The culprits are always the same shape: something that creates one small file per event and never cleans up. PHP session files. A maildir that receives a message a second. A cache that shards into one file per key. A queue spooling one file per job, with the consumer dead for a month. The bytes barely move; the inode pool drains.

Finding the directory is the only interesting part, because du sorts by bytes and bytes are exactly the wrong metric here. You want a file count per directory, and the trick is to make find print each file's parent directory and let sort | uniq -c do the counting:

$ sudo find /srv -xdev -type f -printf '%h\n' | sort | uniq -c | sort -rn | head -3
6212482 /srv/app/var/cache/prod/pools
 201510 /srv/app/var/sessions
  88314 /srv/uploads/thumbs

Six million cache entries. A faster first pass, if you want one: directory files themselves grow as they accumulate entries, so find /srv -xdev -type d -size +1M lists directories whose own index has blown past a megabyte — a directory with millions of entries cannot hide its bulk. Once you have the path, deleting six million files is its own small project: a plain rm dir/* will fail because the shell expands the glob into an argument list longer than the kernel accepts. Use find ... -delete or stream the names through xargs instead — the mechanics, and why the argument-list limit exists, are covered in find & xargs. And the durable fix is never the deletion; it is whatever stops the directory refilling — session garbage collection turned on, cache TTLs, a consumer for the queue.

Walking the tree

Back on the incident box: inodes fine, 186 GB allocated, so now the question is where the bytes live. du is the opposite of df in every way that matters: it takes no shortcuts, walks the directory tree, stats every file, and adds the sizes up. That makes it slow, and it makes it honest about names — it can only count what it can reach by path. The standard opening move is one level at a time, sorted so the biggest line is at the bottom where your eye lands:

$ sudo du -xh --max-depth=1 / 2>/dev/null | sort -h | tail -6
1.6G    /opt
2.2G    /home
3.1G    /usr
94G     /var
102G    /

Three deliberate choices in that one line. sudo, because an unprivileged du silently skips directories it cannot read and hands you a number that is wrong without saying so. 2>/dev/null, to drop the permission-denied noise from /proc and friends. And -x, which is the flag people forget exactly once. It tells du not to cross filesystem boundaries — without it, the walk of / happily descends into every mounted volume: the NFS share on /data, the separate /home volume, every bind mount a container runtime scattered around. You end up either staring at 600 GB of perfectly healthy network storage that has nothing to do with your full root disk, or hanging on a dead NFS mount at the worst possible moment. The question is "what is filling this filesystem," and -x is what keeps the question honest.

From here it is recursion by hand: take the fattest directory, run the same command on it, repeat until you hit files.

$ sudo du -xh --max-depth=1 /var 2>/dev/null | sort -h | tail -4
2.1G    /var/cache
22G     /var/log
67G     /var/lib
94G     /var
$ sudo du -xh --max-depth=1 /var/lib 2>/dev/null | sort -h | tail -3
3.8G    /var/lib/journal
61G     /var/lib/docker
67G     /var/lib

If ncdu is installed — and it is worth installing before the incident, not during it — it does the same walk once and gives you an interactive browser over the result: sudo ncdu -x /, arrow keys to descend, sorted by size, with delete behind a confirmation. Same data as du, far fewer keystrokes. Either way, after a few minutes you have a map: docker is the biggest tenant at 61 GB, logs hold 22, and the rest is scattered small stuff.

But stop and check the totals before you start cleaning, because this is the moment most investigations go wrong. du says everything reachable from / on this filesystem adds up to 102 GB. df says 186 GB is allocated. Eighty-four gigabytes are on the disk and not under any name. No amount of tree-walking will find them, because tree-walking is exactly what they are hiding from.

When du and df disagree

A big gap between df and du on the same filesystem almost always means one thing: a deleted file that a process still has open. The mechanism is plain filesystem bookkeeping. A file's name (the directory entry) and a file's substance (the inode and its data blocks) are separate things, and deleting is the removal of a name. The kernel frees the inode and blocks only when two counts both reach zero: the number of names pointing at the inode, and the number of open descriptors holding it. Remove the last name while a process still holds a descriptor, and you get a ghost — zero names, blocks still allocated, a file that exists for exactly one process and for the block allocator and for nobody else. The link-count machinery behind this is laid out in file systems, and you can perform the whole life cycle by hand — create, link, unlink, watch the counts — in the filesystem simulator.

The disagreement is structural. du walks the name tree on the left and cannot reach the inode in the middle; df reads the allocator on the right, which counts every block regardless of names.

The classic way this happens in production: an application opens its log at startup and writes to the same descriptor forever. Months later the file is enormous, and someone — a human doing emergency cleanup, or a logrotate config that deletes instead of signalling — removes it. The name disappears, ls looks clean, and the process keeps appending to a file that no longer has a name, growing the invisible allocation with every request. The tool that finds it is lsof +L1, which lists open files whose link count is below one:

$ sudo lsof -nP +L1
COMMAND   PID   USER   FD   TYPE DEVICE    SIZE/OFF NLINK   NODE NAME
java    41327 deploy   4w   REG  259,1 90194313216     0 524291 /var/log/app/server.log (deleted)

There is the missing 84 GB, with the holder's name and PID attached: java, PID 41327, descriptor 4, open for writing, link count zero, (deleted). Reading every column of that output — and everything else lsof can tell you — is the subject of lsof. What matters here is the fix, and the fix has a hierarchy. Work down it; do not skip to the bottom.

First choice: make the process let go politely. Most logging daemons and many servers reopen their log files when asked — that is exactly what a logrotate postrotate script does when it sends SIGHUP or SIGUSR1, and many services expose the same thing as systemctl reload. The process closes descriptor 4, the kernel sees the last reference drop, and 84 GB returns instantly. Check what signal this particular service expects before sending anything — HUP means "reopen logs" to nginx and rsyslog and means "die" to plenty of other programs. Which signal does what, and how to find out, is covered in kill & signals.

Second choice: truncate the file through the descriptor. If the process has no reopen mechanism and you cannot restart it right now, /proc gives you a path to the ghost: /proc/41327/fd/4 is a live handle on the deleted file, and truncating it frees the blocks without touching the process.

$ sudo truncate -s 0 /proc/41327/fd/4
$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p1  197G  102G   86G  55% /

The process keeps its descriptor and keeps writing; you have simply cut the file's length to zero underneath it. (One wrinkle worth knowing: if the program opened the log without O_APPEND, its next write lands at its old offset and the file becomes sparse — harmless for disk space, occasionally surprising in the output. Truncating an O_APPEND log is clean.) Restarting the service also works, of course — it is just the bluntest version of "close the descriptor," and it costs you whatever a restart costs.

What is never on the list: deleting more files. This is the moment in the incident where the panicked move is to rm other big things you can see, and it is worth being precise about why that is wrong twice over. Practically: if you rm another file that some process has open, you have not freed its space either — you have minted a second ghost and made the df-versus-du gap bigger. And operationally: files deleted in a panic at 2 am have a way of turning out to be the database's write-ahead log. The original sin in this incident was somebody deleting an open file instead of rotating it; repeating the sin faster is not a fix. Truncate, signal, or rotate — deletion is for files nothing holds open, identified calmly.

The usual suspects

The ghost explained 84 GB, but the named files on this box still hold 102, and the same handful of culprits show up on almost every machine. When the du walk lands you in one of these, here is what to check and what is safe to reclaim.

Logs that never rotate. Our box has 22 GB in /var/log, and the tree walk will point at the heavy files directly. Two sub-cases. Application logs that grew because nobody wrote a logrotate stanza for them: compress or truncate the big ones now, add the stanza after the incident. And the systemd journal, which manages its own retention and answers for itself:

$ journalctl --disk-usage
Archived and active journals take up 3.8G in the file system.
$ sudo journalctl --vacuum-size=500M
Vacuuming done, freed 3.3G of archived journals from /var/log/journal/…

Vacuuming archived journals is one of the safest reclaims available — it deletes only rotated journal segments, oldest first, through the tool that owns them. What the journal is, how its retention settings work, and how to read it properly is the territory of journalctl & dmesg.

Docker, the 61 GB tenant. Container hosts accumulate disk in four separate pools, and docker system df itemises them:

$ docker system df
TYPE            TOTAL   ACTIVE   SIZE      RECLAIMABLE
Images          38      6        41.2GB    33.9GB (82%)
Containers      9       4        2.1GB     1.8GB (85%)
Local Volumes   31      5        14.7GB    11.2GB (76%)
Build Cache     214     0        8.4GB     8.4GB

Old image layers from every deploy ever, stopped containers nobody removed, build cache, and volumes orphaned by deleted containers. docker image prune -a and docker builder prune reclaim the first and last with little drama; read before running docker volume prune, because volumes are where the data lives, and an "unused" volume may be unused only because its container is temporarily stopped. One more docker-shaped trap: container stdout logs default to unbounded json files under /var/lib/docker/containers/, and a chatty container can write a 30 GB log file that no logrotate config knows about. Set max-size in the logging options and the problem stays solved.

Package debris and old kernels. /var/cache/apt holds every package archive ever downloaded until you say otherwise — sudo apt-get clean is free money, often a gigabyte or two. Old kernel versions accumulate in /boot and as installed packages; apt autoremove --purge clears the ones the system no longer needs. /boot is small and ignorable right up until it is a separate 100 MB partition, at which point three old kernels fill it completely and upgrades start failing with a confusingly local version of this whole page.

Core dumps. A crashing process can leave a dump the size of its address space — multi-gigabyte files written in seconds. Look for core.* files in application working directories, check coredumpctl list on systemd boxes (dumps live under /var/lib/systemd/coredump), and check /var/crash. A service in a crash loop with core dumps enabled is one of the few things that can fill a disk in minutes rather than months.

/tmp and /var/tmp. Crashed jobs leave their working sets behind: extracted archives, sort spill files, half-written exports. Anything old in /tmp is usually fair game, but check owners and timestamps before sweeping — and if a tmpfile is huge and recent, the job that made it may still be running and holding it open, which puts you back in the previous section. A final net for everything the categories miss: one sweep for individual big files, sorted oldest-first so the long-forgotten ones surface:

$ sudo find / -xdev -type f -size +1G -printf '%s\t%TY-%Tm-%Td\t%p\n' 2>/dev/null | sort -n | tail -5
1438092871  2024-11-02  /home/deploy/heapdump-prod.hprof
2147748332  2025-03-19  /opt/app/exports/full-2025-03-19.sql
6442450944  2025-08-30  /var/tmp/dataset-staging.tar

Sparse files and the missing five percent

Two footnotes to the investigation, both of which eventually bite everyone who does this work. The first: not every big-looking file is big. Filesystems support sparse files, where blocks that were never written simply are not allocated — the file has a logical size (what ls -l reports) and a physical size (what du reports, in actual blocks), and the two can differ by orders of magnitude:

$ ls -lh /var/log/lastlog
-rw-rw-r-- 1 root utmp 1.2G Jun  8 09:14 /var/log/lastlog
$ du -h /var/log/lastlog
16K     /var/log/lastlog

lastlog is indexed by UID, so one login from a high-UID account extends the file's logical size out to gigabytes while allocating a few blocks. VM disk images and pre-sized database files behave the same way. The practical rules: trust du over ls when you are accounting for disk, do not "fix" a sparse file you do not recognise, and be careful copying them — a copy tool that does not understand holes (cp mostly does; some backup tools do not) writes the zeros for real, and the copy genuinely consumes the full logical size on the destination.

The second footnote answers the arithmetic from the very first df: 186 used plus 1.9 available is not 197. ext filesystems reserve a slice of blocks — 5% by default — that only root can allocate:

$ sudo tune2fs -l /dev/nvme0n1p1 | grep -i 'block count'
Block count:              51642880
Reserved block count:     2582144

df's Size, Used, and Avail never add up on ext4 because Avail excludes the root reserve. Unprivileged writes start failing while df still shows headroom.

The reserve exists so the machine stays operable when users fill it: root can still log in, the system's own daemons can still write what they must, and the filesystem keeps a little slack for its allocator to avoid pathological fragmentation. The consequences for your incident reading: unprivileged processes start getting ENOSPC at roughly 95% real occupancy, Use% is computed against the space available to non-root and can read 100% with gigabytes technically free, and a root-owned writer (a log daemon, say) can keep growing a file after every user process has started failing. On a data-only volume that root never writes to, tune2fs -m 1 trims the reserve and hands back several gigabytes; on the root filesystem, leave it alone — it is the slack that lets you fix the next one of these incidents while logged in.

Closing the incident

On our box the ledger now reconciles: 84 GB came back when the ghost log was truncated, docker pruning and journal vacuuming reclaimed another 40, and df sits at 31% with the arithmetic understood. Before you close the page, two pieces of discipline separate an incident that is over from one that is merely paused.

Write the notes while the terminal scrollback still exists. Five lines is enough, but they need to be the right five: the df line as found and as left; the culprit, by path and by mechanism ("deleted-but-open /var/log/app/server.log, 84 GB, held by java PID 41327 since the May logrotate change"); exactly what you deleted or truncated, with sizes; the fix applied and the signal or command used; and the follow-up that prevents recurrence, with an owner. The mechanism line matters most. "Disk was full, cleaned up logs" tells the next responder nothing; "logrotate was deleting a file the app never reopens" tells them the config to fix and the trap to avoid. Paste the actual lsof +L1 output — thirty seconds now, and the next 2 am responder gets your whole investigation for free.

Then make the prevention real. Most full-disk pages are scheduled months in advance, and the schedule is visible if anything is looking. Alert at 80%, not 95 — at 80% on a slow-growth disk you have weeks to act during business hours; at 95% you have a pager and adrenaline. Better, alert on the trend: "full in under four days at the current growth rate" catches both the slow leak and the fast one. Audit logrotate after any incident it caused: every config that rotates a live application's log must either signal the app to reopen (postrotate) or use copytruncate — which copies then empties the live file, at the cost of possibly losing the lines written between the two steps. And give every growing dataset a retention job the day it is created: uploads, exports, build artifacts, anything with a timestamp in its filename. Data with no deletion policy has a deletion policy; it is called this incident.

The fast version. Four commands, in order, cover nearly every full disk: df -h / to size the problem, df -i / to rule inodes in or out, sudo du -xh --max-depth=1 / | sort -h repeated downward to find named bulk (never forget the -x), and sudo lsof -nP +L1 when du's total falls short of df's. Everything else on this page is what to do with their answers.

Why is the disk full?

One number, three suspects

Fork one: out of bytes, or out of inodes?

Walking the tree

When du and df disagree

The usual suspects

Sparse files and the missing five percent

Closing the incident

Further reading

All sixteen pages