df, du & ncdu
The pager says the disk is at 95% and climbing. The first command you type decides how the
next twenty minutes go. df asks the filesystem how many blocks are allocated;
du walks the directory tree and adds up what it can reach by name. Those are
different questions, and the gap between their answers is where every interesting disk
incident lives. This page covers the five invocations worth knowing, reads the output column
by column, walks three production scenarios, explains why the numbers diverge, and ends with
a drill that makes a directory's contents vanish and reappear without deleting a thing.
One question, two tools
"Where did the disk space go?" sounds like one question, but Linux gives you two ways to ask
it, and they are not interchangeable. df — disk free — asks the
filesystem. Every mounted filesystem keeps running counters of how many blocks it
has, how many are in use, and how many inodes remain. df reads those counters
and prints them. It is the accountant's view: fast, exact, and completely indifferent to
what the blocks are being used for.
du — disk usage — asks the directory tree. It starts at a path you
give it, walks every file and subdirectory it can reach by name, asks each one how many
blocks it occupies, and sums as it goes. It is the surveyor's view: slower, granular, and
limited to what has a name. A file that exists on disk but has no path the walk can reach
simply does not appear in du's total.
Most days the two agree to within a rounding error and you never notice the distinction. The
days you get paged are the days they disagree, and the disagreement is never a bug — it is
the two tools faithfully reporting two different truths. Blocks held by a deleted-but-open
file are real to df and invisible to du. Files buried under a
mount point are counted by df and unreachable by du. A filesystem
can refuse writes with plenty of blocks free, because the gauge that ran out was inodes,
which only df -i shows. Reading the gap is the skill this page teaches.
The third tool, ncdu, is du with a user interface: it does the
same tree walk once, holds the result in memory, and lets you drill into the biggest
directory with the arrow keys instead of re-running du at each level. For "a
directory got huge and I need to find which one," it turns a ten-command session into one.
It inherits every one of du's blind spots, though, because it sees the world
the same way: by name.
The five invocations that matter
Both man pages are long; the working set is small. These five cover nearly every disk investigation, and the order below is roughly the order you run them during one.
| Invocation | What it reports | When you reach for it |
|---|---|---|
df -h | Per-filesystem block usage, human-readable sizes | First. Which filesystem is full, and how full, in two seconds |
df -i | Per-filesystem inode usage | "No space left on device" while df -h shows free space |
du -xh --max-depth=1 /path | Size of each immediate subdirectory, staying on one filesystem | Narrowing down which directory holds the bulk, level by level |
ncdu -x /path | The same walk, interactive: sort, drill in, drill out | Triage. Finding the one runaway directory in minutes |
du -sh --apparent-size f | Logical file length rather than allocated blocks | Sparse files, compressed filesystems, "why is this 10G file using 80K" |
Two of those flags deserve a sentence each before you ever type them. The -x on
du and ncdu means "stay on one filesystem": without it, a walk
that starts at / happily descends into /proc,
/sys, every NFS mount, and every container overlay it finds, and the total it
prints answers a question you did not ask. You are almost always investigating one full
filesystem, so you almost always want -x. Make it reflex the way
-nP is reflex for lsof.
The --max-depth=1 keeps du from printing a line for every
directory in the tree. du still visits everything — it has to, the totals roll
up from the leaves — but it only prints the first level, which is what you read. The classic
loop is: run it at /, see that /var is the heavy one, run it again
at /var, and repeat until you hit the directory that explains the number. Pipe
through sort -rh so the biggest entry is the first line you see. Or skip the
loop entirely and let ncdu do the descending for you.
sudo du -xh --max-depth=1 / | sort -rh | head — one filesystem, one level,
biggest first. During an incident this single line usually points at the guilty directory
before df's output has scrolled off your screen.Reading the output
Here is df -h on a root filesystem that is about to ruin someone's evening.
Five columns, and two of them hide arithmetic worth understanding.
$ df -h / Filesystem Size Used Avail Use% Mounted on /dev/nvme0n1p2 916G 823G 47G 95% /
Filesystem is the block device or remote export backing the mount. Size is the filesystem's total capacity. Used is allocated blocks. Avail is what an ordinary process can still write. Mounted on is the path where this filesystem is attached. Now do the arithmetic the column headers invite you to do: 823 used plus 47 available is 870, but Size says 916. Forty-six gigabytes are missing from the table. That is not rounding. On ext4, by default, 5% of the blocks are reserved for root: ordinary processes cannot touch them, so they appear in neither Used nor Avail. The reservation exists so that when users fill the disk, root can still log in, daemons can still write logs, and the system stays repairable rather than wedging solid.
The reservation also explains a number that startles people: Use% is computed against the
space available to non-root processes, not the raw size, so it reaches
100% while Avail still shows a few gigabytes — those are root's, not yours
— and on some setups you may see it quoted above 100% when root has dipped into the reserve.
When your service gets ENOSPC at "95%," this is usually why: 95% of the
user-writable space plus the untouchable 5% is, for your process, completely full. The
reservation is tunable per filesystem (tune2fs -m), and on huge data volumes
that hold no system state, operators often shrink it — 5% of a 10 TB disk is half a
terabyte of insurance you may not need.
Now the surveyor's view of the same disk, biggest first, staying on one filesystem:
$ sudo du -xh --max-depth=1 / | sort -rh | head 823G / 512G /var 198G /home 67G /usr 21G /opt 9.8G /root 2.1G /srv 640M /etc 24K /tmp 16K /lost+found
The first line is the grand total for the walk; every line after it is one immediate child.
Two readings happen at once here. First, the drill-down read: /var holds 512 of
823 gigabytes, so the next command is the same one with /var as the argument,
and you keep descending until the number stops being interesting. Second, the reconciliation
read: compare du's 823G total against df's 823G Used. Here they
match, which tells you everything on this filesystem has a name and the investigation is a
plain "which directory grew" hunt. When they don't match — when df
reports tens of gigabytes more than du can find — stop descending, because the
space you are hunting has no name, and no amount of du will surface it. The
scenarios below cover both branches.
ncdu presents the same walk as a screen you can move around in:
ncdu 1.19 ~ Use the arrow keys to move, press ? for help --- / (one filesystem) ------------------------------------------- 512.3 GiB [##########] /var 198.0 GiB [### ] /home 67.4 GiB [# ] /usr 21.2 GiB [ ] /opt 9.8 GiB [ ] /root 2.1 GiB [ ] /srv Total disk usage: 823.1 GiB Items: 4,182,664
Enter descends into a directory, the left arrow backs out, n and
s toggle sorting by name or size, and g switches the bar graph
between percentages and blocks. There is also a d key that deletes the selected
file or directory, which is exactly as dangerous during a 3am incident as it sounds; start
ncdu with -r for read-only mode when you are on a box that
matters, and the delete key stops existing. The Items count at the bottom is quietly useful
too — four million items is a normal root filesystem, while forty million tiny files is a
hint that your problem might be the inode gauge, which brings us to the scenarios.
Three production scenarios
df says 98%, du finds nothing
The volume is at 98% and rising. You run the du one-liner and the total comes
to barely half of what df reports. You run it again with sudo in
case permissions hid something. Same answer. Forty gigabytes are allocated on this
filesystem and not one of them has a name.
This is the deleted-but-open file, and it is the most common df-versus-du gap in production.
A process opened a log file, something deleted the file — logrotate misconfigured to
rm instead of signalling, or a human tidying up — and the process kept writing.
Deleting a file removes its name. The inode and its blocks stay allocated until the
last open descriptor closes. du walks names, so the file is gone from its
world; df reads the filesystem's block counters, so the space is still very
much there, and still growing.
$ sudo lsof -nP +L1 | grep -v ' /dev' COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME java 41327 deploy 4w REG 259,2 42949672960 0 524291 /var/log/app/server.log (deleted)
There are your forty gigabytes: link count zero, still open for writing. The recovery — and
the trick of truncating the file through /proc/PID/fd without restarting
anything — is covered in detail on the
lsof page, and the full decision tree for a
filling disk, of which this is one branch, lives in
why is the disk full? The
lesson for this page is the diagnostic shape: when df and
du disagree by a lot, the disagreement itself is the finding. Do not keep
running du in more directories. Switch tools.
"No space left on device" with 60% free
A service starts failing every write with ENOSPC. You check df -h
and the filesystem is at 40%. Restart the service; it fails again immediately. The error
says the device is full and the block gauge says it is not even close. The gauge you have
not checked is the other one:
$ df -h /data && df -i /data Filesystem Size Used Avail Use% Mounted on /dev/nvme1n1 1.8T 720G 1.1T 40% /data Filesystem Inodes IUsed IFree IUse% Mounted on /dev/nvme1n1 117211136 117211136 0 100% /data
Every file needs an inode — the on-disk record that holds its metadata and points at its
blocks — and on ext4 the inode table is sized once, at mkfs time. Run out of
inodes and the filesystem cannot create a single new file, no matter how many free blocks
remain. The usual culprit is millions of tiny files: a session store writing one file per
login, a mail queue that stopped draining, a build cache exploding into node_modules
confetti, a cron job that creates a temp file per run and never deletes any. Each file might
be fifty bytes, which is why the block gauge barely moved while the inode gauge filled.
Finding the culprit is a counting problem rather than a sizing problem, so du's
byte totals will not point at it. Count files per directory instead — something like
find /data -xdev -type f | cut -d/ -f1-3 | sort | uniq -c | sort -rn | head —
or watch ncdu's item counts rather than its sizes (press c to
display child counts per directory). The fix is to delete or archive the file swarm; the
prevention is to notice that df -i exists and put it on the same dashboard as
df -h. The two gauges fill independently, and either one at 100% takes the
filesystem down for writes.
The runaway directory, found in minutes
The boring case is the common one: nothing is hidden, nothing is exhausted, a directory just
grew. A debug flag left on after an incident has the application logging at trace level. A
build cache nobody set a size limit on. A database's write-ahead logs piling up because the
thing that consumes them stopped. Here df and du agree, and the
job is pure descent: find the heaviest directory, then the heaviest directory inside it,
until you hit something with a name you recognise.
You can do that with the du | sort -rh loop, and on a machine where you cannot
install anything, you should — it is four or five invocations to get anywhere on a deep
tree. ncdu -x on the suspect filesystem does the same walk once and then makes
the descent free: enter, enter, enter, and twenty seconds after the scan finishes you are
looking at /var/lib/app/cache/render/ holding 400 GB of files last touched
eight months ago. The walk itself costs the same as du — it is the same work,
stat everything — but you only pay it once instead of once per level. On a multi-terabyte
filesystem with tens of millions of files, that walk can take real minutes either way, which
is an argument for running it in tmux and an argument for the export trick:
ncdu -x -o scan.json /data saves the scan to a file, and
ncdu -f scan.json reopens it instantly, on this machine or your laptop.
What you do once you find the 400 GB is judgement, not tooling — confirm nothing has it
open, archive rather than delete if there is any doubt, and fix whatever let it grow
unbounded. The triage pattern is the part to keep: df -h to pick the
filesystem, one reconciliation glance at du's total versus df's
Used, then ncdu to descend. Three tools, one minute each, and the question
"where did the disk space go" is answered before the next page fires.
Why the numbers diverge
Everything above falls out of two different system calls. df calls
statfs() on each mount point. The filesystem answers from counters it keeps in
its superblock: total blocks, free blocks, free inodes. No tree is walked; the cost is the
same whether the filesystem holds ten files or ten million, which is why df
returns instantly on any disk. The list of mounts it iterates comes from
/proc/self/mountinfo — the same place the mount command reads, and
one more entry in the long list of things exposed through
/proc.
du calls stat() on every file the walk reaches and sums the
st_blocks field — the number of 512-byte blocks actually allocated to the file.
It deduplicates hard links as it goes, so a file with three names on the same filesystem is
counted once. Its total is built name by name, and that is the whole story of its blind
spots: anything a name-walk cannot reach does not exist for du. Deleted-but-open
files are the famous case. The sneakier one is the mount shadow.
When you mount a filesystem on a directory, the directory's previous contents do not move
and are not deleted — they become unreachable. Imagine a service that writes to
/srv/data before its data volume is mounted: maybe the mount failed on one
boot, maybe an init-order race let the service start first. Eighty gigabytes land on the
root filesystem, in /srv/data. Then the volume mounts, the new filesystem
covers the directory, and those eighty gigabytes vanish from view. du walking
through /srv/data now descends into the mounted filesystem and never
sees the files underneath. df on the root filesystem still counts every block.
You get the classic gap, lsof +L1 comes back empty because nothing is deleted
and nothing is open, and the space sits in plain sight behind the mount. The way to look is
to bind-mount the parent somewhere else — mount --bind / /mnt/peek gives you a
view of the root filesystem with no child mounts attached, and
du -sh /mnt/peek/srv/data reads the shadowed bytes directly. The drill at the
end builds this exact situation in /tmp so you can watch it happen.
One more source of divergence lives inside individual files. A file has two sizes: its
apparent size (the length, st_size — how many bytes you would get
reading it end to end) and its disk usage (allocated blocks). A sparse file pushes
these wildly apart: write one byte at offset ten gigabytes and the filesystem stores one
block plus a note that the rest reads as zeros. ls -l shows ten gigabytes;
du shows a few kilobytes; both are correct. Disk images, core dumps, and
database preallocations are sparse all the time. du --apparent-size switches to
summing lengths, which is the number you want when asking "how big would this be if I
tar'd it up" and the wrong number when asking "what is filling this disk." Filesystems with
transparent compression bend the same two numbers the other way — more on that in the
pitfalls. The block-and-inode machinery underneath all of this is covered in
file systems, and
you can watch inodes, blocks, and directory entries get allocated by hand in the
filesystem simulator.
Pitfalls
Running du without -x and trusting the total. A du rooted at
/ without -x crosses into every mounted filesystem it meets, and
its grand total describes the union of all of them. You then compare that number against
df for one filesystem and conclude something impossible is happening. Worse,
the walk descends into places that are not disks at all and stats things that misbehave when
stat'd. If a number is going to drive a decision, it should describe one filesystem, and
-x is what makes that true.
Pointing du or ncdu at an NFS tree. The tree walk issues a metadata request
per file, and over NFS each one is a network round trip. On a big export that is millions of
round trips: slow for you and a genuine load spike for the file server, which everyone else
is using too. If the server has a hung peer or a flaky path, the walk can block indefinitely
on a single stat with no timeout you control. Measure usage of a network filesystem from the
server side where the walk is local, or with the server's own accounting (a quota report
answers in milliseconds what a client-side du answers in an hour).
Forgetting the root reservation cuts both ways. Use% at 100 with gigabytes
in Avail confuses people in one direction; the other direction bites harder. Your monitoring
fires at 90%, someone "fixes" the alert by noting the disk has 5% nobody can use and
reclaiming it with tune2fs -m 0, and the next time the disk fills, it fills
completely — root included. Now the cleanup tooling cannot write a temp file, the
package manager cannot run, and a full disk has been upgraded to an unrecoverable shell.
Keep at least a small reservation on any filesystem the OS depends on.
Expecting honest numbers from btrfs and ZFS. Both report through
statfs() because they must, but the question barely fits. Snapshots share
extents, so "how much would deleting this free" has no single answer; compression means
bytes written and blocks allocated diverge per file; on ZFS, pool space is shared by every
dataset, so the same free blocks show up in several mounts' Avail at once, and on btrfs,
metadata is allocated in chunks that can exhaust separately from data — ENOSPC with
df showing free space, this time with inodes innocent too. Use the native
tools: btrfs filesystem usage and zfs list -o space answer the
questions df is mis-asking.
Reading du's number as "the size of the data." du reports
allocated blocks, which tracks what the disk loses, not what the bytes weigh. Sparse files
deflate it, compression deflates it, reflinked copies on modern filesystems deflate it, and
small files inflate it (a 100-byte file occupies a full block). When you are estimating a
transfer, a backup, or an S3 bill, use --apparent-size; when you are freeing a
disk, use the default. Mixing them up produces estimates that are wrong by integer factors,
in whichever direction is most embarrassing.
A drill you can run right now
Everything below is safe on any Linux machine: it reads state, creates a small scratch
directory in /tmp, and mounts nothing over anything that matters. The
bind-mount step needs sudo; everything else does not. Fifteen minutes, and the block gauge,
the inode gauge, and the mount shadow stop being trivia.
Step 1 — both gauges, every filesystem. Run df -h and read it
properly for once: find your root filesystem, check whether Used plus Avail equals Size, and
compute the gap — that is your root reservation. Then run df -i and look at
IUse% for the same filesystem. Most people have never once looked at this column on a
machine they operate; note which of your filesystems is closest to inode exhaustion, because
it is probably not the one closest to block exhaustion.
Step 2 — survey your own home. Run
du -xh --max-depth=1 ~ | sort -rh | head. Descend once or twice into the
biggest entry by hand, then do the same survey with ncdu -rx ~ (read-only, one
filesystem) and feel the difference: the walk happens once and the descent is free. While
you are in there, press c to show item counts and find your own file swarms —
caches and package directories with six-digit counts are normal and worth knowing about.
Step 3 — build a mount shadow and catch it. Create a file, hide it behind a
bind mount, and watch du lose it while the blocks stay allocated:
$ mkdir -p /tmp/shadow/under /tmp/shadow/cover $ dd if=/dev/zero of=/tmp/shadow/under/hidden.bin bs=1M count=64 status=none $ du -sh /tmp/shadow 65M /tmp/shadow $ df -h /tmp | tail -1 tmpfs 16G 66M 16G 1% /tmp $ sudo mount --bind /tmp/shadow/cover /tmp/shadow/under $ ls /tmp/shadow/under $ du -sh /tmp/shadow 8.0K /tmp/shadow $ df -h /tmp | tail -1 tmpfs 16G 66M 16G 1% /tmp $ sudo umount /tmp/shadow/under $ ls /tmp/shadow/under hidden.bin $ rm -r /tmp/shadow
Walk through what just happened. Before the mount, du and df
agreed: 64 MB of file, 64-ish MB of blocks. The bind mount placed an empty
directory over under, and du's answer collapsed to nearly nothing
— the walk now descends into the covering directory and hidden.bin has no
reachable name. But df still reports the 66 MB, because the blocks never
stopped being allocated. Nothing was deleted, nothing holds the file open, so the
ghost-file hunt from the first scenario would come back empty; this gap has a different
cause and a different cure. Unmount, and the file is back, untouched. That is the entire
mount-shadow story, performed on 64 harmless megabytes in /tmp instead of 80
mystery gigabytes on a production root volume.
Step 4 — see a sparse file disagree with itself. No sudo needed:
truncate -s 10G /tmp/sparse.img creates a ten-gigabyte file in zero time. Now
ask its size both ways: du -h /tmp/sparse.img says 0, and
du -h --apparent-size /tmp/sparse.img says 10G. ls -lh agrees with
the second; df agrees with the first. Both are telling the truth about
different things, which by this point in the page should feel familiar. Delete it with
rm /tmp/sparse.img and you are done.
df -h to pick the filesystem,
df -i to rule out inodes, sudo du -xh --max-depth=1 MOUNT | sort -rh | head
to reconcile and descend — and when df and du disagree, stop descending and ask what has no
name: a deleted-but-open file, or a mount shadow.Further reading
- df(1) and du(1) — short as man pages go; the du page's notes on hard links and --apparent-size are the parts worth a careful read.
- statfs(2) — the system call behind df, including the f_bavail versus f_bfree distinction that is the root reservation in syscall form.
- ncdu — project page — documentation for the export/import workflow and the read-only flag, both worth knowing before you need them.
- Semicolony — Why is the disk full? — the full incident decision tree that this page's scenarios slot into.