top & htop
You SSH into a box because something is slow, and the first thirty seconds decide whether
the next hour is diagnosis or guesswork. top is the tool for those thirty
seconds. It answers one question: what is this machine doing right now, and who is
responsible? Most engineers run it, glance at the big numbers, and close it without
reading any of them. This page fixes that. The five keys worth knowing, a line-by-line
read of the header everyone skips, three production incidents, what the numbers come from
in /proc, and a drill that ends with you deliberately pinning a core and
then letting it go.
The question it answers
Every performance investigation starts from the same place: a machine that is misbehaving
and a person who does not yet know why. Before you can ask the good questions — which
resource is saturated, which process is at fault, whether the problem is CPU, memory,
disk, or none of those — you need a live picture of the whole box. That is the job of
top. It samples the kernel's accounting a few times a second, sorts every
process by how much CPU it is burning, and paints the result on your terminal in a loop.
It is the first command worth typing on an unfamiliar machine, which is why it is the
first page in this series.
htop is the same idea with better manners: colour, scrolling, mouse support,
per-core meters drawn as bars, and a tree view that shows which process spawned which.
Day to day, htop is the nicer place to live. But the two tools read the same
kernel counters and answer the same question, and top ships with effectively
every Linux system while htop often does not. Learn to read top
cold and htop becomes a comfort, not a dependency. The reverse leaves you
stranded the first time you land on a minimal container image at 3am.
It also helps to know what these tools are not. They are not profilers; they tell you
that a process is burning CPU, not which function inside it is responsible — for
that, the trail continues in
what's eating my CPU? They are
not historians; they show the state of the machine right now, and a spike that ended two
minutes ago has already left the screen. And they are not the whole picture: memory
pressure and disk traffic get one summary line each, and when those lines look suspicious
the follow-up tools are
free & vmstat. What
top gives you is the triage view, the same role the first checks play in the
USE method: utilisation and
saturation for the whole box, with names attached.
The five keys that matter
Both tools are interactive, and the keyboard is where the value is. The default view is a CPU leaderboard; one keystroke turns it into a memory leaderboard, a per-core view, or a process tree. These are the keys that earn their place in your fingers.
| Key / flag | What it does | When you reach for it |
|---|---|---|
top -o %MEM | Starts top already sorted by resident memory | "What is eating the RAM" — the second most common question after CPU |
P / M / T | Re-sorts the live view: by CPU, by memory, by cumulative CPU time (all shift+key) | Flipping between leaderboards mid-investigation; T finds the long-running grinder that is never on top of the instantaneous view |
1 | Expands the single %Cpu(s) summary into one line per core | Whenever the box "looks fine" but one thing is slow — averages hide pinned cores |
c | Toggles full command lines in the COMMAND column | Ten identical java or python rows; the arguments tell them apart |
e / E | Cycles memory units (KiB, MiB, GiB…) in the task list / the summary header | Reading 12782340 as 12.2 GiB without doing arithmetic under stress |
htop: F5 | Tree view — processes nested under their parents | Working out who spawned the thing that is misbehaving, and what dies with it if you kill the parent |
Two habits worth forming early. First, sorting answers most questions before you ever
read a number: sorted by CPU, the culprit of a CPU problem is on line one; press
M and the culprit of a memory problem is on line one. Second, in
htop the tree view changes what a kill means. A worker that keeps coming
back from the dead usually has a supervisor respawning it, and F5 shows you
the supervisor sitting one level up. The mechanics of actually stopping things, and the
difference between asking and insisting, live in
kill & signals.
top -bn1 runs one iteration in
batch mode and exits — that is how you put top output into a script, a log, or a ticket.
Mind the caveat in the pitfalls section though: the CPU percentages in the very first
iteration are averages since boot, not "right now."Reading the header
Here is a realistic header from a 4-core web box that is having a bad afternoon. Most people's eyes slide straight past these five lines to the process list below. The header is the better half of the tool.
$ top top - 14:32:07 up 41 days, 3:12, 2 users, load average: 6.41, 5.87, 4.92 Tasks: 213 total, 2 running, 210 sleeping, 0 stopped, 1 zombie %Cpu(s): 12.3 us, 4.1 sy, 0.0 ni, 71.2 id, 10.9 wa, 0.0 hi, 0.4 si, 1.1 st MiB Mem : 15842.3 total, 412.7 free, 9216.4 used, 6213.2 buff/cache MiB Swap: 2048.0 total, 2046.1 free, 1.9 used. 5904.8 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 41327 deploy 20 0 12.4g 2.1g 24512 S 186.7 13.6 412:11.07 java 812 postgres 20 0 328940 121408 98244 D 0.7 0.7 88:14.92 postgres 1290 root 20 0 142212 18044 11236 S 2.0 0.1 3:02.55 nginx
The load average, actually explained
Three numbers: averages over the last 1, 5, and 15 minutes. The folk definition — "how many processes wanted CPU" — is wrong on Linux in a way that matters. Linux counts two kinds of task into the load: tasks that are runnable (running on a CPU or queued waiting for one) and tasks in uninterruptible sleep, the D state, which almost always means blocked on disk or another piece of slow I/O. So the Linux load average is a demand number for the whole machine, not for the CPU alone. A load of 6 can mean six tasks fighting over the processors, or one task computing while five sit parked in D state waiting on a sick disk. Same number, opposite diagnoses, and the %Cpu(s) line below it is how you tell them apart.
The numbers only mean something relative to the core count. One runnable task per core
is a machine working at capacity with no queue; more than that and tasks are waiting.
On this 4-core box, a 1-minute load of 6.41 says that, on average, two and a half
tasks' worth of demand had to wait at any instant. The same 6.41 on a 32-core box is a
quiet Tuesday. Check the core count with nproc before you let any load
number alarm you. The three windows give you a slope as well as a level: 1-minute above
15-minute means the problem is arriving; 1-minute below 15-minute means it is leaving
and you may be looking at the aftermath rather than the cause.
The %Cpu(s) line
Eight numbers that say where every CPU cycle went during the last refresh interval.
us is user time: your programs running their own code. sy is
system time: the kernel working on behalf of those programs — syscalls, network stack,
filesystem work. A high sy relative to us means processes are
asking the kernel to do something over and over, which is its own clue.
ni is user time from processes running at a lowered priority, and
id is genuine idle. hi and si are hardware and
software interrupt handling, normally near zero and interesting when they are not.
The two that decide incidents are wa and st. wa
is iowait, and it is widely misread: it does not mean the CPU is busy doing I/O. It
means the CPU is idle, and at least one task on it is blocked waiting for I/O
to finish. It is idle time with an asterisk — the processor has nothing to do because
the disk has not answered yet. That is why "high load, idle CPU" is not a paradox; the
waiting tasks count toward load while contributing nothing but wa to this
line. st is steal time, and it only exists on virtual machines: the slice
of time your VM had a task ready to run but the hypervisor gave the physical CPU to
someone else. Inside the VM there is nothing to fix; the contention is on the host,
between you and tenants you cannot see.
VIRT, RES, SHR — the decoder nobody teaches
Three memory columns per process, and the biggest one is the least meaningful.
VIRT is virtual size: the total address space the process has mapped. It
counts heap the allocator reserved but never touched, files mapped into memory whether
or not any page was read, anything swapped out, and shared libraries over again for
every process that maps them. It is a measure of promises, not of RAM. A JVM or a Go
service showing tens of gigabytes of VIRT on a 16 GB machine is normal and fine.
RES is the resident set: the pages actually sitting in physical RAM right
now. This is the column that means what people think VIRT means, and it is the number
behind %MEM. When you are hunting a memory hog, sort by %MEM
and read RES. SHR is the slice of RES that is shared with other processes —
mostly shared libraries and explicitly shared memory segments. It matters when you are
tempted to multiply: ten workers each showing 200 MB RES with 150 MB SHR are
not using 2 GB, because most of that SHR is the same physical pages counted ten
times. A process's private footprint is closer to RES minus SHR than to RES,
and nowhere near VIRT.
One more row in the sample output deserves a word: the S column, process
state. R is running or runnable, S is ordinary sleep,
Z is a zombie (exited, waiting for its parent to collect the exit status —
one or two are cosmetic, hundreds mean a buggy parent), and D is the
uninterruptible sleep from the load average discussion. The postgres row
above is in D state. Hold that thought.
Three production scenarios
Load is 14 but the CPU is idle
An alert fires on load average. You log in and the header makes no sense at first read:
load 14 on a 4-core box, but id says the CPUs are 78% idle. The reconciling
number is sitting right there: wa at 19. The machine is not short of CPU;
it is full of tasks in D state, blocked on storage, each one counting toward load while
the processors twiddle. The disk is the bottleneck, the load average is just the queue
forming behind it.
$ top -bn1 | head -3 top - 02:14:31 up 12 days, 22:40, 1 user, load average: 14.22, 11.04, 6.31 Tasks: 188 total, 1 running, 175 sleeping, 0 stopped, 0 zombie %Cpu(s): 1.8 us, 1.1 sy, 0.0 ni, 78.0 id, 19.1 wa, 0.0 hi, 0.0 si, 0.0 st $ ps -eo state,pid,comm | awk '$1=="D"' D 812 postgres D 815 postgres D 819 postgres D 2204 kworker/u8:3
The ps one-liner lists the D-state tasks by name, and a cluster of them
from the same service points the finger. From here the investigation belongs to the
disk: is it saturated, dying, or an NFS mount that stopped answering? The tools for
that next step, including watching the b column count blocked tasks over
time, are on the free & vmstat
page. The lesson that survives the incident: a load alert is not a CPU alert. Read
wa before you assume.
The cloud VM that lost a third of its CPU
A service on a cloud VM gets slower over a week with no deploy and no traffic change. CPU graphs from inside the box show usage well below 100%, yet latency keeps creeping. The header tells the story in one number most dashboards never plot:
$ top -bn1 | grep Cpu %Cpu(s): 41.2 us, 8.3 sy, 0.0 ni, 18.1 id, 0.6 wa, 0.0 hi, 0.9 si, 30.9 st
Thirty-one percent steal. Nearly a third of the time this VM had work ready, the
hypervisor handed the physical core to another tenant. Two usual causes: a noisy
neighbour on an oversubscribed host, or a burstable instance type that has spent its
CPU credits and is being throttled by design — check which class of instance you are on
before blaming anyone. Either way, no amount of tuning inside the guest gets those
cycles back. The fixes are operational: resize to a non-burstable type, redeploy so the
scheduler places you on a different host, or pay for dedicated capacity. A few percent
of transient st is life on shared hardware; sustained double digits is a
capacity problem wearing a performance costume, and the cheapest thing you can do is
stop profiling your own code and look at this line first.
One core pinned while the box "looks fine"
An 8-core machine shows 13% total CPU, every dashboard is green, and yet one workload
is mysteriously slow. The summary line is an average over all cores, and an average is
exactly the right tool for hiding one bad core among seven idle ones. Press
1:
$ top (then press 1) %Cpu0 : 2.0 us, 1.0 sy, 0.0 ni, 96.7 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st %Cpu1 : 1.7 us, 0.7 sy, 0.0 ni, 97.3 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 99.0 us, 1.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 3.0 us, 1.3 sy, 0.0 ni, 95.4 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st ...cores 4-7 similar...
Core 2 is saturated and has been for as long as the workload has been slow. The usual
suspects: a single-threaded program that simply cannot go faster than one core, a hot
thread inside a multi-threaded service (a garbage collector, a lone event loop, a
compression job), or interrupt handling pinned to one core by IRQ affinity so that all
network processing lands in one place. Identifying which thread it is and what it is
executing is the subject of
what's eating my CPU?, and
the reason one task stays glued to one core rather than spreading is the scheduler's
affinity logic, covered in
scheduling. The
habit to take away is cheap: any time a machine "looks fine" but is not, press
1 before you trust the average.
Where the numbers come from
Neither tool has privileged access to anything. Everything on the screen is read from
/proc, the kernel's window onto its own state, and you can read the same
files with cat. The load average is the file /proc/loadavg:
three damped moving averages the kernel maintains as part of its regular timekeeping,
decaying old samples exponentially so the 1-minute number reacts fast and the 15-minute
number smooths the noise. What gets counted into them is the part worth remembering —
tasks runnable plus tasks in uninterruptible sleep, which is the design decision that
makes Linux load a whole-machine demand signal rather than a CPU queue length.
$ cat /proc/loadavg 6.41 5.87 4.92 2/213 41330 the three averages, then runnable/total tasks, then the last PID handed out $ head -2 /proc/stat cpu 84321907 21340 19833502 933012890 8231201 0 412390 901230 0 0 cpu0 21080476 5335 4958375 233253222 2057800 0 103097 225307 0 0 user nice system idle iowait hi si steal ...
The %Cpu(s) line comes from /proc/stat, which holds one counter per CPU
per category — user, system, idle, iowait, steal, and the rest — each ticking up
forever since boot. The counters are cumulative, so a single read is meaningless;
top reads the file, sleeps for the refresh interval, reads it again, and
the percentages you see are the deltas. Per-process numbers work the same way from
/proc/PID/stat (CPU time consumed) and /proc/PID/status
(memory: VIRT, RES, and SHR under their kernel names VmSize, VmRSS, and RssShmem plus
file-backed pages). top and htop are, to a first
approximation, loops that read these files, subtract, divide by the interval, and sort.
Knowing this buys you two things. During a bad incident on a stripped-down box with no
tools installed, cat /proc/loadavg and two reads of /proc/stat
get you the header by hand. And the numbers stop being oracle pronouncements: a
percentage in top is a sampled difference between two counters, subject to
sampling error and aliasing like any other measurement. The full tour of the filesystem
behind all of this is on the /proc page, and
if you want to watch run queues form and tasks migrate between cores instead of reading
about it, the scheduler simulator lets
you generate the load and see the queueing happen.
Pitfalls
%CPU above 100 is not a bug. In top's default mode (Irix mode), a
process's %CPU is measured against a single core, so a process running four busy
threads on four cores shows 400%. The java row in the header example reads
186.7% for exactly this reason. Pressing shift+I toggles Solaris mode,
which divides by the core count so 100% means the whole machine. Neither is wrong; you
just need to know which one you are reading before you quote a number in an incident
channel. htop uses the per-core convention too.
The VIRT panic. Someone sorts by VIRT, sees a 40 GB process on a
16 GB machine, and declares a leak. VIRT is address space, not memory, and modern
runtimes reserve it wholesale: the JVM maps its maximum heap up front, Go reserves a
large arena, anything using mmap on big files counts them all. The kernel
hands out address space optimistically and only commits physical pages on first touch.
Memory pressure is real when RES is large and growing, when swap usage climbs, or when
available memory shrinks toward zero — and those last two live on the
free & vmstat page. VIRT alone
has never been an emergency.
htop is not always there. Minimal server images, containers, and
rescue environments routinely ship without it, and during an incident is the wrong
moment to discover that your fingers only know F5 and mouse clicks. Every
skill on this page was written against plain top first for that reason.
Practice the vanilla keys until the fancy tool is a luxury rather than a requirement.
The first batch sample lies. top -bn1 prints percentages
computed from the since-boot counters, because there is no previous sample to diff
against. On a box that has been up for 41 days, that first frame is a 41-day average —
useless for "right now." Use top -bn2 and read the second frame, or accept
the interactive tool's second refresh for the same reason.
Watching the watcher. top itself costs a little CPU, more
with fast refresh intervals on busy machines, and it will happily appear in its own
leaderboard. If a screenshot of top shows top near the top, that is not the smoking gun
it appears to be.
A drill you can run right now
Everything below is safe on any Linux machine, including a shared one. The only thing
it creates is one deliberately busy process that you will kill at the end, and the only
thing that process does is write the letter y into a black hole.
Step 1 — establish the baseline. Run nproc and remember
the number, then open top and read the header against this page: load
average relative to core count, then the %Cpu(s) line left to right, then the Mem line.
Find the largest process by CPU, press M to re-sort by memory, press
P to flip back. Press c and watch the COMMAND column grow
arguments. Press 1 and count the cores you saw with nproc.
Step 2 — pin a core on purpose. In a second terminal, start the noisiest harmless process Unix offers, then watch it land in top:
$ yes > /dev/null & [1] 50612 $ top (press 1, then P) %Cpu3 : 99.7 us, 0.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 50612 nilesh 20 0 8124 980 876 R 99.7 0.0 0:41.32 yes
yes prints y forever; redirected to /dev/null,
it becomes a pure CPU burner. With the per-core view open, watch one core sit at 100%
user time while the others idle — the pinned-core scenario, manufactured. Notice the
process state is R and its TIME+ climbs in real time. If you wait a minute
or two with cat /proc/loadavg, you can watch the 1-minute load drift up
toward 1.0 while the 15-minute number barely moves: the damped averages reacting at
their different speeds. Sometimes the scheduler migrates the burner between cores
mid-watch; that wandering is load balancing happening in front of you, and the
scheduler simulator shows the same
decision-making slowed down.
Step 3 — clean up, two ways. Kill it from the shell that started it
with kill %1, or do it from inside the tool: in top press
k, give the PID, and accept the default signal; in htop
select the row and press F9. Run jobs to confirm nothing is
left. What those signals actually are, and when the default is the wrong one, is the
subject of kill & signals.
Step 4 — if htop is installed, take the tour. Open htop,
press F5, and find your shell: terminal emulator or sshd at the top, your
shell under it, htop itself as a child. Run the yes trick
again and watch it appear in the tree under your shell, then kill it from the tree.
Parentage is the thing top makes you reconstruct by hand and htop just shows you.
nproc and check wa
before blaming the CPU. Press 1 when the box looks fine but is not. And
judge memory by RES, never by VIRT.Further reading
- top(1) — the manual page — long, but the SUMMARY Display and FIELDS sections explain every header line and column for the version you actually have.
- Brendan Gregg — Linux Load Averages: Solving the Mystery — the definitive archaeology of why Linux counts uninterruptible sleep into load, traced to the original 1993 patch.
- proc(5) — the documentation for /proc/loadavg, /proc/stat, and the per-process files top is built on.
- Semicolony — The USE method — the checklist that turns "stare at top" into a repeatable first pass over every resource.