watch, time & tmux
Three small tools, one theme: staying in control of a terminal over time. watch
turns any command into a live dashboard so you can see a queue drain instead of re-running
the same line forty times. time answers "is this slow because it computes, or
because it waits" in ten seconds, before you reach for a profiler. And tmux
keeps a six-hour migration alive when the VPN drops at hour four. None of them is glamorous.
All of them are the difference between a calm incident and a frantic one. This page covers
the flags worth knowing, the SIGHUP mechanics underneath, and ends with a drill you can run
anywhere.
Why these three travel together
Most pages in this codex cover one tool. This one covers three, because they solve three faces of the same problem: a terminal is a single moment in time, and real operational work is not. You run a command and get one snapshot; the system keeps moving. You start a script and it is slow; the prompt gives you no hint why. You start a migration over SSH and your connection is now a load-bearing part of the migration, which is a terrible place for a Wi-Fi link to be.
watch fixes the first: it re-runs a command on an interval and repaints the
screen, turning a one-shot snapshot into a dashboard. time fixes the second: it
wraps a command and reports where the wall-clock seconds went, which sorts "slow" into two
very different problems before you spend an hour on the wrong one. tmux fixes
the third: it moves your shell sessions into a server that does not care whether you are
connected, so the work and the connection stop sharing a fate. Together they are the
survival kit for any long session on a remote box, and unlike most survival kits, you will
use this one weekly.
watch: any command becomes a dashboard
The shape is simple: watch -n1 -d 'command' runs the command every second,
clears the screen, prints the fresh output, and highlights what changed since the previous
run. -n sets the interval in seconds (the default is 2), and -d
turns on the change highlighting. That second flag is the underrated one. Without it you are
staring at a wall of repainting text trying to spot movement; with it the moving parts light
up and the static parts fade into background. Watching a 40-row table for the one number
that ticks is exactly the kind of task humans are bad at and -d is built for.
$ watch -n1 -d 'wc -l /var/spool/outbox/queue' Every 1.0s: wc -l /var/spool/outbox/queue worker-3: Sat Jun 6 21:14:02 2026 18204 /var/spool/outbox/queue ^^^^^ the changed digits are highlighted each refresh — if they stop changing, the queue stopped draining, and you saw it the second it happened
The pattern generalises to anything that prints state. A queue draining:
watch -n1 -d 'wc -l queue'. A file growing during a long export:
watch -n1 -d 'ls -lh /backups/dump.sql'. Connections settling after you pull a
host from a load balancer: watch -n1 -d 'ss -t state established | wc -l' and
you can literally watch the count walk down to zero before you take the box offline.
Replication catching up, a directory filling with processed files, the line count of an
error log — if a command can print it, watch can animate it. The skill is not
the tool; it is the habit of asking "what single command prints the number I care about"
and then putting watch -d in front of it.
Three smaller flags earn a mention. -t drops the header line when you want the
whole screen for output. -e stops on the first non-zero exit, which makes
watch -e a crude alarm: it freezes the moment your health-check command starts
failing. And -g exits as soon as the output changes, which turns watch
into a blocking wait: watch -g 'ls done.flag' 2>/dev/null sits quietly until
a file appears, then returns control to your script or your attention.
watch ss -t | wc -l means "run watch ss -t, and pipe
watch's screen output into wc" — your shell parses the pipe before
watch ever sees it, and you get a meaningless number and no display. What you meant is
watch 'ss -t | wc -l': quote the whole pipeline so watch receives it as one
string and hands it to sh -c intact. Same story for >,
&&, and $VAR (single quotes defer expansion to each
refresh, double quotes expand once, before watch starts — both are occasionally what you
want, but only one of them is what you meant).time: the three numbers, read properly
Prefix any command with time and you get three numbers back. Most people read
the first and ignore the other two, which throws away the diagnosis. Here is what each one
counts. real is wall-clock time: the seconds that passed in the world
between start and exit. user is CPU time spent executing the process's own
code in user space: your loops, your parsing, your compression. sys is CPU
time the kernel spent working on the process's behalf: reads, writes, memory mapping,
network calls — the cost of crossing into the kernel and doing privileged work there.
The numbers mean little alone and a great deal in combination. When user dominates, the command spent its life computing in your code: a CPU-bound workload, and the fix lives in the algorithm or the data volume. When sys is high relative to user, the process spent its life asking the kernel to do things: millions of tiny reads, a stat call per file in a huge tree, a chatty syscall pattern. That is the signature that says run strace next and see which call it is making ten thousand times. And when real far exceeds user + sys, the process was not running at all for most of its life. It was waiting: on disk, on the network, on a lock, on a database at the other end of a connection. No CPU profiler will find that time, because it was not spent on a CPU. That third pattern is arguably the most diagnostic single line in Linux: one cheap command, and you know whether you are hunting a computation or a wait.
Two contrasting runs make the patterns concrete. First a compression job, the classic CPU-bound case:
$ time gzip -k big.log real 0m8.219s user 0m7.984s <- almost all of real. the CPU never stopped chewing. sys 0m0.211s <- a little kernel work to read and write the file # user ≈ real: CPU-bound. a faster algorithm or fewer bytes is the only fix.
Now a report fetched from a slow internal API. Same wall-clock ballpark, completely different story:
$ time curl -s https://api.internal/report -o report.json real 0m9.412s user 0m0.054s <- fifty milliseconds of actual work sys 0m0.082s <- and barely any kernel time either # real ≫ user + sys: 9.3 of 9.4 seconds were spent waiting on the network. # nothing on this machine is slow. the problem lives on the other end.
Nine seconds either way, and the two commands need opposite responses. The gzip run wants a
look at what is being compressed and whether it must be; the curl run wants a look at the
API server, not at this box at all. Ten seconds of time just saved you from
profiling the wrong machine.
One wrinkle worth knowing on multi-core machines: user and sys count CPU time summed across
all cores, so a parallel job can report more CPU time than wall time.
make -j8 finishing in 30 seconds of real with 3 minutes of user is not a
measurement error; it is eight cores each contributing their share. The waiting diagnosis
still works the same way — it is real exceeding the sum that signals a wait, and a
sum exceeding real that signals parallelism.
There are also two different times on your system, and they answer different
questions. The word time at a bash or zsh prompt is a shell keyword, not a
program: it prints the three numbers and nothing else, and because it is built into the
shell's grammar it can time an entire pipeline. The binary at /usr/bin/time is
a separate program with separate talents, and its -v flag is the one that earns
its keep:
$ /usr/bin/time -v ./import.py 2>&1 | grep -E 'Maximum resident|page faults|wall clock' Elapsed (wall clock) time (h:mm:ss or m:ss): 1:42.18 Maximum resident set size (kbytes): 6291184 <- the script peaked at ~6 GB Major (requiring I/O) page faults: 48121 <- memory came back from disk — Minor (reclaiming a frame) page faults: 902214 the box was swapping or cold
Maximum resident set size is the headline: peak memory for the whole run, captured for free,
no instrumentation. When someone asks "how much RAM does the import need," this is how you
answer with a number instead of a shrug. Major page faults are the supporting witness — each
one is a memory access that had to wait for the disk, and tens of thousands of them mean the
process's working set did not fit, which quietly converts a CPU problem into an I/O problem.
For live, whole-system views of the same pressure,
top & htop are the companion tools; for
a single command after the fact, /usr/bin/time -v is hard to beat.
tmux: sessions that survive you
The mental model first, because it explains everything else. When you run tmux,
you start (or talk to) a server: a background process that owns your shell sessions
from then on. Your terminal becomes a client — a window onto sessions that live in
the server. Close the window, lose the SSH connection, put the laptop to sleep: the client
dies, the server does not notice, and everything running inside it keeps running. Attaching
later is just pointing a new window at the same session. The work and the connection no
longer share a fate, which is the entire point.
tmux has a few hundred commands. Five of them carry the daily load:
| Command | What it does | When |
|---|---|---|
tmux new -s migration | Start a session with a name you will recognise later | Before anything long-running on a remote box |
C-b d | Detach: leave the session running, return to your plain shell | End of your shift; the work continues without you |
tmux attach -t migration | Reattach to a running session by name | Next morning, next coffee, after the VPN recovers |
C-b % and C-b " | Split the window into panes, side by side or stacked | Command in one pane, watch dashboard in the other |
tmux ls | List sessions on this machine's server | "What did I leave running here?" — ask it on every box you SSH into |
C-b is the prefix key: press Ctrl-b, release, then press the command key. It
is the doorbell that tells tmux "this keystroke is for you, not for the shell inside." The
incident workflow that justifies learning all this takes one paragraph. You SSH into the
box. First command: tmux new -s migration. Inside it you
start the schema migration that will take six hours, split a pane, and put
watch -n5 -d 'psql -c "select count(*) from new_table"' next to it so progress
is visible at a glance. At hour two you detach with C-b d and go home. On the
train, the VPN drops; nothing happens, because nothing of yours depended on it. Next morning
you run tmux attach -t migration and the session is exactly as you left it —
same panes, same scrollback, migration at 80%. The alternative timeline, where the migration
ran in a bare SSH session, ends at hour two with a dead connection, a half-applied
migration, and a very careful audit of which statements committed.
$ tmux new -s migration # inside: start the long job, split panes, arrange the view # … hours pass … C-b d to detach … $ tmux ls migration: 2 windows (created Sat Jun 6 14:02:11 2026) $ tmux attach -t migration # and you are back, mid-scroll, as if you never left
One more trick that pays for itself the first time you use it: shared sessions. Two people
SSH into the same box as the same user and both run tmux attach -t migration.
They now see the same panes and the same cursor, live. For pair-debugging a production
issue, this beats screen-sharing over a video call by a comfortable margin — both people
can type, the latency is whatever SSH's latency is, and the session itself is the shared
artifact. It is also the dignified way to hand an incident over at shift change: the next
person attaches to your session and inherits your exact view of the problem, scrollback
included.
The lightweight alternative: nohup, &, disown
Not every long job needs a whole session. If you only need a command to survive your
departure — no reattaching, no interaction, just "keep running and write a log" — the
old tools are lighter. nohup ./backfill.sh & starts the job in the
background with SIGHUP ignored, so the disconnect that would normally kill it gets shrugged
off; output lands in nohup.out unless you redirect it. If the job is
already running and you only now realise you need to leave, disown
rescues it after the fact: suspend nothing, just run disown -h %1 and the shell
stops forwarding SIGHUP to that job when it exits. The decision rule is simple. Will you
ever want to look at this job's screen again, type into it, or show it to a colleague?
tmux. Is it fire-and-forget with a log file? nohup is enough, and one fewer
moving part. Either way the mechanism being defeated is the same signal, which is the next
section.
Three production scenarios
Watching a deploy converge
A rolling deploy is out and you want to see it land, not poll it by hand. Two panes. In the
first, watch -n2 -d 'kubectl get pods -n payments' — the -d
highlighting makes each pod's status flicker as it walks from
ContainerCreating to Running, and a pod stuck in
CrashLoopBackOff stops flickering, which your eye catches immediately. In the
second, on the node being drained, watch -n1 -d 'ss -t state established | wc -l'
counts live connections walking down as the load balancer stops sending traffic. When the
number reaches the long-lived stragglers and stays flat, you know what is left and can
decide whether to wait them out or cut them off. Neither pane is clever. Both replace a
human re-running commands with a dashboard that cost ten seconds to build.
CPU or waiting, before the profiler
A nightly report script used to take 4 minutes and now takes 19, and the first instinct is
to attach a profiler. Resist it for ten seconds: time ./report.sh first. If it
comes back real 19m, user 18m, fine — it really is computing, and a CPU profiler
will show you where. But if it comes back real 19m, user 0m40s, sys 0m12s, a
CPU profiler will show you a program that is nearly idle, because 18 of those 19 minutes
were spent waiting — on a database, a filesystem, an API, a lock. The follow-up tools differ
too: high sys says strace to see the syscall
storm, big waits say look at what it talks to. The triage costs one run of the script with
five extra keystrokes, and it routinely saves the hour you would have spent profiling the
wrong layer.
The six-hour migration and the sleeping laptop
The full kit at once. A data migration must copy 21 million rows tonight, from your laptop,
over a VPN, onto a database server you reach through a bastion. Bare SSH would make the
whole chain — laptop lid, Wi-Fi, VPN, bastion — a chain of single points of failure for a
six-hour job. Instead: SSH to the server, tmux new -s migration, start the
copy, split a pane for watch -n5 -d on the row count, detach, close the laptop.
Every link in the chain can now fail without consequence, because the job runs in the tmux
server on the database host, attached to nothing you carry. Reattach from home, from the
office, from your phone if the night goes badly. When it finishes, the session holds the
final output and the full scrollback of everything the migration printed, waiting for you
to read it — which beats reconstructing what happened from logs after a connection drop
took the evidence with it.
Underneath: the controlling terminal and SIGHUP
Why does a dropped connection kill your processes in the first place? Nothing about a running program inherently depends on your SSH session. The link is a kernel-level arrangement called the controlling terminal. When sshd accepts your connection, it allocates a pseudo-terminal (a pty) and starts your shell as a session leader attached to it. Every process you then start belongs to that session, and the foreground ones are wired to that terminal for input, output, and — this is the load-bearing part — signals.
When the connection dies, sshd closes its side of the pty. The kernel sees the terminal hang up — the name is literal, inherited from modems — and sends SIGHUP to the session leader, your shell. The shell, before it dies, forwards SIGHUP to its jobs. The default action for SIGHUP is termination, so your migration dies not because anything went wrong with it but because it was wired to a terminal that no longer exists. The full signal taxonomy lives in kill & signals, and the session/process-group machinery is part of the deeper anatomy in processes.
Every survival trick on this page is a different way of breaking one link in that chain.
nohup tells the process to ignore SIGHUP when it arrives. disown
tells the shell not to forward it. And tmux removes the dependency entirely: the server
detaches from your terminal at startup (the setsid() call, if you want the
name) and becomes its own session leader with its own ptys — one per pane, with the server
on the master side. Your shell-inside-tmux has a controlling terminal, but it is one the
tmux server owns, not the one your SSH connection owns. When your connection hangs up, the
SIGHUP goes to the tmux client, which dies, as clients are allowed to do. The
server, the shells, and the migration never receive anything. They were never wired to your
terminal in the first place.
Pitfalls
watch intervals that hammer something. watch -n1 on a local
command is free. The same interval wrapped around curl against a rate-limited
API, an expensive database query, or a kubectl call against a busy control
plane is a tiny denial-of-service you run against yourself — and it scales with the number
of engineers who paste the same line during the same incident. Match the interval to the
cost: -n1 for local state, -n5 or -n10 for anything
that crosses the network or grinds a database, and ask whether the thing you are polling
has a cheaper read (a metrics endpoint, a local count) before you poll the expensive one.
The two times drift apart. Because time is a shell keyword,
time -v ./job does not do what it looks like: the keyword takes no
-v, and depending on your shell you get an error or a surprise. You have to
name the binary — /usr/bin/time -v or command time -v — to get the
verbose report. It cuts the other way too: the keyword can time a whole pipeline
(time grep x log | sort | uniq -c) where the binary times only the first
command. And the flags differ by system: -v is GNU; on macOS and the BSDs the
equivalent detail flag is -l. Scripts that assume one or the other break
quietly when they travel.
tmux scrollback is not your terminal's scrollback. The first week of tmux
includes the moment you scroll your terminal up and see the wrong, garbled history. Output
inside tmux lives in tmux's own buffer per pane, and you read it with copy mode:
C-b [, arrows or PageUp to move, q to leave. Your terminal
emulator's scrollbar shows whatever leaked out around tmux, which is noise. Learn
C-b [ early and the confusion never starts.
tmux inside tmux. SSH from inside a tmux pane to a server and start tmux
there, and you now have two of them stacked, both listening for C-b. The outer
one always wins, so the inner one seems to ignore you. The escape hatch: press
C-b twice — the outer tmux swallows the first and passes a literal
C-b inward, so C-b C-b d detaches the inner session.
Better, notice the double status bar at the bottom of the screen before you start typing
commands at the wrong layer; that stacked pair of green bars is the tell.
nohup is not a supervisor. Surviving SIGHUP is all nohup does.
If the job crashes, runs out of disk, or needs a restart, nothing notices and nothing
restarts it. For a one-off backfill that is fine; for anything that should stay
running across reboots and failures, the right tool is a real service manager, which is
where the next page in this series picks up.
A drill you can run right now
Everything below is safe on any machine: it watches a clock, times two harmless commands, and creates one tmux session you delete at the end. Ten minutes, and all three tools move from "read about once" to "have actually done."
Step 1 — see -d earn its keep. Run watch -n1 -d date. The
seconds digits light up every refresh; once a minute the minute digits join them; the rest
of the line stays quiet. That highlight is exactly what you will rely on when the output is
a 40-row table instead of one line — your eye goes where the change is. Press
q or Ctrl-C to leave, then try watch -n1 -d 'ls -l /tmp | tail -5'
and touch a file in /tmp from another terminal to see a real change flash.
Step 2 — two times, two stories. Time a pure wait and a pure computation and read the difference:
$ time sleep 2 real 0m2.004s user 0m0.001s <- two seconds passed; almost none were spent computing sys 0m0.002s $ head -c 200M /dev/urandom > /tmp/drill.bin && time gzip -k /tmp/drill.bin real 0m4.918s user 0m4.711s <- nearly all of real: the CPU did this, end to end sys 0m0.198s $ rm /tmp/drill.bin /tmp/drill.bin.gz
sleep is the curl pattern in miniature — real full, user empty, a process that
waited for a living. gzip on random bytes (which barely compress, so it works
hard) is the opposite shape. If you have GNU time installed, run the gzip line again under
/usr/bin/time -v and find the maximum resident set size: you just measured a
program's peak memory with zero setup.
Step 3 — create, detach, kill the connection, reattach. The full survival loop, on your own machine:
$ tmux new -s drill # inside the session: $ watch -n1 -d date # leave it running, then press C-b d to detach # back in your plain shell — now close this terminal window entirely. open a new one. $ tmux ls drill: 1 windows (created Sun Jun 8 10:14:02 2026) $ tmux attach -t drill # the clock is still ticking. it never stopped. # C-b % to split a pane, C-b arrow to move between them, then clean up: $ tmux kill-session -t drill
The moment that matters is the middle one: you closed the terminal — the same event as a
dropped SSH connection, as far as the kernel is concerned — and the session shrugged. The
watch command never stopped repainting; you simply were not there to see it. That shrug is
what you are buying every time you type tmux new -s before a long job, and
having watched it happen once on a throwaway clock, you will trust it at 2am with a
migration.
watch -n1 -d 'cmd' to make
any command a dashboard. time cmd and compare real against user + sys to learn
whether it computes or waits. tmux new -s name before anything long on a
remote box, C-b d to leave, tmux attach -t name to come back.Further reading
- time(1) — the GNU time manual page
— the full list of what
-vreports, including the resource fields this page only sampled. - watch(1)
— short by man-page standards; the
-g,-e, and precision-interval notes are worth the five minutes. - The tmux wiki — Getting started — the upstream introduction, with the client/server model drawn properly and the default key table in full.
- credentials(7) — sessions, process groups, and controlling terminals from the kernel's point of view: the machinery behind the SIGHUP cascade.