25 / 28

Linux / 25

watch, time & tmux

Three small tools, one theme: staying in control of a terminal over time. watch turns any command into a live dashboard so you can see a queue drain instead of re-running the same line forty times. time answers "is this slow because it computes, or because it waits" in ten seconds, before you reach for a profiler. And tmux keeps a six-hour migration alive when the VPN drops at hour four. None of them is glamorous. All of them are the difference between a calm incident and a frantic one. This page covers the flags worth knowing, the SIGHUP mechanics underneath, and ends with a drill you can run anywhere.

Why these three travel together

Most pages in this codex cover one tool. This one covers three, because they solve three faces of the same problem: a terminal is a single moment in time, and real operational work is not. You run a command and get one snapshot; the system keeps moving. You start a script and it is slow; the prompt gives you no hint why. You start a migration over SSH and your connection is now a load-bearing part of the migration, which is a terrible place for a Wi-Fi link to be.

watch fixes the first: it re-runs a command on an interval and repaints the screen, turning a one-shot snapshot into a dashboard. time fixes the second: it wraps a command and reports where the wall-clock seconds went, which sorts "slow" into two very different problems before you spend an hour on the wrong one. tmux fixes the third: it moves your shell sessions into a server that does not care whether you are connected, so the work and the connection stop sharing a fate. Together they are the survival kit for any long session on a remote box, and unlike most survival kits, you will use this one weekly.

watch: any command becomes a dashboard

The shape is simple: watch -n1 -d 'command' runs the command every second, clears the screen, prints the fresh output, and highlights what changed since the previous run. -n sets the interval in seconds (the default is 2), and -d turns on the change highlighting. That second flag is the underrated one. Without it you are staring at a wall of repainting text trying to spot movement; with it the moving parts light up and the static parts fade into background. Watching a 40-row table for the one number that ticks is exactly the kind of task humans are bad at and -d is built for.

$ watch -n1 -d 'wc -l /var/spool/outbox/queue'
Every 1.0s: wc -l /var/spool/outbox/queue          worker-3: Sat Jun  6 21:14:02 2026

18204 /var/spool/outbox/queue
       ^^^^^ the changed digits are highlighted each refresh — if they stop
       changing, the queue stopped draining, and you saw it the second it happened

The pattern generalises to anything that prints state. A queue draining: watch -n1 -d 'wc -l queue'. A file growing during a long export: watch -n1 -d 'ls -lh /backups/dump.sql'. Connections settling after you pull a host from a load balancer: watch -n1 -d 'ss -t state established | wc -l' and you can literally watch the count walk down to zero before you take the box offline. Replication catching up, a directory filling with processed files, the line count of an error log — if a command can print it, watch can animate it. The skill is not the tool; it is the habit of asking "what single command prints the number I care about" and then putting watch -d in front of it.

Three smaller flags earn a mention. -t drops the header line when you want the whole screen for output. -e stops on the first non-zero exit, which makes watch -e a crude alarm: it freezes the moment your health-check command starts failing. And -g exits as soon as the output changes, which turns watch into a blocking wait: watch -g 'ls done.flag' 2>/dev/null sits quietly until a file appears, then returns control to your script or your attention.

The quoting pitfall. Pipes bind to the wrong command if you forget quotes. watch ss -t | wc -l means "run watch ss -t, and pipe watch's screen output into wc" — your shell parses the pipe before watch ever sees it, and you get a meaningless number and no display. What you meant is watch 'ss -t | wc -l': quote the whole pipeline so watch receives it as one string and hands it to sh -c intact. Same story for >, &&, and $VAR (single quotes defer expansion to each refresh, double quotes expand once, before watch starts — both are occasionally what you want, but only one of them is what you meant).

time: the three numbers, read properly

Prefix any command with time and you get three numbers back. Most people read the first and ignore the other two, which throws away the diagnosis. Here is what each one counts. real is wall-clock time: the seconds that passed in the world between start and exit. user is CPU time spent executing the process's own code in user space: your loops, your parsing, your compression. sys is CPU time the kernel spent working on the process's behalf: reads, writes, memory mapping, network calls — the cost of crossing into the kernel and doing privileged work there.

The numbers mean little alone and a great deal in combination. When user dominates, the command spent its life computing in your code: a CPU-bound workload, and the fix lives in the algorithm or the data volume. When sys is high relative to user, the process spent its life asking the kernel to do things: millions of tiny reads, a stat call per file in a huge tree, a chatty syscall pattern. That is the signature that says run strace next and see which call it is making ten thousand times. And when real far exceeds user + sys, the process was not running at all for most of its life. It was waiting: on disk, on the network, on a lock, on a database at the other end of a connection. No CPU profiler will find that time, because it was not spent on a CPU. That third pattern is arguably the most diagnostic single line in Linux: one cheap command, and you know whether you are hunting a computation or a wait.

Two contrasting runs make the patterns concrete. First a compression job, the classic CPU-bound case:

$ time gzip -k big.log

real    0m8.219s
user    0m7.984s   <- almost all of real. the CPU never stopped chewing.
sys     0m0.211s   <- a little kernel work to read and write the file

# user ≈ real: CPU-bound. a faster algorithm or fewer bytes is the only fix.

Now a report fetched from a slow internal API. Same wall-clock ballpark, completely different story:

$ time curl -s https://api.internal/report -o report.json

real    0m9.412s
user    0m0.054s   <- fifty milliseconds of actual work
sys     0m0.082s   <- and barely any kernel time either

# real ≫ user + sys: 9.3 of 9.4 seconds were spent waiting on the network.
# nothing on this machine is slow. the problem lives on the other end.

Nine seconds either way, and the two commands need opposite responses. The gzip run wants a look at what is being compressed and whether it must be; the curl run wants a look at the API server, not at this box at all. Ten seconds of time just saved you from profiling the wrong machine.

The same wall-clock second has three different price tags. Read which segment fills the bar before deciding what kind of problem you have.

One wrinkle worth knowing on multi-core machines: user and sys count CPU time summed across all cores, so a parallel job can report more CPU time than wall time. make -j8 finishing in 30 seconds of real with 3 minutes of user is not a measurement error; it is eight cores each contributing their share. The waiting diagnosis still works the same way — it is real exceeding the sum that signals a wait, and a sum exceeding real that signals parallelism.

There are also two different times on your system, and they answer different questions. The word time at a bash or zsh prompt is a shell keyword, not a program: it prints the three numbers and nothing else, and because it is built into the shell's grammar it can time an entire pipeline. The binary at /usr/bin/time is a separate program with separate talents, and its -v flag is the one that earns its keep:

$ /usr/bin/time -v ./import.py 2>&1 | grep -E 'Maximum resident|page faults|wall clock'
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:42.18
        Maximum resident set size (kbytes): 6291184   <- the script peaked at ~6 GB
        Major (requiring I/O) page faults: 48121      <- memory came back from disk —
        Minor (reclaiming a frame) page faults: 902214     the box was swapping or cold

Maximum resident set size is the headline: peak memory for the whole run, captured for free, no instrumentation. When someone asks "how much RAM does the import need," this is how you answer with a number instead of a shrug. Major page faults are the supporting witness — each one is a memory access that had to wait for the disk, and tens of thousands of them mean the process's working set did not fit, which quietly converts a CPU problem into an I/O problem. For live, whole-system views of the same pressure, top & htop are the companion tools; for a single command after the fact, /usr/bin/time -v is hard to beat.

tmux: sessions that survive you

The mental model first, because it explains everything else. When you run tmux, you start (or talk to) a server: a background process that owns your shell sessions from then on. Your terminal becomes a client — a window onto sessions that live in the server. Close the window, lose the SSH connection, put the laptop to sleep: the client dies, the server does not notice, and everything running inside it keeps running. Attaching later is just pointing a new window at the same session. The work and the connection no longer share a fate, which is the entire point.

tmux has a few hundred commands. Five of them carry the daily load:

Command	What it does	When
`tmux new -s migration`	Start a session with a name you will recognise later	Before anything long-running on a remote box
`C-b d`	Detach: leave the session running, return to your plain shell	End of your shift; the work continues without you
`tmux attach -t migration`	Reattach to a running session by name	Next morning, next coffee, after the VPN recovers
`C-b %` and `C-b "`	Split the window into panes, side by side or stacked	Command in one pane, watch dashboard in the other
`tmux ls`	List sessions on this machine's server	"What did I leave running here?" — ask it on every box you SSH into

C-b is the prefix key: press Ctrl-b, release, then press the command key. It is the doorbell that tells tmux "this keystroke is for you, not for the shell inside." The incident workflow that justifies learning all this takes one paragraph. You SSH into the box. First command: tmux new -s migration. Inside it you start the schema migration that will take six hours, split a pane, and put watch -n5 -d 'psql -c "select count(*) from new_table"' next to it so progress is visible at a glance. At hour two you detach with C-b d and go home. On the train, the VPN drops; nothing happens, because nothing of yours depended on it. Next morning you run tmux attach -t migration and the session is exactly as you left it — same panes, same scrollback, migration at 80%. The alternative timeline, where the migration ran in a bare SSH session, ends at hour two with a dead connection, a half-applied migration, and a very careful audit of which statements committed.

$ tmux new -s migration          # inside: start the long job, split panes, arrange the view
# … hours pass … C-b d to detach …
$ tmux ls
migration: 2 windows (created Sat Jun  6 14:02:11 2026)
$ tmux attach -t migration     # and you are back, mid-scroll, as if you never left

One more trick that pays for itself the first time you use it: shared sessions. Two people SSH into the same box as the same user and both run tmux attach -t migration. They now see the same panes and the same cursor, live. For pair-debugging a production issue, this beats screen-sharing over a video call by a comfortable margin — both people can type, the latency is whatever SSH's latency is, and the session itself is the shared artifact. It is also the dignified way to hand an incident over at shift change: the next person attaches to your session and inherits your exact view of the problem, scrollback included.

The lightweight alternative: nohup, &, disown

Not every long job needs a whole session. If you only need a command to survive your departure — no reattaching, no interaction, just "keep running and write a log" — the old tools are lighter. nohup ./backfill.sh & starts the job in the background with SIGHUP ignored, so the disconnect that would normally kill it gets shrugged off; output lands in nohup.out unless you redirect it. If the job is already running and you only now realise you need to leave, disown rescues it after the fact: suspend nothing, just run disown -h %1 and the shell stops forwarding SIGHUP to that job when it exits. The decision rule is simple. Will you ever want to look at this job's screen again, type into it, or show it to a colleague? tmux. Is it fire-and-forget with a log file? nohup is enough, and one fewer moving part. Either way the mechanism being defeated is the same signal, which is the next section.

Three production scenarios

Watching a deploy converge

A rolling deploy is out and you want to see it land, not poll it by hand. Two panes. In the first, watch -n2 -d 'kubectl get pods -n payments' — the -d highlighting makes each pod's status flicker as it walks from ContainerCreating to Running, and a pod stuck in CrashLoopBackOff stops flickering, which your eye catches immediately. In the second, on the node being drained, watch -n1 -d 'ss -t state established | wc -l' counts live connections walking down as the load balancer stops sending traffic. When the number reaches the long-lived stragglers and stays flat, you know what is left and can decide whether to wait them out or cut them off. Neither pane is clever. Both replace a human re-running commands with a dashboard that cost ten seconds to build.

CPU or waiting, before the profiler

A nightly report script used to take 4 minutes and now takes 19, and the first instinct is to attach a profiler. Resist it for ten seconds: time ./report.sh first. If it comes back real 19m, user 18m, fine — it really is computing, and a CPU profiler will show you where. But if it comes back real 19m, user 0m40s, sys 0m12s, a CPU profiler will show you a program that is nearly idle, because 18 of those 19 minutes were spent waiting — on a database, a filesystem, an API, a lock. The follow-up tools differ too: high sys says strace to see the syscall storm, big waits say look at what it talks to. The triage costs one run of the script with five extra keystrokes, and it routinely saves the hour you would have spent profiling the wrong layer.

The six-hour migration and the sleeping laptop

The full kit at once. A data migration must copy 21 million rows tonight, from your laptop, over a VPN, onto a database server you reach through a bastion. Bare SSH would make the whole chain — laptop lid, Wi-Fi, VPN, bastion — a chain of single points of failure for a six-hour job. Instead: SSH to the server, tmux new -s migration, start the copy, split a pane for watch -n5 -d on the row count, detach, close the laptop. Every link in the chain can now fail without consequence, because the job runs in the tmux server on the database host, attached to nothing you carry. Reattach from home, from the office, from your phone if the night goes badly. When it finishes, the session holds the final output and the full scrollback of everything the migration printed, waiting for you to read it — which beats reconstructing what happened from logs after a connection drop took the evidence with it.

Underneath: the controlling terminal and SIGHUP

Why does a dropped connection kill your processes in the first place? Nothing about a running program inherently depends on your SSH session. The link is a kernel-level arrangement called the controlling terminal. When sshd accepts your connection, it allocates a pseudo-terminal (a pty) and starts your shell as a session leader attached to it. Every process you then start belongs to that session, and the foreground ones are wired to that terminal for input, output, and — this is the load-bearing part — signals.

When the connection dies, sshd closes its side of the pty. The kernel sees the terminal hang up — the name is literal, inherited from modems — and sends SIGHUP to the session leader, your shell. The shell, before it dies, forwards SIGHUP to its jobs. The default action for SIGHUP is termination, so your migration dies not because anything went wrong with it but because it was wired to a terminal that no longer exists. The full signal taxonomy lives in kill & signals, and the session/process-group machinery is part of the deeper anatomy in processes.

The cascade follows the controlling terminal. tmux survives because its server detached from yours: it sits in its own session with ptys it owns, so the hangup on your pty has no route to the work.

Every survival trick on this page is a different way of breaking one link in that chain. nohup tells the process to ignore SIGHUP when it arrives. disown tells the shell not to forward it. And tmux removes the dependency entirely: the server detaches from your terminal at startup (the setsid() call, if you want the name) and becomes its own session leader with its own ptys — one per pane, with the server on the master side. Your shell-inside-tmux has a controlling terminal, but it is one the tmux server owns, not the one your SSH connection owns. When your connection hangs up, the SIGHUP goes to the tmux client, which dies, as clients are allowed to do. The server, the shells, and the migration never receive anything. They were never wired to your terminal in the first place.

Pitfalls

watch intervals that hammer something. watch -n1 on a local command is free. The same interval wrapped around curl against a rate-limited API, an expensive database query, or a kubectl call against a busy control plane is a tiny denial-of-service you run against yourself — and it scales with the number of engineers who paste the same line during the same incident. Match the interval to the cost: -n1 for local state, -n5 or -n10 for anything that crosses the network or grinds a database, and ask whether the thing you are polling has a cheaper read (a metrics endpoint, a local count) before you poll the expensive one.

The two times drift apart. Because time is a shell keyword, time -v ./job does not do what it looks like: the keyword takes no -v, and depending on your shell you get an error or a surprise. You have to name the binary — /usr/bin/time -v or command time -v — to get the verbose report. It cuts the other way too: the keyword can time a whole pipeline (time grep x log | sort | uniq -c) where the binary times only the first command. And the flags differ by system: -v is GNU; on macOS and the BSDs the equivalent detail flag is -l. Scripts that assume one or the other break quietly when they travel.

tmux scrollback is not your terminal's scrollback. The first week of tmux includes the moment you scroll your terminal up and see the wrong, garbled history. Output inside tmux lives in tmux's own buffer per pane, and you read it with copy mode: C-b [, arrows or PageUp to move, q to leave. Your terminal emulator's scrollbar shows whatever leaked out around tmux, which is noise. Learn C-b [ early and the confusion never starts.

tmux inside tmux. SSH from inside a tmux pane to a server and start tmux there, and you now have two of them stacked, both listening for C-b. The outer one always wins, so the inner one seems to ignore you. The escape hatch: press C-b twice — the outer tmux swallows the first and passes a literal C-b inward, so C-b C-b d detaches the inner session. Better, notice the double status bar at the bottom of the screen before you start typing commands at the wrong layer; that stacked pair of green bars is the tell.

nohup is not a supervisor. Surviving SIGHUP is all nohup does. If the job crashes, runs out of disk, or needs a restart, nothing notices and nothing restarts it. For a one-off backfill that is fine; for anything that should stay running across reboots and failures, the right tool is a real service manager, which is where the next page in this series picks up.

A drill you can run right now

Everything below is safe on any machine: it watches a clock, times two harmless commands, and creates one tmux session you delete at the end. Ten minutes, and all three tools move from "read about once" to "have actually done."

Step 1 — see -d earn its keep. Run watch -n1 -d date. The seconds digits light up every refresh; once a minute the minute digits join them; the rest of the line stays quiet. That highlight is exactly what you will rely on when the output is a 40-row table instead of one line — your eye goes where the change is. Press q or Ctrl-C to leave, then try watch -n1 -d 'ls -l /tmp | tail -5' and touch a file in /tmp from another terminal to see a real change flash.

Step 2 — two times, two stories. Time a pure wait and a pure computation and read the difference:

$ time sleep 2
real    0m2.004s
user    0m0.001s     <- two seconds passed; almost none were spent computing
sys     0m0.002s

$ head -c 200M /dev/urandom > /tmp/drill.bin && time gzip -k /tmp/drill.bin
real    0m4.918s
user    0m4.711s     <- nearly all of real: the CPU did this, end to end
sys     0m0.198s

$ rm /tmp/drill.bin /tmp/drill.bin.gz

sleep is the curl pattern in miniature — real full, user empty, a process that waited for a living. gzip on random bytes (which barely compress, so it works hard) is the opposite shape. If you have GNU time installed, run the gzip line again under /usr/bin/time -v and find the maximum resident set size: you just measured a program's peak memory with zero setup.

Step 3 — create, detach, kill the connection, reattach. The full survival loop, on your own machine:

$ tmux new -s drill
# inside the session:
$ watch -n1 -d date          # leave it running, then press C-b d to detach
# back in your plain shell — now close this terminal window entirely. open a new one.
$ tmux ls
drill: 1 windows (created Sun Jun  8 10:14:02 2026)
$ tmux attach -t drill       # the clock is still ticking. it never stopped.
# C-b % to split a pane, C-b arrow to move between them, then clean up:
$ tmux kill-session -t drill

The moment that matters is the middle one: you closed the terminal — the same event as a dropped SSH connection, as far as the kernel is concerned — and the session shrugged. The watch command never stopped repainting; you simply were not there to see it. That shrug is what you are buying every time you type tmux new -s before a long job, and having watched it happen once on a throwaway clock, you will trust it at 2am with a migration.

If you remember one line of each. watch -n1 -d 'cmd' to make any command a dashboard. time cmd and compare real against user + sys to learn whether it computes or waits. tmux new -s name before anything long on a remote box, C-b d to leave, tmux attach -t name to come back.

watch, time & tmux

Why these three travel together

watch: any command becomes a dashboard

time: the three numbers, read properly

tmux: sessions that survive you

The lightweight alternative: nohup, &, disown

Three production scenarios

Watching a deploy converge

CPU or waiting, before the profiler

The six-hour migration and the sleeping laptop

Underneath: the controlling terminal and SIGHUP

Pitfalls

A drill you can run right now

Further reading

26 — systemctl