26 / 28

Linux / 26

systemctl

A deploy went out and the service never came back. A daemon has been restarting itself every eleven seconds for a week and nobody noticed. You raised the open-files limit in limits.conf and the service still dies at 1024. All of these end at the same prompt: systemctl, the command that talks to systemd, the process that starts and supervises every service on a modern Linux box. This page covers the five invocations that do the daily work, reads a full status block line by line, walks three production incidents, looks at what a unit actually is on disk, and ends with a drill you can run anywhere without changing a thing.

The question it answers

The question is "how do I manage — and read the true state of — the services on this box?" Note the second half. ps can tell you a process exists. It cannot tell you whether that process is the supervised main process of a service, whether it has crashed and been resurrected forty times since lunch, whether it will come back after a reboot, or which config file it was actually started from. systemctl can, because it asks the one process that knows: systemd, running as PID 1, the ancestor of everything else on the machine and the supervisor of every service on it.

systemd's model fits in sixty seconds. Everything it manages is a unit, named by a suffix that says what kind of thing it is: app.service is a daemon, app.socket is a listening socket that can start the daemon on first connection, app.timer is a scheduled trigger, data.mount is a filesystem, multi-user.target is a grouping point. Each unit is described by a small ini-style text file. Units refer to each other with directives like Wants=, Requires=, and After=, and those references form a dependency graph. A target is a unit with no process of its own; it exists purely as a named place in that graph, a synchronisation point that other units attach themselves to. Booting the machine is nothing more exotic than systemd picking the default target and walking the graph until everything that target pulls in is up.

systemctl is the client side of all this. It does not start processes itself; it sends requests to PID 1 over the system bus and prints what PID 1 reports back. That distinction matters more than it sounds. When systemctl status says a service is running, that is not a guess assembled from a process listing — it is the supervisor's own ledger: the main PID it forked, the control group it placed the processes in, the timestamps of every state change, the result of the last start attempt. Reading that ledger properly is most of the skill, which is why a third of this page is one status block read line by line.

One piece of vocabulary before the commands: a unit has two mostly independent states. The active state says whether it is running right now. The enabled state says whether it is wired to start at boot. A service can be running but disabled, stopped but enabled, and every other combination. Half the confusion people have with systemctl dissolves once those two axes come apart, so we will keep returning to them.

The five invocations that matter

systemctl accepts somewhere north of a hundred verbs. Five invocations cover nearly all the daily work, and each one carries a distinction that people get wrong for years before someone spells it out.

Invocation	What it does	The distinction people miss
`systemctl status app`	The supervisor's ledger: load state, active state, main PID, the process tree, recent log lines	It is ten lines of dense signal that almost everyone skims. Decoded fully below.
`systemctl restart app` `systemctl reload app`	Restart kills and re-launches; reload asks the running process to re-read config	Reload keeps connections alive but only works if the unit defines it; `reload-or-restart` picks for you
`systemctl enable --now app`	Enable wires the unit into boot; start runs it now; `--now` does both	Enable does not start anything. Start does not survive a reboot. The eternal confusion.
`sudo systemctl edit app`	Opens a drop-in override file so you can change settings without touching the vendor unit	The vendor file is not yours to edit; drop-ins merge over it and survive package upgrades
`systemctl list-units --failed` `systemctl cat app`	Every unit in the failed state; the full assembled text of one unit with all its drop-ins	The pair that starts every investigation: what is broken, and what config is it really running with

Restart versus reload is a choice about disruption. restart stops the service and starts it again: systemd sends the main process SIGTERM, waits up to TimeoutStopSec, escalates to SIGKILL if it has to, then runs the start sequence fresh. Every connection the service held is gone. reload instead runs whatever the unit's ExecReload= line says, which for most daemons is a SIGHUP to the main process — nginx re-reads its config and re-opens its logs without dropping a single connection. The signal mechanics behind both, and why TERM-then-KILL is the polite order, are covered in kill & signals. The catch is that reload only exists if the unit defines it, so when you are scripting and do not know the unit intimately, systemctl reload-or-restart app does the right thing: reload if the unit supports it, restart if not.

Enable versus start is the two-axes point from above made operational. start changes the active state: the service runs now, and a reboot forgets it ever did. enable changes the enabled state: it creates a symlink in a target's .wants/ directory so the unit gets pulled in at boot, and changes nothing about the present moment. The classic failure is enabling a freshly installed service, seeing no errors, and filing the ticket as done while the service sits there stopped — or the mirror image, starting a service during an incident and discovering three months later, after a kernel patch and a reboot, that nobody ever enabled it. enable --now exists precisely so you stop choosing.

systemctl edit deserves its own sentence because the alternative is so tempting. The vendor's unit file lives under /usr/lib/systemd/system/, and the package manager owns it: edit it directly and your change is silently overwritten at the next upgrade. systemctl edit app opens an editor on /etc/systemd/system/app.service.d/override.conf, a drop-in that merges over the vendor file setting by setting. You write only the lines you want to change, the vendor file stays pristine, the upgrade keeps working, and systemctl cat app shows the merged result with each fragment's path printed above it so the next engineer can see exactly what was overridden and where. As a bonus, edit runs the daemon-reload step for you, which manual editing does not — a pitfall with its own entry below.

Reading the status block

Here is the full output for an nginx that has been up for six days, with a drop-in applied. This is the ten-or-so lines everyone scrolls past on the way to the log tail, and nearly every line answers a question you would otherwise run another command for.

$ systemctl status nginx.service
● nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled; preset: enabled)
    Drop-In: /etc/systemd/system/nginx.service.d
             └─override.conf
     Active: active (running) since Mon 2026-06-01 04:11:32 UTC; 6 days ago
       Docs: man:nginx(8)
    Process: 1287 ExecStartPre=/usr/sbin/nginx -t -q (code=exited, status=0/SUCCESS)
   Main PID: 1290 (nginx)
      Tasks: 3 (limit: 4582)
     Memory: 18.4M
        CPU: 2min 41.020s
     CGroup: /system.slice/nginx.service
             ├─1290 "nginx: master process /usr/sbin/nginx"
             ├─1291 "nginx: worker process"
             └─1292 "nginx: worker process"

Jun 01 04:11:32 web-1 systemd[1]: Starting A high performance web server...
Jun 01 04:11:32 web-1 systemd[1]: Started A high performance web server and a reverse proxy server.

The anatomy of systemctl status. Each region answers a question you would otherwise need a separate command for.

The dot is the fastest signal on the page: green for active, white for inactive, red for failed. Loaded tells you three things in one line. The path says which copy of the unit file systemd parsed — /usr/lib/... means the vendor's, /etc/... means an admin replaced it wholesale. enabled is the boot axis: this unit will start at boot. And preset: enabled is the distro's default policy for this unit, the answer to "should this be enabled on a fresh install" — when the two disagree, a human made a deliberate choice at some point, which is occasionally exactly the archaeology you need. Drop-In lists every override fragment merged over the vendor file; if behaviour does not match the unit file you are reading, the explanation is almost always in this list.

Active carries the state and, just as usefully, the timestamp. active (running) with "6 days ago" is a healthy daemon. active (running) with "8 seconds ago" on a service nobody touched is a restart loop wearing a green dot. activating means the start sequence has not finished — common for services with a slow ExecStartPre, and also what Type=notify services show while systemd waits for their ready signal. activating (auto-restart) is the brief hold between a crash and the next attempt. failed means the last attempt is over and systemd has given up; the reason is one journalctl away.

Main PID is the process whose exit decides the unit's fate. When it dies, the unit's state changes and any Restart= policy fires. The CGroup tree below it is the part worth slowing down for: it lists every process the unit owns, not just the main one. systemd places each service in its own control group at start, and children stay in that cgroup no matter how they fork, which is how the supervisor can account for them (the Tasks, Memory, and CPU lines are read straight from cgroup accounting) and kill the whole tree cleanly at stop. A worker that double-forked itself into the background twenty minutes ago still shows up here. How cgroups do that bookkeeping, and how the same mechanism turns into CPU and memory limits, is the subject of nice, ionice & cgroups.

Finally, the journal tail: the last few log lines for the unit, newest at the bottom. It is a teaser, not the record. The full view — every line the service and systemd ever wrote about it, filterable by boot and by time — is journalctl -u nginx, and reading it well is its own page: journalctl & dmesg.

Three production scenarios

The service that flaps

Alerts fire, then resolve, then fire again. systemctl status shows a green dot — and "active (running) since … 9s ago". The service is crashing, and Restart=always is hiding the body: systemd waits RestartSec (100 ms by default) after each death and starts it again, so any health check that samples between crashes sees a running process. The status timestamp is the tell, and the journal is the proof:

$ journalctl -u app.service --since "15 min ago" | tail -8
Jun 08 09:14:02 web-1 app[8841]: panic: connection pool exhausted
Jun 08 09:14:02 web-1 systemd[1]: app.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jun 08 09:14:02 web-1 systemd[1]: app.service: Failed with result 'exit-code'.
Jun 08 09:14:02 web-1 systemd[1]: app.service: Scheduled restart job, restart counter is at 47.
Jun 08 09:14:02 web-1 systemd[1]: Started app.service.
Jun 08 09:14:13 web-1 app[8854]: panic: connection pool exhausted
Jun 08 09:14:13 web-1 systemd[1]: app.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jun 08 09:14:13 web-1 systemd[1]: app.service: Scheduled restart job, restart counter is at 48.

The restart counter climbing in lockstep with the same panic is the whole diagnosis. The fix is whatever the panic says it is; Restart= only decides how loudly the failure announces itself. There is a rate limiter on top: by default a unit allowed to start more than five times within ten seconds (StartLimitBurst=5 inside StartLimitIntervalSec=10) trips the limit, lands in failed with start-limit-hit, and stays down until systemctl reset-failed app clears the counter. That limiter confuses people in the other direction too: a service that crash-looped fast enough trips it, and then even a correct fix appears not to work because systemctl start keeps refusing until you reset. If your restarts are spaced out by a longer RestartSec, the default burst window never fills, and the loop can run for days — which is why "since 9 seconds ago" on the status line deserves a reflexive second look every single time.

"I changed limits.conf and nothing happened"

The service dies with "too many open files", so you raise nofile in /etc/security/limits.conf, restart, and it dies again at exactly the same count. Nothing is broken; you edited the wrong layer. limits.conf is applied by PAM during login sessions. A systemd service never logs in — PID 1 starts it directly, and PID 1 sets its limits from the unit's own directives. The fix is a drop-in:

$ sudo systemctl edit app.service
# the editor opens override.conf — you write only this:
[Service]
LimitNOFILE=65536

$ sudo systemctl restart app.service
$ systemctl show app.service -p LimitNOFILE
LimitNOFILE=65536
$ cat /proc/$(systemctl show app -p MainPID --value)/limits | grep files
Max open files            65536                65536                files

Two verification steps, on purpose. systemctl show -p LimitNOFILE confirms what systemd intends to apply; reading /proc/PID/limits confirms what the running process actually got, which catches the case where you forgot to restart after editing. The whole layered story of where limits come from — kernel defaults, PAM, the shell's ulimit, and systemd's directives, and which one wins for which kind of process — is laid out in ulimit & limits.

Works by hand, dies on boot

You can systemctl start app at any time of day and it comes up clean. Reboot the box and it is dead every time. There are two families of cause, and journalctl -b -u app (this boot only, this unit only) tells you which one you have.

Family one: ordering. At boot, systemd starts everything it can in parallel, and your service raced something it needs — the database, name resolution, a mounted volume — and lost. By the time you test by hand, the dependency has long been up, so the bug only exists in the first few seconds after boot. The fix is to declare the ordering: After=postgresql.service makes systemd sequence the start, and the subtlety is that After= alone does not pull the other unit in. It only sequences units that are starting anyway. You want the pair: Wants=postgresql.service plus After=postgresql.service — pull it in, and order behind it. The network variant has its own trap: network.target means roughly "network management has started", not "the box has an address". A service that binds a specific address at boot wants Wants=network-online.target and After=network-online.target instead.

Family two: environment. When you start a service from your shell, it quietly inherits your world: your PATH with /usr/local/bin in it, your HOME, the environment variables your dotfiles export, your current directory. A unit started by PID 1 at boot gets almost none of that — a minimal PATH, no login environment, working directory of / unless WorkingDirectory= says otherwise. A service that shells out to a binary in /usr/local/bin, or reads a config via a relative path, or expects AWS_PROFILE from somebody's bashrc, works by hand and dies on boot. The fix is to make the unit self-contained: absolute paths in Exec*= lines, WorkingDirectory= set explicitly, variables declared with Environment= or EnvironmentFile=. The discipline generalises: a unit file should describe everything the service needs, because the unit file is all it gets.

What is underneath

A unit is a text file, and where the file lives decides who wins. Vendor units installed by packages live in /usr/lib/systemd/system/. Runtime units generated on the fly live in /run/systemd/system/. Admin units live in /etc/systemd/system/, and /etc beats /run beats /usr/lib: a file with the same name higher in that order replaces the lower one entirely. Drop-ins are the gentler mechanism — fragments in app.service.d/*.conf merge over whichever full file won, one setting at a time. This is why systemctl cat app is the only honest way to read a unit's config: it prints the winning file plus every drop-in, each with its path, in the order systemd applied them. Reading the vendor file with a pager shows you what the package author wrote, which may share only a passing resemblance with what is running.

The directives inside those files build the graph, and the graph has two kinds of edges that people persistently blur. Wants= and Requires= are dependency edges: starting this unit pulls that one into the transaction. (Wants= shrugs if the dependency fails; Requires= takes the dependent unit down with it, which is usually more drama than you want — prefer Wants= unless the service is genuinely meaningless without the other.) After= and Before= are ordering edges: they say nothing about what starts, only about sequence among things that are starting anyway. Dependency without ordering means both units start, in a race. Ordering without dependency means a clean sequence — around a unit that may never have been asked to start at all. Most real relationships want one edge of each kind, which is exactly what the boot-time scenario above came down to.

A small web service in the graph. Solid edges pull units into the transaction; dashed edges only decide sequence. app.service carries both kinds toward its database and the network.

Targets are how the graph gets its shape. multi-user.target is the modern descendant of runlevel 3 (a fully booted, non-graphical server) and graphical.target of runlevel 5, but unlike runlevels they are ordinary units: you can define your own, group services under it, and bring whole slices of the system up or down together. Boot is just systemd computing the transaction for the default target and executing it with maximal parallelism — and everything before that moment, from firmware to the instant PID 1 first reads a unit file, is walkable step by step in the Linux boot simulator.

Two more unit types earn their keep in daily work. Timers are systemd's answer to cron: an app-backup.timer with an OnCalendar= schedule activates a matching app-backup.service. What you buy over a crontab line is everything around the job: its output lands in the journal instead of mail nobody reads, Persistent=true runs a missed schedule after the box was down, systemctl list-timers shows every schedule with its last and next run in one table, and the job inherits the service machinery — resource limits, dependencies, its own cgroup. Cron is still the right tool plenty of the time, and the syntax for both lives in the cron cheat sheet.

And underneath all of it: systemd can supervise honestly because of cgroups. Classic init scripts tracked services by writing a PID to a file and hoping; a daemon that forked twice was an orphan no supervisor could account for. systemd puts every unit in its own control group, and the kernel guarantees children cannot leave it. That single property is what makes the CGroup tree in status trustworthy, what makes systemctl stop able to kill an entire process tree without leaking workers, and what the Tasks, Memory, and CPU accounting lines are read from.

Pitfalls

Editing a unit file and forgetting daemon-reload. systemd parses unit files into memory and works from the parsed copy. Edit a file on disk and nothing changes until systemctl daemon-reload tells PID 1 to re-read everything — restarting the service is not enough, because the restart uses the stale in-memory unit. Recent systemd versions print a warning when status notices the file changed on disk, but only if you happen to run status and happen to read it. systemctl edit reloads for you; vim does not.

Confusing mask with disable. disable removes the boot-time symlinks, and that is all it does: the unit can still be started by hand, and — the part that surprises people — still gets pulled in by any other unit that lists it in Wants= or Requires=, or by socket activation. mask is the stronger statement: it symlinks the unit name to /dev/null, so nothing can start it, not an admin, not a dependency, not a socket. Mask is the right tool when something keeps resurrecting a service you want dead; it is the wrong tool to reach for casually, because six months later somebody will spend an afternoon discovering why a perfectly healthy unit refuses to start with "Unit app.service is masked."

Trusting reload to have done something. systemctl reload succeeds when the ExecReload= command succeeds — and for most units that command is "send SIGHUP to the main PID", which succeeds the moment the signal is delivered, whether or not the daemon does anything with it. A daemon that ignores SIGHUP, or one whose reload handler re-reads some config files but not the one you changed, gives you a green exit code and unchanged behaviour. If the new config absolutely must be live, verify behaviour after a reload, or restart and pay the disruption for certainty.

Editing vendor files instead of drop-ins. The change works, survives every test you run, and disappears the next time the package manager upgrades the package and rewrites its file under /usr/lib. Worse, it disappears silently, weeks from now, on whichever box upgraded first. Overrides belong in /etc: drop-ins via systemctl edit for changing a few settings, or a full copy via systemctl edit --full when you genuinely need to replace the whole file. Either way, systemctl cat will show the next engineer the layers.

A drill you can run right now

Everything below reads state without changing it — the one edit step is deliberately abandoned, and systemd discards an empty override. Ten minutes on any Linux box, virtual machine, or container with systemd as PID 1.

Step 1 — read a real status block slowly. Pick a unit that exists everywhere: systemctl status systemd-journald.service, or ssh / sshd if the box runs one. Read every line against the anatomy above: the dot, the Loaded path and enabled state, the Active timestamp (how long has this actually been up?), the Main PID, the cgroup tree. Then run ps -fp on the main PID and notice that ps gives you one process while the cgroup tree gave you all of them, with the supervisor's view of which one matters.

Step 2 — cat versus edit. Run systemctl cat ssh.service and look at the file path printed in the comment on the first line: that is the copy systemd is actually using. Then run sudo systemctl edit ssh.service, look at the empty override buffer and the commented-out copy of the unit below it, and quit without writing anything. systemd announces it discarded the empty file. You have now seen the override mechanism end to end without changing your machine.

$ systemctl cat ssh.service | head -4
# /usr/lib/systemd/system/ssh.service
[Unit]
Description=OpenBSD Secure Shell server
Documentation=man:sshd(8) man:sshd_config(5)
$ sudo systemctl edit ssh.service
(quit the editor without saving)
Editing "/etc/systemd/system/ssh.service.d/override.conf" cancelled: temporary file is empty.
$ systemctl list-units --failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
0 loaded units listed.
$ systemctl list-timers --all | head -5
NEXT                        LEFT          LAST                        PASSED   UNIT                      ACTIVATES
Sun 2026-06-08 12:32:00 UTC 2h 4min left  Sun 2026-06-07 12:32:00 UTC 21h ago  apt-daily.timer           apt-daily.service
Mon 2026-06-09 00:00:00 UTC 13h left      Sun 2026-06-08 00:00:04 UTC 10h ago  logrotate.timer           logrotate.service
Mon 2026-06-09 06:14:22 UTC 19h left      Sun 2026-06-08 06:14:22 UTC 4h ago   fstrim.timer              fstrim.service

Step 3 — the failure inventory. systemctl list-units --failed is the first command worth typing on any box you have just been handed. Zero rows is the happy case. Any rows: systemctl status the unit, read the Active line for when it failed and the result for how, then journalctl -u it for the why. A failed unit costs nothing while it sits there, which is exactly why boxes accumulate them unnoticed.

Step 4 — the schedules. systemctl list-timers --all shows every timer with its next and last activation. Find one that ran recently and pull its log with journalctl -u logrotate.service --since yesterday — scheduled work whose output is captured, timestamped, and queryable, no mail spool involved. Compare that with hunting down whichever crontab a predecessor left on this box, and the case for timers mostly makes itself.

If you remember one line. systemctl status unit and actually read all of it; systemctl cat unit for the config that is really in effect; systemctl list-units --failed on any box you have just inherited; and after editing any unit file by hand, systemctl daemon-reload before you conclude anything.

systemctl

The question it answers

The five invocations that matter

Reading the status block

Three production scenarios

The service that flaps

"I changed limits.conf and nothing happened"

Works by hand, dies on boot

What is underneath

Pitfalls

A drill you can run right now

Further reading

27 — Is it the network?