systemctl
A deploy went out and the service never came back. A daemon has been restarting itself
every eleven seconds for a week and nobody noticed. You raised the open-files limit in
limits.conf and the service still dies at 1024. All of these end at the same
prompt: systemctl, the command that talks to systemd, the process that starts
and supervises every service on a modern Linux box. This page covers the five invocations
that do the daily work, reads a full status block line by line, walks three
production incidents, looks at what a unit actually is on disk, and ends with a drill you
can run anywhere without changing a thing.
The question it answers
The question is "how do I manage — and read the true state of — the services on this box?"
Note the second half. ps can tell you a process exists. It cannot tell you
whether that process is the supervised main process of a service, whether it has crashed
and been resurrected forty times since lunch, whether it will come back after a reboot, or
which config file it was actually started from. systemctl can, because it asks
the one process that knows: systemd, running as PID 1, the ancestor of everything else
on the machine and the supervisor of every service on it.
systemd's model fits in sixty seconds. Everything it manages is a unit,
named by a suffix that says what kind of thing it is: app.service is a daemon,
app.socket is a listening socket that can start the daemon on first connection,
app.timer is a scheduled trigger, data.mount is a filesystem,
multi-user.target is a grouping point. Each unit is described by a small
ini-style text file. Units refer to each other with directives like Wants=,
Requires=, and After=, and those references form a
dependency graph. A target is a unit with no process of
its own; it exists purely as a named place in that graph, a synchronisation point that other
units attach themselves to. Booting the machine is nothing more exotic than systemd picking
the default target and walking the graph until everything that target pulls in is up.
systemctl is the client side of all this. It does not start processes itself;
it sends requests to PID 1 over the system bus and prints what PID 1 reports back.
That distinction matters more than it sounds. When systemctl status says a
service is running, that is not a guess assembled from a process listing — it is the
supervisor's own ledger: the main PID it forked, the control group it placed the processes
in, the timestamps of every state change, the result of the last start attempt. Reading
that ledger properly is most of the skill, which is why a third of this page is one
status block read line by line.
One piece of vocabulary before the commands: a unit has two mostly independent states. The
active state says whether it is running right now. The enabled state says
whether it is wired to start at boot. A service can be running but disabled, stopped but
enabled, and every other combination. Half the confusion people have with
systemctl dissolves once those two axes come apart, so we will keep returning
to them.
The five invocations that matter
systemctl accepts somewhere north of a hundred verbs. Five invocations cover
nearly all the daily work, and each one carries a distinction that people get wrong for
years before someone spells it out.
| Invocation | What it does | The distinction people miss |
|---|---|---|
systemctl status app | The supervisor's ledger: load state, active state, main PID, the process tree, recent log lines | It is ten lines of dense signal that almost everyone skims. Decoded fully below. |
systemctl restart appsystemctl reload app | Restart kills and re-launches; reload asks the running process to re-read config | Reload keeps connections alive but only works if the unit defines it; reload-or-restart picks for you |
systemctl enable --now app | Enable wires the unit into boot; start runs it now; --now does both | Enable does not start anything. Start does not survive a reboot. The eternal confusion. |
sudo systemctl edit app | Opens a drop-in override file so you can change settings without touching the vendor unit | The vendor file is not yours to edit; drop-ins merge over it and survive package upgrades |
systemctl list-units --failedsystemctl cat app | Every unit in the failed state; the full assembled text of one unit with all its drop-ins | The pair that starts every investigation: what is broken, and what config is it really running with |
Restart versus reload is a choice about disruption. restart
stops the service and starts it again: systemd sends the main process SIGTERM, waits up to
TimeoutStopSec, escalates to SIGKILL if it has to, then runs the start sequence
fresh. Every connection the service held is gone. reload instead runs whatever
the unit's ExecReload= line says, which for most daemons is a SIGHUP to the
main process — nginx re-reads its config and re-opens its logs without dropping a single
connection. The signal mechanics behind both, and why TERM-then-KILL is the polite order,
are covered in kill & signals. The
catch is that reload only exists if the unit defines it, so when you are scripting and do
not know the unit intimately, systemctl reload-or-restart app does the right
thing: reload if the unit supports it, restart if not.
Enable versus start is the two-axes point from above made operational.
start changes the active state: the service runs now, and a reboot forgets it
ever did. enable changes the enabled state: it creates a symlink in a target's
.wants/ directory so the unit gets pulled in at boot, and changes nothing about
the present moment. The classic failure is enabling a freshly installed service, seeing no
errors, and filing the ticket as done while the service sits there stopped — or the mirror
image, starting a service during an incident and discovering three months later, after a
kernel patch and a reboot, that nobody ever enabled it. enable --now exists
precisely so you stop choosing.
systemctl edit deserves its own sentence because the
alternative is so tempting. The vendor's unit file lives under
/usr/lib/systemd/system/, and the package manager owns it: edit it directly
and your change is silently overwritten at the next upgrade. systemctl edit app
opens an editor on /etc/systemd/system/app.service.d/override.conf, a
drop-in that merges over the vendor file setting by setting. You write only the
lines you want to change, the vendor file stays pristine, the upgrade keeps working, and
systemctl cat app shows the merged result with each fragment's path printed
above it so the next engineer can see exactly what was overridden and where. As a bonus,
edit runs the daemon-reload step for you, which manual editing
does not — a pitfall with its own entry below.
Reading the status block
Here is the full output for an nginx that has been up for six days, with a drop-in applied. This is the ten-or-so lines everyone scrolls past on the way to the log tail, and nearly every line answers a question you would otherwise run another command for.
$ systemctl status nginx.service ● nginx.service - A high performance web server and a reverse proxy server Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled; preset: enabled) Drop-In: /etc/systemd/system/nginx.service.d └─override.conf Active: active (running) since Mon 2026-06-01 04:11:32 UTC; 6 days ago Docs: man:nginx(8) Process: 1287 ExecStartPre=/usr/sbin/nginx -t -q (code=exited, status=0/SUCCESS) Main PID: 1290 (nginx) Tasks: 3 (limit: 4582) Memory: 18.4M CPU: 2min 41.020s CGroup: /system.slice/nginx.service ├─1290 "nginx: master process /usr/sbin/nginx" ├─1291 "nginx: worker process" └─1292 "nginx: worker process" Jun 01 04:11:32 web-1 systemd[1]: Starting A high performance web server... Jun 01 04:11:32 web-1 systemd[1]: Started A high performance web server and a reverse proxy server.
The dot is the fastest signal on the page: green for active, white for
inactive, red for failed. Loaded tells you three things in one line. The
path says which copy of the unit file systemd parsed — /usr/lib/... means the
vendor's, /etc/... means an admin replaced it wholesale. enabled
is the boot axis: this unit will start at boot. And preset: enabled is the
distro's default policy for this unit, the answer to "should this be enabled on a fresh
install" — when the two disagree, a human made a deliberate choice at some point, which is
occasionally exactly the archaeology you need. Drop-In lists every override
fragment merged over the vendor file; if behaviour does not match the unit file you are
reading, the explanation is almost always in this list.
Active carries the state and, just as usefully, the timestamp.
active (running) with "6 days ago" is a healthy daemon.
active (running) with "8 seconds ago" on a service nobody touched is a
restart loop wearing a green dot. activating means the start sequence has not
finished — common for services with a slow ExecStartPre, and also what
Type=notify services show while systemd waits for their ready signal.
activating (auto-restart) is the brief hold between a crash and the next
attempt. failed means the last attempt is over and systemd has given up; the
reason is one journalctl away.
Main PID is the process whose exit decides the unit's fate. When it dies,
the unit's state changes and any Restart= policy fires. The
CGroup tree below it is the part worth slowing down for: it lists every
process the unit owns, not just the main one. systemd places each service in its own
control group at start, and children stay in that cgroup no matter how they fork, which is
how the supervisor can account for them (the Tasks, Memory, and CPU lines are read straight
from cgroup accounting) and kill the whole tree cleanly at stop. A worker that double-forked
itself into the background twenty minutes ago still shows up here. How cgroups do that
bookkeeping, and how the same mechanism turns into CPU and memory limits, is the subject of
nice, ionice & cgroups.
Finally, the journal tail: the last few log lines for the unit, newest at
the bottom. It is a teaser, not the record. The full view — every line the service and
systemd ever wrote about it, filterable by boot and by time — is
journalctl -u nginx, and reading it well is its own page:
journalctl & dmesg.
Three production scenarios
The service that flaps
Alerts fire, then resolve, then fire again. systemctl status shows a green dot
— and "active (running) since … 9s ago". The service is crashing, and
Restart=always is hiding the body: systemd waits RestartSec
(100 ms by default) after each death and starts it again, so any health check that
samples between crashes sees a running process. The status timestamp is the tell, and the
journal is the proof:
$ journalctl -u app.service --since "15 min ago" | tail -8 Jun 08 09:14:02 web-1 app[8841]: panic: connection pool exhausted Jun 08 09:14:02 web-1 systemd[1]: app.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Jun 08 09:14:02 web-1 systemd[1]: app.service: Failed with result 'exit-code'. Jun 08 09:14:02 web-1 systemd[1]: app.service: Scheduled restart job, restart counter is at 47. Jun 08 09:14:02 web-1 systemd[1]: Started app.service. Jun 08 09:14:13 web-1 app[8854]: panic: connection pool exhausted Jun 08 09:14:13 web-1 systemd[1]: app.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Jun 08 09:14:13 web-1 systemd[1]: app.service: Scheduled restart job, restart counter is at 48.
The restart counter climbing in lockstep with the same panic is the whole diagnosis. The fix
is whatever the panic says it is; Restart= only decides how loudly the failure
announces itself. There is a rate limiter on top: by default a unit allowed to start more
than five times within ten seconds (StartLimitBurst=5 inside
StartLimitIntervalSec=10) trips the limit, lands in failed with
start-limit-hit, and stays down until systemctl reset-failed app
clears the counter. That limiter confuses people in the other direction too: a service that
crash-looped fast enough trips it, and then even a correct fix appears not to work because
systemctl start keeps refusing until you reset. If your restarts are spaced out
by a longer RestartSec, the default burst window never fills, and the loop can
run for days — which is why "since 9 seconds ago" on the status line deserves a reflexive
second look every single time.
"I changed limits.conf and nothing happened"
The service dies with "too many open files", so you raise nofile in
/etc/security/limits.conf, restart, and it dies again at exactly the same
count. Nothing is broken; you edited the wrong layer. limits.conf is applied by
PAM during login sessions. A systemd service never logs in — PID 1 starts it directly,
and PID 1 sets its limits from the unit's own directives. The fix is a drop-in:
$ sudo systemctl edit app.service # the editor opens override.conf — you write only this: [Service] LimitNOFILE=65536 $ sudo systemctl restart app.service $ systemctl show app.service -p LimitNOFILE LimitNOFILE=65536 $ cat /proc/$(systemctl show app -p MainPID --value)/limits | grep files Max open files 65536 65536 files
Two verification steps, on purpose. systemctl show -p LimitNOFILE confirms
what systemd intends to apply; reading /proc/PID/limits confirms what
the running process actually got, which catches the case where you forgot to
restart after editing. The whole layered story of where limits come from — kernel defaults,
PAM, the shell's ulimit, and systemd's directives, and which one wins for which
kind of process — is laid out in
ulimit & limits.
Works by hand, dies on boot
You can systemctl start app at any time of day and it comes up clean. Reboot
the box and it is dead every time. There are two families of cause, and
journalctl -b -u app (this boot only, this unit only) tells you which one you
have.
Family one: ordering. At boot, systemd starts everything it can in
parallel, and your service raced something it needs — the database, name resolution, a
mounted volume — and lost. By the time you test by hand, the dependency has long been up,
so the bug only exists in the first few seconds after boot. The fix is to declare the
ordering: After=postgresql.service makes systemd sequence the start,
and the subtlety is that After= alone does not pull the other unit in.
It only sequences units that are starting anyway. You want the pair:
Wants=postgresql.service plus After=postgresql.service — pull it
in, and order behind it. The network variant has its own trap:
network.target means roughly "network management has started", not "the box has
an address". A service that binds a specific address at boot wants
Wants=network-online.target and After=network-online.target
instead.
Family two: environment. When you start a service from your shell, it
quietly inherits your world: your PATH with /usr/local/bin in it,
your HOME, the environment variables your dotfiles export, your current
directory. A unit started by PID 1 at boot gets almost none of that — a minimal
PATH, no login environment, working directory of / unless
WorkingDirectory= says otherwise. A service that shells out to a binary in
/usr/local/bin, or reads a config via a relative path, or expects
AWS_PROFILE from somebody's bashrc, works by hand and dies on boot. The fix is
to make the unit self-contained: absolute paths in Exec*= lines,
WorkingDirectory= set explicitly, variables declared with
Environment= or EnvironmentFile=. The discipline generalises:
a unit file should describe everything the service needs, because the unit file is all it
gets.
What is underneath
A unit is a text file, and where the file lives decides who wins. Vendor units installed by
packages live in /usr/lib/systemd/system/. Runtime units generated on the fly
live in /run/systemd/system/. Admin units live in
/etc/systemd/system/, and /etc beats /run beats
/usr/lib: a file with the same name higher in that order replaces the lower one
entirely. Drop-ins are the gentler mechanism — fragments in
app.service.d/*.conf merge over whichever full file won, one setting at a time.
This is why systemctl cat app is the only honest way to read a unit's config:
it prints the winning file plus every drop-in, each with its path, in the order systemd
applied them. Reading the vendor file with a pager shows you what the package author wrote,
which may share only a passing resemblance with what is running.
The directives inside those files build the graph, and the graph has two kinds of edges
that people persistently blur. Wants= and Requires= are
dependency edges: starting this unit pulls that one into the transaction.
(Wants= shrugs if the dependency fails; Requires= takes the
dependent unit down with it, which is usually more drama than you want — prefer
Wants= unless the service is genuinely meaningless without the other.)
After= and Before= are ordering edges: they say nothing
about what starts, only about sequence among things that are starting anyway. Dependency
without ordering means both units start, in a race. Ordering without dependency means a
clean sequence — around a unit that may never have been asked to start at all. Most real
relationships want one edge of each kind, which is exactly what the boot-time scenario
above came down to.
Targets are how the graph gets its shape. multi-user.target is the modern
descendant of runlevel 3 (a fully booted, non-graphical server) and
graphical.target of runlevel 5, but unlike runlevels they are ordinary units:
you can define your own, group services under it, and bring whole slices of the system up
or down together. Boot is just systemd computing the transaction for the default target and
executing it with maximal parallelism — and everything before that moment, from firmware to
the instant PID 1 first reads a unit file, is walkable step by step in the
Linux boot simulator.
Two more unit types earn their keep in daily work. Timers are systemd's
answer to cron: an app-backup.timer with an OnCalendar= schedule
activates a matching app-backup.service. What you buy over a crontab line is
everything around the job: its output lands in the journal instead of mail nobody reads,
Persistent=true runs a missed schedule after the box was down,
systemctl list-timers shows every schedule with its last and next run in one
table, and the job inherits the service machinery — resource limits, dependencies, its own
cgroup. Cron is still the right tool plenty of the time, and the syntax for both lives in
the cron cheat sheet.
And underneath all of it: systemd can supervise honestly because of cgroups. Classic init
scripts tracked services by writing a PID to a file and hoping; a daemon that forked twice
was an orphan no supervisor could account for. systemd puts every unit in its own control
group, and the kernel guarantees children cannot leave it. That single property is what
makes the CGroup tree in status trustworthy, what makes
systemctl stop able to kill an entire process tree without leaking workers,
and what the Tasks, Memory, and CPU accounting lines are read from.
Pitfalls
Editing a unit file and forgetting daemon-reload. systemd parses unit files
into memory and works from the parsed copy. Edit a file on disk and nothing changes until
systemctl daemon-reload tells PID 1 to re-read everything — restarting
the service is not enough, because the restart uses the stale in-memory unit. Recent
systemd versions print a warning when status notices the file changed on disk,
but only if you happen to run status and happen to read it.
systemctl edit reloads for you; vim does not.
Confusing mask with disable. disable removes the boot-time
symlinks, and that is all it does: the unit can still be started by hand, and — the part
that surprises people — still gets pulled in by any other unit that lists it in
Wants= or Requires=, or by socket activation.
mask is the stronger statement: it symlinks the unit name to
/dev/null, so nothing can start it, not an admin, not a dependency, not a
socket. Mask is the right tool when something keeps resurrecting a service you want dead;
it is the wrong tool to reach for casually, because six months later somebody will spend an
afternoon discovering why a perfectly healthy unit refuses to start with
"Unit app.service is masked."
Trusting reload to have done something. systemctl reload
succeeds when the ExecReload= command succeeds — and for most units that
command is "send SIGHUP to the main PID", which succeeds the moment the signal is
delivered, whether or not the daemon does anything with it. A daemon that ignores SIGHUP,
or one whose reload handler re-reads some config files but not the one you changed, gives
you a green exit code and unchanged behaviour. If the new config absolutely must be live,
verify behaviour after a reload, or restart and pay the disruption for certainty.
Editing vendor files instead of drop-ins. The change works, survives every
test you run, and disappears the next time the package manager upgrades the package and
rewrites its file under /usr/lib. Worse, it disappears silently, weeks from
now, on whichever box upgraded first. Overrides belong in /etc: drop-ins via
systemctl edit for changing a few settings, or a full copy via
systemctl edit --full when you genuinely need to replace the whole file. Either
way, systemctl cat will show the next engineer the layers.
A drill you can run right now
Everything below reads state without changing it — the one edit step is
deliberately abandoned, and systemd discards an empty override. Ten minutes on any Linux
box, virtual machine, or container with systemd as PID 1.
Step 1 — read a real status block slowly. Pick a unit that exists
everywhere: systemctl status systemd-journald.service, or ssh /
sshd if the box runs one. Read every line against the anatomy above: the dot,
the Loaded path and enabled state, the Active timestamp (how long has this actually been
up?), the Main PID, the cgroup tree. Then run ps -fp on the main PID and
notice that ps gives you one process while the cgroup tree gave you all of
them, with the supervisor's view of which one matters.
Step 2 — cat versus edit. Run systemctl cat ssh.service and
look at the file path printed in the comment on the first line: that is the copy systemd is
actually using. Then run sudo systemctl edit ssh.service, look at the empty
override buffer and the commented-out copy of the unit below it, and quit without writing
anything. systemd announces it discarded the empty file. You have now seen the override
mechanism end to end without changing your machine.
$ systemctl cat ssh.service | head -4 # /usr/lib/systemd/system/ssh.service [Unit] Description=OpenBSD Secure Shell server Documentation=man:sshd(8) man:sshd_config(5) $ sudo systemctl edit ssh.service (quit the editor without saving) Editing "/etc/systemd/system/ssh.service.d/override.conf" cancelled: temporary file is empty. $ systemctl list-units --failed UNIT LOAD ACTIVE SUB DESCRIPTION 0 loaded units listed. $ systemctl list-timers --all | head -5 NEXT LEFT LAST PASSED UNIT ACTIVATES Sun 2026-06-08 12:32:00 UTC 2h 4min left Sun 2026-06-07 12:32:00 UTC 21h ago apt-daily.timer apt-daily.service Mon 2026-06-09 00:00:00 UTC 13h left Sun 2026-06-08 00:00:04 UTC 10h ago logrotate.timer logrotate.service Mon 2026-06-09 06:14:22 UTC 19h left Sun 2026-06-08 06:14:22 UTC 4h ago fstrim.timer fstrim.service
Step 3 — the failure inventory. systemctl list-units --failed
is the first command worth typing on any box you have just been handed. Zero rows is the
happy case. Any rows: systemctl status the unit, read the Active line for
when it failed and the result for how, then journalctl -u it
for the why. A failed unit costs nothing while it sits there, which is exactly why boxes
accumulate them unnoticed.
Step 4 — the schedules. systemctl list-timers --all shows
every timer with its next and last activation. Find one that ran recently and pull its log
with journalctl -u logrotate.service --since yesterday — scheduled work whose
output is captured, timestamped, and queryable, no mail spool involved. Compare that with
hunting down whichever crontab a predecessor left on this box, and the case for timers
mostly makes itself.
systemctl status unit and actually
read all of it; systemctl cat unit for the config that is really in effect;
systemctl list-units --failed on any box you have just inherited; and after
editing any unit file by hand, systemctl daemon-reload before you conclude
anything.Further reading
- systemctl(1) — the manual page — long, but the "Unit Commands" section maps cleanly onto the five invocations above.
- systemd.unit(5) — the unit-file format, the search-path precedence rules, and the precise semantics of Wants, Requires, After, and Before.
- systemd for Administrators — Lennart Poettering — the original blog series from the project's author; dated in places, still the best narrative of why the design is the way it is.
- systemd.special(7) — what every standard target actually means, including the network-online.target story from the boot scenario.