10 / 28

Linux / 10

journalctl & dmesg

A service keeps restarting and its own log file ends mid-sentence. A box rebooted at 3am and nobody asked it to. A process died and left no farewell at all. Your application logs tell you what your code thinks happened; the journal and the kernel ring buffer tell you what the operating system knows happened, including the parts your code never got to see. This page covers the five invocations that do the daily work, decodes a crash-restart cycle and the famous OOM-killer block line by line, walks three production incidents, and ends with a drill that is safe on any machine.

The question they answer

Every debugging session starts with logs, and most engineers stop at the first kind: the application's own output. That works right up until the moment it cannot. When the kernel kills a process with SIGKILL, the process gets no chance to flush a buffer, write a stack trace, or say goodbye; the signal is not catchable, and the log simply stops. When a machine loses power or panics, the application log ends wherever the last flushed write landed. When a disk starts returning errors, the application sees timeouts and retries and reports them as its problem, because it has no idea the hardware underneath is failing. In all three cases the truth was recorded — just not by your code.

Linux keeps two system-level records, and they answer slightly different questions. The journal, kept by systemd-journald, is the userspace record: it captures what every service printed to stdout and stderr, what systemd itself observed about those services starting, stopping, crashing, and being restarted, plus everything sent through the old syslog interface. You read it with journalctl. The kernel ring buffer is the kernel's own record: hardware events, driver messages, filesystem complaints, network interface state changes, and the verdicts of kernel subsystems like the OOM killer. You read it with dmesg, or with journalctl -k, because the journal ingests kernel messages too.

The shift in thinking is small but it changes what you do first during an incident. The application log answers "what was my code doing?" The journal answers "what did the service manager see happen to my process?" The kernel buffer answers "what did the machine itself experience?" A service that "randomly dies" is rarely random once you read the second and third records: there is almost always a line, written by systemd or the kernel, that states exactly who killed it and why. The skill this page teaches is finding that line quickly, and then being able to actually read it, because the most important kernel messages — the OOM block above all — are written in a dialect nobody teaches.

The five invocations that matter

Like most of the systemd surface, journalctl has a long manual and a short working set. Five invocations cover nearly everything you will do with it and with dmesg in a normal month of operations.

Invocation	What it shows	When you reach for it
`journalctl -u nginx --since "1 hour ago"`	One unit's log, time-windowed	The everyday move. Any "what is this service doing" question starts here.
`journalctl -fu nginx`	Live follow of one unit	Watching a deploy land, tailing during an incident — `tail -f` for services
`journalctl -b -1 -e`	The previous boot, jumped to the end	Post-crash forensics: what were the last things written before the machine went down
`journalctl -p err -b`	Everything at priority error or worse, this boot	The triage sweep: one command, every complaint the system considered serious
`journalctl -k` / `dmesg -T`	Kernel messages only, with readable timestamps	Hardware, drivers, filesystems, OOM kills — anything below userspace

A few notes on the grammar, because each flag hides a little more than it shows. The --since and --until filters accept both human phrases ("1 hour ago", yesterday, today) and timestamps ("2026-06-08 03:00"), and combining a unit with a window is the single highest value habit here: an unfiltered journalctl on a long-lived box will happily page you through months of history. The -e flag jumps the pager to the end, which is where the interesting lines usually are; -n 200 limits output to the last two hundred entries if you want no pager at all.

The -b selector takes an offset: -b alone means the current boot, -b -1 the one before it, -b -2 the one before that, and journalctl --list-boots prints the full catalogue with start and end times. This is the flag that turns the journal from a log viewer into a forensic instrument, because "what happened before the reboot" is otherwise a genuinely hard question.

Priority filtering with -p uses the eight syslog levels, and the filter means "this level and worse": -p err shows err, crit, alert, and emerg. The levels, from loudest to quietest: emerg (0), alert (1), crit (2), err (3), warning (4), notice (5), info (6), debug (7). In practice -p err is the triage sweep and -p warning is the slightly paranoid version of it.

Why dmesg needs -T. By default dmesg prefixes every line with seconds since boot, like [1834502.114866], which is precise and unreadable. dmesg -T converts to wall-clock time. The conversion has a known wrinkle — covered in the pitfalls — but for "when did the disk start complaining," readable beats exact, and journalctl -k sidesteps the issue entirely by stamping kernel messages with real receipt times.

Reading the output

Here is the journal around a service crash — the thing you will actually be staring at when someone says "payments keeps restarting." Run as a user in the systemd-journal group or with sudo; without either you only see your own session's entries.

$ sudo journalctl -u payments --since "12:00" --until "12:06"
Jun 08 12:04:31 web-3 payments[1234]: request completed route=/charge status=200 dur=41ms
Jun 08 12:04:32 web-3 payments[1234]: request completed route=/charge status=200 dur=39ms
Jun 08 12:04:40 web-3 systemd[1]: payments.service: A process of this unit has been killed by the OOM killer.
Jun 08 12:04:40 web-3 systemd[1]: payments.service: Main process exited, code=killed, status=9/KILL
Jun 08 12:04:40 web-3 systemd[1]: payments.service: Failed with result 'oom-kill'.
Jun 08 12:04:50 web-3 systemd[1]: payments.service: Scheduled restart job, restart counter is at 7.
Jun 08 12:04:50 web-3 systemd[1]: Started payments.service - Payments API.

The line shape is fixed: timestamp, hostname, then an identifier with a PID in brackets, then the message. The identifier is the tell. Lines tagged payments[1234] are the service's own stdout and stderr, captured by journald — your application talking. Lines tagged systemd[1] are the service manager talking about your service from the outside, and those are the ones that survive a crash, because PID 1 does not die when your process does. Interleaving the two voices in one timeline is the journal's whole value: the application's last healthy request at 12:04:32, then eight seconds of silence, then the outside view of its death.

Decode the death line itself. code=killed means the process did not exit; it was terminated by a signal. status=9/KILL names the signal: 9, SIGKILL, the uncatchable one. A clean crash from a failed assertion looks different (code=exited, status=1); a segfault different again (status=11/SEGV); a graceful stop that overran its timeout shows status=9/KILL too, but only after a Stopping... line and a State 'stop-sigterm' timed out complaint before it. The pattern above — no stop request, straight to SIGKILL, with an explicit oom-kill result on a recent systemd — points one direction, and the kernel buffer has the rest of the story. On older systems the oom-kill attribution lines are absent and all you get is the bare status=9/KILL, which is exactly when you need journalctl -k.

The OOM block, decoded

When the kernel runs out of reclaimable memory, it picks a process and kills it, and it writes a long, dense block into the ring buffer explaining itself. Almost everyone has scrolled past this block; very few people can read it. Here is the abbreviated shape, via journalctl -k around the same timestamp:

$ sudo journalctl -k --since "12:04" --until "12:05"
Jun 08 12:04:40 web-3 kernel: java invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Jun 08 12:04:40 web-3 kernel: Mem-Info:
Jun 08 12:04:40 web-3 kernel: active_anon:3801504 inactive_anon:62110 isolated_anon:0 ...
Jun 08 12:04:40 web-3 kernel: Tasks state (memory values in pages):
Jun 08 12:04:40 web-3 kernel: [  pid  ]   uid  tgid total_vm      rss ... oom_score_adj name
Jun 08 12:04:40 web-3 kernel: [    612]     0   612    73410     1922 ...             0 rsyslogd
Jun 08 12:04:40 web-3 kernel: [   1234]   998  1234  9437184  3145728 ...             0 java
Jun 08 12:04:40 web-3 kernel: Out of memory: Killed process 1234 (java) total-vm:37748736kB, anon-rss:12582912kB, file-rss:1024kB, shmem-rss:0kB, UID:998 pgtables:25600kB oom_score_adj:0

Read it top to bottom. The first line names the invoker, not the victim: java invoked oom-killer means java happened to be the process whose memory allocation could not be satisfied, which tripped the killer. The invoker and the victim are often the same process on a single-tenant box, but they do not have to be — an innocent 16 MB allocation by rsyslogd can invoke the killer, which then chooses the 12 GB java process as the victim. Blaming the invoker is the classic misreading. The gfp_mask and order describe the allocation that failed (order=0 is a single 4 KiB page, the most ordinary request there is — the machine was genuinely out, not fragmented).

Then comes the task table, a census of every candidate at the moment of the kill, and its trap is in the header: memory values in pages, not bytes or kilobytes. One page is 4 KiB on most systems, so java's rss of 3145728 pages is 12 GiB, and its total_vm of 9437184 pages is 36 GiB. The table is where you see the whole field: who else was big, what their oom_score_adj was, and whether the victim was really the heaviest process or just the heaviest one without a protective score.

Finally the verdict line, the one everyone has seen and few can parse:

The verdict line, decoded. total-vm is address space and proves nothing; anon-rss is real RAM and is the number you quote in the postmortem.

The distinction that matters: total-vm is virtual address space — every mapping the process ever made, most of it never backed by physical memory. A JVM or a Go runtime reserves enormous virtual ranges as a matter of course, so a huge total-vm proves nothing. anon-rss is anonymous resident memory: heap, stacks, the pages the process was genuinely holding in RAM with no file behind them. That 12 GB is what the process was actually costing the machine, and it is the number to put in the incident writeup. file-rss and shmem-rss are file-backed and shared pages, which the kernel could mostly have reclaimed without killing anything, which is why they are usually small in these reports. The deeper machinery — what counts as reclaimable, how the oom score is computed, why the kernel overcommits in the first place — lives in memory management.

Three production scenarios

"The service randomly restarts"

The report arrives as a mystery: payments has restarted six times today, the application log shows normal traffic and then a gap, the dashboards show the gaps but not the cause. "Randomly" is doing a lot of work in that sentence. Start with the unit's journal over the window, exactly as in the excerpt above, and look for the systemd lines between the gaps. There are only a few endings a process can have, and each leaves a distinct signature: code=exited, status=1 means it crashed on its own and the reason should be in its last stdout lines; status=11/SEGV means a segfault; code=killed, status=9/KILL with no stop request means something outside the process killed it, and the two usual suspects are the kernel OOM killer and a cgroup memory limit.

Confirm with the kernel record: journalctl -k --since "12:00" and look for the OOM block. One detail worth reading carefully: a global OOM kill says Out of memory: Killed process, while a kill caused by the service's own MemoryMax= cgroup limit says Memory cgroup out of memory instead. The first means the machine was out of RAM and the fix is capacity or a leak hunt; the second means the machine was fine and the service simply hit its configured ceiling, and the fix is the ceiling or the workload. Engineers conflate these constantly and the remediation is different for each. For the leak hunt itself — who was growing, how fast, anonymous or cache — the working method is in what's eating my memory? with the supporting numbers in free & vmstat.

The box rebooted at 3am — why?

Uptime says four hours, nobody deployed anything, and the monitoring gap matches. The current boot's journal cannot help, because the answer is in the boot that died. This is what -b -1 is for:

$ journalctl --list-boots | tail -3
-2 b9c1...  Tue 2026-06-02 09:11:02 UTC - Thu 2026-06-04 22:40:18 UTC
-1 4f7a...  Thu 2026-06-04 22:41:30 UTC - Sun 2026-06-08 03:12:44 UTC
 0 d20e...  Sun 2026-06-08 03:14:09 UTC - Sun 2026-06-08 07:02:51 UTC
$ journalctl -b -1 -e

Read the last lines of the dead boot and they tell you which kind of death it was. A clean shutdown leaves a paper trail: Stopped target Multi-User System, units stopping one by one, Reached target System Reboot. If you see that, something asked for the reboot — a human, an unattended-upgrades job, a cloud provider's maintenance event — and the same journal usually names it a page up. The other signature is the absence of one: ordinary chatter at 03:12:44 and then nothing, no shutdown sequence, the record simply ends. That is a hard stop — power loss, a hardware fault, a hang followed by a watchdog reset, or a kernel panic. A panic usually does not appear in the journal at all, for the bleak reason that the process that writes the journal to disk died with everything else; capturing panics takes pstore or kdump, which is its own topic. But even the abrupt ending is information: check journalctl -b -1 -k -e for hardware complaints in the final minutes, machine-check errors, or thermal warnings, and you have either a lead or a clean bill that points at power. What the machine does from power-on to that first journal line is walked step by step in the Linux boot simulator.

The disk that announced its death for weeks

Storage rarely fails without warning; it fails after weeks of warnings nobody read. The kernel logs every failed I/O against a block device, and those lines accumulate in the ring buffer and the journal long before the filesystem gives up:

$ sudo dmesg -T | grep -iE "i/o error|ata[0-9]|exception"
[Tue Jun  2 04:12:09 2026] ata3.00: exception Emask 0x0 SAct 0x400 SErr 0x0 action 0x0
[Tue Jun  2 04:12:09 2026] ata3.00: failed command: READ FPDMA QUEUED
[Tue Jun  2 04:12:09 2026] blk_update_request: I/O error, dev sdb, sector 488282112
[Sat Jun  6 11:38:51 2026] blk_update_request: I/O error, dev sdb, sector 488282113
[Sun Jun  8 02:55:17 2026] EXT4-fs error (device sdb1): ext4_find_entry:1463: inode #2883585: comm java: reading directory lblock 0

The progression reads like a diagnosis. An ATA exception and a failed read command is the drive struggling with a sector; blk_update_request: I/O error is the block layer giving up on it after retries; the same or neighbouring sector numbers recurring across days means a growing defect, not a one-off; and the EXT4-fs error at the end is the filesystem finally tripping over the bad region — often the first moment an application notices anything. The useful habit is checking for the first three signatures while they are still cheap: journalctl -k -p err --since "-7 days" as a weekly glance, or better, shipping kernel-priority errors into the alerting pipeline so a human never has to remember. How system logs feed that pipeline is the subject of logs, metrics & traces.

What's underneath

The two tools make more sense once you see the plumbing they sit on. There are two stores and several inputs, and almost everything on a systemd machine flows through one daemon.

The log flow map. journald ingests the kernel buffer, every unit's stdout and stderr, and the legacy syslog socket; journalctl queries the result. dmesg bypasses all of it and reads the ring buffer directly.

Start at the bottom of the stack. The kernel ring buffer is a fixed-size circular buffer inside kernel memory — typically a few megabytes, set by log_buf_len — where every printk() from every driver and subsystem lands. Circular means it wraps: on a chatty system, messages from early boot get overwritten by lunchtime, which is why dmesg on a long-running machine sometimes cannot show you the boot sequence at all. It survives nothing — a reboot clears it — and it has no notion of units, users, or priorities beyond the kernel log levels. dmesg is a thin window onto exactly this buffer and nothing else.

One layer up, journald is the collector. It reads the kernel buffer through /dev/kmsg, so kernel messages end up in the journal too. It owns the stdout and stderr of every systemd unit — when systemd starts a service, it wires the process's file descriptors 1 and 2 to a socket journald listens on, which is why services on modern systems just print to stdout and let the system handle the rest. And it listens on /dev/log, the socket behind the ancient syslog() API, so software written decades before systemd flows in as well. For every entry, from any source, journald records not just the message but a set of trusted fields the sender cannot fake: _PID, _UID, _SYSTEMD_UNIT, _BOOT_ID, PRIORITY, the timestamp of receipt. The store is a binary, indexed format — which is what makes journalctl -u nginx --since "1 hour ago" an indexed lookup rather than a scan through flat text files, and what makes -b -1 possible at all, since every entry carries the boot it belongs to.

Where the journal lives decides whether it survives a reboot, and this is configuration, not fate. With Storage=volatile the journal sits in /run/log/journal, a tmpfs, and vanishes with the boot. With Storage=persistent it lives in /var/log/journal and accumulates across boots. The common default, Storage=auto, persists only if the /var/log/journal directory already exists — a rule with consequences covered in the pitfalls. On fleets, journald is usually the first hop rather than the destination: it forwards to rsyslog or a shipper, and the entries become part of the centralised pipeline described in logs, metrics & traces. The journal is the ground truth on the box; the pipeline is how anyone finds it without SSHing to the box.

Pitfalls

Assuming the journal survived the reboot. The cruellest discovery in a post-crash investigation: you run journalctl -b -1 and get Specifying boot ID or boot offset has no effect, no persistent journal was found. On distros where /var/log/journal does not exist out of the box, Storage=auto quietly means volatile, and every reboot shreds the evidence. Check now, on a calm day: journalctl --list-boots showing only the current boot is the tell. The fix is one line in /etc/systemd/journald.conf (Storage=persistent) or simply creating the directory, then restarting journald. Do it before the incident, because afterwards is too late by definition.

Forgetting the journal eats itself. Persistence is not forever. journald caps its disk usage — by default a percentage of the filesystem, tunable with SystemMaxUse= — and vacuums the oldest entries when it hits the cap. On a chatty box "persistent" can mean ten days, and the boot you wanted from last month is gone. Check what you actually have with journalctl --disk-usage and the timestamps in --list-boots; trim deliberately with --vacuum-time=30d or --vacuum-size=2G rather than letting the default decide which evidence to keep.

Trusting dmesg timestamps too much. Raw dmesg stamps lines in seconds since boot. dmesg -T converts to wall-clock by adding the boot time — but the kernel clock those stamps come from does not tick while the machine is suspended, so on laptops and suspend-happy VMs the converted times drift by exactly the total suspended time, sometimes hours. On a server that never sleeps, -T is fine. When the exact time matters, prefer journalctl -k: journald stamps each kernel message with the real time it received it.

Grepping the journal instead of querying it. Piping journalctl | grep nginx works, but it forces a render of the entire journal just to throw most of it away, and it matches the text nginx anywhere — including some other service complaining about nginx. The journal is a database with indexed fields; ask it like one. -u nginx matches the unit field, _PID=1234 follows one process, -p err filters by priority, --grep searches message text while keeping the other filters cheap, and -o json emits the full field set when a script is the consumer. The field-based query is faster, and more importantly it is precise: -u cannot be fooled by a coincidental substring.

A drill you can run right now

Everything below reads state and writes nothing. Ten minutes on any Linux machine — a server, a VM, a Raspberry Pi — and the three records this page is about stop being abstract: you will have read a unit's journal, swept a boot for complaints, and looked at the kernel's raw record with both timestamp formats.

Step 1 — one unit, one window. Run journalctl -u ssh --since today (the unit is sshd on Fedora-family systems; systemctl list-units --type=service shows what your box calls things). Read the line shape against the anatomy above: timestamp, host, identifier with PID, message. Find a line written by the service itself and, if the service has restarted recently, the systemd[1] lines around it — the two voices in one timeline. If the output is empty, that is a finding too: either nothing connected today, or you are seeing the permission pitfall and need sudo.

Step 2 — the triage sweep. Run journalctl -b -p warning — every entry of the current boot that the system filed at warning or worse. On a healthy machine this is short and oddly interesting: a service that took two tries to start, a firmware grumble, a misconfigured timer. Pick one entry and pull its context with journalctl -u that-unit -e. Then run journalctl --list-boots and journalctl --disk-usage and note what you actually have: how many boots of history, how much disk. If you see exactly one boot, you have found the persistence pitfall on your own machine, on a calm day, which is the cheapest possible way to find it.

Step 3 — the kernel's record, both clocks. Finish with the ring buffer:

$ sudo dmesg | tail -5
[1834502.114866] usb 1-3: new high-speed USB device number 9 using xhci_hcd
[1834502.263531] usb 1-3: New USB device found, idVendor=0951, idProduct=1666
$ sudo dmesg -T | tail -5
[Sun Jun  8 14:02:11 2026] usb 1-3: new high-speed USB device number 9 using xhci_hcd
[Sun Jun  8 14:02:11 2026] usb 1-3: New USB device found, idVendor=0951, idProduct=1666
$ journalctl -k -e -n 5

The same events three ways: raw seconds-since-boot, -T's wall-clock conversion, and the journal's own receipt timestamps. Compare the last two — on a machine that never suspends they agree; on a laptop they may not, and now you know why. If plain dmesg refuses without root, you have met kernel.dmesg_restrict, a hardening default on many distros, and sudo is the answer. Read the five lines you got, whatever they are: a USB device, a network interface flapping, a filesystem mount. Each one is the kernel narrating its own life, in the same voice it will use on the day it writes an OOM block or an I/O error with your pager on the other end.

If you remember one line. journalctl -u SERVICE --since "1 hour ago" for what a service and its manager saw, journalctl -b -1 -e for what the machine said before it went down, and dmesg -T | tail for what the kernel is saying right now.

journalctl & dmesg

The question they answer

The five invocations that matter

Reading the output

The OOM block, decoded

Three production scenarios

"The service randomly restarts"

The box rebooted at 3am — why?

The disk that announced its death for weeks

What's underneath

Pitfalls

A drill you can run right now

Further reading

11 — find & xargs