08 / 08
Errors / 08

exit code 137

137 is 128 + 9: the process was destroyed by SIGKILL. The OOM killer is the most famous sender, but stop-timeout escalations, CI runners, and orchestrators send the very same signal — check the OOMKilled flag before buying memory.


The symptom

A container or process ends with status 137 — in docker ps, a Kubernetes pod’s last state, or a CI job’s final line. The logs typically just stop mid-sentence: SIGKILL gives the process no chance to write a farewell.

$ docker ps -a --format "table {{.Names}}\t{{.Status}}"
NAMES        STATUS
worker-1     Exited (137) 4 minutes ago

# Kubernetes:    Last State: Terminated, Exit Code: 137
# CI:            ERROR: process exited with code 137
  ← the application logs end abruptly with no shutdown message: the defining
    signature of SIGKILL — nothing gets to run, not even logging

The diagnosis

1 Decode the number

# shells and runtimes report "killed by signal N" as exit code 128 + N
# (see the EXIT STATUS section of bash(1), and waitpid(2) for the raw form)
137 = 128 + 9    → SIGKILL   (uncatchable: the process had no say)
143 = 128 + 15   → SIGTERM   (catchable: it was ASKED to stop)
139 = 128 + 11   → SIGSEGV   (it crashed itself)
  ← so the question is never "what did the app do" — it’s "WHO sent SIGKILL"

SIGKILL cannot be caught, blocked, or handled (signal(7)), so the process tells you nothing — the sender is the whole investigation. There is a short list of suspects: the kernel’s OOM killer, a runtime escalating a stop timeout, an orchestrator or CI system enforcing a deadline, and occasionally a human with kill -9.

2 Ask the runtime whether it was OOM

$ docker inspect --format "{{.State.OOMKilled}}  exit={{.State.ExitCode}}" worker-1
true  exit=137      ← memory: the cgroup OOM killer — switch to the OOMKilled page

$ docker inspect --format "{{.State.OOMKilled}}  exit={{.State.ExitCode}}" worker-2
false exit=137     ← NOT memory: someone sent SIGKILL deliberately → step 4
# Kubernetes equivalent: Reason: OOMKilled present (or absent) in kubectl describe

This single flag forks the whole investigation. True: it’s a memory problem — sizing, leak, or runtime heap flags; the OOMKilled page takes it from here. False: stop chasing memory graphs and find the sender. One caveat before fully trusting "false": step 3.

3 Check the kernel log anyway

$ dmesg -T | grep -iE "killed process|out of memory" | tail -2
[13:21:44] Memory cgroup out of memory: Killed process 22114 (ffmpeg)
           total-vm:4812340kB, anon-rss:1903112kB
  ← the victim was a CHILD process (ffmpeg), not the container’s main process:
    runtime flags can miss this, and the main process may have exited 137
    or with its own error after losing its child

The OOMKilled flag tracks the cgroup’s main process; a kill that lands on a child can slip past it. If dmesg shows an OOM kill inside your container’s cgroup at the right timestamp, it’s a memory problem regardless of what the flag said. No dmesg line and OOMKilled false: memory is genuinely exonerated.

4 Not OOM? Find which stop-deadline expired

# the senders of deliberate SIGKILL, in rough order of likelihood:
docker stop          → SIGTERM, waits 10s (default), then SIGKILL
kubernetes           → SIGTERM, waits terminationGracePeriodSeconds (30s), then SIGKILL
systemd              → SIGTERM, waits TimeoutStopSec (90s), then SIGKILL
CI runners           → job timeout / cancellation → SIGKILL
  ← if 137 appears exactly at deploys, restarts, or job cancellations, your
    process is ignoring SIGTERM and the runtime is escalating on schedule

Correlate the timestamp with deploys and stop commands. The signature of a graceful-stop failure: every routine restart produces a 137, and the gap between the stop command and the exit equals the grace period exactly. That means the SIGTERM either never reached your process or was ignored — in containers, overwhelmingly the PID 1 problem (cause 2).

The causes, ranked

  1. 1 The OOM killer — cgroup or node memory

    confirm OOMKilled true in the runtime, or a matching dmesg kill line in the container’s cgroup.

  2. 2 Stop-timeout escalation: the app never acts on SIGTERM

    confirm 137 lands exactly at deploys/stops, after precisely the grace period; logs show no shutdown messages.

  3. 3 An external enforcer: CI timeout, autoscaler, or a human

    confirm The orchestrator’s or runner’s own logs at that timestamp: job cancelled, node scaled down, manual kill.

The fixes

The OOM killer — cgroup or node memory

It’s a memory investigation now: limit sizing, heap flags, or a leak. The OOMKilled page walks it step by step.

Stop-timeout escalation: the app never acts on SIGTERM

Handle SIGTERM and exit promptly. In containers your process is PID 1, and PID 1 gets no default signal handlers — an unhandled SIGTERM is simply ignored, which is why so many containers only ever die by escalation. Also use the exec form of ENTRYPOINT: the shell form makes /bin/sh PID 1, and it doesn’t forward signals to your app. If the app can’t be changed, run it under a tiny init (tini, or docker run --init) that forwards signals, and size the grace period to real drain time.

An external enforcer: CI timeout, autoscaler, or a human

Per source: raise the CI timeout or split the job; for scale-downs, proper SIGTERM handling (cause 2’s fix) turns them from 137s into clean 143s/0s; for humans, an entry in the runbook.

What people get wrong

  • 137 does not mean OOM by definition. It means SIGKILL, full stop. The OOM killer is one sender among several. Teams that hard-wire "137 = add memory" end up with 8Gi limits on services whose real bug was an ignored SIGTERM. The OOMKilled flag and dmesg settle it in under a minute.
  • The 143-then-137 pair tells the whole stop story. 143 means the process honoured SIGTERM. A 137 where you expected a 143 means it was asked and didn’t comply within the grace period. Watching which of the two your fleet produces at every routine deploy is a free, continuous health check on graceful shutdown.
  • PID 1 in a container plays by different signal rules. The kernel applies default signal actions (like terminate-on-SIGTERM) to ordinary processes, but not to PID 1 unless it installed a handler. The same binary that stops cleanly on your laptop ignores SIGTERM as a container entrypoint. Exec-form ENTRYPOINT plus an explicit handler — or an init shim — closes the gap.

Quick answers

Is exit code 137 always out-of-memory?

No. 137 means the process was killed by SIGKILL (128 + 9), and the OOM killer is only one possible sender. docker inspect’s OOMKilled flag (or Reason: OOMKilled in Kubernetes) plus a dmesg check distinguishes memory kills from stop-timeout escalations and external kills.

Why does my container exit 137 on every deploy?

Your app isn’t acting on SIGTERM, so the runtime escalates to SIGKILL when the grace period expires. In containers the process runs as PID 1, which gets no default signal handlers — SIGTERM is ignored unless explicitly handled. Handle the signal, use exec-form ENTRYPOINT (shell form swallows signals), or add an init like tini.

What’s the difference between exit codes 137 and 143?

143 = 128 + 15: terminated by SIGTERM — the process was asked and complied (or default-terminated). 137 = 128 + 9: destroyed by SIGKILL — either the OOM killer, or an escalation after SIGTERM was ignored. A fleet that 143s on deploys is healthy; one that 137s has a shutdown bug or a memory problem.

Related on Semicolony

Found this useful?