01 / 08
Errors / 01

CrashLoopBackOff

Your container keeps exiting and the kubelet is waiting longer between restart attempts. The status names the loop, not the crash — the real error is in the previous container’s exit code and logs.


The symptom

A pod shows CrashLoopBackOff with a climbing restart count. Between attempts it sits in this state doing nothing — that pause is the back-off, not a hang. The kubelet doubles the wait after each failure (10s, 20s, 40s…) and caps it at five minutes.

$ kubectl get pods
NAME                    READY   STATUS             RESTARTS         AGE
api-7d4b9c6f5d-x2x4q    0/1     CrashLoopBackOff   17 (2m11s ago)   64m
api-7d4b9c6f5d-9k2lp    1/1     Running            0                64m
  ← 17 restarts in an hour, and one replica of the same Deployment is fine:
    suspect something node- or data-local before blaming the image

The diagnosis

1 Get the exit code of the last attempt

$ kubectl describe pod api-7d4b9c6f5d-x2x4q
Containers:
  api:
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1            ← the number that routes everything
      Started:      Tue, 09 Jun 10:14:03
      Finished:     Tue, 09 Jun 10:14:04   ← died one second after starting
Events:
  Warning  BackOff  2m (x214 over 58m)  kubelet  Back-off restarting failed container

The exit code is the router. 1 (or any small number): the app exited on its own — go to step 2 for its logs. 137: SIGKILL, usually the OOM killer — switch to the OOMKilled page. 143: SIGTERM — something asked it to stop, often a failing liveness probe (step 3). 127 or 126: the command doesn’t exist or isn’t executable — bad image, entrypoint, or args. The Started→Finished gap matters too: one second means it crashed during startup; twenty minutes means it ran fine and then hit something.

2 Read the logs of the crashed container, not the new one

$ kubectl logs api-7d4b9c6f5d-x2x4q --previous
panic: connect to postgres: dial tcp 10.96.14.3:5432: connect: connection refused

goroutine 1 [running]:
main.main()
        /src/cmd/api/main.go:31 +0x1d4
  ← the crash, in one line: it dies because a dependency is unreachable at boot

--previous reads the last terminated container’s logs. Without it you read the fresh attempt, which has usually printed nothing yet — the most common wasted ten minutes with this error. If --previous prints nothing at all, the process died before it could write a single line: almost always a bad entrypoint, a missing binary, or a config file read in the first milliseconds.

3 Check whether a probe is doing the killing

$ kubectl describe pod api-7d4b9c6f5d-x2x4q | grep -B2 -A4 Liveness
    Liveness:  http-get http://:8080/healthz delay=5s timeout=1s period=10s #failure=3
Events:
  Warning  Unhealthy  3m  kubelet  Liveness probe failed: Get "http://10.244.1.7:8080/healthz":
                                   context deadline exceeded
  Normal   Killing    3m  kubelet  Container api failed liveness probe, will be restarted

A liveness kill looks identical to a crash from the outside: restarts climb, status says CrashLoopBackOff. If the events show "Liveness probe failed" followed by "Killing" before each restart, the app may be perfectly healthy but slower to start than the probe allows — the fix is probe timing (initialDelaySeconds, or a startupProbe), not the application. No probe events means it’s a real crash: back to the logs.

4 See if the crashes correlate with a node

$ kubectl get pods -l app=api -o wide
NAME                    READY   STATUS             RESTARTS   NODE
api-7d4b9c6f5d-x2x4q    0/1     CrashLoopBackOff   17         node-3
api-7d4b9c6f5d-9k2lp    1/1     Running            0          node-1
  ← all the crashing replicas on one node is a node story, not an app story

If only the replicas on one node loop, the difference is node-local: a stale cached image (same tag, different bits), a missing hostPath, a sysctl, disk pressure. Cordon the node, delete the pod so it reschedules elsewhere, and debug the node separately.

The causes, ranked

  1. 1 The app exits during startup: missing env var, bad config, unreachable dependency

    confirm kubectl logs --previous shows a panic, stack trace, or fatal log line within the first seconds; describe shows Exit Code 1 and a sub-second Started→Finished gap.

  2. 2 A liveness probe is killing a working or slow-starting container

    confirm Events show "Liveness probe failed" then "Killing" before each restart; exit code 143 (SIGTERM).

  3. 3 OOMKilled — the memory limit is below the real working set

    confirm Last State shows Reason: OOMKilled, Exit Code: 137.

  4. 4 Bad command, args, or entrypoint

    confirm Exit code 127 ("executable file not found") or 126 (found but not executable); --previous logs are empty.

The fixes

The app exits during startup: missing env var, bad config, unreachable dependency

Fix the config or secret it’s missing (kubectl describe also surfaces failed volume and secret mounts). For dependency-at-boot crashes, make startup retry with backoff instead of exiting: pods restart in arbitrary order on every node drain, and an app that dies when the database is briefly away turns every minor blip into a crash loop.

A liveness probe is killing a working or slow-starting container

Give startup its own budget with a startupProbe (the liveness probe doesn’t run until it passes), or raise initialDelaySeconds. Keep the liveness endpoint cheap and dependency-free: a /healthz that checks the database converts every database hiccup into a restart storm across the fleet.

OOMKilled — the memory limit is below the real working set

Measure, then raise the limit or fix the runtime’s heap sizing. This is its own investigation — see the OOMKilled page in the related links.

Bad command, args, or entrypoint

Compare the pod’s command/args against the image’s ENTRYPOINT and CMD (kubectl get pod -o yaml, then docker/crane inspect on the image). A command: in the pod spec replaces the image entrypoint entirely — a common surprise when only args was intended.

What people get wrong

  • CrashLoopBackOff is a state, not an error. Nothing is "stuck". The kubelet is deliberately waiting — up to five minutes between attempts — so a pod that looks idle in this state is just between retries. The thing to debug is whatever the exit code and previous logs say, never the back-off itself.
  • kubectl logs without --previous reads the wrong container. The default shows the current (often just-restarted, still-empty) container. The crash you care about is in the previous one. This single flag is the difference between "no logs, weird" and the actual stack trace.
  • Jobs don’t crash-loop the same way. A Deployment’s pods restart in place under restartPolicy: Always. Jobs typically run with OnFailure or Never and create new pods up to backoffLimit — so advice and symptoms don’t transfer one-to-one between the two.

Quick answers

What does CrashLoopBackOff actually mean?

The container exits shortly after starting, every time, and the kubelet is applying an exponential back-off (capped at five minutes) between restart attempts. It is a status describing the restart loop — the underlying error is in the container’s exit code and its previous logs.

How do I see the logs of the crashed container?

kubectl logs <pod> --previous. Without --previous you get the freshly restarted container, which usually has not logged anything yet. If --previous is empty too, the process died before writing a line — check the exit code in kubectl describe pod for 127/126 (bad command) or 137 (SIGKILL).

Is CrashLoopBackOff always the application’s fault?

No. Two common non-app causes: a liveness probe that kills a healthy-but-slow container (events show "Liveness probe failed" before each restart), and an OOM kill from a memory limit set too low (Reason: OOMKilled, exit 137). Check both in kubectl describe before touching code.

Related on Semicolony

Found this useful?