CrashLoopBackOff
Your container keeps exiting and the kubelet is waiting longer between restart attempts. The status names the loop, not the crash — the real error is in the previous container’s exit code and logs.
The symptom
A pod shows CrashLoopBackOff with a climbing restart count. Between attempts it sits in this state doing nothing — that pause is the back-off, not a hang. The kubelet doubles the wait after each failure (10s, 20s, 40s…) and caps it at five minutes.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
api-7d4b9c6f5d-x2x4q 0/1 CrashLoopBackOff 17 (2m11s ago) 64m
api-7d4b9c6f5d-9k2lp 1/1 Running 0 64m
← 17 restarts in an hour, and one replica of the same Deployment is fine:
suspect something node- or data-local before blaming the imageThe diagnosis
1 Get the exit code of the last attempt
$ kubectl describe pod api-7d4b9c6f5d-x2x4q
Containers:
api:
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1 ← the number that routes everything
Started: Tue, 09 Jun 10:14:03
Finished: Tue, 09 Jun 10:14:04 ← died one second after starting
Events:
Warning BackOff 2m (x214 over 58m) kubelet Back-off restarting failed container The exit code is the router. 1 (or any small number): the app exited on its own — go to step 2 for its logs. 137: SIGKILL, usually the OOM killer — switch to the OOMKilled page. 143: SIGTERM — something asked it to stop, often a failing liveness probe (step 3). 127 or 126: the command doesn’t exist or isn’t executable — bad image, entrypoint, or args. The Started→Finished gap matters too: one second means it crashed during startup; twenty minutes means it ran fine and then hit something.
2 Read the logs of the crashed container, not the new one
$ kubectl logs api-7d4b9c6f5d-x2x4q --previous
panic: connect to postgres: dial tcp 10.96.14.3:5432: connect: connection refused
goroutine 1 [running]:
main.main()
/src/cmd/api/main.go:31 +0x1d4
← the crash, in one line: it dies because a dependency is unreachable at boot --previous reads the last terminated container’s logs. Without it you read the fresh attempt, which has usually printed nothing yet — the most common wasted ten minutes with this error. If --previous prints nothing at all, the process died before it could write a single line: almost always a bad entrypoint, a missing binary, or a config file read in the first milliseconds.
3 Check whether a probe is doing the killing
$ kubectl describe pod api-7d4b9c6f5d-x2x4q | grep -B2 -A4 Liveness
Liveness: http-get http://:8080/healthz delay=5s timeout=1s period=10s #failure=3
Events:
Warning Unhealthy 3m kubelet Liveness probe failed: Get "http://10.244.1.7:8080/healthz":
context deadline exceeded
Normal Killing 3m kubelet Container api failed liveness probe, will be restarted A liveness kill looks identical to a crash from the outside: restarts climb, status says CrashLoopBackOff. If the events show "Liveness probe failed" followed by "Killing" before each restart, the app may be perfectly healthy but slower to start than the probe allows — the fix is probe timing (initialDelaySeconds, or a startupProbe), not the application. No probe events means it’s a real crash: back to the logs.
4 See if the crashes correlate with a node
$ kubectl get pods -l app=api -o wide NAME READY STATUS RESTARTS NODE api-7d4b9c6f5d-x2x4q 0/1 CrashLoopBackOff 17 node-3 api-7d4b9c6f5d-9k2lp 1/1 Running 0 node-1 ← all the crashing replicas on one node is a node story, not an app story
If only the replicas on one node loop, the difference is node-local: a stale cached image (same tag, different bits), a missing hostPath, a sysctl, disk pressure. Cordon the node, delete the pod so it reschedules elsewhere, and debug the node separately.
The causes, ranked
- 1 The app exits during startup: missing env var, bad config, unreachable dependency
confirm kubectl logs --previous shows a panic, stack trace, or fatal log line within the first seconds; describe shows Exit Code 1 and a sub-second Started→Finished gap.
- 2 A liveness probe is killing a working or slow-starting container
confirm Events show "Liveness probe failed" then "Killing" before each restart; exit code 143 (SIGTERM).
- 3 OOMKilled — the memory limit is below the real working set
confirm Last State shows Reason: OOMKilled, Exit Code: 137.
- 4 Bad command, args, or entrypoint
confirm Exit code 127 ("executable file not found") or 126 (found but not executable); --previous logs are empty.
The fixes
Fix the config or secret it’s missing (kubectl describe also surfaces failed volume and secret mounts). For dependency-at-boot crashes, make startup retry with backoff instead of exiting: pods restart in arbitrary order on every node drain, and an app that dies when the database is briefly away turns every minor blip into a crash loop.
Give startup its own budget with a startupProbe (the liveness probe doesn’t run until it passes), or raise initialDelaySeconds. Keep the liveness endpoint cheap and dependency-free: a /healthz that checks the database converts every database hiccup into a restart storm across the fleet.
Measure, then raise the limit or fix the runtime’s heap sizing. This is its own investigation — see the OOMKilled page in the related links.
Compare the pod’s command/args against the image’s ENTRYPOINT and CMD (kubectl get pod -o yaml, then docker/crane inspect on the image). A command: in the pod spec replaces the image entrypoint entirely — a common surprise when only args was intended.
What people get wrong
- CrashLoopBackOff is a state, not an error. Nothing is "stuck". The kubelet is deliberately waiting — up to five minutes between attempts — so a pod that looks idle in this state is just between retries. The thing to debug is whatever the exit code and previous logs say, never the back-off itself.
- kubectl logs without --previous reads the wrong container. The default shows the current (often just-restarted, still-empty) container. The crash you care about is in the previous one. This single flag is the difference between "no logs, weird" and the actual stack trace.
- Jobs don’t crash-loop the same way. A Deployment’s pods restart in place under restartPolicy: Always. Jobs typically run with OnFailure or Never and create new pods up to backoffLimit — so advice and symptoms don’t transfer one-to-one between the two.
Quick answers
What does CrashLoopBackOff actually mean?
The container exits shortly after starting, every time, and the kubelet is applying an exponential back-off (capped at five minutes) between restart attempts. It is a status describing the restart loop — the underlying error is in the container’s exit code and its previous logs.
How do I see the logs of the crashed container?
kubectl logs <pod> --previous. Without --previous you get the freshly restarted container, which usually has not logged anything yet. If --previous is empty too, the process died before writing a line — check the exit code in kubectl describe pod for 127/126 (bad command) or 137 (SIGKILL).
Is CrashLoopBackOff always the application’s fault?
No. Two common non-app causes: a liveness probe that kills a healthy-but-slow container (events show "Liveness probe failed" before each restart), and an OOM kill from a memory limit set too low (Reason: OOMKilled, exit 137). Check both in kubectl describe before touching code.