Errors, decoded.
The strings you paste into a search engine at the worst possible moment. Each page takes one of them apart the way a senior engineer would at the keyboard: the exact symptom with real output, the commands that narrow it down and what each branch of the output means, the causes ranked by how often they turn out to be the answer, and the fix for each one. No "have you tried restarting" — the actual investigation, written down.
How these pages work
An error string is a symptom wearing a name badge. CrashLoopBackOff doesn't tell you what crashed, exit 137 doesn't tell you who sent the kill, and "connection reset by peer" doesn't tell you which peer — or whether it was a peer at all. So every page here has the same spine: the symptom as you actually see it, a short diagnosis where each command's output routes you to the next step, the causes ranked by how often they're the answer, and a concrete fix per cause. Plus the two or three things people reliably get wrong, because those cost more hours than the errors themselves.
If you're working through the apprenticeship, this shelf is where the build task sends you when a project breaks — which it will, on schedule. ECONNRESET and EADDRINUSE are practically part of that task's syllabus.
The errors
- 01
CrashLoopBackOff
Your container keeps exiting and the kubelet is waiting longer between restart attempts. The status names the loop, not the crash — the real error is in the previous container’s exit code and logs.
Open → - 02
OOMKilled
The kernel’s OOM killer shot your container for exceeding its cgroup memory limit. The node can have gigabytes free and this still happens — the limit is the whole world as far as the cgroup is concerned.
Open → - 03
ImagePullBackOff
The kubelet can’t pull the image and is backing off between attempts. The actual reason — bad tag, auth, rate limit, or network — is spelled out verbatim in the pod’s events, one kubectl describe away.
Open → - 04
connection reset by peer
Something on the path sent a TCP RST instead of a polite close — usually an idle-timeout mismatch in a connection pool, a crashed peer, or a middlebox that forgot your connection. The investigation is finding who sent it.
Open → - 05
context deadline exceeded
A context’s timer ran out before the operation finished. The message tells you a budget was blown, not where — the work is finding which hop spent the time, and whether the server ever even saw the request.
Open → - 06
too many open files
The process hit its file-descriptor limit. Since every socket, pipe, and epoll instance is an fd, this is usually a connection leak or a default 1024 limit on a busy server — rarely about actual files.
Open → - 07
address already in use
bind() failed because something owns the port: either a live process you can name with one ss command, or the ghost of your own previous instance lingering in TIME_WAIT because the listener didn’t set SO_REUSEADDR.
Open → - 08
exit code 137
137 is 128 + 9: the process was destroyed by SIGKILL. The OOM killer is the most famous sender, but stop-timeout escalations, CI runners, and orchestrators send the very same signal — check the OOMKilled flag before buying memory.
Open →
Where to go deeper
These pages stop where the specialist material starts. The Linux codex carries the full investigations behind most of them — a held port, a growing process, a suspect network — and the Kubernetes codex covers the machinery (kubelet, probes, the pod lifecycle) that produces half of these statuses in the first place.