← All posts

June 1, 2026 · 8 min read · kubernetes

CrashLoopBackOff: What It Means, Why It Happens, and How to Fix It Fast

If you run Kubernetes long enough, you will meet CrashLoopBackOff. It is one of the most common pod states in production — and one of the most misunderstood. The good news: it is almost always one of six causes, and once you know which one, the fix takes minutes.

What CrashLoopBackOff actually means

CrashLoopBackOff is not an error in itself. It is Kubernetes telling you: "I started your container, it exited, so I restarted it — and it exited again. I am now waiting longer between restarts before I try once more."

That waiting is the "BackOff" part. Kubernetes uses an exponential delay — 10s, 20s, 40s, up to a cap of 5 minutes — so a crashing container does not hammer your node. The pod is healthy from the scheduler's point of view; the application inside it keeps dying.

So the question is never "what is CrashLoopBackOff" — it is "why does my container keep exiting?"

The 6 most common causes

1. Application error on startup. The process starts, throws, and exits non-zero. The classic version is a missing environment variable or an unreachable dependency — the app tries to connect to a database that is not ready and calls process.exit(1).

2. OOMKilled. The container exceeds its memory limit and the kernel kills it (exit code 137). Kubernetes restarts it, it grows again, and the loop repeats. This one is sneaky because the application logs often show nothing — the process is killed before it can write a clean shutdown.

3. Liveness probe failing. If your liveness probe is too aggressive — too short a timeout, or a startup that takes longer than initialDelaySeconds — Kubernetes will kill a container that was actually fine and never let it finish booting.

4. Wrong command or entrypoint. A typo in command: or args:, or an image whose entrypoint exits immediately (a script that runs once and returns), produces an instant exit every time.

5. Missing ConfigMap or Secret. If the pod mounts a ConfigMap or Secret that does not exist, the container may fail to start or the app may fail when it cannot read expected config.

6. Init container failing. If an init container never succeeds, the main container never starts. The pod shows Init:CrashLoopBackOff — easy to miss if you only glance at the status.

How to diagnose each

The single most useful command is reading the logs from the previous crashed instance:

kubectl logs <pod> --previous

The --previous flag is the trick — by the time you look, the current container may have just restarted and have no output yet. The previous instance holds the stack trace.

Then describe the pod to see events and the exit reason:

kubectl describe pod <pod>

Look at the Last State block. Reason: OOMKilled and Exit Code: 137 point straight at cause #2. Reason: Error with a non-zero exit code points at #1 or #4. A Liveness probe failed event points at #3.

Cluster-wide events catch scheduling and config problems:

kubectl get events --sort-by=.lastTimestamp

A FailedMount or couldn't find key event points at #5. An Init:Error points at #6.

How to fix each

  • Startup error / missing env: set the variable in the deployment, or fix the dependency ordering. For "database not ready," add a readiness gate or retry-with-backoff in the app rather than crashing.
  • OOMKilled: raise the memory limit to match real usage plus ~20% headroom (see our OOMKilled deep-dive). For the JVM or Node, also set the runtime heap flag so it respects the container limit.
  • Liveness probe: increase initialDelaySeconds and failureThreshold, or add a startupProbe so slow boots are not mistaken for crashes.
  • Wrong command: correct command/args, or test the image locally with docker run to confirm the entrypoint stays alive.
  • Missing ConfigMap/Secret: create the object, or fix the name/key reference in the volume or envFrom.
  • Init container: read its logs with kubectl logs <pod> -c <init-container> and fix the failing step.

How to prevent it

Most CrashLoopBackOff incidents are preventable with three habits:

  1. Always set resource requests and limits. This prevents the noisy-neighbour OOM cascade and forces you to know your app's real footprint.
  2. Use a startupProbe for slow-starting apps. Separate "is it still booting" from "is it healthy" so liveness never kills a cold start.
  3. Validate config at deploy time. A missing Secret should fail your CI, not your pod at 3am.

How Tracegrid handles it

When a pod enters CrashLoopBackOff, Tracegrid does the six-way diagnosis for you automatically. It reads the previous container logs and the pod events, matches them against its failure-pattern library, and tells you which of the six causes it is — "missing SECRET_KEY environment variable," not just "CrashLoopBackOff." Then it gives you the exact kubectl command to fix it. The work in this article, done in about 40 seconds, before you have finished reading the alert.

Written by Pradip — founder of Tracegrid, building AI infrastructure intelligence so small teams get senior-SRE answers at 3am.

Related reading

Stop Googling incidents at 3am

Start free monitoring

Tracegrid explains them for you. 1 host free forever.