← All posts

July 8, 2026 · 10 min read · kubernetes

Pod Evicted: Why Kubernetes Kills Pods That Aren't Even Using Much Memory

Your pod isn't Running. It isn't CrashLoopBackOff. kubectl get pods shows a status you don't see every day: Evicted. And the part that makes no sense is that the app inside it was barely using any memory. So why did Kubernetes kill it?

Because eviction has almost nothing to do with what your pod was doing. It's the kubelet protecting the node. When a node runs low on disk, memory, or process IDs, the kubelet doesn't wait for things to fall over. It starts terminating pods to reclaim resources, and the pod it kills is frequently not the one causing the problem. Understanding that one fact is most of the fix.

Evicted is not OOMKilled

These get conflated constantly, and they are two different mechanisms with two different fixes.

  • OOMKilled is the Linux kernel killing a single container because that container exceeded its memory cgroup limit. It's a hard kill, exit code 137, the pod restarts in place, and the cause lives inside the pod.
  • Evicted is the kubelet gracefully terminating one or more pods because the node crossed a resource threshold. The pod object sticks around with Status: Evicted, it does not restart in place (the controller reschedules a fresh pod elsewhere), and the cause is the node's total resource picture, not any one container's limit.

The tell is in the status itself. OOMKilled shows up under a running pod's container state with a restart count climbing. Evicted shows up as a separate dead pod object:

kubectl get pods
NAME            READY   STATUS     RESTARTS   AGE
api-7d9f-abc    0/1     Evicted    0          6m

Zero restarts, status Evicted. The pod was shown the door, not crashed.

Read the eviction message first

describe tells you exactly which resource ran out:

kubectl describe pod api-7d9f-abc

Look at the top:

Status:   Failed
Reason:   Evicted
Message:  The node was low on resource: ephemeral-storage. Container api was using 4Gi, which exceeds its request of 0.

or

Message:  The node was low on resource: memory. Threshold quantity: 100Mi, available: 87Mi.

That low on resource: line is the whole diagnosis. It names one of three things: ephemeral-storage (disk), memory, or pids. Everything after follows from which one it is.

Then look at the node:

kubectl describe node <node-name>

Scroll to Conditions. Under pressure you'll see one of these flip to True:

Type             Status
MemoryPressure   False
DiskPressure     True
PIDPressure      False

DiskPressure: True plus an ephemeral-storage eviction message is the most common combination in the wild. Here are the real causes.

The five real causes

1. The node disk filled up (DiskPressure)

This is the single most common eviction in production, and it usually has nothing to do with your application's data. The node's disk fills with container logs, old images, and emptyDir volumes, the kubelet's nodefs.available drops below its threshold (default 10%), DiskPressure flips to True, and pods start getting evicted to reclaim space.

The cruel part: the kubelet evicts pods that exceed their ephemeral-storage request first, and if no pod set one (most don't), it evicts by total usage. So the pod that logs the most, or writes the most to an emptyDir, gets killed, even if that's a perfectly well behaved app.

Check the node's actual disk:

kubectl get --raw "/api/v1/nodes/<node>/proxy/stats/summary" | grep -A5 '"fs"'

or, if you can reach the node, df -h /var/lib/kubelet and crictl images to see how much old image data is sitting there.

Fix: turn on image garbage collection (it's on by default but tune --image-gc-high-threshold), rotate container logs, and give noisy emptyDir volumes a sizeLimit so one pod can't eat the whole node.

2. No ephemeral-storage requests or limits

A pod that writes gigabytes to its container filesystem or an emptyDir (think a job that downloads a dataset, or an app logging verbosely to disk) with no ephemeral-storage limit will happily fill the node, trigger DiskPressure, and get itself evicted, often taking innocent neighbors down with it.

Fix: declare what your pod actually needs, and cap it:

resources:
  requests:
    ephemeral-storage: "1Gi"
  limits:
    ephemeral-storage: "2Gi"

With a limit set, a pod that blows past 2Gi is evicted on its own, before it can pressure the whole node. The blast radius shrinks to the one misbehaving pod.

3. Memory pressure and the QoS pecking order

When memory.available on the node drops below the threshold (default 100Mi), the kubelet evicts to reclaim memory, and it does so in a strict order based on QoS class:

  1. BestEffort pods first (no requests or limits set at all).
  2. Burstable pods that are using more than their memory request.
  3. Guaranteed pods last (requests equal limits).

This is why a tiny, idle pod gets evicted while the memory hog survives: the idle pod was BestEffort and the hog was Guaranteed. Eviction order is about QoS class, not about who caused the pressure.

Check a pod's class:

kubectl get pod <pod> -o jsonpath='{.status.qosClass}'

If it says BestEffort, that pod is first against the wall on any node under memory pressure.

4. No resource requests means BestEffort means first to die

Following from #3: the most common reason an unremarkable pod keeps getting evicted is that nobody set requests on it, so it's BestEffort by default. The fix is the cheapest one in Kubernetes: set a memory request. Even a modest one promotes the pod to Burstable and moves it out of the first eviction wave.

resources:
  requests:
    memory: "128Mi"
    cpu: "100m"

Setting realistic requests does two jobs at once: the scheduler stops overpacking the node, and your pods stop being the kubelet's first target.

5. The eviction cascade

This is the one that turns a small problem into an outage. A node hits DiskPressure and evicts a handful of pods. Those pods get rescheduled, by their controllers, onto another node. That node was already near its limit, so the new arrivals push it over, and it starts evicting. Repeat across the cluster.

Meanwhile, every evicted pod leaves behind a dead Evicted object that never gets cleaned up automatically, so kubectl get pods slowly fills with hundreds of ghosts:

kubectl get pods -A | grep Evicted | wc -l

Clean them up so the noise doesn't hide the live failures (note this removes every Failed pod, evicted or otherwise, so scan the list first):

kubectl get pods -A --field-selector 'status.phase==Failed' -o name | xargs kubectl delete

But cleanup is cosmetic. The cascade only stops when you fix the underlying node pressure, by adding capacity, setting requests and limits so the scheduler stops overpacking, or freeing the disk.

A fast triage order

  1. kubectl describe pod <pod> → read the Message: The node was low on resource: line. That names the resource: ephemeral-storage, memory, or pids.
  2. kubectl describe node <node>Conditions. Which one is True: DiskPressure, MemoryPressure, PIDPressure?
  3. DiskPressure? Check node disk usage and image/log buildup. Almost always logs, images, or an uncapped emptyDir (causes #1, #2).
  4. MemoryPressure? Check the evicted pod's qosClass. BestEffort with no requests is your answer (causes #3, #4).
  5. Many pods Evicted across nodes? You're in a cascade (#5). Fix node pressure and set requests, then clean up the ghost objects.

The fastest single check: if the status is Evicted with RESTARTS: 0, stop looking inside the pod. The cause is on the node, and the describe message already told you which resource.

Why this one is so disorienting

Every other pod failure points you at the pod. CrashLoopBackOff, OOMKilled, ImagePullBackOff: the problem is in that pod, and you debug that pod. Eviction breaks the pattern. The pod that died is often blameless, the real culprit is a different pod filling the disk, or a scheduler that overpacked the node, or a logging config from six months ago. So you inspect the evicted app, find nothing wrong, and lose half an hour before thinking to run describe node and notice DiskPressure: True. The answer was never in the pod.

This is exactly the kind of incident Tracegrid is built to short-circuit. When pods start getting evicted, Tracegrid reads the node conditions and the eviction messages, identifies which resource ran out and which pod actually drove the pressure (not just which one got killed), and posts the root cause and the fix to Slack, set an ephemeral-storage limit here, clean up image buildup there, give this BestEffort pod a request. It watches the node, so you don't have to remember to.

It installs in 60 seconds, covers Kubernetes, Linux, Docker, ECS, and Azure, and there's a free tier with the AI explanations included. If you've ever debugged a perfectly healthy pod that Kubernetes evicted for someone else's mess, that's the half hour it's built to give back.

curl -sSL https://tracegrid.app/install.sh | bash, or tracegrid.app.

Written by Pradip, founder of Tracegrid, building AI infrastructure intelligence so small teams get senior-SRE answers at 3am.

Related reading

Stop Googling incidents at 3am

Start free monitoring

Tracegrid explains them for you. 1 host free forever.