July 5, 2026 · 11 min read · kubernetes

Kubernetes Security in Production: The 10 Checks Every Cluster Should Pass

Kubernetes is secure by capability and insecure by default. Out of the box it will happily run a privileged container as root with no network policy and secrets in plain environment variables — because it prioritises "it works" over "it is locked down." These are the ten checks that close the largest gaps, drawn from the same rules the Tracegrid Advisor runs.

1. Containers running as root

By default a container runs as UID 0. If an attacker escapes the container, they are root on a process that may share a kernel with everything else on the node.

Fix: set runAsNonRoot: true and a specific runAsUser in the pod's securityContext, and build images with a non-root user.

2. No resource limits

A container without memory and CPU limits can starve every other pod on the node — the "noisy neighbour" that turns one bad deploy into a node-wide outage. It is also a denial-of-service primitive.

Fix: set resources.requests and resources.limits on every container. (This also prevents the OOM cascades covered in our OOMKilled guide.)

3. Missing network policies

By default, every pod can talk to every other pod in the cluster. One compromised pod can reach your database, your internal APIs, everything — lateral movement with no friction.

Fix: adopt a default-deny NetworkPolicy per namespace, then explicitly allow the flows you need. Start with the namespaces that touch sensitive data.

4. Privileged containers

A container with privileged: true has essentially the same access as a root process on the host. It is occasionally necessary (some CNI and storage plugins); it is far more often left on by accident.

Fix: remove privileged: true unless you can name exactly why it is required. Drop all capabilities and add back only the specific ones needed.

5. Overly permissive (or missing) RBAC

A service account bound to cluster-admin, or the default service account mounted into every pod, means a single compromised workload can control the cluster.

Fix: apply least privilege with Role/RoleBinding scoped per namespace, set automountServiceAccountToken: false where the token is not needed, and audit any binding to cluster-admin.

6. Secrets in environment variables

Secrets passed as env vars leak easily — into logs, into crash dumps, into kubectl describe, and into any child process's environment.

Fix: mount secrets as files via volumes rather than env vars, enable encryption at rest for etcd, and consider an external secrets manager for anything truly sensitive.

7. Public container registry / no image provenance

Pulling images from a public registry with mutable tags (:latest) means you cannot prove what is actually running, and a poisoned upstream image lands straight in production.

Fix: use a private registry, pin images by digest, and scan images for known CVEs in CI.

8. No pod security standards

Without an admission policy, nothing stops a teammate from deploying a pod that violates all of the above.

Fix: enable Pod Security Admission at the baseline or restricted level per namespace so the cluster enforces the rules instead of relying on review.

9. Ingress without TLS

An ingress serving plain HTTP exposes credentials and session tokens to anyone on the path.

Fix: terminate TLS at the ingress, redirect HTTP to HTTPS, and automate certificate renewal (cert-manager) so it never lapses — a lapsed cert is a self-inflicted outage and a security gap.

10. Outdated Kubernetes version

Old clusters miss security patches and run end-of-life components with known exploits.

Fix: stay within the supported version skew, subscribe to the Kubernetes security announcements, and schedule upgrades as routine maintenance rather than emergency response.

Turning checks into a habit

A one-time audit decays the moment the next deploy ships. Security posture has to be continuously checked, because every change can reintroduce a gap. That is precisely why the Tracegrid Advisor runs all ten of these checks (and 400+ more) automatically, scores your cluster, and flags regressions as they appear — so "secure by default" becomes true for your cluster, not just possible.

Written by Pradip — founder of Tracegrid, building AI infrastructure intelligence so small teams get senior-SRE answers at 3am.