Submit feedback on
Orphaned and Overprovisioned Resources in EKS Clusters
We've received your feedback.
Thanks for reaching out!
Oops! Something went wrong while submitting the form.
Close
Orphaned and Overprovisioned Resources in EKS Clusters
Yisrael Gross
Service Category
Compute
Cloud Provider
AWS
Service Name
AWS EKS
Inefficiency Type
Inefficient Configuration
Explanation

In EKS environments, cluster sprawl can occur when workloads are removed but underlying resources remain. Common issues include persistent volumes no longer mounted by pods, services still backed by ELBs despite being unused, and overprovisioned nodes for workloads that no longer exist. Node overprovisioning can result from high CPU/memory requests or limits, DaemonSets running on every node, restrictive Pod Disruption Budgets, anti-affinity rules, uneven AZ distribution, or slow scale-down timers. Preventative measures include improving bin packing efficiency, enabling Karpenter consolidation, and right-sizing node instance types and counts. Dev/test namespaces and short-lived environments often accumulate without clear teardown processes, leading to ongoing idle costs.

These remnants contribute to excess infrastructure cost and control plane noise. Since AWS bills independently for each resource (e.g., EBS, ELB, EC2), inefficiencies can add up quickly. Without structured governance or cleanup tooling, clusters gradually fill with orphaned objects and unused capacity.

Relevant Billing Model

EKS clusters incur compute charges through EC2 nodes or Fargate profiles, storage charges from attached EBS volumes, and networking charges from load balancers created via Kubernetes Services. Additional network-level costs can include NAT Gateway hourly and per-GB data processing fees, cross-Availability Zone (AZ) data transfer, and other inter-VPC or internet-bound traffic. Orphaned resources such as unused PVCs (backed by EBS), idle Services (backed by Elastic Load Balancers), and underutilized nodes can generate persistent charges even when no active workloads are running.

Detection
  • Identify namespaces that no longer contain active Deployments or Pods
  • Review Services still provisioned with ELBs but lacking backend endpoints
  • Check for PVCs that are not mounted by any current workloads
  • Analyze node utilization to detect consistently underused EC2 nodes, referencing median CPU and memory utilization per node group (targeting 60–70%+ in production)
  • Audit test or sandbox namespaces that have not been updated recently
  • Verify whether terminated Helm releases or jobs left residual resources
Remediation
  • Remove unused PVCs to deprovision backing EBS volumes
  • Delete idle Services to release associated ELBs and IP addresses
  • Clean up inactive namespaces and workloads
  • Resize or scale down overprovisioned EC2 nodes, incorporating proactive prevention strategies such as Karpenter consolidation, bin packing optimization, and right-sizing of node instance types and counts
  • Implement environment lifecycle policies for dev/test clusters
Submit Feedback