Compute

Excessive Lambda Duration from Synchronous Waiting

Compute

Cloud Provider

AWS

Service Name

AWS Lambda

Inefficiency Type

Inefficient Configuration

Some Lambda functions perform synchronous calls to other services, APIs, or internal microservices and wait for the response before proceeding. During this time, the Lambda is idle from a compute perspective but still fully billed. This anti-pattern can lead to unnecessarily long durations and elevated costs, especially when repeated across high-volume workflows or under memory-intensive configurations.

While this behavior might be functionally correct, it is rarely optimal. Asynchronous invocation patterns—such as decoupling downstream calls with queues, events, or callbacks—can reduce runtime and avoid charging for waiting time. However, detecting this inefficiency is nontrivial, as high duration alone doesn’t always indicate synchronous waiting. Understanding function logic and workload patterns is key.

Learn more

Oversized Hosting Plan for Azure Functions

Compute

Cloud Provider

Azure

Service Name

Inefficiency Type

Teams often choose the Premium or App Service Plan for Azure Functions to avoid cold start delays or enable VNET connectivity, especially early in a project when performance concerns dominate. However, these decisions are rarely revisited—even as usage patterns change.

In practice, many workloads running on Premium or App Service Plans have low invocation frequency, minimal execution time, and no strict latency requirements. This leads to consistent spend on compute capacity that is largely idle. Because these plans still “work” and don’t cause reliability issues, the inefficiency is easy to overlook. Over time, this misalignment between hosting tier and actual usage creates significant invisible waste.

Learn more

Orphaned and Overprovisioned Resources in EKS Clusters

Compute

Cloud Provider

AWS

Service Name

AWS EKS

Inefficiency Type

Inefficient Configuration

In EKS environments, cluster sprawl can occur when workloads are removed but underlying resources remain. Common issues include persistent volumes no longer mounted by pods, services still backed by ELBs despite being unused, and overprovisioned nodes for workloads that no longer exist. Node overprovisioning can result from high CPU/memory requests or limits, DaemonSets running on every node, restrictive Pod Disruption Budgets, anti-affinity rules, uneven AZ distribution, or slow scale-down timers. Preventative measures include improving bin packing efficiency, enabling Karpenter consolidation, and right-sizing node instance types and counts. Dev/test namespaces and short-lived environments often accumulate without clear teardown processes, leading to ongoing idle costs.

These remnants contribute to excess infrastructure cost and control plane noise. Since AWS bills independently for each resource (e.g., EBS, ELB, EC2), inefficiencies can add up quickly. Without structured governance or cleanup tooling, clusters gradually fill with orphaned objects and unused capacity.

Learn more

Suboptimal Architecture Selection for Azure Virtual Machines

Compute

Cloud Provider

Azure

Service Name

Azure Virtual Machines

Inefficiency Type

Suboptimal Pricing Model

Azure provides VM families across three major CPU architectures, but default provisioning often leans toward Intel-based SKUs due to inertia or pre-configured templates. AMD and ARM alternatives offer substantial cost savings; ARM in particular can be 30–50% cheaper for general-purpose workloads. These price differences accumulate quickly at scale.

ARM-based VMs in Azure (e.g., Dps_v5, Eps_v5) are suited for many common workloads, such as microservices, web applications, and containerized environments. However, not all applications are architecture-compatible, especially those with dependencies on x86-specific libraries or instruction sets. Organizations that skip architecture evaluation during provisioning miss out on cost-efficient options.

Learn more

Orphaned Kubernetes Resources

Compute

Cloud Provider

AWS

Service Name

AWS EKS

Inefficiency Type

Orphaned Resource

In Kubernetes environments, resources such as ConfigMaps, Secrets, Services, and Persistent Volume Claims (PVCs) are often created dynamically by applications or deployment pipelines. When applications are removed or reconfigured, these resources may be left behind if not explicitly cleaned up. Over time, they accumulate as orphaned resources — not referenced by any live workload.

Some of these objects, like PVCs or Services of type LoadBalancer, result in active infrastructure that continues to incur cloud charges (e.g., retained EBS volumes or unused Elastic Load Balancers). Even lightweight objects like ConfigMaps and Secrets bloat the API server’s object store, causing latency and impacting deployments/scaling,, clutter the control plane, and complicate configuration management. This issue is especially common during cluster upgrades, namespace decommissioning, and workload migrations.

Learn more

Orphaned Kubernetes Resources

Compute

Cloud Provider

Azure

Service Name

Azure AKS

Inefficiency Type

Orphaned Resource

Kubernetes environments often accumulate unused resources over time as applications evolve. Common examples include Persistent Volume Claims (PVCs) backed by Azure Disks, Services that trigger load balancer provisioning, or stale ConfigMaps and Secrets. When the associated deployments or pods are removed, these resources may remain unless explicitly deleted.

In AKS, this can lead to unmanaged costs, such as idle managed disks from orphaned PVCs or public load balancers from Services of type LoadBalancer. Even lightweight resources like unused Secrets or ConfigMaps degrade cluster hygiene and can introduce security or operational risk. This inefficiency is common across Kubernetes environments and is scoped here to AKS.

Learn more

On-Demand-Only Configuration for Non-Production Databricks Clusters

Compute

Cloud Provider

Databricks

Service Name

Databricks Clusters

Inefficiency Type

Suboptimal Pricing Model

In non-production environments—such as development, testing, and experimentation—many teams default to on-demand nodes out of habit or caution. However, Databricks offers built-in support for using spot instances safely. Its job scheduler and cluster management system are designed to detect spot instance evictions and automatically replace them with on-demand nodes when necessary, making the use of spot compute relatively low-risk.

Failing to enable spot for non-critical or short-lived workloads leads to unnecessary overspend. The inefficiency often arises because spot usage is not enabled by default and must be explicitly selected in cluster settings. In teams that don’t revisit infrastructure defaults or where FinOps guardrails are missing, this results in a persistent cost gap between actual usage and what could be safely optimized.

Learn more

Orphaned and Overprovisioned Resources in AKS Clusters

Compute

Cloud Provider

Azure

Service Name

Azure AKS

Inefficiency Type

Inefficient Configuration

Clusters often accumulate unused components when applications are terminated or environments are cloned. These include PVCs backed by Managed Disks, Services that still front Azure Load Balancers, and test namespaces that are no longer maintained. Node pools are frequently overprovisioned, especially in multi-tenant or CI environments.

The cost impact of these idle resources is magnified in organizations with many environments or without standardized cleanup routines. Since billing is resource-specific, even low-cost items like Managed Disks, load balancer rules, and frontend configurations can accumulate meaningful waste over time.

Learn more

Orphaned and Overprovisioned Resources in GKE Clusters

Compute

Cloud Provider

GCP

Service Name

GCP GKE

Inefficiency Type

Inefficient Configuration

As environments scale, GKE clusters tend to accumulate artifacts from ephemeral workloads, dev environments, or incomplete job execution. PVCs can continue to retain Persistent Disks, Services may continue to expose public IPs and provision load balancers, and node pools are often oversized for steady-state demand. This results in cloud spend that is not aligned with application activity.

Organizations that lack visibility into Kubernetes-level resource usage often miss these inefficiencies because GCP billing tools surface usage at the infrastructure level, not the Kubernetes object level.

Learn more

Suboptimal Use of Compute Savings Plans for Specialized Instances

Compute

Cloud Provider

AWS

Service Name

AWS EC2

Inefficiency Type

Suboptimal Pricing Model

Accelerated EC2 instance types such as `p5.48xlarge` and `p5en.48xlarge (often used for ML/AI workloads)` are eligible for Compute Savings Plans, but the discount rates offered are modest compared to more common instance families. When organizations rely solely on CSPs, these lower priority instances are typically the last to benefit from the plan, especially if other instance types consume most of the discounted hours.

As a result, p5 usage may fall through the cracks and be billed at full On-Demand rates despite an active CSP. This dynamic makes CSPs a potentially inefficient choice for workloads that heavily or predictably rely on these instance types. EC2 Instance Savings Plans provide better discount targeting for known usage patterns, and AWS now offers dedicated P5 and P5en Instance Savings Plans with up to 40% savings specifically for these instance types. Additionally, while Capacity Blocks offer the steepest discount, they come with operational rigidity and inflexible scheduling constraints that limit their applicability.

Learn more

There are no inefficiency matches the current filters.