Compute

Idle EMR Cluster Without Auto-Termination Policy

Compute

Cloud Provider

AWS

Service Name

AWS EMR

Inefficiency Type

Inactive Resource

Amazon EMR clusters often run on large, multi-node EC2 fleets, making them costly to leave running unnecessarily. If a cluster becomes idle—no longer processing jobs—but is not terminated, it continues accruing EC2 and EMR service charges. Many teams forget to shut down clusters manually or leave them running for debugging, staging, or future job use. Without an auto-termination policy, this oversight leads to significant unnecessary spend.

Learn more

Underuse of Fargate Spot for Interruptible Workloads

Compute

Cloud Provider

AWS

Service Name

AWS Fargate

Inefficiency Type

Pricing Model Misalignment

Many teams run workloads on standard Fargate pricing even when the workload is fault-tolerant and could tolerate interruptions. Fargate Spot provides the same performance characteristics at up to 70% lower cost, making it ideal for stateless jobs, batch processing, CI/CD runners, or retry-friendly microservices.

Learn more

Suboptimal Architecture Selection in AWS Lambda

Compute

Cloud Provider

AWS

Service Name

AWS Lambda

Inefficiency Type

Suboptimal Configuration

Lambda functions default to the x86\_64 architecture, which is more expensive than Arm64. For many workloads, especially those written in interpreted languages (e.g., Python, Node.js) or compiled to architecture-neutral bytecode (e.g., Java), there is no dependency on x86-specific binaries. In such cases, moving to Arm64 can reduce compute costs by 20% without impacting functionality. Despite this, many teams continue to run Lambda functions on x86\_64 due to legacy configurations, inertia, or lack of awareness. This leads to avoidable spending, particularly at scale or in high-volume environments.

Learn more

Inefficient Workflow Design in AWS Step Functions

Compute

Cloud Provider

AWS

Service Name

AWS Step Functions

Inefficiency Type

Misconfiguration

Improper design choices in AWS Step Functions can lead to unnecessary charges. For example: * Using **Standard Workflows** for short-lived, high-frequency executions leads to excessive per-transition charges. * Using **Express Workflows** for long-running processes (close to or exceeding the 5-minute limit) may cause timeouts or retries. * Inefficient use of states—such as chaining many simple states instead of combining logic into a Lambda function—can increase cost in both workflow types. * Overuse of payload-passing between states (especially in Express workflows) increases GB-second and data transfer charges.

Learn more

Stale Dedicated Hosts for Stopped EC2 Mac Instances

Compute

Cloud Provider

AWS

Service Name

AWS EC2

Inefficiency Type

Orphaned Resource

When an EC2 Mac instance is stopped or terminated, its associated dedicated host remains allocated by default. Because Mac instances are the only EC2 type billed at the host level, charges continue to accrue as long as the host is retained. This can lead to significant waste when: * Instances are stopped but the host is never released * Hosts are retained unintentionally after workloads are migrated or decommissioned * Automation only terminates instances without deallocating hosts

Learn more

Inefficient BI Queries Driving Excessive Compute Usage

Compute

Cloud Provider

Databricks

Service Name

Interactive Clusters

Inefficiency Type

Inefficient Query Patterns

Business Intelligence dashboards and ad-hoc analyst queries frequently drive Databricks compute usage — especially when: * Dashboards are auto-refreshed too frequently * Queries scan full datasets instead of leveraging filtered views or materialized tables * Inefficient joins or large broadcast operations are used * Redundant or exploratory queries are triggered during interactive exploration This often results in clusters staying active for longer than necessary, or being autoscaled up to handle inefficient workloads, leading to unnecessary DBU consumption.

Learn more

Excessive Lambda Retries (Retry Storms)

Compute

Cloud Provider

AWS

Service Name

AWS Lambda

Inefficiency Type

Retry Misconfiguration

Retry storms occur when a function fails and is automatically retried repeatedly due to default retry behavior for asynchronous events (e.g., SQS, EventBridge). If the error is persistent and unhandled, retries can accumulate rapidly — often invisibly — creating a large volume of billable executions with no successful outcome. This is especially costly when functions run for extended durations or have high memory allocation.

Learn more

Overprovisioned Memory Allocation in Cloud Run Services

Compute

Cloud Provider

GCP

Service Name

GCP Cloud Run

Inefficiency Type

Overprovisioned Resource Allocation

In Cloud Run, each revision is deployed with a fixed memory allocation (e.g., 512MiB, 1GiB, 2GiB, etc.). These settings are often overestimated during initial development or copied from templates. Unlike auto-scaling platforms that adapt instance size based on workload, Cloud Run continues to bill per the allocated amount regardless of actual memory used during execution. If a service consistently uses significantly less memory than allocated, it results in avoidable overpayment per request — especially for high-throughput or long-running services. Since memory and CPU are billed together based on configured values, this inefficiency compounds quickly at scale.

Learn more

Overprovisioned Node Pool in GKE Cluster

Compute

Cloud Provider

GCP

Service Name

GCP GKE

Inefficiency Type

Overprovisioned Resource

Node pools provisioned with large or specialized VMs (e.g., high-memory, GPU-enabled, or compute-optimized) can be significantly overprovisioned relative to the actual pod requirements. If workloads consistently leave a large portion of resources unused (e.g., low CPU/memory request-to-capacity ratio), the organization incurs unnecessary compute spend. This often happens in early cluster design phases, after application demand shifts, or when teams allocate for peak usage without autoscaling.

Learn more

Idle GKE Autopilot Clusters with Always-On System Overhead

Compute

Cloud Provider

GCP

Service Name

GCP GKE

Inefficiency Type

Inactive Resource Consuming Baseline Costs

Even when no user workloads are active, GKE Autopilot clusters continue running system-managed pods that accrue compute and storage charges. These include control plane components and built-in agents for observability and networking. If Autopilot clusters are deployed in non-production or experimental environments and left idle, they may silently accrue ongoing charges unrelated to application activity. This inefficiency often occurs in: * Dev/test clusters that are spun up temporarily but not deleted * Clusters used for one-time jobs or training workloads * Scheduled workloads that run infrequently but don't trigger downscaling

Learn more

There are no inefficiency matches the current filters.