Compute

Inefficient Workflow Design in AWS Step Functions

Compute

Cloud Provider

AWS

Service Name

AWS Step Functions

Inefficiency Type

Misconfiguration

Improper design choices in AWS Step Functions can lead to unnecessary charges. For example: * Using Standard Workflows for short-lived, high-frequency executions leads to excessive per-transition charges. * Using Express Workflows for long-running processes (close to or exceeding the 5-minute limit) may cause timeouts or retries. * Inefficient use of states—such as chaining many simple states instead of combining logic into a Lambda function—can increase cost in both workflow types. * Overuse of payload-passing between states (especially in Express workflows) increases GB-second and data transfer charges.

Learn more

Stale Dedicated Hosts for Stopped EC2 Mac Instances

Compute

Cloud Provider

AWS

Service Name

AWS EC2

Inefficiency Type

Orphaned Resource

When an EC2 Mac instance is stopped or terminated, its associated dedicated host remains allocated by default. Because Mac instances are the only EC2 type billed at the host level, charges continue to accrue as long as the host is retained. This can lead to significant waste when: * Instances are stopped but the host is never released * Hosts are retained unintentionally after workloads are migrated or decommissioned * Automation only terminates instances without deallocating hosts

Learn more

Inefficient BI Queries Driving Excessive Compute Usage

Compute

Cloud Provider

Databricks

Service Name

Interactive Clusters

Inefficiency Type

Inefficient Query Patterns

Business Intelligence dashboards and ad-hoc analyst queries frequently drive Databricks compute usage — especially when: * Dashboards are auto-refreshed too frequently * Queries scan full datasets instead of leveraging filtered views or materialized tables * Inefficient joins or large broadcast operations are used * Redundant or exploratory queries are triggered during interactive exploration This often results in clusters staying active for longer than necessary, or being autoscaled up to handle inefficient workloads, leading to unnecessary DBU consumption.

Learn more

Excessive Lambda Retries (Retry Storms)

Compute

Cloud Provider

AWS

Service Name

AWS Lambda

Inefficiency Type

Retry Misconfiguration

Retry storms occur when a function fails and is automatically retried repeatedly due to default retry behavior for asynchronous events (e.g., SQS, EventBridge). If the error is persistent and unhandled, retries can accumulate rapidly — often invisibly — creating a large volume of billable executions with no successful outcome. This is especially costly when functions run for extended durations or have high memory allocation.

Learn more

Overprovisioned Memory Allocation in Cloud Run Services

Compute

Cloud Provider

GCP

Service Name

GCP Cloud Run

Inefficiency Type

Overprovisioned Resource Allocation

In Cloud Run, each revision is deployed with a fixed memory allocation (e.g., 512MiB, 1GiB, 2GiB, etc.). These settings are often overestimated during initial development or copied from templates. Unlike auto-scaling platforms that adapt instance size based on workload, Cloud Run continues to bill per the allocated amount regardless of actual memory used during execution. If a service consistently uses significantly less memory than allocated, it results in avoidable overpayment per request — especially for high-throughput or long-running services. Since memory and CPU are billed together based on configured values, this inefficiency compounds quickly at scale.

Learn more

Overprovisioned Node Pool in GKE Cluster

Compute

Cloud Provider

GCP

Service Name

GCP GKE

Inefficiency Type

Overprovisioned Resource

Node pools provisioned with large or specialized VMs (e.g., high-memory, GPU-enabled, or compute-optimized) can be significantly overprovisioned relative to the actual pod requirements. If workloads consistently leave a large portion of resources unused (e.g., low CPU/memory request-to-capacity ratio), the organization incurs unnecessary compute spend. This often happens in early cluster design phases, after application demand shifts, or when teams allocate for peak usage without autoscaling.

Learn more

Idle GKE Autopilot Clusters with Always-On System Overhead

Compute

Cloud Provider

GCP

Service Name

GCP GKE

Inefficiency Type

Inactive Resource Consuming Baseline Costs

Even when no user workloads are active, GKE Autopilot clusters continue running system-managed pods that accrue compute and storage charges. These include control plane components and built-in agents for observability and networking. If Autopilot clusters are deployed in non-production or experimental environments and left idle, they may silently accrue ongoing charges unrelated to application activity. This inefficiency often occurs in: * Dev/test clusters that are spun up temporarily but not deleted * Clusters used for one-time jobs or training workloads * Scheduled workloads that run infrequently but don't trigger downscaling

Learn more

Overprovisioned Memory in Cloud Run Services

Compute

Cloud Provider

GCP

Service Name

GCP Cloud Run

Inefficiency Type

Overprovisioned Resource

Cloud Run allows users to allocate up to 8 GB of memory per container instance. If memory is overestimated — often as a buffer or based on unvalidated assumptions — customers pay for more than what the workload consumes during execution. Unlike in VM-based environments where memory might be shared or underutilized without direct cost impact, in Cloud Run, you're billed precisely for what you allocate. This inefficiency often results from: * Defaulting to high memory values for “safety” * Not using monitoring tools to assess actual memory usage * Lack of clear ownership over service tuning

Learn more

Excessive Cold Starts in GCP Cloud Functions

Compute

Cloud Provider

GCP

Service Name

GCP Cloud Functions

Inefficiency Type

Inefficient Configuration

Cloud Functions scale to zero when idle. When invoked after inactivity, they undergo a "cold start," initializing runtime, loading dependencies, and establishing any required network connections (e.g., VPC connectors). These cold starts can dramatically increase execution time, especially for functions with: * High memory allocations * Heavy initialization logic * VPC connector requirements If cold starts are frequent, customers may be paying for unnecessary compute time — particularly in latency-sensitive workloads — without receiving proportional value.

Learn more

Underutilized VM Commitments Due to Architectural Drift

Compute

Cloud Provider

GCP

Service Name

GCP Compute Engine

Inefficiency Type

Underutilized Commitment

VM-based Committed Use Discounts in GCP offer cost savings for predictable workloads, but they are rigid: they apply only to specified VM types, quantities, and regions. When organizations evolve their architecture — such as moving to GKE (Kubernetes), Cloud Run, or autoscaling — usage patterns often shift away from the original commitments. Because GCP lacks flexible reallocation options like AWS Convertible RIs or Savings Plans, underutilized commitments lead to sustained, silent waste. This is especially common when workload changes go uncoordinated with finance or centralized planning.

Learn more

There are no inefficiency matches the current filters.