Inefficiencies

Inefficient Workflow Design in AWS Step Functions

Compute

Cloud Provider

AWS

Service Name

AWS Step Functions

Inefficiency Type

Misconfiguration

Improper design choices in AWS Step Functions can lead to unnecessary charges. For example: * Using Standard Workflows for short-lived, high-frequency executions leads to excessive per-transition charges. * Using Express Workflows for long-running processes (close to or exceeding the 5-minute limit) may cause timeouts or retries. * Inefficient use of states—such as chaining many simple states instead of combining logic into a Lambda function—can increase cost in both workflow types. * Overuse of payload-passing between states (especially in Express workflows) increases GB-second and data transfer charges.

Learn more

Disabled Retry Policies in EventBridge

Other

Cloud Provider

AWS

Service Name

AWS EventBridge

Inefficiency Type

Misconfiguration

By default, EventBridge includes retry mechanisms for delivery failures, particularly when targets like Lambda functions or Step Functions fail to process an event. However, if these retry policies are disabled or misconfigured, EventBridge may treat failed deliveries as successful, prompting upstream services to republish the same event multiple times in response to undelivered outcomes. This leads to: * Duplicate event publishing and delivery * Unnecessary compute triggered by repeated events * Increased EventBridge, downstream service, and data transfer costs This behavior is especially problematic in systems where idempotency is not strictly enforced and retries are managed externally by upstream services.

Learn more

Suboptimal Log Class Configuration in CloudWatch

Other

Cloud Provider

AWS

Service Name

AWS CloudWatch

Inefficiency Type

Misconfiguration

By default, CloudWatch Log Groups use the Standard log class, which applies higher rates for both ingestion and storage. AWS also offers an Infrequent Access (IA) log class designed for logs that are rarely queried — such as audit trails, debugging output, or compliance records. Many teams assume storage is the dominant cost driver in CloudWatch, but in high-volume environments, ingestion costs can account for the majority of spend. When logs that are infrequently accessed are ingested into the Standard class, it leads to unnecessary costs without impacting observability. The IA log class offers significantly reduced rates for ingestion and storage, making it a better fit for logs used primarily for post-incident review, compliance retention, or ad hoc forensic analysis.

Learn more

Stale Dedicated Hosts for Stopped EC2 Mac Instances

Compute

Cloud Provider

AWS

Service Name

AWS EC2

Inefficiency Type

Orphaned Resource

When an EC2 Mac instance is stopped or terminated, its associated dedicated host remains allocated by default. Because Mac instances are the only EC2 type billed at the host level, charges continue to accrue as long as the host is retained. This can lead to significant waste when: * Instances are stopped but the host is never released * Hosts are retained unintentionally after workloads are migrated or decommissioned * Automation only terminates instances without deallocating hosts

Learn more

Missing Delta Optimization Features for High-Volume Tables

Storage

Cloud Provider

Databricks

Service Name

Delta Lake

Inefficiency Type

Suboptimal Data Layout

In many Databricks environments, large Delta tables are created without enabling standard optimization features like partitioning and Z-Ordering. Without these, queries scanning large datasets may read far more data than necessary, increasing execution time and compute usage. * Partitioning organizes data by a specified column to reduce scan scope. * Z-Ordering optimizes file sorting to minimize I/O during range queries or filters. * Delta Format enables additional optimizations like data skipping and compaction. Failing to use these features in high-volume tables often results in avoidable performance overhead and elevated spend, especially in environments with frequent exploratory queries or BI workloads.

Learn more

Inefficient BI Queries Driving Excessive Compute Usage

Compute

Cloud Provider

Databricks

Service Name

Interactive Clusters

Inefficiency Type

Inefficient Query Patterns

Business Intelligence dashboards and ad-hoc analyst queries frequently drive Databricks compute usage — especially when: * Dashboards are auto-refreshed too frequently * Queries scan full datasets instead of leveraging filtered views or materialized tables * Inefficient joins or large broadcast operations are used * Redundant or exploratory queries are triggered during interactive exploration This often results in clusters staying active for longer than necessary, or being autoscaled up to handle inefficient workloads, leading to unnecessary DBU consumption.

Learn more

Lack of Deduplication and Change Block Tracking in AWS Backup

Storage

Cloud Provider

AWS

Service Name

AWS Backup

Inefficiency Type

Underutilization

AWS Backup does not natively support global deduplication or change block tracking across backups. As a result, even traditional incremental or differential backup strategies (e.g., daily incremental, weekly full) can accumulate redundant data. Over time, this leads to higher-than-necessary storage usage and cost — especially in environments with frequent backup schedules or large data volumes that only change minimally between snapshots. While some third-party agents can implement CBT and deduplication at the client level, AWS Backup alone offers no built-in mechanism to avoid storing unchanged data across backup generations.

Learn more

Excessive Lambda Retries (Retry Storms)

Compute

Cloud Provider

AWS

Service Name

AWS Lambda

Inefficiency Type

Retry Misconfiguration

Retry storms occur when a function fails and is automatically retried repeatedly due to default retry behavior for asynchronous events (e.g., SQS, EventBridge). If the error is persistent and unhandled, retries can accumulate rapidly — often invisibly — creating a large volume of billable executions with no successful outcome. This is especially costly when functions run for extended durations or have high memory allocation.

Learn more

Excessive Shard Count in GCP Bigtable

Databases

Cloud Provider

GCP

Service Name

GCP BigTable

Inefficiency Type

Inefficient Configuration

Bigtable automatically splits data into tablets (shards), which are distributed across provisioned nodes. However, poorly designed row key schemas or excessive shard counts (caused by high cardinality, hash-based keys, or timestamp-first designs) can result in performance bottlenecks or hot spotting. To compensate, users often scale up node counts — increasing costs — when the real issue lies in suboptimal data distribution. This leads to inflated infrastructure spend without actual workload increase.

Learn more

Inactive Memorystore Instance

Databases

Cloud Provider

GCP

Service Name

Inefficiency Type

Inactive Resource

Memorystore instances that are provisioned but unused — whether due to deprecated services, orphaned environments, or development/testing phases ending — continue to incur memory and infrastructure charges. Because usage-based metrics like client connections or cache hit ratios are not tied to billing, an idle instance costs the same as a heavily used one. This makes it critical to identify and decommission inactive caches.

Learn more

There are no inefficiency matches the current filters.