hub-backup

Excessive Automated Backup Retention in Cloud SQL

Databases

Cloud Provider

GCP

Service Name

Cloud SQL

Inefficiency Type

Excessive Data Retention

This inefficiency occurs when automated Cloud SQL backups are retained longer than required by recovery objectives or governance needs. Because backups accumulate over the retention window (and can grow quickly for high-change databases), excessive retention drives ongoing backup storage charges without improving practical recoverability.

Learn more

Mixing Production and Non-Production Applications in the Same App Service Plan

Compute

Cloud Provider

Azure

Service Name

Azure App Service Plans

Inefficiency Type

Inefficient environment isolation

This inefficiency occurs when production and non-production applications are hosted within the same App Service Plan. Production workloads often require higher availability, performance, or scaling characteristics, driving the plan toward larger or higher-cost SKUs. When non-production workloads share that plan, they inherit the higher cost structure even though their availability and performance requirements are typically much lower, resulting in unnecessary spend.

Learn more

Fargate Resource Rounding and Per-Pod Overhead Driving Step-Up Costs

Compute

Cloud Provider

AWS

Service Name

AWS EKS

Inefficiency Type

Suboptimal resource sizing

This inefficiency occurs when pod resource requests—often inflated by sidecar containers—push total memory or CPU just over a Fargate sizing boundary. Because Fargate adds mandatory system overhead and only supports fixed resource combinations, small incremental increases can force a pod into a much larger billing tier. This results in materially higher cost for marginal additional resource needs, especially in workloads that run continuously or at scale.

Learn more

Unnecessary Lambda Provisioned Concurrency on Low-Utilization Functions

Compute

Cloud Provider

AWS

Service Name

AWS Lambda

Inefficiency Type

Unused reserved capacity

This inefficiency occurs when Provisioned Concurrency is enabled for Lambda functions that do not require consistently low latency or steady traffic. In such cases, reserved capacity remains allocated and billed during idle periods, creating ongoing cost without proportional performance or business benefit. This is distinct from standard Lambda execution charges, which are purely usage-based.

Learn more

Retained Azure Backup Data After Resource Decommissioning

Storage

Cloud Provider

Azure

Service Name

Azure Backup

Inefficiency Type

Orphaned backup data

This inefficiency occurs when a protected resource (such as a virtual machine, database, or file share) is decommissioned without explicitly stopping backup protection. In these cases, Azure Backup continues to retain existing recovery points in the vault until the retention policy expires. Although the source resource no longer exists, backup storage remains allocated and billable, resulting in unnecessary ongoing costs.

This pattern is common when infrastructure is deleted outside of a formal decommissioning process or when backup ownership is unclear.

Learn more

Underutilized Azure Savings Plan Due to Overly Narrow Scope

Compute

Cloud Provider

Azure

Service Name

Azure Virtual Machines

Inefficiency Type

Commitment underutilization due to scope configuration

This inefficiency occurs when an Azure Savings Plan is scoped too narrowly relative to where eligible compute usage actually runs. When usage is spread across multiple subscriptions or fluctuates significantly (for example, development and test workloads that are frequently stopped and started), a narrowly scoped Savings Plan may not consistently find enough eligible usage to consume the full commitment. As a result, part of the committed hourly spend goes unused while other eligible workloads outside the scope continue to incur on-demand charges.

Azure supports broader scoping options—such as Management Group or Shared scope—that allow the commitment to be applied across a larger pool of eligible compute. Selecting an overly restrictive scope can therefore directly drive underutilization, even when sufficient total usage exists across the tenant.

Learn more

Using High-Cost Models for Low-Complexity Tasks

Cloud Provider

GCP

Service Name

GCP Vertex AI

Inefficiency Type

Overpowered Model Selection

Vertex AI workloads often include low-complexity tasks such as classification, routing, keyword extraction, metadata parsing, document triage, or summarization of short and simple text. These operations do **not** require the advanced multimodal reasoning or long-context capabilities of larger Gemini model tiers. When organizations default to a single high-end model (such as Gemini Ultra or Pro) across all applications, they incur elevated token costs for work that could be served efficiently by **Gemini Flash** or smaller task-optimized variants. This mismatch is a common pattern in early deployments where model selection is driven by convenience rather than workload-specific requirements. Over time, this creates unnecessary spend without delivering measurable value.

Learn more

Using High-Cost Bedrock Models for Low-Complexity Tasks

Cloud Provider

AWS

Service Name

AWS Bedrock

Inefficiency Type

Overpowered Model Selection

Many Bedrock workloads involve low-complexity tasks such as tagging, classification, routing, entity extraction, keyword detection, document triage, or lightweight summarization. These tasks **do not require** the advanced reasoning or generative capabilities of higher-cost models such as Claude 3 Opus or comparable premium models. When organizations default to a high-end model across all applications—or fail to periodically reassess model selection—they pay elevated costs for work that could be performed effectively by smaller, lower-cost models such as Claude Haiku or other compact model families. This inefficiency becomes more pronounced in high-volume, repetitive workloads where token counts scale quickly.

Learn more

Always-On PTUs for Seasonal or Cyclical Azure OpenAI Workloads

Cloud Provider

Azure

Service Name

Azure Cognitive Services

Inefficiency Type

Unnecessary Continuous Provisioning

Many Azure OpenAI workloads—such as reporting pipelines, marketing workflows, batch inference jobs, or time-bound customer interactions—only run during specific periods. When PTUs remain fully provisioned 24/7, organizations incur continuous fixed cost even during extended idle time. Although Azure does not offer native PTU scheduling, teams can use automation to provision and deprovision PTUs based on predictable cycles. This allows them to retain performance during peak windows while reducing cost during low-activity periods.

Learn more

Non-Production Azure OpenAI Deployments Using PTUs Instead of PAYG

Cloud Provider

Azure

Service Name

Azure Cognitive Services

Inefficiency Type

Misaligned Pricing Model

Development, testing, QA, and sandbox environments rarely have the steady, predictable traffic patterns needed to justify PTU deployments. These workloads often run intermittently, with lower throughput and shorter usage windows. When PTUs are assigned to such environments, the fixed hourly billing generates continuous cost with little utilization. Switching non-production workloads to PAYG aligns cost with actual usage and eliminates the overhead of managing PTU quota in low-stakes environments.

Learn more

There are no inefficiency matches the current filters.