Cloud Provider
Service Name
Inefficiency Type
Clear filters
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Showing
1234
out of
1234
inefficiencies
Filter
:
Filter
x
Mixing Production and Non-Production Applications in the Same App Service Plan
Compute
Cloud Provider
Azure
Service Name
Azure App Service Plans
Inefficiency Type
Inefficient environment isolation

This inefficiency occurs when production and non-production applications are hosted within the same App Service Plan. Production workloads often require higher availability, performance, or scaling characteristics, driving the plan toward larger or higher-cost SKUs. When non-production workloads share that plan, they inherit the higher cost structure even though their availability and performance requirements are typically much lower, resulting in unnecessary spend.

Fargate Resource Rounding and Per-Pod Overhead Driving Step-Up Costs
Compute
Cloud Provider
AWS
Service Name
AWS EKS
Inefficiency Type
Suboptimal resource sizing

This inefficiency occurs when pod resource requests—often inflated by sidecar containers—push total memory or CPU just over a Fargate sizing boundary. Because Fargate adds mandatory system overhead and only supports fixed resource combinations, small incremental increases can force a pod into a much larger billing tier. This results in materially higher cost for marginal additional resource needs, especially in workloads that run continuously or at scale.

Unnecessary Lambda Provisioned Concurrency on Low-Utilization Functions
Compute
Cloud Provider
AWS
Service Name
AWS Lambda
Inefficiency Type
Unused reserved capacity

This inefficiency occurs when Provisioned Concurrency is enabled for Lambda functions that do not require consistently low latency or steady traffic. In such cases, reserved capacity remains allocated and billed during idle periods, creating ongoing cost without proportional performance or business benefit. This is distinct from standard Lambda execution charges, which are purely usage-based.

Retained Azure Backup Data After Resource Decommissioning
Storage
Cloud Provider
Azure
Service Name
Azure Backup
Inefficiency Type
Orphaned backup data

This inefficiency occurs when a protected resource (such as a virtual machine, database, or file share) is decommissioned without explicitly stopping backup protection. In these cases, Azure Backup continues to retain existing recovery points in the vault until the retention policy expires. Although the source resource no longer exists, backup storage remains allocated and billable, resulting in unnecessary ongoing costs.

This pattern is common when infrastructure is deleted outside of a formal decommissioning process or when backup ownership is unclear.

Underutilized Azure Savings Plan Due to Overly Narrow Scope
Compute
Cloud Provider
Azure
Service Name
Azure Virtual Machines
Inefficiency Type
Commitment underutilization due to scope configuration

This inefficiency occurs when an Azure Savings Plan is scoped too narrowly relative to where eligible compute usage actually runs. When usage is spread across multiple subscriptions or fluctuates significantly (for example, development and test workloads that are frequently stopped and started), a narrowly scoped Savings Plan may not consistently find enough eligible usage to consume the full commitment. As a result, part of the committed hourly spend goes unused while other eligible workloads outside the scope continue to incur on-demand charges.

Azure supports broader scoping options—such as Management Group or Shared scope—that allow the commitment to be applied across a larger pool of eligible compute. Selecting an overly restrictive scope can therefore directly drive underutilization, even when sufficient total usage exists across the tenant.

Suboptimal Bedrock Custom Model
AI
Cloud Provider
AWS
Service Name
AWS Bedrock
Inefficiency Type
Outdated or Overpowered Model Configuration

Teams often start custom-model deployments with large architectures, full-precision weights, or older model versions carried over from training environments. When these models transition to Bedrock’s managed inference environment, the compute footprint (especially GPU class) becomes a major cost driver. Common inefficiencies include: * Deploying outdated custom models despite newer, more efficient variants being available, * Running full-size models for tasks that could be served by distilled or quantized versions, * Using accelerators overpowered for the workload’s latency requirements, or * Relying on default model artifacts instead of optimizing for inference. Because Bedrock Custom Models bill continuously for the backing compute, even small inefficiencies in model design or versioning translate into substantial ongoing cost.

Excessive Retries for Large Inference Outputs
AI
Cloud Provider
GCP
Service Name
GCP Vertex AI
Inefficiency Type
Excessive Retry-Induced Token Consumption

Generative workloads that produce long outputs—such as detailed summaries, document rewrites, or multi-paragraph chat completions—require extended model runtime.

Unnecessary Use of Embeddings for Simple Retrieval Tasks
AI
Cloud Provider
Databricks
Service Name
Databricks Vector Search
Inefficiency Type
Misapplied Embedding Architecture

Embedding-based retrieval enables semantic matching even when keywords differ. But many Databricks workloads—catalog lookups, metadata search, deterministic classification, or fixed-rule routing—do not require semantic understanding. When embeddings are used anyway, teams incur DBU cost for embedding generation, additional storage for vector columns or indexes, and more expensive similarity-search compute. This often stems from defaulting to a RAG approach rather than evaluating whether a simpler retrieval mechanism would perform equally well.

Unnecessary Use of Embeddings for Simple Retrieval Tasks
AI
Cloud Provider
Azure
Service Name
Azure Cognitive Services
Inefficiency Type
Misapplied Embedding Architecture

Embeddings enable semantic retrieval by capturing the meaning of text, while keyword search returns results based on exact or lexical matches. Many Azure workloads—FAQ search, routing, deterministic classification, or structured lookups—achieve the same or better accuracy using simple keyword or metadata filtering. When embeddings are used for these uncomplicated tasks, organizations pay for token-based embedding generation, vector storage, and compute-heavy similarity search without receiving meaningful quality improvements. This inefficiency often occurs when RAG is used automatically rather than intentionally.

Unnecessary Use of Embeddings for Simple Retrieval Tasks
AI
Cloud Provider
Snowflake
Service Name
Snowflake Cortex
Inefficiency Type
Misapplied Embedding Architecture

Embeddings enable semantic similarity search by representing text as high-dimensional vectors. Keyword search, however, returns results based on lexical matches and is often sufficient for simple retrieval tasks such as FAQ matching, deterministic filtering, metadata lookup, or rule-based routing. When embeddings are used for these low-complexity scenarios, organizations pay for compute to generate embeddings, storage for vector columns, and compute-heavy cosine similarity searches — without improving accuracy or user experience. In Snowflake, this can also increase warehouse load and query runtime.

There are no inefficiency matches the current filters.