Cloud Provider
Service Name
Inefficiency Type
Clear filters
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Showing
1234
out of
1234
inefficiencies
Filter
:
Filter
x
Orphaned and Overprovisioned Resources in GKE Clusters
Compute
Cloud Provider
GCP
Service Name
GCP GKE
Inefficiency Type
Inefficient Configuration

As environments scale, GKE clusters tend to accumulate artifacts from ephemeral workloads, dev environments, or incomplete job execution. PVCs can continue to retain Persistent Disks, Services may continue to expose public IPs and provision load balancers, and node pools are often oversized for steady-state demand. This results in cloud spend that is not aligned with application activity.

Organizations that lack visibility into Kubernetes-level resource usage often miss these inefficiencies because GCP billing tools surface usage at the infrastructure level, not the Kubernetes object level.

Suboptimal Use of Compute Savings Plans for Specialized Instances
Compute
Cloud Provider
AWS
Service Name
AWS EC2
Inefficiency Type
Suboptimal Pricing Model

Accelerated EC2 instance types such as `p5.48xlarge` and `p5en.48xlarge (often used for ML/AI workloads)` are eligible for Compute Savings Plans, but the discount rates offered are modest compared to more common instance families. When organizations rely solely on CSPs, these lower priority instances are typically the last to benefit from the plan, especially if other instance types consume most of the discounted hours.

As a result, p5 usage may fall through the cracks and be billed at full On-Demand rates despite an active CSP. This dynamic makes CSPs a potentially inefficient choice for workloads that heavily or predictably rely on these instance types. EC2 Instance Savings Plans provide better discount targeting for known usage patterns, and AWS now offers dedicated P5 and P5en Instance Savings Plans with up to 40% savings specifically for these instance types. Additionally, while Capacity Blocks offer the steepest discount, they come with operational rigidity and inflexible scheduling constraints that limit their applicability.

Outdated Virtual Machine Version in Azure
Compute
Cloud Provider
Azure
Service Name
Azure Virtual Machines
Inefficiency Type
Outdated Resource

Many organizations choose a VM SKU and version (e.g., `D4s_v3`) during the initial planning phase of a project, often based on availability, compatibility, or early cost estimates. Over time, Microsoft releases newer hardware generations (e.g., `D4s_v4`, `D4s_v5`) that offer equivalent or better performance at the same or reduced cost. However, existing VMs are not automatically migrated, and these newer versions are often overlooked unless intentionally evaluated.

This inefficiency tends to persist because switching to a newer version typically requires VM deallocation and resizing, which introduces temporary downtime. As a result, outdated VM series versions continue to run indefinitely, even in environments where brief downtime is acceptable. The cost delta between series versions—especially across certain families like `D`, `E`, or `F`—can be significant when scaled across environments or multiple VMs. Note that VM series versions (v3, v4, v5) are distinct from VM generations (Gen 1 vs Gen 2), with series versions representing the primary opportunity for cost optimization.

Orphaned Kubernetes Resources
Compute
Cloud Provider
GCP
Service Name
GCP GKE
Inefficiency Type
Orphaned Resource

In GKE environments, it is common for unused Kubernetes resources to accumulate over time. Examples include Persistent Volume Claims (PVCs) that retain provisioned Persistent Disks, or Services of type LoadBalancer that continue to front GCP external load balancers even after the backing pods are gone. ConfigMaps and Secrets may also linger, creating operational overhead.

These orphaned objects often persist due to gaps in CI/CD teardown logic, manual testing workflows, or drift over time. While some carry negligible cost on their own, others can result in significant charges, especially storage and networking artifacts. This inefficiency applies broadly across Kubernetes platforms and is scoped here to GKE.

Unoptimized Billing Model for BigQuery Dataset Storage
Databases
Cloud Provider
GCP
Service Name
GCP BigQuery
Inefficiency Type
Inefficient Configuration

Highly compressible datasets, such as those with repeated string fields, nested structures, or uniform rows, can benefit significantly from physical storage billing. Yet most datasets remain on logical storage by default, even when physical storage would reduce costs.

This inefficiency is common for cold or infrequently updated datasets that are no longer optimized or regularly reviewed. Because storage behavior and data characteristics evolve, failing to periodically reassess the billing model may result in persistent waste.

Underuse of Serverless for Short or Interactive Workloads
Compute
Cloud Provider
Databricks
Service Name
Databricks SQL
Inefficiency Type
Inefficient Configuration

Many organizations continue running short-lived or low-intensity SQL workloads — such as dashboards, exploratory queries, and BI tool integrations — on traditional clusters. This leads to idle compute, overprovisioning, and high baseline costs, especially when the clusters are always-on. Databricks SQL Serverless is optimized for bursty, interactive use cases with auto-scaling and pay-per-second pricing, making it better suited for this class of workloads. Failing to migrate to serverless for these patterns results in unnecessary cost without performance benefit.

Lack of Workload-Specific Cluster Segmentation
Compute
Cloud Provider
Databricks
Service Name
Databricks Compute
Inefficiency Type
Inefficient Configuration

Running varied workload types (e.g., ETL pipelines, ML training, SQL dashboards) on the same cluster introduces inefficiencies. Each workload has different runtime characteristics, scaling needs, and performance sensitivities. When mixed together, resource contention can degrade job performance, increase cost, and obscure cost attribution.

ETL jobs may overprovision memory, while lightweight SQL queries may trigger unnecessary cluster scale-ups. Job failures or retries may increase due to contention, and queued jobs can further inflate runtime costs. Without clear segmentation, teams lose the ability to tune environments for specific use cases or monitor workload-specific efficiency.

Poorly Configured Autoscaling on Databricks Clusters
Compute
Cloud Provider
Databricks
Service Name
Databricks Compute
Inefficiency Type
Inefficient Configuration

Autoscaling is a core mechanism for aligning compute supply with workload demand, yet it's often underutilized or misconfigured. In older clusters or ad-hoc environments, autoscaling may be disabled by default or set with tight min/max worker limits that prevent scaling. This can lead to persistent overprovisioning (and wasted cost during idle periods) or underperformance due to insufficient parallelism and job queuing. Poor autoscaling settings are especially common in manually created all-purpose clusters, where idle resources often go unnoticed.

Overly wide autoscaling ranges can also introduce instability: Databricks may rapidly scale up to the upper limit if demand briefly spikes, leading to cost spikes or degraded performance. Understanding workload characteristics is key to tuning autoscaling appropriately.

Overuse of Photon in Non-Production Workloads
Compute
Cloud Provider
Databricks
Service Name
Databricks Compute
Inefficiency Type
Inefficient Configuration

Photon is frequently enabled by default across Databricks workspaces, including for development, testing, and low-concurrency workloads. In these non-production contexts, job runtimes are typically shorter, SLAs are relaxed or nonexistent, and performance gains offer little business value.

Enabling Photon in these environments can inflate DBU costs substantially without meaningful runtime improvements. By not differentiating cluster configurations between production and non-production, organizations may pay a premium for workloads that could run just as efficiently on standard compute.

Cluster policies can be used to restrict Photon usage to explicitly tagged production workloads, helping enforce cost-conscious defaults and reduce unnecessary spend.

Inefficient Query Design in Databricks SQL and Spark Jobs
Compute
Cloud Provider
Databricks
Service Name
Databricks SQL
Inefficiency Type
Inefficient Configuration

Many Spark and SQL workloads in Databricks suffer from micro-optimization issues — such as unfiltered joins, unnecessary shuffles, missing broadcast joins, and repeated scans of uncached data. These problems increase compute time and resource utilization, especially in exploratory or development environments. Disabling Adaptive Query Execution (AQE) can further exacerbate inefficiencies. Optimizing queries reduces DBU costs, improves cluster performance, and enhances user experience.

There are no inefficiency matches the current filters.