The Efficiency Hub

Go back

Databricks Compute

Behavioral Inefficiency

11

Commitment Misalignment

11

Excessive Data Retention

11

Excessive Retention of Non-Critical Data

11

Idle Resource

11

Idle Resource with Baseline Cost

11

Inactive Resource

11

Inactive Resource Consuming Baseline Costs

11

Inactive Storage Resource

11

Inactive and Detached Volume

11

Inefficient Architecture

11

Inefficient Configuration

11

Inefficient Configuration

11

Inefficient Data Ingestion

11

Inefficient Network Configuration

11

Inefficient Query Pattern

11

Inefficient Query Patterns

11

Inefficient Resource Usage

11

Inefficient Scheduling

11

Inefficient Storage Tiering

11

Inefficient Storage Usage

11

Misconfiguration

11

Misconfiguration Leading to Future Orphaned Resource

11

Misconfigured Architecture

11

Misconfigured Logging

11

Misconfigured Redundancy

11

Misconfigured Reservation

11

Misconfigured Storage Tier

11

Missing Cost Control Configuration

11

Missing Safeguard

11

Modernization

11

Orphaned Resource

11

Orphaned Storage Resource

11

Outdated Resource

11

Outdated Resource Selection

11

Over-Retention of Data

11

Overcommitted Reservation

11

Overprovisioned Networking Resource

11

Overprovisioned Resource

11

Overprovisioned Resource

11

Overprovisioned Resource Allocation

11

Pricing Model Misalignment

11

Recursive Invocation Misconfiguration

11

Redundant Configuration

11

Retained Unused Resource

11

Retention

11

Retry Misconfiguration

11

Suboptimal Configuration

11

Suboptimal Configuration and Usage

11

Suboptimal Data Layout

11

Suboptimal Data Layout or Format

11

Suboptimal Execution Model

11

Suboptimal Instance Family Selection

11

Suboptimal Instance Family Selection

11

Suboptimal Instance Selection

11

Suboptimal Lifecycle Configuration

11

Suboptimal Pricing Model

11

Suboptimal Query Routing and Warehouse Utilization

11

Suboptimal Storage Tier

11

Suboptimal Workload Distribution

11

Underutilization

11

Underutilized Commitment

11

Underutilized Compute Resource

11

Underutilized Resource

11

Unused Resource

11

Unused Resource

11

Visibility Gap

11

Clear filters

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Showing

1234

out of

1234

inefficiencis

Filter

:

Filter

x

Lack of Workload-Specific Cluster Segmentation

Compute

Cloud Provider

Databricks

Service Name

Databricks Compute

Inefficiency Type

Inefficient Configuration

Running varied workload types (e.g., ETL pipelines, ML training, SQL dashboards) on the same cluster introduces inefficiencies. Each workload has different runtime characteristics, scaling needs, and performance sensitivities. When mixed together, resource contention can degrade job performance, increase cost, and obscure cost attribution.

ETL jobs may overprovision memory, while lightweight SQL queries may trigger unnecessary cluster scale-ups. Job failures or retries may increase due to contention, and queued jobs can further inflate runtime costs. Without clear segmentation, teams lose the ability to tune environments for specific use cases or monitor workload-specific efficiency.

Learn more

Poorly Configured Autoscaling on Databricks Clusters

Compute

Cloud Provider

Databricks

Service Name

Databricks Compute

Inefficiency Type

Inefficient Configuration

Autoscaling is a core mechanism for aligning compute supply with workload demand, yet it's often underutilized or misconfigured. In older clusters or ad-hoc environments, autoscaling may be disabled by default or set with tight min/max worker limits that prevent scaling. This can lead to persistent overprovisioning (and wasted cost during idle periods) or underperformance due to insufficient parallelism and job queuing. Poor autoscaling settings are especially common in manually created all-purpose clusters, where idle resources often go unnoticed.

Overly wide autoscaling ranges can also introduce instability: Databricks may rapidly scale up to the upper limit if demand briefly spikes, leading to cost spikes or degraded performance. Understanding workload characteristics is key to tuning autoscaling appropriately.

Learn more

Overuse of Photon in Non-Production Workloads

Compute

Cloud Provider

Databricks

Service Name

Databricks Compute

Inefficiency Type

Inefficient Configuration

Photon is frequently enabled by default across Databricks workspaces, including for development, testing, and low-concurrency workloads. In these non-production contexts, job runtimes are typically shorter, SLAs are relaxed or nonexistent, and performance gains offer little business value.

Enabling Photon in these environments can inflate DBU costs substantially without meaningful runtime improvements. By not differentiating cluster configurations between production and non-production, organizations may pay a premium for workloads that could run just as efficiently on standard compute.

Cluster policies can be used to restrict Photon usage to explicitly tagged production workloads, helping enforce cost-conscious defaults and reduce unnecessary spend.

Learn more

There are no inefficiency matches the current filters.