Running varied workload types (e.g., ETL pipelines, ML training, SQL dashboards) on the same cluster introduces inefficiencies. Each workload has different runtime characteristics, scaling needs, and performance sensitivities. When mixed together, resource contention can degrade job performance, increase cost, and obscure cost attribution.
ETL jobs may overprovision memory, while lightweight SQL queries may trigger unnecessary cluster scale-ups. Job failures or retries may increase due to contention, and queued jobs can further inflate runtime costs. Without clear segmentation, teams lose the ability to tune environments for specific use cases or monitor workload-specific efficiency.
Databricks charges per-node-hour using Databricks Units (DBUs), with different rates based on cluster type and configuration. When disparate workloads share a single cluster — especially in all-purpose clusters — compute is inefficiently allocated, jobs contend for resources, and DBU consumption can spike due to overprovisioning, retry inflation, or queuing delays.