Many Spark and SQL workloads in Databricks suffer from micro-optimization issues — such as unfiltered joins, unnecessary shuffles, missing broadcast joins, and repeated scans of uncached data. These problems increase compute time and resource utilization, especially in exploratory or development environments. Disabling Adaptive Query Execution (AQE) can further exacerbate inefficiencies. Optimizing queries reduces DBU costs, improves cluster performance, and enhances user experience.
Databricks charges by Databricks Units (DBUs), which are billed per-second based on the compute resources used. Inefficient query design leads to longer execution times, increased memory and shuffle usage, and higher DBU consumption without proportional business value.