CER-0332
When Delta Lake tables are partitioned by specific columns — such as date, region, or tenant identifier — the query engine can use partition pruning to limit data scans to only the relevant subset of files. However, when queries against these partitioned tables omit filter predicates on partition columns, the engine is forced to perform a full table scan across all partitions. This means the cluster reads every data file in the table regardless of how much data the query actually needs, directly inflating both execution time and Databricks Unit (DBU) consumption.
This pattern is especially common in several scenarios: legacy SQL queries written before tables were partitioned, dynamically generated queries from applications or BI tools that do not incorporate partition column awareness, and ad-hoc exploratory queries by analysts unfamiliar with the table's partitioning strategy. On large time-series datasets, the difference can be dramatic — a query that should scan only a few gigabytes of recent data may instead process terabytes across the entire table history. Because Databricks bills DBUs per second, a query that runs significantly longer due to scanning unnecessary data consumes proportionally more DBUs, compounding the waste across both the Databricks platform charges and the underlying cloud infrastructure costs.
This inefficiency is distinct from tables that lack partitioning entirely. Here, the partitioning infrastructure exists and is correctly configured, but queries fail to leverage it — making the investment in partitioning effectively wasted while still incurring full-scan costs.
Databricks charges based on Databricks Units (DBUs), which represent processing capability consumed per unit of time, billed on a per-second basis. Total cost is determined by:
Query execution time directly impacts DBU consumption. Queries that scan more data require more compute resources and longer execution times, increasing DBU usage proportionally. A full table scan on a partitioned table that could have been pruned to a fraction of its size results in unnecessary cluster runtime and inflated DBU charges. This waste is further amplified on interactive All-Purpose Compute clusters, which carry higher per-DBU rates than Jobs Compute.