Overselecting Data and Misusing LIMIT for Cost Control in BigQuery

Go back

Benjamin van der Maas

CER:

CER-0295

Service Category

Other

Cloud Provider

GCP

Service Name

GCP BigQuery

Inefficiency Type

Excessive data processed

Explanation

This inefficiency occurs when analysts use SELECT * (reading more columns than needed) and/or rely on LIMIT as a cost-control mechanism. In BigQuery, projecting excess columns increases the amount of data read and can materially raise query cost, particularly on wide tables and frequently-run queries. Separately, applying LIMIT to a query does not inherently reduce bytes processed for non-clustered tables; it mainly caps the result set returned. The “LIMIT saves cost” assumption is only sometimes true on clustered tables, where BigQuery may be able to stop scanning earlier once enough clustered blocks have been read.

Relevant Billing Model

BigQuery query cost is driven by the data read/processed by the query. Selecting unnecessary columns increases bytes processed. Using LIMIT typically reduces only the rows returned, not the data read—especially on non-clustered tables. On clustered tables, LIMIT can reduce bytes scanned in some cases because scanning may stop after enough blocks are read.

Detection

Review whether recurring or shared queries use SELECT * instead of selecting required columns
Identify query patterns that treat LIMIT as a primary way to reduce cost, especially on non-clustered tables
Assess whether large or wide tables are commonly queried without intentional column selection or data reduction
Check whether users expect LIMIT to lower cost in cases where clustering is not present or not aligned with the query pattern

Remediation

Replace SELECT * with explicit column selection for production, scheduled, and commonly reused queries
Treat LIMIT as a result-size control, not a default cost-control strategy; rely on intentional data reduction approaches instead
Improve discoverability of table schemas (catalogs / docs) so users can choose columns intentionally before querying

Relevant Documentation

Submit Feedback