Submit feedback on
Overselecting Data and Misusing LIMIT for Cost Control in BigQuery
We've received your feedback.
Thanks for reaching out!
Oops! Something went wrong while submitting the form.
Close
Overselecting Data and Misusing LIMIT for Cost Control in BigQuery
Benjamin van der Maas
CER:
GCP-Other-3384
Service Category
Other
Cloud Provider
GCP
Service Name
GCP BigQuery
Inefficiency Type
Excessive data processed
Explanation

This inefficiency occurs when analysts use SELECT * (reading more columns than needed) and/or rely on LIMIT as a cost-control mechanism. In BigQuery, projecting excess columns increases the amount of data read and can materially raise query cost, particularly on wide tables and frequently-run queries. Separately, applying LIMIT to a query does not inherently reduce bytes processed for non-clustered tables; it mainly caps the result set returned. The “LIMIT saves cost” assumption is only sometimes true on clustered tables, where BigQuery may be able to stop scanning earlier once enough clustered blocks have been read.

Relevant Billing Model

BigQuery query cost is driven by the data read/processed by the query. Selecting unnecessary columns increases bytes processed. Using LIMIT typically reduces only the rows returned, not the data read—especially on non-clustered tables. On clustered tables, LIMIT can reduce bytes scanned in some cases because scanning may stop after enough blocks are read.

Detection
  • Review whether recurring or shared queries use SELECT * instead of selecting required columns
  • Identify query patterns that treat LIMIT as a primary way to reduce cost, especially on non-clustered tables
  • Assess whether large or wide tables are commonly queried without intentional column selection or data reduction
  • Check whether users expect LIMIT to lower cost in cases where clustering is not present or not aligned with the query pattern
Remediation
  • Replace SELECT * with explicit column selection for production, scheduled, and commonly reused queries
  • Treat LIMIT as a result-size control, not a default cost-control strategy; rely on intentional data reduction approaches instead
  • Improve discoverability of table schemas (catalogs / docs) so users can choose columns intentionally before querying
Submit Feedback