Submit feedback on
Unoptimized Billing Model for BigQuery Dataset Storage
We've received your feedback.
Thanks for reaching out!
Oops! Something went wrong while submitting the form.
Close
Unoptimized Billing Model for BigQuery Dataset Storage
Laurent Dumont
Service Category
Databases
Cloud Provider
GCP
Service Name
GCP BigQuery
Inefficiency Type
Inefficient Configuration
Explanation

Highly compressible datasets, such as those with repeated string fields, nested structures, or uniform rows, can benefit significantly from physical storage billing. Yet most datasets remain on logical storage by default, even when physical storage would reduce costs.

This inefficiency is common for cold or infrequently updated datasets that are no longer optimized or regularly reviewed. Because storage behavior and data characteristics evolve, failing to periodically reassess the billing model may result in persistent waste.

Relevant Billing Model

BigQuery supports two billing models for table storage:

Logical Storage (default): Billed based on the uncompressed size of user data. This model includes time travel and fail-safe storage at no extra cost.

Physical Storage: Billed based on the actual compressed bytes on disk. Time travel and fail-safe storage are charged separately at the same rate as active storage.

While physical storage can significantly reduce costs for highly compressible datasets (e.g., compression savings exceeding \~45%), its higher per-byte cost (\~1.8x) and additional charges for retention-related features may lead to higher total costs for frequently updated or less compressible data.

Detection
  • Evaluate datasets with high logical-to-physical compression ratios.
  • The billing model is applied at the dataset level. Make sure that the compression ratio is aligned to all tables within the dataset.
  • Run periodic queries against `INFORMATION_SCHEMA.TABLE_STORAGE` to compare logical and physical storage sizes across multiple dimensions: active, long-term, time travel and fail-safe.
  • Prioritize datasets that are infrequently updated (therefore with low time travel and fail-safe storage volumes) or primarily used for historical lookback
  • Use the storage pricing to simulate cost in the physical model \- multiply by the new rates, and add the time travel and fail-safe storage bytes.
  • Flag datasets with simulated physical storage price lower than the current logical storage price
  • Balance the frequency of detection queries with their cost, and avoid excessive scanning of large datasets
Remediation
  • Switch eligible datasets to physical storage billing when compression advantages are material
  • There is no performance impact between the two billing models.
  • Changing the billing model takes 24 hours before it’s reflected in the GCP billing SKUs.
  • There is a 14-day waiting period when a change is made to the billing model.
  • Periodically reassess dataset compression ratios to determine if billing model changes are warranted
  • Use partitioning and clustering to improve compressibility where possible
  • Apply billing model changes to cold or infrequently modified data first, as their structure is more stable
  • Document the dataset billing model decisions to ensure transparency and reproducibility
  • Monitor billing changes for 2-3 months after switching to validate expected savings
  • Consider dataset lifecycle \- archive old partitions to cheaper storage classes first
Submit Feedback