Unmanaged Growth of Athena Query Output Buckets

Abdeldjallil Koutchoukali

CER:

CER-0136

Service Category

Compute

Cloud Provider

AWS

Service Name

AWS Athena

Inefficiency Type

Missing Lifecycle Policy

Explanation

Athena generates a new S3 object for every query result, regardless of whether the output is needed long term. Over time, this leads to uncontrolled growth of the output bucket, especially in environments with repetitive queries such as cost and usage reporting. Many of these files are transient and provide little value once the query is consumed. Without lifecycle rules, organizations pay for unnecessary storage and create clutter in S3.

Relevant Billing Model

Athena query execution is billed per terabyte of data scanned, but query results are stored in S3 and billed according to S3 storage pricing. Each executed query produces an object in the output bucket, and costs accumulate as these objects persist over time without automated cleanup.

Detection

Review whether Athena query output buckets have lifecycle rules configured to delete or transition old objects
Assess growth trends in the S3 bucket size used for Athena outputs relative to actual business need
Check for repetitive or automated queries (e.g., CUR queries) that generate large volumes of transient results
Confirm whether audit, compliance, or reporting requirements justify long-term retention of certain outputs

Remediation

Implement S3 Lifecycle Policies on Athena output buckets to automatically expire objects after a set period (e.g., 30, 60, 90 days)
Use prefixes or tags to differentiate between temporary query outputs and long-term reports, applying tailored retention rules
Regularly review and adjust retention policies to balance cost efficiency with business and compliance needs

Relevant Documentation

Submit Feedback