While many AWS customers have migrated EC2 workloads to Graviton to reduce costs, Lambda functions often remain on the default x86 architecture. AWS Graviton2 (ARM) offers lower pricing and equal or better performance for most supported runtimes — yet adoption remains uneven due to legacy defaults or lack of awareness. Continuing to run eligible Lambda functions on x86 leads to unnecessary spending. The migration requires minimal configuration changes and can be verified through benchmarking and workload testing.
EFS file systems that are no longer attached to any running services — such as EC2 instances or Lambda functions — continue to incur storage charges. This often occurs after workloads are decommissioned but the file system is left behind. A quick indicator of this state is when the EFS file system has no mount targets configured. Without active usage or connection, these orphaned file systems represent pure cost with no functional value. Unlike block storage, EFS does not require an attached instance to incur billing, making it easy for unused resources to go unnoticed.
For Premium SSD and Standard SSD disks 513 GiB or larger, Azure now offers the option to enable Performance Plus — unlocking higher IOPS and MBps at no extra cost. Many environments that previously required custom performance settings continue to pay for additional throughput unnecessarily. By not enabling Performance Plus on eligible disks, organizations miss a straightforward opportunity to reduce disk spend while maintaining or improving performance. The feature is opt-in and must be explicitly enabled on each qualifying disk.
Each Azure VM size has a defined limit for total disk IOPS and throughput. When high-performance disks (e.g., Premium SSDs with high IOPS capacity) are attached to low-tier VMs, the disk’s performance capabilities may exceed what the VM can consume. This results in paying for performance that the VM cannot access. For example, attaching a large Premium SSD to a B-series VM will not provide the expected performance because the VM cannot deliver that level of throughput. Without aligning disk selection with VM limits, organizations incur unnecessary storage costs with no corresponding performance benefit.
Azure WAF configurations attached to Application Gateways can persist after their backend pool resources have been removed — often during environment reconfiguration or application decommissioning. In these cases, the WAF is no longer serving any functional purpose but continues to incur fixed hourly costs. Because no traffic is routed and no applications are protected, the WAF is effectively inactive. These orphaned WAFs are easy to overlook without regular cleanup processes and can quietly accumulate unnecessary charges over time.
Many EC2 workloads—such as development environments, test jobs, stateless services, and data processing pipelines—can tolerate interruptions and do not require the reliability of On-Demand pricing. Using On-Demand instances in these scenarios drives up cost without adding value. Spot Instances offer significantly lower pricing and are well-suited to workloads that can handle restarts, retries, or fluctuations in capacity. Without evaluating workload tolerance and adjusting pricing models accordingly, organizations risk consistently overpaying for compute.
Databricks supports AWS Graviton-based instances for most workloads, including Spark jobs, data engineering pipelines, and interactive notebooks. These instances offer significant cost advantages over traditional x86-based VMs, with comparable or better performance in many cases. When teams default to legacy instance types, they miss an easy opportunity to reduce compute spend. Unless workloads have known compatibility issues or specialized requirements, Graviton should be the default instance family used in Databricks Clusters.
In Databricks, on-demand instances provide reliable performance but come at a premium cost. For non-production workloads—such as development, testing, or exploratory analysis—high availability is often unnecessary. Spot instances provide equivalent performance at a lower price, with the tradeoff of occasional interruptions. If teams default to on-demand usage in lower environments, they may be incurring unnecessary compute costs. Using compute policies to limit on-demand usage ensures greater consistency and efficiency across environments.
Databricks users can select from a wide range of instance types for cluster driver and worker nodes. Without guardrails, teams may choose high-cost configurations (e.g., 16xlarge nodes) that exceed workload requirements. This results in inflated costs with little performance benefit. To reduce this risk, administrators can use compute policies to define acceptable node types and enforce size limits across the workspace.
Workloads using legacy Premium SSD managed disks may be eligible for migration to Premium SSD v2, which delivers equivalent or improved performance characteristics at a lower cost. Premium SSD v2 decouples disk size from performance metrics like IOPS and throughput, enabling more granular cost optimization. Additionally, Premium SSD disks are often overprovisioned in size—for example, a P40 disk with more IOPS and capacity than the workload requires—resulting in inflated storage costs. Rightsizing includes both transitioning to v2 and resizing to smaller SKUs (e.g., P40 → P20) based on observed utilization. Failure to address either form of overprovisioning leads to persistent waste.