Azure Virtual Machine Scale Sets can operate in two modes: manual scaling with a fixed instance count, or autoscaling with dynamic instance counts that respond to demand. When a scale set is configured with manual scaling, it maintains the same number of VM instances at all times — regardless of whether those instances are actively processing workload. Every provisioned instance continues to incur per-second compute charges, meaning the organization pays for full capacity even during off-peak hours, weekends, or seasonal lulls when only a fraction of that capacity is needed.
This pattern is especially wasteful for workloads with variable demand — web applications with daily traffic cycles, batch processing jobs that run at specific intervals, or services with clear seasonal peaks. If a scale set is sized for peak demand but runs at that capacity around the clock, the gap between provisioned resources and actual utilization translates directly into unnecessary spend. Microsoft explicitly identifies autoscaling as a mechanism to reduce scale set costs by running only the number of instances required to meet current demand.
There are legitimate reasons to maintain fixed capacity — stateful applications that cannot tolerate dynamic instance changes, workloads with licensing constraints tied to specific instance counts, or scenarios where consistent performance without scale-up latency is critical. However, many scale sets running at fixed capacity do so simply because autoscaling was never configured, not because it was deliberately excluded. Identifying and addressing these cases represents a significant cost optimization opportunity.