Opportunity Name
EcsOptimizeFargate
AWS Resource Type
Amazon ECS (Elastic Container Service) Fargate
Opportunity Description
CloudFix identifies ECS Fargate services that are overprovisioned for their actual workload. Many Fargate services are configured with more CPU and memory than needed, often sized for peak load that never occurs or using default configurations. This results in paying for 100% of provisioned capacity while utilizing only 10-30%.
This finder analyzes your CloudWatch metrics to determine the peak utilization of each Fargate service and recommends downsizing to a smaller, cost-optimized configuration while maintaining safety buffers to ensure performance.
Criteria for Identifying the Opportunity
A Fargate service is identified as overprovisioned when ALL of the following conditions are met:
|
Criterion |
Description |
|---|---|
|
Service Age |
Service has been running for at least 30 days (configurable: 7/30/60 days) — ensures sufficient CloudWatch data for reliable analysis |
|
Not Smallest Config |
Service is not already at the smallest valid Fargate configuration (0.25 vCPU / 512 MB) |
|
Low CPU Utilization |
Peak CPU utilization (maximum of hourly P99 values) is below 80% threshold |
|
Low Memory Utilization |
Peak memory utilization (maximum of hourly P99 values) is below 80% threshold |
|
Valid Downsize Target |
A smaller valid Fargate configuration exists that can accommodate the peak utilization plus safety buffers |
Metrics Analysis Method:
-
CloudWatch CPUUtilization and MemoryUtilization metrics are queried using the P99 statistic with a 1-hour period
-
The maximum of all hourly P99 values is taken to find the worst-case sustained peak
-
Safety buffers (default 20% for CPU and memory) are applied to the peak values
-
The smallest valid Fargate CPU/memory combination that fits the buffered requirements is selected
Exclusions:
-
EC2/ASG-backed ECS clusters (different cost model)
-
Standalone Fargate tasks (not part of a service)
-
Services tagged with cloudfix:dont-fix-it
-
Services where another finder (e.g., EcsOptimizeSporadicToLambda) takes precedence
Potential Savings
Savings vary based on the degree of overprovisioning and current configuration:
|
Current Config |
Peak Utilization |
Recommended Config |
Annual Savings |
Savings % |
|---|---|---|---|---|
|
4 vCPU / 16 GB |
CPU: 15%, Mem: 20% |
1 vCPU / 4 GB |
~$820 |
62% |
|
2 vCPU / 8 GB |
CPU: 10%, Mem: 25% |
1 vCPU / 4 GB |
~$290 |
43% |
|
1 vCPU / 4 GB |
CPU: 20%, Mem: 30% |
0.5 vCPU / 2 GB |
~$115 |
38% |
ARM (Graviton) Savings: Services running on ARM/Graviton Fargate are approximately 20% cheaper than equivalent x86 configurations.
Savings Plans Portability: Compute Savings Plans cover Fargate regardless of task size. Your existing Savings Plan discount applies to the smaller configuration — no commitments are stranded.
What Happens When the Fixer is Executed?
This finder does not have an automatic fixer. CloudFix provides the recommendation, and you must apply the configuration change manually through your standard deployment process.
To implement the recommendation:
-
Update your ECS task definition with the recommended CPU and memory values
-
Deploy the updated task definition through your pipeline (ECS service update, CloudFormation, Terraform, etc.)
-
The ECS service will gradually replace tasks with the new configuration
Is It Possible to Roll Back Once CloudFix Implements the Fixer?
Since this is a recommendation-only finder with no automatic fixer, rollback is your responsibility. If you experience issues after resizing:
-
Update the task definition back to the previous configuration
-
Redeploy to restore the original CPU/memory settings
-
The service will gradually replace tasks with the original configuration
Can CloudFix Implement the Fix Automatically Once I Accept the Recommendation?
No. This is a recommendation-only finder. There is no automatic fixer available. You must manually update your ECS task definitions and deploy the changes through your existing infrastructure-as-code or deployment pipeline.
Does the Fix Require Downtime?
No. ECS Fargate services support zero-downtime deployments when using the default deployment configuration:
-
New tasks with the updated configuration are launched gradually
-
Old tasks are terminated only after new tasks are healthy
-
The service remains available throughout the rollout
However, if your application has specific startup requirements or health check dependencies, brief interruptions may occur during the deployment. Test the configuration change in a non-production environment first.
Bill Gleeson
Comments