Optimize Overprovisioned ECS Fargate Services

Opportunity Name

EcsOptimizeFargate

AWS Resource Type

Amazon ECS (Elastic Container Service) Fargate

Opportunity Description

CloudFix identifies ECS Fargate services that are overprovisioned for their actual workload. Many Fargate services are configured with more CPU and memory than needed, often sized for peak load that never occurs or using default configurations. This results in paying for 100% of provisioned capacity while utilizing only 10-30%.

This finder analyzes your CloudWatch metrics to determine the peak utilization of each Fargate service and recommends downsizing to a smaller, cost-optimized configuration while maintaining safety buffers to ensure performance.

Criteria for Identifying the Opportunity

A Fargate service is identified as overprovisioned when ALL of the following conditions are met:

Criterion	Description
Service Age	Service has been running for at least 30 days (configurable: 7/30/60 days) — ensures sufficient CloudWatch data for reliable analysis
Not Smallest Config	Service is not already at the smallest valid Fargate configuration (0.25 vCPU / 512 MB)
Low CPU Utilization	Peak CPU utilization (maximum of hourly P99 values) is below 80% threshold
Low Memory Utilization	Peak memory utilization (maximum of hourly P99 values) is below 80% threshold
Valid Downsize Target	A smaller valid Fargate configuration exists that can accommodate the peak utilization plus safety buffers

Metrics Analysis Method:

CloudWatch CPUUtilization and MemoryUtilization metrics are queried using the P99 statistic with a 1-hour period
The maximum of all hourly P99 values is taken to find the worst-case sustained peak
Safety buffers (default 20% for CPU and memory) are applied to the peak values
The smallest valid Fargate CPU/memory combination that fits the buffered requirements is selected

Exclusions:

EC2/ASG-backed ECS clusters (different cost model)
Standalone Fargate tasks (not part of a service)
Services tagged with cloudfix:dont-fix-it
Services where another finder (e.g., EcsOptimizeSporadicToLambda) takes precedence

Potential Savings

Savings vary based on the degree of overprovisioning and current configuration:

Current Config	Peak Utilization	Recommended Config	Annual Savings	Savings %
4 vCPU / 16 GB	CPU: 15%, Mem: 20%	1 vCPU / 4 GB	~$820	62%
2 vCPU / 8 GB	CPU: 10%, Mem: 25%	1 vCPU / 4 GB	~$290	43%
1 vCPU / 4 GB	CPU: 20%, Mem: 30%	0.5 vCPU / 2 GB	~$115	38%

ARM (Graviton) Savings: Services running on ARM/Graviton Fargate are approximately 20% cheaper than equivalent x86 configurations.

Savings Plans Portability: Compute Savings Plans cover Fargate regardless of task size. Your existing Savings Plan discount applies to the smaller configuration — no commitments are stranded.

What Happens When the Fixer is Executed?

This finder does not have an automatic fixer. CloudFix provides the recommendation, and you must apply the configuration change manually through your standard deployment process.

To implement the recommendation:

Update your ECS task definition with the recommended CPU and memory values
Deploy the updated task definition through your pipeline (ECS service update, CloudFormation, Terraform, etc.)
The ECS service will gradually replace tasks with the new configuration

Is It Possible to Roll Back Once CloudFix Implements the Fixer?

Since this is a recommendation-only finder with no automatic fixer, rollback is your responsibility. If you experience issues after resizing:

Update the task definition back to the previous configuration
Redeploy to restore the original CPU/memory settings
The service will gradually replace tasks with the original configuration

Can CloudFix Implement the Fix Automatically Once I Accept the Recommendation?

No. This is a recommendation-only finder. There is no automatic fixer available. You must manually update your ECS task definitions and deploy the changes through your existing infrastructure-as-code or deployment pipeline.

Does the Fix Require Downtime?

No. ECS Fargate services support zero-downtime deployments when using the default deployment configuration:

New tasks with the updated configuration are launched gradually
Old tasks are terminated only after new tasks are healthy
The service remains available throughout the rollout

However, if your application has specific startup requirements or health check dependencies, brief interruptions may occur during the deployment. Test the configuration change in a non-production environment first.

Additional Resources

Choose files or drag and drop files

Tags:

Was this article helpful?

Yes

Bill Gleeson
Posted
Updated