Opportunity Name:

ML Resize SageMaker


AWS Resource Type:

Amazon SageMaker


Opportunity Description:

Amazon SageMaker instances are often overprovisioned, leading to unnecessary costs. CloudFix offers an automated solution to rightsize SageMaker instances based on actual usage metrics, optimizing instance types and sizes to closely match workloads and reduce overprovisioning waste.


Criteria for identifying the opportunity:

  • Collect SageMaker workload utilization metrics at 5-minute intervals over 14-day periods.
  • Analyze percentile distributions across CPU, GPU, Memory, and network.
  • Downsize by one instance size using the 99th percentile as a threshold.
  • Annual cost, extrapolated from the last 7 days of usage, exceeds the annual public cost threshold (default $100).


Potential savings (range in % on annual basis):

  • Ensure the cost reduction recommendation surpasses 10% of the Storage costs.
  • Actual savings will depend on the specific instance types and sizes being used, and the extent of overprovisioning.


Can CloudFix implement the fix automatically once I accept the recommendation?



Does this fix require downtime?



Other considerations:

  • Performance impact: Rightsizing is based on actual usage metrics to ensure there is no risk to performance.
  • Data loss considerations: There is no data loss expected as this process involves resizing instances, not deleting data.
  • Security concerns: No additional security concerns as rightsizing does not affect the security posture of SageMaker instances.


Additional Resources:



