ML Retype to Inferentia

Opportunity Name:

ML Retype to Inferentia


AWS Resource Type:

Amazon SageMaker


Opportunity Description:

This opportunity focuses on evaluating machine learning production workloads to identify potential cost savings by migrating from GPU instances to AWS Inferentia instances for inference tasks. AWS Inferentia is a custom chip designed to provide high-throughput, low-latency inference performance at a lower cost compared to GPU instances.


Criteria for identifying the opportunity:

  • Instances must be running Amazon SageMaker with GPU-based instance types.
  • Utilization metrics (GPU, CPU, Memory) are analyzed over the past 14 days to establish typical and peak values.
  • Instances with GPU Compute Utilization indicating that Inferentia could provide comparable or better performance are considered.
  • Annual cost, extrapolated from the last 31 days of usage, exceeds the annual public cost threshold (default $100).


Potential savings (range in % on annual basis):

  • Migrating to Inferentia instances can result in significant cost savings due to the lower cost per chip-hour compared to GPU instances. For example, moving from a p3dn.2xlarge GPU instance to an ml.inf1.2xlarge Inferentia instance can reduce the cost per hour by approximately 75%.
  • Actual savings will depend on the specific instance types, usage patterns, and the compatibility of the machine learning models with Inferentia.


Can CloudFix apply an automatic fix?



Other considerations:

  • Performance impact: It's crucial to validate that the machine learning models are compatible with Inferentia and that the expected inference performance meets the application requirements.
  • Data loss considerations: There should be no data loss involved in migrating instance types for inference tasks. However, thorough testing is recommended to ensure model accuracy and performance are not impacted.
  • Security concerns: Security configurations and compliance should be reviewed as part of the migration process to ensure that they are not affected by the change in instance types.


Additional Resources:



Please sign in to leave a comment.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request