Opportunity Name:
ML Stop Idle SageMaker Notebooks
AWS Resource Type:
SageMaker Notebook Instances
Opportunity Description:
SageMaker notebook instances that remain idle for extended periods can incur unnecessary costs. CloudFix identifies these idle notebook instances based on their connectivity state and instance metrics. When an idle period threshold is reached, CloudFix can automatically stop the notebook instance to optimize AWS SageMaker costs.
Criteria for identifying the opportunity:
- Monitor SageMaker notebook instances for periods of user inactivity.
- Use CloudWatch agent for OS-level metrics to detect idle periods.
- Consider a notebook instance idle if it has a maximum CPU utilization of 0 within the CloudWatch aggregation period, indicating no computation activity.
- Default inactivity duration before auto-stop is set to 1 hour.
- Annual cost, extrapolated from the last 31 days of usage, exceeds the annual public cost threshold (default $100).
Potential savings (range in % on annual basis):
- Stopping an idle notebook instance saves on compute costs while still incurring minimal storage costs for the underlying ML storage volume.
- For example, stopping a ml.t3.medium notebook instance, which costs around $0.066 per hour (~$48 per month), while continuing to pay for a 5GB ML storage volume at $0.50 per month.
Can CloudFix apply an automatic fix?
Yes
Other considerations:
- Data Loss: Stopping a notebook instance does not delete it or its data, minimizing the risk of data loss. Users continue to be charged for the storage of the underlying ML storage volume.
- Performance Impact: Stopping an idle notebook has no impact on active workloads. Notebooks can be easily restarted when needed.
- Security Concerns: Security configurations and data encryption remain intact for the stopped notebook instances and their associated storage volumes.
Comments
0 comments
Please sign in to leave a comment.