GPU EC2 Manual Optimization

Opportunity Name:

GPU Finder for G4dn and P3 Instance Optimization.

AWS Resource Type:

EC2

Opportunity Description:

The GPU Finder identifies opportunities to optimize G4dn and P3 GPU-based EC2 instances by leveraging AWS Compute Optimizer recommendations. It surfaces these opportunities in the CloudFix UI as manual optimizations for the user to review and optionally fix.

Criteria for identifying the opportunity:

  • Instance type: Only G4dn and P3 GPU instance type families are considered.
  • EC2 Usage: The instance must have hourly charges or dedicated usage charge in the CUR data, indicating it is a normal running instance. Spot, EKS, ECS, and some other specialized usage types are excluded.
  • Not Autoscaling: Instances belonging to an autoscaling group are excluded, as these are managed automatically.
  • CloudWatch Agent: The CloudWatch agent must be installed and configured for a specific NVIDIA agent configuration.
  • Savings threshold: The projected savings from Compute Optimizer must be above $100 for the previous month.
  • Performance risk: Recommendations flagged as high risk by Compute Optimizer are excluded.
  • Instance Size: Recommendations to right-size to nano/micro instances are excluded.

Potential savings (range in % on annual basis):

Based on related EC2 optimizations, up to 20% savings can be achieved.

What happens when the Fixer is executed?

The fixer is not automatic. The instance must be manually resized.

Is it possible to rollback once CloudFix implements the fixer?

There is no automated rollback since the instance resizing is executed manually by the user.

Can CloudFix implement the fix automatically once I accept the recommendation?

No. This resize operation is manually applied and must be manually reversed.

Does this fix require downtime?

Yes. Resizing an instance requires the instance to be stopped, resized, and restarted.

 
Additional Resources:

Comments

0 comments

Please sign in to leave a comment.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request