Optimize OpenSearch domains with over-provisioned storage

Opportunity Name:

Optimize OpenSearch domains with over-provisioned storage

 

AWS Resource Type:

AWS OpenSearch

 

Opportunity Description:

Amazon charges a premium price for OpenSearch gp2 and gp3 volumes, which are 35% and 52.5% more expensive than regular EC2 gp2 and gp3 volumes, respectively. Despite this, many OpenSearch domains have over-provisioned storage - this FF identifies such domains and right-sizes the attached storage volumes to save costs.

 

Criteria for identifying the opportunity:

The Finder uses the CloudWatch GetMetricData API endpoint to gather metrics for the OpenSearch domain for the last 30 days. The following metrics are collected using an aggregation period of 5 minutes:

  • FreeStorageSpace (Statistics: Minimum, First, Current)
  • ReadIOPS (Statistic: Maximum)
  • WriteIOPS (Statistic: Maximum)

 

The Finder then predicts the utilization of the domain’s volume storage 3 months from now using a linear regression model to calculate the recommended volume size:

  • PredictedFreeStorageSpaceIn3Months = 4 x CurrentFreeStorageSpace - 3 x FreeStorageSpace30DaysAgo
  • RecommendedVolumeSize = int((CurrentVolumeSize - MinimumFreeStorageSpace) x 1.3)

Where :

  • MinimumFreeStorageSpace_ = max(0, min(MinimumFreeStorageSpaceInLast30Days, PredictedFreeStorageSpaceIn3Months))

 

An opportunity is identified if the OpenSearch domain meets the following criteria:

  • The domain has at least 30 days of CloudWatch metrics data.
  • Has EBS volume sizes >10GB. (The volume size attached to OpenSearch domains cannot be reduced below 10GB)
  • RecommendedVolumeSize from the above calculation is less than the current volume size

The performance of gp2 volumes is tied to their size, so the FF will not reduce the size of gp2 volumes below the point at which it would impact the ReadIOPS or WriteIOPS requirements.

 

Potential savings (range in % on annual basis):

The savings that can be achieved will depend on your current OpenSearch usage and configuration, but typically customers can expect savings of at least 12%. 

 

What happens when the Fixer is executed?

The Fixer carries out the following automated steps:

 

Is it possible to rollback once CloudFix implements the fixer?

Yes - in the unlikely event that the cluster enters a “red” state, you will be sent an alert notification so that you can restore the index from a snapshot.

 

If the free storage space in the resized OpenSearch domain falls below 20%, a CloudWatch alarm will be triggered and will run the FF’s rollback automation to scale the storage volume back up to provide 30% free storage space. The alarm remains active for 90 days.

 

Can CloudFix implement the fix automatically once I accept the recommendation?

Yes.

 

Does this fix require downtime?

No. However, during the update process, Amazon OpenSearch will temporarily increase the number of instances in the cluster. This temporary increase can strain the cluster's dedicated master nodes and increase search and indexing latencies. Therefore, we recommend that the Fixer should be run within a maintenance window.

 

Additional Resources:

Comments

0 comments

Article is closed for comments.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request