Start a conversation

AWS MSK broker instance rightsizing

Opportunity Name

MSK Broker Instance Rightsizing (MskRightsizeInstances)

AWS Resource Type (AWS service name)

Amazon Managed Streaming for Apache Kafka (Amazon MSK – Provisioned/Standard brokers)

Opportunity Description

This CloudFix Finder analyzes Amazon MSK broker utilization (CPU, memory, and partition pressure) and recommends a vertical rightsizing change to a more appropriate broker instance type (for example, kafka.m5.large → kafka.m7g.large) to reduce cost while maintaining required performance headroom.

This spec is instance-type-only rightsizing:

  • In scope: changing broker instance type (vertical resizing)

  • Out of scope: changing number of brokers, storage/EBS tuning, and partition reassignment

Criteria for identifying the opportunity

CloudFix identifies an opportunity when the following conditions are met:

Cost + tagging gates (CUR-based seed list)

  • CUR shows MSK broker “RunBroker” usage for the cluster over a ~31-day window (AmazonMSK, RunBroker)

  • Excludes resources tagged with cloudfix_dont_fix_it

  • Excludes clusters with annualized amortized cost ≤ $100

Eligibility validations (API-based)

  • Cluster exists (via DescribeCluster / DescribeClusterV2) and the ARN matches the CUR resource_id

  • Cluster state is ACTIVE (exclude creating/updating/deleting/failed)

  • Cluster type is STANDARD

  • Cluster age is older than the configured lookbackPeriodDays (default 31)

Recommendation feasibility checks

  • A cheaper target instance type exists (pricing cached via AWS pricing data) and is available in the region

  • Per-broker utilization over the lookback window supports the target type with configured buffers:

    • CPU utilization statistic (default p99) must stay below (100% - cpuBufferPercent) (default buffer 20% → threshold 80%)

    • Memory utilization statistic (default p99) must stay below (100% - memoryBufferPercent) (default buffer 20% → threshold 80%)

  • Partition pressure check:

    • Target instance partition guidance/limits must exceed observed partitions-per-broker with partitionCountBufferPercent (default 20%)


Potential Savings (if known)

Savings are estimated by comparing:

  • Current annualized cost (CUR annualized amortized and public cost)
    vs.

  • Target annualized cost, based on the ratio of target vs current broker-hour pricing

CloudFix reports:

  • Estimated annual savings ($)

  • Savings percentage (%)

(Reference pricing: https://aws.amazon.com/msk/pricing/)

What happens when the Fixer is Executed?

There is no Fixer for this opportunity (Finder-only). CloudFix provides the recommended instance type and supporting analysis; customers implement the change manually in AWS.

Is it possible to roll back once CloudFix implements the Fixer?

Not applicable (no CloudFix Fixer). If you change the broker type manually, rollback would also be manual (change back to the previous broker type).

Can CloudFix implement the fix automatically once I accept the recommendation?

No — manual process (Finder-only).

Does the fix require downtime?

CloudFix does not execute changes. If you apply the broker-type change manually, AWS describes the broker-size update as happening in a rolling fashion while the cluster is running, but downsizing can reduce performance and should be planned carefully. 

Additional Resources

Choose files or drag and drop files
Was this article helpful?
Yes
No
  1. Bill Gleeson

  2. Posted

Comments