Skip to content

Add recovery rebalance support for top state downward transitions#134

Merged
proud-parselmouth merged 2 commits intodevfrom
anubagar/downward_st_prioritzation
Mar 23, 2026
Merged

Add recovery rebalance support for top state downward transitions#134
proud-parselmouth merged 2 commits intodevfrom
anubagar/downward_st_prioritzation

Conversation

@proud-parselmouth
Copy link
Copy Markdown
Collaborator

Issues

  • My PR addresses the following Helix issues and references them in the PR description:

Presently any top state downward transition message from MASTER->SLAVE or LEADER->STANDBY gets evaluated as a load balance message. This is because when determining message type, Helix checks how many present participants are in the second-top state (e.g., SLAVE) and if that count is >= min active required number (almost always 1), it treats the top state downward transition message as load balance since it is increasing the number of replicas in that state. This causes an issue when it is urgently required to transfer the top state and the load balance messages are throttled due to ST throttle configs.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Added a new cluster config ENABLE_RECOVERY_REBALANCE_FOR_TOPSTATE_DOWNWARD_TRANSITION to handle such cases. When enabled, the IntermediateStateCalcStage reclassifies any top state downward ST message (e.g., MASTER->SLAVE, LEADER->STANDBY) as RECOVERY_REBALANCE instead of LOAD_BALANCE, ensuring these transitions are not throttled.

Changes:

  • ClusterConfig.java: Added new ENABLE_RECOVERY_REBALANCE_FOR_TOPSTATE_DOWNWARD_TRANSITION config property with getter/setter
  • IntermediateStateCalcStage.java: Added isTopStateDownwardTransition() method that checks if a message transitions from the top state to the second-top state. When the config is enabled and a message is classified as LOAD_BALANCE but is a top state downward transition, it is reclassified as RECOVERY_BALANCE

Tests

  • The following tests are written for this issue:
  1. TestIntermediateStateCalcStage.testEnableRecoveryRebalanceForTopStateDownwardStateTransition — Verifies MASTER->SLAVE is throttled as LOAD_BALANCE without config, and passes as RECOVERY_REBALANCE with config enabled
  2. TestIntermediateStateCalcStage.testNonTopStateDownwardTransitionNotReclassifiedAsRecovery — Verifies SLAVE->OFFLINE is NOT reclassified even with config enabled (negative test)
  3. TestIntermediateStateCalcStage.testRecoveryRebalanceForTopStateDownwardWithLeaderStandby — Verifies LEADER->STANDBY is correctly reclassified with LeaderStandby state model

Changes that Break Backward Compatibility (Optional)

  • My PR contains changes that break backward compatibility or previous assumptions for certain methods or API. They include:

(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)

Documentation (Optional)

  • In case of new functionality, my PR adds documentation in the following wiki page:

(Link the GitHub wiki you added)

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

Add ENABLE_RECOVERY_REBALANCE_FOR_TOPSTATE_DOWNWARD_TRANSITION config
to classify top state downward transitions (e.g., MASTER->SLAVE,
LEADER->STANDBY) as RECOVERY_REBALANCE instead of LOAD_BALANCE,
preventing them from being throttled during urgent top-state transfers.

Made-with: Cursor
@proud-parselmouth proud-parselmouth force-pushed the anubagar/downward_st_prioritzation branch from 0f7125e to 4b72540 Compare March 12, 2026 15:13
if (recoveryRebalanceForTopStateDownwardTransition
&& rebalanceType.equals(RebalanceType.LOAD_BALANCE)
&& isTopStateDownwardTransition(stateModelDef, message)) {
rebalanceType = RebalanceType.RECOVERY_BALANCE;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are updating the rebalanceType here but there is also quota counting before this stage which will charge this towards load_balance even though we are changing it later to recovery rebalance. IIUC, this could result in overcharging for load balance throttle limits.
chargePendingTransition(resource, currentStateOutput, throttleController, ...)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chargePendingTransition - This would be for the already sent pending transitions, right? Shouldn't be a problem. message throttle for new ones is at later stage. CMIIW @proud-parselmouth

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thestreak101 added the check in the chargePendingTransition also.

* Check if the message represents a top state downward transition (e.g., MASTER→SLAVE, LEADER→STANDBY).
* Used when the cluster config enables treating such transitions as recovery rebalance.
*/
private boolean isTopStateDownwardTransition(StateModelDefinition stateModelDef, Message msg) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, We only want to check if this message is an top state downward state transition but we don't need to verify if there is another associated SLAVE -> MASTER transition as well?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is required. this pr only checks for top state downward st prioritaztion.

// less than the threshold. Otherwise, only allow downward-transition load balance
boolean onlyDownwardLoadBalance = numPartitionsWithErrorReplica > threshold;

boolean recoveryRebalanceForTopStateDownwardTransition =
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to do the change in IntermediateStateCalcStageV2 as well. The behaviour for v2 is that - it keeps the score high for leadership handoff but keeps the message type as load balance so throttle scope will be different for both v1 and v2

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the logic in MessageThrottleProcessor. processMessagesWithThrottling, so IntermediateStateCalcStageV2 is handled that way.

* Check if the message represents a top state downward transition (e.g., MASTER→SLAVE, LEADER→STANDBY).
* Used when the cluster config enables treating such transitions as recovery rebalance.
*/
private boolean isTopStateDownwardTransition(StateModelDefinition stateModelDef, Message msg) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…V2 support

- Add shouldReclassifyForTopStateHandOff() to StateTransitionHelper as
  single source of truth for top-state downward transition reclassification
- Remove private isTopStateDownwardTransition() from IntermediateStateCalcStage
- Reuse existing isTopStateHandoff() instead of custom second-top-state check
- Drop redundant LOAD_BALANCE type guard (no-op when already RECOVERY_BALANCE)
- Apply reclassification in chargePendingTransition (V1) for correct quota
  accounting of in-flight messages
- Apply reclassification in MessageThrottleProcessor (V2) for both new
  messages and pending transitions

Made-with: Cursor
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a todo to add tests for this change when adding tests for this new class.

@proud-parselmouth proud-parselmouth merged commit 958f00f into dev Mar 23, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants