Add recovery rebalance support for top state downward transitions by proud-parselmouth · Pull Request #134 · linkedin/helix

proud-parselmouth · 2026-03-12T15:11:06Z

Issues

My PR addresses the following Helix issues and references them in the PR description:

Presently any top state downward transition message from MASTER->SLAVE or LEADER->STANDBY gets evaluated as a load balance message. This is because when determining message type, Helix checks how many present participants are in the second-top state (e.g., SLAVE) and if that count is >= min active required number (almost always 1), it treats the top state downward transition message as load balance since it is increasing the number of replicas in that state. This causes an issue when it is urgently required to transfer the top state and the load balance messages are throttled due to ST throttle configs.

Description

Here are some details about my PR, including screenshots of any UI changes:

Added a new cluster config ENABLE_RECOVERY_REBALANCE_FOR_TOPSTATE_DOWNWARD_TRANSITION to handle such cases. When enabled, the IntermediateStateCalcStage reclassifies any top state downward ST message (e.g., MASTER->SLAVE, LEADER->STANDBY) as RECOVERY_REBALANCE instead of LOAD_BALANCE, ensuring these transitions are not throttled.

Changes:

ClusterConfig.java: Added new ENABLE_RECOVERY_REBALANCE_FOR_TOPSTATE_DOWNWARD_TRANSITION config property with getter/setter
IntermediateStateCalcStage.java: Added isTopStateDownwardTransition() method that checks if a message transitions from the top state to the second-top state. When the config is enabled and a message is classified as LOAD_BALANCE but is a top state downward transition, it is reclassified as RECOVERY_BALANCE

Tests

The following tests are written for this issue:

TestIntermediateStateCalcStage.testEnableRecoveryRebalanceForTopStateDownwardStateTransition — Verifies MASTER->SLAVE is throttled as LOAD_BALANCE without config, and passes as RECOVERY_REBALANCE with config enabled
TestIntermediateStateCalcStage.testNonTopStateDownwardTransitionNotReclassifiedAsRecovery — Verifies SLAVE->OFFLINE is NOT reclassified even with config enabled (negative test)
TestIntermediateStateCalcStage.testRecoveryRebalanceForTopStateDownwardWithLeaderStandby — Verifies LEADER->STANDBY is correctly reclassified with LeaderStandby state model

Changes that Break Backward Compatibility (Optional)

My PR contains changes that break backward compatibility or previous assumptions for certain methods or API. They include:

(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)

Documentation (Optional)

In case of new functionality, my PR adds documentation in the following wiki page:

(Link the GitHub wiki you added)

Commits

My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters (not including Jira issue reference)
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"

Code Quality

My diff has been formatted using helix-style.xml
(helix-style-intellij.xml if IntelliJ IDE is used)

Add ENABLE_RECOVERY_REBALANCE_FOR_TOPSTATE_DOWNWARD_TRANSITION config to classify top state downward transitions (e.g., MASTER->SLAVE, LEADER->STANDBY) as RECOVERY_REBALANCE instead of LOAD_BALANCE, preventing them from being throttled during urgent top-state transfers. Made-with: Cursor

thestreak101 · 2026-03-16T15:57:18Z

helix-core/src/main/java/org/apache/helix/controller/stages/IntermediateStateCalcStage.java

+        if (recoveryRebalanceForTopStateDownwardTransition
+            && rebalanceType.equals(RebalanceType.LOAD_BALANCE)
+            && isTopStateDownwardTransition(stateModelDef, message)) {
+          rebalanceType = RebalanceType.RECOVERY_BALANCE;


We are updating the rebalanceType here but there is also quota counting before this stage which will charge this towards load_balance even though we are changing it later to recovery rebalance. IIUC, this could result in overcharging for load balance throttle limits.
chargePendingTransition(resource, currentStateOutput, throttleController, ...)

chargePendingTransition - This would be for the already sent pending transitions, right? Shouldn't be a problem. message throttle for new ones is at later stage. CMIIW @proud-parselmouth

@thestreak101 added the check in the chargePendingTransition also.

thestreak101 · 2026-03-16T15:59:45Z

helix-core/src/main/java/org/apache/helix/controller/stages/IntermediateStateCalcStage.java

+   * Check if the message represents a top state downward transition (e.g., MASTER→SLAVE, LEADER→STANDBY).
+   * Used when the cluster config enables treating such transitions as recovery rebalance.
+   */
+  private boolean isTopStateDownwardTransition(StateModelDefinition stateModelDef, Message msg) {


Just to confirm, We only want to check if this message is an top state downward state transition but we don't need to verify if there is another associated SLAVE -> MASTER transition as well?

I don't think that is required. this pr only checks for top state downward st prioritaztion.

ngngwr · 2026-03-17T04:45:40Z

helix-core/src/main/java/org/apache/helix/controller/stages/IntermediateStateCalcStage.java

    // less than the threshold. Otherwise, only allow downward-transition load balance
    boolean onlyDownwardLoadBalance = numPartitionsWithErrorReplica > threshold;

+    boolean recoveryRebalanceForTopStateDownwardTransition =


I believe we need to do the change in IntermediateStateCalcStageV2 as well. The behaviour for v2 is that - it keeps the score high for leadership handoff but keeps the message type as load balance so throttle scope will be different for both v1 and v2

Added the logic in MessageThrottleProcessor. processMessagesWithThrottling, so IntermediateStateCalcStageV2 is handled that way.

ngngwr · 2026-03-17T04:55:06Z

helix-core/src/main/java/org/apache/helix/controller/stages/IntermediateStateCalcStage.java

+   * Check if the message represents a top state downward transition (e.g., MASTER→SLAVE, LEADER→STANDBY).
+   * Used when the cluster config enables treating such transitions as recovery rebalance.
+   */
+  private boolean isTopStateDownwardTransition(StateModelDefinition stateModelDef, Message msg) {


Can you see if we can utilise existing isTopStateHandoff method https://github.com/linkedin/helix/blob/4b72540121de1a77aaed990ac5eefe84d85c498e/helix-core/src/main/java/org/apache/helix/controller/stages/StateTransitionHelper.java#L62C1-L69C1

…V2 support - Add shouldReclassifyForTopStateHandOff() to StateTransitionHelper as single source of truth for top-state downward transition reclassification - Remove private isTopStateDownwardTransition() from IntermediateStateCalcStage - Reuse existing isTopStateHandoff() instead of custom second-top-state check - Drop redundant LOAD_BALANCE type guard (no-op when already RECOVERY_BALANCE) - Apply reclassification in chargePendingTransition (V1) for correct quota accounting of in-flight messages - Apply reclassification in MessageThrottleProcessor (V2) for both new messages and pending transitions Made-with: Cursor

thestreak101 · 2026-03-20T06:29:37Z

helix-core/src/main/java/org/apache/helix/controller/stages/MessageThrottleProcessor.java

Add a todo to add tests for this change when adding tests for this new class.

proud-parselmouth requested review from ngngwr and thestreak101 March 12, 2026 15:11

proud-parselmouth force-pushed the anubagar/downward_st_prioritzation branch from 0f7125e to 4b72540 Compare March 12, 2026 15:13

thestreak101 reviewed Mar 16, 2026

View reviewed changes

ngngwr reviewed Mar 17, 2026

View reviewed changes

thestreak101 approved these changes Mar 20, 2026

View reviewed changes

proud-parselmouth merged commit 958f00f into dev Mar 23, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add recovery rebalance support for top state downward transitions#134

Add recovery rebalance support for top state downward transitions#134
proud-parselmouth merged 2 commits intodevfrom
anubagar/downward_st_prioritzation

proud-parselmouth commented Mar 12, 2026

Uh oh!

thestreak101 Mar 16, 2026

Uh oh!

ngngwr Mar 17, 2026

Uh oh!

proud-parselmouth Mar 17, 2026

Uh oh!

thestreak101 Mar 16, 2026

Uh oh!

proud-parselmouth Mar 17, 2026

Uh oh!

ngngwr Mar 17, 2026

Uh oh!

proud-parselmouth Mar 17, 2026

Uh oh!

ngngwr Mar 17, 2026

Uh oh!

thestreak101 Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

proud-parselmouth commented Mar 12, 2026

Issues

Description

Tests

Changes that Break Backward Compatibility (Optional)

Documentation (Optional)

Commits

Code Quality

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants