Add recovery rebalance support for top state downward transitions#134
Add recovery rebalance support for top state downward transitions#134proud-parselmouth merged 2 commits intodevfrom
Conversation
Add ENABLE_RECOVERY_REBALANCE_FOR_TOPSTATE_DOWNWARD_TRANSITION config to classify top state downward transitions (e.g., MASTER->SLAVE, LEADER->STANDBY) as RECOVERY_REBALANCE instead of LOAD_BALANCE, preventing them from being throttled during urgent top-state transfers. Made-with: Cursor
0f7125e to
4b72540
Compare
| if (recoveryRebalanceForTopStateDownwardTransition | ||
| && rebalanceType.equals(RebalanceType.LOAD_BALANCE) | ||
| && isTopStateDownwardTransition(stateModelDef, message)) { | ||
| rebalanceType = RebalanceType.RECOVERY_BALANCE; |
There was a problem hiding this comment.
We are updating the rebalanceType here but there is also quota counting before this stage which will charge this towards load_balance even though we are changing it later to recovery rebalance. IIUC, this could result in overcharging for load balance throttle limits.
chargePendingTransition(resource, currentStateOutput, throttleController, ...)
There was a problem hiding this comment.
chargePendingTransition - This would be for the already sent pending transitions, right? Shouldn't be a problem. message throttle for new ones is at later stage. CMIIW @proud-parselmouth
There was a problem hiding this comment.
@thestreak101 added the check in the chargePendingTransition also.
| * Check if the message represents a top state downward transition (e.g., MASTER→SLAVE, LEADER→STANDBY). | ||
| * Used when the cluster config enables treating such transitions as recovery rebalance. | ||
| */ | ||
| private boolean isTopStateDownwardTransition(StateModelDefinition stateModelDef, Message msg) { |
There was a problem hiding this comment.
Just to confirm, We only want to check if this message is an top state downward state transition but we don't need to verify if there is another associated SLAVE -> MASTER transition as well?
There was a problem hiding this comment.
I don't think that is required. this pr only checks for top state downward st prioritaztion.
| // less than the threshold. Otherwise, only allow downward-transition load balance | ||
| boolean onlyDownwardLoadBalance = numPartitionsWithErrorReplica > threshold; | ||
|
|
||
| boolean recoveryRebalanceForTopStateDownwardTransition = |
There was a problem hiding this comment.
I believe we need to do the change in IntermediateStateCalcStageV2 as well. The behaviour for v2 is that - it keeps the score high for leadership handoff but keeps the message type as load balance so throttle scope will be different for both v1 and v2
There was a problem hiding this comment.
Added the logic in MessageThrottleProcessor. processMessagesWithThrottling, so IntermediateStateCalcStageV2 is handled that way.
| * Check if the message represents a top state downward transition (e.g., MASTER→SLAVE, LEADER→STANDBY). | ||
| * Used when the cluster config enables treating such transitions as recovery rebalance. | ||
| */ | ||
| private boolean isTopStateDownwardTransition(StateModelDefinition stateModelDef, Message msg) { |
There was a problem hiding this comment.
Can you see if we can utilise existing isTopStateHandoff method https://github.com/linkedin/helix/blob/4b72540121de1a77aaed990ac5eefe84d85c498e/helix-core/src/main/java/org/apache/helix/controller/stages/StateTransitionHelper.java#L62C1-L69C1
…V2 support - Add shouldReclassifyForTopStateHandOff() to StateTransitionHelper as single source of truth for top-state downward transition reclassification - Remove private isTopStateDownwardTransition() from IntermediateStateCalcStage - Reuse existing isTopStateHandoff() instead of custom second-top-state check - Drop redundant LOAD_BALANCE type guard (no-op when already RECOVERY_BALANCE) - Apply reclassification in chargePendingTransition (V1) for correct quota accounting of in-flight messages - Apply reclassification in MessageThrottleProcessor (V2) for both new messages and pending transitions Made-with: Cursor
There was a problem hiding this comment.
Add a todo to add tests for this change when adding tests for this new class.
Issues
Presently any top state downward transition message from MASTER->SLAVE or LEADER->STANDBY gets evaluated as a load balance message. This is because when determining message type, Helix checks how many present participants are in the second-top state (e.g., SLAVE) and if that count is >= min active required number (almost always 1), it treats the top state downward transition message as load balance since it is increasing the number of replicas in that state. This causes an issue when it is urgently required to transfer the top state and the load balance messages are throttled due to ST throttle configs.
Description
Added a new cluster config
ENABLE_RECOVERY_REBALANCE_FOR_TOPSTATE_DOWNWARD_TRANSITIONto handle such cases. When enabled, the IntermediateStateCalcStage reclassifies any top state downward ST message (e.g., MASTER->SLAVE, LEADER->STANDBY) asRECOVERY_REBALANCEinstead ofLOAD_BALANCE, ensuring these transitions are not throttled.Changes:
ENABLE_RECOVERY_REBALANCE_FOR_TOPSTATE_DOWNWARD_TRANSITIONconfig property with getter/setterisTopStateDownwardTransition()method that checks if a message transitions from the top state to the second-top state. When the config is enabled and a message is classified as LOAD_BALANCE but is a top state downward transition, it is reclassified as RECOVERY_BALANCETests
TestIntermediateStateCalcStage.testEnableRecoveryRebalanceForTopStateDownwardStateTransition— Verifies MASTER->SLAVE is throttled as LOAD_BALANCE without config, and passes as RECOVERY_REBALANCE with config enabledTestIntermediateStateCalcStage.testNonTopStateDownwardTransitionNotReclassifiedAsRecovery— Verifies SLAVE->OFFLINE is NOT reclassified even with config enabled (negative test)TestIntermediateStateCalcStage.testRecoveryRebalanceForTopStateDownwardWithLeaderStandby— Verifies LEADER->STANDBY is correctly reclassified with LeaderStandby state modelChanges that Break Backward Compatibility (Optional)
(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)
Documentation (Optional)
(Link the GitHub wiki you added)
Commits
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)