-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Handle ZkInterruptedException with write verification in IdealStateGroupCommit #16900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Handle ZkInterruptedException with write verification in IdealStateGroupCommit #16900
Conversation
…ruptedException Signed-off-by: Alex Maniates <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #16900 +/- ##
=========================================
Coverage 63.50% 63.51%
Complexity 1412 1412
=========================================
Files 3068 3068
Lines 180255 180269 +14
Branches 27583 27586 +3
=========================================
+ Hits 114479 114494 +15
- Misses 56956 56961 +5
+ Partials 8820 8814 -6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds improved error handling for ZooKeeper interruptions during IdealState updates in the Pinot controller. The solution implements a write verification mechanism to handle race conditions where a ZkInterruptedException occurs but the write may have actually succeeded.
- Adds handling for
ZkInterruptedExceptionwith verification logic - Implements dual verification: version advancement check and content equality check
- Provides graceful recovery from transient ZooKeeper interruptions
pinot-common/src/main/java/org/apache/pinot/common/utils/helix/IdealStateGroupCommit.java
Show resolved
Hide resolved
| return false; | ||
| } else { | ||
| LOGGER.info("IdealState was written successfully after interrupt for resource: {}", resourceName); | ||
| idealStateWrapper._idealState = updatedIdealState; |
Copilot
AI
Sep 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wrapper should be updated with the writtenIdealState that was successfully read back from ZooKeeper, not the updatedIdealState that we attempted to write. This ensures the wrapper reflects the actual persisted state and its version number.
| idealStateWrapper._idealState = updatedIdealState; | |
| idealStateWrapper._idealState = writtenIdealState; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the interruption caused by controller disconnecting from the cluster (during shut down)? If so, do we still have access to ZK after disconnection?
cc @xiangfu0 to also review
| LOGGER.warn("Version changed while updating ideal state for resource: {}", resourceName); | ||
| return false; | ||
| } catch (ZkInterruptedException e) { | ||
| LOGGER.warn("Caught ZkInterruptedException while updating resource: {}, verifying...", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some comment explaining why we are doing this
| } catch (ZkInterruptedException e) { | ||
| LOGGER.warn("Caught ZkInterruptedException while updating resource: {}, verifying...", | ||
| resourceName); | ||
| IdealState writtenIdealState = dataAccessor.getProperty(idealStateKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do another try-catch over this handling?
I feel the interruption might mostly be triggered when controller disconnects from the cluster, in which case we probably no longer be able to access the ZK. Can you verify that?
|
@Jackie-Jiang you are totally right. Upon further investigation, it looks like this is happening during controller shut down, and the ZkInterruptedException is quite intentional on the client side when the Helix Manager is called to disconnect https://github.com/apache/pinot/blob/master/pinot-controller/src/main/java/org/apache/pinot/controller/BaseControllerStarter.java#L1035 ZkHelixManager source I need to go back and think how to solve this from that point of view, as we won't be able to read from zk (or delete for that matter) since the client is disconnected and won't be reconnected since the Controller is in shutdown mode. The solution may need a more graceful shutdown to solve this timeline:
Perhaps we can attempt to let remaining updates go through with some fixed timeout |
|
I still feel the root cause is Helix not giving a clear abstraction on the ZK update call. It should either success, or fail without changing the record |
Potential solution for #16866
When a ZkInterruptedException occurs during IdealState updates, we now verify whether the write actually succeeded by:
Why both checks are necessary
There are some tradeoffs here and we are accepting some false negatives when checking for equality, given that the IdealState could be updated outside of this code block (i.e. if some other thread or process updates the ideal state, the equality check may fail, and then we will retry). I think this is an acceptable tradeoff especially given this exception shouldn't occur very often. This ensures we never incorrectly report success when our specific write didn't persist in the final state.