-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-25749 Improved logging when interrupting active RPC handlers ho… #6789
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
@apurtell Appreciate your thoughts on this PR. |
This comment has been minimized.
This comment has been minimized.
@@ -8704,6 +8700,8 @@ private void interruptRegionOperations() { | |||
// eligible for interrupt; if so, we should interrupt it. | |||
if (entry.getValue().booleanValue()) { | |||
entry.getKey().interrupt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometime even if you have interuppted the thread it will still stay active and eventually after wait timeout RS will get aborted. Can you try to print the stack trace of such threads as well.
We can do that when we have waited 80%(might be different) of the wait time and there are still some Interruptable thread alive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Umeshkumar9414 Please check now. Added the stack trace for thread which are not interruptable.
Most of the failures seems to be in |
This test is failing even in another PR. Example https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6790/1/artifact/yetus-jdk17-hadoop3-check/output/patch-unit-hbase-server.txt |
6515ef7
to
2b0a7da
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…lding the region close lock We should add the thread to regionlockholders map to make sure we have can track the threads holding a lock even if it is not interuuptable. When calling interrupt we will not interrupt such threads but will print warn with thread name to help with debugging
2b0a7da
to
f397308
Compare
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@virajjasani If you could give it a look. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement, left one nit, +1 otherwise
if (entry.getValue().booleanValue()) { | ||
entry.getKey().interrupt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we keep this change as is, and log whether the thread is interruptible as part of the common log above:
// print all thread stacks which are still holding locks and are the cause of RS abort
for (Map.Entry<Thread, Boolean> rslocks : regionLockHolders.entrySet()) {
LOG.warn("Thread still holding lock: {} , interruptible: {} , Stack trace: {}",
rslocks.getKey(), rslocks.getValue(), Threads.printStackTrace(rslocks.getKey()));
}
This will help make logs appear at common place and make it more easier to debug.
…lding the region close lock
We should add the thread to regionlockholders map to make sure we can track the threads holding a lock even if it is not interruptible. When calling interrupt we will not interrupt such threads but will print warn with thread name to help with debugging