Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-25749 Improved logging when interrupting active RPC handlers ho… #6789

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mnpoonia
Copy link
Contributor

…lding the region close lock

We should add the thread to regionlockholders map to make sure we can track the threads holding a lock even if it is not interruptible. When calling interrupt we will not interrupt such threads but will print warn with thread name to help with debugging

@Apache-HBase

This comment has been minimized.

@mnpoonia
Copy link
Contributor Author

mnpoonia commented Mar 13, 2025

@apurtell Appreciate your thoughts on this PR.

@Apache-HBase

This comment has been minimized.

@@ -8704,6 +8700,8 @@ private void interruptRegionOperations() {
// eligible for interrupt; if so, we should interrupt it.
if (entry.getValue().booleanValue()) {
entry.getKey().interrupt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometime even if you have interuppted the thread it will still stay active and eventually after wait timeout RS will get aborted. Can you try to print the stack trace of such threads as well.

We can do that when we have waited 80%(might be different) of the wait time and there are still some Interruptable thread alive.

Copy link
Contributor Author

@mnpoonia mnpoonia Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Umeshkumar9414 Please check now. Added the stack trace for thread which are not interruptable.

@mnpoonia
Copy link
Contributor Author

mnpoonia commented Mar 18, 2025

Most of the failures seems to be in TestNettyTLSIPCFileWatcher which seems unrelated. Will check other build to see if anything similar happening there.

@mnpoonia
Copy link
Contributor Author

@mnpoonia mnpoonia force-pushed the local_HBASE-25749 branch 2 times, most recently from 6515ef7 to 2b0a7da Compare March 18, 2025 06:40
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

…lding the region close lock

We should add the thread to regionlockholders map to make sure we have can track the threads holding a lock even if it is not interuuptable. When calling interrupt we will not interrupt such threads but will print warn with thread name to help with debugging
@mnpoonia mnpoonia force-pushed the local_HBASE-25749 branch from 2b0a7da to f397308 Compare March 18, 2025 16:08
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 1s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 4m 25s master passed
+1 💚 compile 3m 44s master passed
+1 💚 checkstyle 0m 41s master passed
+1 💚 spotbugs 1m 59s master passed
+1 💚 spotless 1m 8s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 1s the patch passed
+1 💚 compile 3m 33s the patch passed
+1 💚 javac 3m 33s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 43s the patch passed
+1 💚 spotbugs 1m 51s the patch passed
+1 💚 hadoopcheck 12m 52s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 1m 5s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 19s The patch does not generate ASF License warnings.
45m 16s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6789/4/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6789
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 0e76f7a9037c 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / f397308
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6789/4/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 35s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 10s master passed
+1 💚 compile 1m 11s master passed
+1 💚 javadoc 0m 36s master passed
+1 💚 shadedjars 7m 25s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 26s the patch passed
+1 💚 compile 1m 10s the patch passed
+1 💚 javac 1m 10s the patch passed
+1 💚 javadoc 0m 40s the patch passed
+1 💚 shadedjars 6m 44s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 400m 19s /patch-unit-hbase-server.txt hbase-server in the patch failed.
437m 16s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6789/4/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6789
Optional Tests javac javadoc unit compile shadedjars
uname Linux 6b7e7be91ac1 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / f397308
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6789/4/testReport/
Max. process+thread count 5109 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6789/4/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@mnpoonia
Copy link
Contributor Author

@virajjasani If you could give it a look. Thanks

@virajjasani virajjasani self-requested a review March 19, 2025 16:06
Copy link
Contributor

@Umeshkumar9414 Umeshkumar9414 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement, left one nit, +1 otherwise

if (entry.getValue().booleanValue()) {
entry.getKey().interrupt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we keep this change as is, and log whether the thread is interruptible as part of the common log above:

          // print all thread stacks which are still holding locks and are the cause of RS abort
          for (Map.Entry<Thread, Boolean> rslocks : regionLockHolders.entrySet()) {
            LOG.warn("Thread still holding lock: {} , interruptible: {} , Stack trace: {}",
              rslocks.getKey(), rslocks.getValue(), Threads.printStackTrace(rslocks.getKey()));
          }

This will help make logs appear at common place and make it more easier to debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants