Skip to content

HBASE-29245 Region reopening batch size should be increased when backoff is 0 #6892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

junegunn
Copy link
Contributor

@junegunn junegunn commented Apr 8, 2025

No description provided.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 42s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 4m 48s master passed
+1 💚 compile 4m 10s master passed
+1 💚 checkstyle 0m 54s master passed
+1 💚 spotbugs 1m 58s master passed
+1 💚 spotless 1m 2s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 4s the patch passed
+1 💚 compile 3m 40s the patch passed
+1 💚 javac 3m 40s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 58s the patch passed
+1 💚 spotbugs 2m 27s the patch passed
+1 💚 hadoopcheck 14m 0s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 1m 0s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 17s The patch does not generate ASF License warnings.
49m 44s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6892/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6892
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux ea72d147db4a 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 6cdd226
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 84 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6892/1/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 28s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 14s master passed
+1 💚 compile 0m 57s master passed
+1 💚 javadoc 0m 28s master passed
+1 💚 shadedjars 5m 53s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 13s the patch passed
+1 💚 compile 0m 57s the patch passed
+1 💚 javac 0m 57s the patch passed
+1 💚 javadoc 0m 27s the patch passed
+1 💚 shadedjars 5m 52s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 222m 23s hbase-server in the patch passed.
248m 29s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6892/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6892
Optional Tests javac javadoc unit compile shadedjars
uname Linux 417b24f80060 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 6cdd226
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6892/1/testReport/
Max. process+thread count 4805 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6892/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@rmdmattingly rmdmattingly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Back in the day, I think my intention was that you'd never configure batching without also configuring a backoff, but that wasn't a good idea because it makes this hard to configure

FWIW I do wonder if there's no significant value in configuring your clusters this way because it will work basically the same as an unbatched table modification (regions will be reopened as quickly as the HMaster can process their reopen procedures). So maybe it's worth adding a warning log recommending that the operator raise hbase.reopen.table.regions.progressive.batch.backoff.ms?

@junegunn
Copy link
Contributor Author

junegunn commented Apr 8, 2025

Thanks for the review.

FWIW I do wonder if there's no significant value in configuring your clusters this way because it will work basically the same as an unbatched table modification (regions will be reopened as quickly as the HMaster can process their reopen procedures).

We're expecting two benefits of using the option, even without backoff.

  1. Reduce the number of regions that are (temporarily) unavailable at a certain point during an alter operation (determined by the number of region servers and their hbase.regionserver.executor.closeregion.threads) to achieve better region availability and less overall service impact. For example, we can set the option to something like 16 to ensure that at most 16 regions are unavailable at a certain point. This helps in minimizing service disruption of a latency sensitive application.

    • image
      • CLOSING is when the region is marked CLOSING on hbase:meta
      • REJECT is when the region actually becomes unavailable (client starts getting NotServingRegionException and retries)
    • image
      • This plots the number of regions between REJECT and OPEN at a certain point of time
    • image
    • image
  2. It protects the table from a faulty alter operation as pointed out in HBASE-29136, because only one region is affected.

@rmdmattingly
Copy link
Contributor

Nice, I suppose there's enough latency baked into awaiting the slowest of each batch & issuing the next batch to really limit the disruption at any point in time. Great stuff 🚀

@Apache9
Copy link
Contributor

Apache9 commented Apr 20, 2025

@rmdmattingly Can we merge this now?

@junegunn
Copy link
Contributor Author

@Apache9 @rmdmattingly I think it's okay to merge. It's a relatively simple patch. I extended the existing test cases with more assertions and I also tested it manually to confirm that it works as expected with and without backoff.

No backoff (default)

image

image

2-second backoff

image

image

@junegunn
Copy link
Contributor Author

Please take a look at #6951 as well. Appreciate it!

@junegunn
Copy link
Contributor Author

junegunn commented May 29, 2025

@Apache9 @rmdmattingly Sorry to ping again, but is there anything else I can do to help move this and #6951 forward? We can live without this one because we can set the backoff value to something like 1ms, but we're particularly interested in getting #6951 into branch-2 so we can avoid maintaining it in our internal fork. I believe supporting multiple throttling configurations without requiring a master server restart makes the feature much more practical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants