Skip to content

Conversation

@ethqunzhong
Copy link
Contributor

@ethqunzhong ethqunzhong commented Oct 13, 2025

Motivation

We use HDFS as the offload remote storage in our online scenarios. During an HDFS namenode switchover, the normal service of the broker was disrupted due to the failure and retry of a large number of offload operations. The logs are as follows:
image

Offload operations in Pulsar's managed ledger share the same scheduler thread pool with core services. When offload operations block or take too long, they can impact critical managed ledger operations like data read/write and metadata management, causing system performance degradation.

This change introduces a dedicated offload scheduler to isolate offload operations from core services, preventing potential blocking issues.

Modifications

  1. Added configuration support:

    • New managedLedgerNumOffloadSchedulerThreads parameter in ServiceConfiguration and broker.conf
    • Default thread count set to available CPU cores
  2. Created dedicated offload scheduler:

    • Added offloadScheduler field in ManagedLedgerFactoryImpl using OrderedScheduler
    • Proper shutdown handling in factory cleanup
  3. Isolated offload operations:

    • Migrated all offload-related async operations from scheduledExecutor to dedicated offloadScheduler
    • Affected operations: maybeOffloadInBackground(), maybeOffload(), cleanupOffloaded(), and tryTransformLedgerInfo()

This ensures offload operations don't interfere with core managed ledger functionality, improving overall system stability and performance.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: ethqunzhong#11

@github-actions
Copy link

@ethqunzhong Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@ethqunzhong ethqunzhong reopened this Oct 13, 2025
@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Oct 13, 2025
@dao-jun
Copy link
Member

dao-jun commented Oct 14, 2025

Offloader already have a separate threadpool, the configuration key is managedLedgerOffloadMaxThreads, and it's default value is 2. Why add a new threadpool?

@ethqunzhong
Copy link
Contributor Author

Offloader already have a separate threadpool, the configuration key is managedLedgerOffloadMaxThreads, and it's default value is 2. Why add a new threadpool?

Strictly speaking, the thread pool configured by managedLedgerOffloadMaxThreads(use at FSOffloader.scheduler) is used in the offloader instance to perform cold storage operations for the sequential reading and writing of specific ledgers. The thread pool newly added in this modification is used for scheduling the offload operations of all ManagedLedgers at the Broker level—a capability that was previously handled by ManagedLedgerImpl#scheduledExecutor.

@ethqunzhong ethqunzhong changed the title [improve][offload]use dedicated bookkeeper-ml-offload-scheduler to avoid any potential … [improve][offload]use dedicated bookkeeper-ml-offload-scheduler to avoid block core service Oct 14, 2025
@dao-jun
Copy link
Member

dao-jun commented Oct 15, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants