perf(state-transition): avoid per-validator pubkey lookups in epoch updated#3130
Draft
fridrik01 wants to merge 2 commits into
Draft
perf(state-transition): avoid per-validator pubkey lookups in epoch updated#3130fridrik01 wants to merge 2 commits into
fridrik01 wants to merge 2 commits into
Conversation
Contributor
|
Caution Review failedAn error occurred during the review process. Please try again later. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I noticed that on our
devnet-coreafter running load testing for a few hours that block times became unstable after several hours. Looking at the logs it seems that we hit timeouts inprocess_proposalalways at epoch boundaries.When running our load testing harness, it injects thousands of deposits, and each creates a validator entry. Then at every epoch boundary
processEpochiterates the whole registry, andprocessEffectiveBalanceUpdatesdid a per-validator pubkey->index lookup plus a single-key balance read. That is ~2N small reads against disk-backed IAVL on Longhorn. With a bloated registry that makes many random I/O reads, and having Longhorn's high latency spikes we are often exceeding the consensus timeouts at epoch boundaries.To address this, this PR:
GetBalances()instead of per-validatorGetBalance(validators and balances are index-keyed and dense, sobalances[i]== validatori).ValidatorIndexByPubkeylookups inprocessEffectiveBalanceUpdatesandprocessRegistryUpdates, since the loop index already is the validator index.processEpochto confirm which step dominates (we can remove thisdevnet-coretesting).Going to push a tag from this PR and deploy it to devnet-core and monitor