fix(gloas): prevent false peer bans from ePBS block/envelope race#47
Merged
Conversation
In Gloas (ePBS), blocks and payload envelopes are decoupled. A peer may have imported the block but not yet received/processed the envelope containing the blob data for columns. This causes two issues: 1. Custody column requests penalize peers for returning 0 columns when the peer legitimately doesn't have them yet (envelope not processed). Fix: disable expect_max_responses enforcement in Gloas since the block/envelope decoupling means having the block doesn't guarantee having the columns. 2. DataColumnsByRange requests that receive ResourceUnavailable (columns pruned within boundary) result in a Fatal peer action (instant ban). Fix: add DataColumnsByRange to the skip list alongside BlobsByRoot and DataColumnsByRoot so ResourceUnavailable doesn't trigger a ban.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
In Gloas devnets, Lighthouse nodes rapidly ban all their peers within minutes of the fork activating, causing the network to collapse.
Root Causes
1. Custody column request penalty (block/envelope race)
In ePBS, the block and payload envelope are separate objects. When a peer gossips a block, other nodes do a lookup and request custody columns. But the proposer may not have published the envelope yet (25ms later). The responding peer returns 0 columns → requester penalizes with
LowToleranceError→ rapid score decay → ban.The assumption
lookup_peers.contains(&peer_id) → must have columnsis invalid in Gloas since having the block doesn't mean having the envelope/columns.Fix: Disable
expect_max_responsesenforcement for Gloas epochs since the block/envelope decoupling means a peer can legitimately have the block without columns.2. DataColumnsByRange ResourceUnavailable → Fatal ban
During custody backfill sync, peers respond with
ResourceUnavailable("columns pruned within boundary"). This hits the defaultPeerAction::Fatalpath for outgoing requests, instantly banning the peer.BlobsByRootandDataColumnsByRootalready skip banning forResourceUnavailable, butDataColumnsByRangewas missing from the skip list.Fix: Add
DataColumnsByRangeto the skip list.Testing
Tested on a 6-node Kurtosis devnet (2 Lighthouse, 2 Prysm, 2 Lodestar) with
gloas_fork_epoch: 1andpreset: minimal. Without this fix, Lighthouse bans all peers within epoch 3-5. With this fix, peers remain connected.