Skip to content

fix(gloas): prevent false peer bans from ePBS block/envelope race#47

Merged
eserilev merged 1 commit into
glamsterdam-devnet-4from
fix/gloas-peer-scoring
May 21, 2026
Merged

fix(gloas): prevent false peer bans from ePBS block/envelope race#47
eserilev merged 1 commit into
glamsterdam-devnet-4from
fix/gloas-peer-scoring

Conversation

@eserilev
Copy link
Copy Markdown
Owner

Problem

In Gloas devnets, Lighthouse nodes rapidly ban all their peers within minutes of the fork activating, causing the network to collapse.

Root Causes

1. Custody column request penalty (block/envelope race)

In ePBS, the block and payload envelope are separate objects. When a peer gossips a block, other nodes do a lookup and request custody columns. But the proposer may not have published the envelope yet (25ms later). The responding peer returns 0 columns → requester penalizes with LowToleranceError → rapid score decay → ban.

The assumption lookup_peers.contains(&peer_id) → must have columns is invalid in Gloas since having the block doesn't mean having the envelope/columns.

Fix: Disable expect_max_responses enforcement for Gloas epochs since the block/envelope decoupling means a peer can legitimately have the block without columns.

2. DataColumnsByRange ResourceUnavailable → Fatal ban

During custody backfill sync, peers respond with ResourceUnavailable ("columns pruned within boundary"). This hits the default PeerAction::Fatal path for outgoing requests, instantly banning the peer.

BlobsByRoot and DataColumnsByRoot already skip banning for ResourceUnavailable, but DataColumnsByRange was missing from the skip list.

Fix: Add DataColumnsByRange to the skip list.

Testing

Tested on a 6-node Kurtosis devnet (2 Lighthouse, 2 Prysm, 2 Lodestar) with gloas_fork_epoch: 1 and preset: minimal. Without this fix, Lighthouse bans all peers within epoch 3-5. With this fix, peers remain connected.

In Gloas (ePBS), blocks and payload envelopes are decoupled. A peer may
have imported the block but not yet received/processed the envelope
containing the blob data for columns. This causes two issues:

1. Custody column requests penalize peers for returning 0 columns when
   the peer legitimately doesn't have them yet (envelope not processed).
   Fix: disable expect_max_responses enforcement in Gloas since the
   block/envelope decoupling means having the block doesn't guarantee
   having the columns.

2. DataColumnsByRange requests that receive ResourceUnavailable (columns
   pruned within boundary) result in a Fatal peer action (instant ban).
   Fix: add DataColumnsByRange to the skip list alongside BlobsByRoot
   and DataColumnsByRoot so ResourceUnavailable doesn't trigger a ban.
@eserilev eserilev merged commit 19ace3a into glamsterdam-devnet-4 May 21, 2026
26 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant