Skip to content

fix(gloas): queue gossip data columns for reprocessing when block root is unknown#49

Merged
eserilev merged 1 commit into
glamsterdam-devnet-4from
fix/reprocess-unknown-block-columns
May 21, 2026
Merged

fix(gloas): queue gossip data columns for reprocessing when block root is unknown#49
eserilev merged 1 commit into
glamsterdam-devnet-4from
fix/reprocess-unknown-block-columns

Conversation

@eserilev
Copy link
Copy Markdown
Owner

Problem

In ePBS (Gloas), data columns frequently arrive via gossip before their corresponding block (~50ms earlier). When this happens, the GossipDataColumnError::BlockRootUnknown handler silently ignores the columns. Once ignored, those columns are lost and never re-processed.

This causes blocks to get stuck at data_columns 0/128 — the block is imported but never becomes fully data-available. Fork choice can't advance the head, causing:

  1. Head gets stuck at the previous slot
  2. Shuffling diverges from the rest of the network (different decision block)
  3. Attestations from other clients get skipped as "incompatible shuffling"
  4. PTC disagreements cascade into peer bans
  5. Network collapses around epoch 80

Fix

Add a reprocess queue path for gossip data columns with unknown block roots:

  • New QueuedGossipDataColumn struct and UnknownBlockDataColumn message variant
  • When BlockRootUnknown is hit, queue the column with its block root
  • On BlockImported, release all queued columns for that root and re-dispatch them through the normal gossip column processing path (send_gossip_data_column_sidecar)

This mirrors the existing pattern for attestations (UnknownBlockUnaggregate) and envelopes (UnknownBlockForEnvelope).

Testing

Tested on a 6-node Kurtosis devnet. Without this fix, network collapses at epoch ~80 when columns arrive before blocks. With this fix, network stays healthy past epoch 80+.

…t is unknown

In ePBS (Gloas), data columns often arrive via gossip BEFORE their
corresponding block. Previously these columns were silently ignored
(action: 'ignoring'), causing blocks to become stuck with 0/128 columns
and preventing head advancement.

This fix queues columns with unknown block roots in the reprocess queue.
When the block is imported (BlockImported event), the queued columns are
released and re-dispatched through the normal gossip column processing
path.

This addresses the root cause of network degradation observed around
epoch 80+ in devnets: columns arrive ~50ms before the block, get
discarded, block stays at 0/128 columns, fork choice can't advance,
head gets stuck, shuffling diverges from the rest of the network.
@eserilev eserilev merged commit 3265cd8 into glamsterdam-devnet-4 May 21, 2026
21 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant