Skip to content

Conversation

@julienrbrt
Copy link
Member

@julienrbrt julienrbrt commented Oct 30, 2025

ref: #1914

A choice has been made to make this logic in the executor and avoid extending the reaper and the sequencer.
This is because, updating the repeer, means passing down the last fetched da height accross all components.
It adds a lot of complexity otherwise. Adding it in the sequencer may be preferable, but this makes the inclusion in a sync node less straightforward. This is what is being investigated.

Compared to the previous implementation, a forced transaction does not have any structure. It should be the raw structure from the execution client. This is to keep ev-node know nothing about the transaction. No signature checks, no validation of correctness. The execution client must make sure to reject gibberish transactions.

  • implement for executor
    • add size checks, to make sure batch doesn't go over max bytes
  • implement for syncer
  • requirements:
    • syncer must be able to detect if sequencer was malicious and hasn't included a forced tx
    • fetched inclusion of the txs bytes within the block data
    • if that happened, reject all future sequencer blocks
      • eventually, become based and keep produce block
  • tool for submitting forced txs in celestia easily -> feat(server): add force inclusion cmd ev-abci#295
  • make sure epochs are in sync between syncer and executor

---- for later, won't be included in this pr (ref #2797 (comment))

  • improve offline capabilities (if sequencer goes offline, full node become fully based)
    • when sequencer comes back only, needs to fetch if any block was produced :/

@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2025

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedNov 10, 2025, 3:58 PM

@claude
Copy link
Contributor

claude bot commented Nov 1, 2025

Claude finished @julienrbrt's task —— View job


Comprehensive Code Review

I've completed a thorough review of the forced inclusion mechanism PR. This is a substantial feature that adds censorship resistance and liveness guarantees to the rollup framework. Here's my detailed feedback:

🎯 Overall Assessment

Summary: This is a well-architected feature with solid implementation. The code demonstrates good understanding of concurrent systems, proper error handling patterns, and thoughtful design around epoch boundaries and transaction smoothing. However, there are several areas requiring attention before production readiness.

Recommendation: Request changes - primarily around resource management, error handling edge cases, and test coverage gaps.


1. Code Quality & Architecture ✅

Strengths:

  • Clean separation of concerns: Forced inclusion logic is properly isolated in dedicated components (retriever, verification)
  • Interface-driven design: Good use of ForcedInclusionRetriever interface for testability
  • Epoch-based scanning: Smart optimization to avoid redundant DA queries by only fetching at epoch boundaries
  • Transaction smoothing: Well-thought-out approach to handle block size constraints with pending queue

Issues:

1.1 Epoch Calculation Edge Case ⚠️

Location: types/epoch.go:36-51

The CalculateEpochBoundaries function has an issue when daHeight < daStartHeight:

if daHeight < daStartHeight {
    return daStartHeight, daStartHeight + daEpochSize - 1, epochNum
}

Problem: Returns epochNum = 0 (from line 37) but still calculates valid boundaries. This creates inconsistent state where you have a valid epoch range but epoch number 0.

Recommendation: Consider returning an error or ensuring the boundaries also indicate an invalid state when epoch is 0.

1.2 Memory Unbounded Growth Risk 🔴

Location: sequencers/single/sequencer.go:58 and sequencers/based/sequencer.go:36

pendingForcedInclusionTxs []pendingForcedInclusionTx
txQueue [][]byte  // Based sequencer

Problem: These slices can grow unbounded if:

  • DA produces forced inclusion transactions faster than they can be processed
  • Block size constraints prevent transactions from being included for extended periods
  • A malicious actor floods the forced inclusion namespace

Recommendation:

  1. Add size limits with configurable thresholds
  2. Implement LRU eviction or time-based expiry
  3. Add metrics to monitor queue growth
  4. Consider memory pressure alerts

1.3 Based Sequencer Time Dependency 🔴

Location: sequencers/based/sequencer.go:127

Timestamp: time.Now(),

Problem: Uses non-deterministic time.Now() which creates non-reproducible block production. Every node running a based sequencer would produce different timestamps for the same DA state.

Recommendation: Use DA layer timestamps or derive deterministically from DA height/block data.


2. Error Handling & Resilience ⚠️

Good Practices:

  • Proper error wrapping with context (fmt.Errorf with %w)
  • Distinction between retriable and permanent errors (ErrHeightFromFuture vs validation errors)
  • Critical error channel for halting scenarios

Issues:

2.1 Incomplete Error Propagation

Location: block/internal/syncing/syncer.go:540-546

if err := s.verifyForcedInclusionTxs(currentState, data); err != nil {
    s.logger.Error().Err(err).Uint64("height", nextHeight).Msg("forced inclusion verification failed")
    if errors.Is(err, errMaliciousProposer) {
        s.cache.RemoveHeaderDAIncluded(headerHash)
        return err
    }
}
// Continues execution even if verification fails for non-malicious errors

Problem: Only returns error for errMaliciousProposer. Other errors are logged but execution continues, which could lead to inconsistent state.

Recommendation: Return all verification errors or clearly document why some are non-fatal.

2.2 Missing Context Cancellation Handling

Location: block/internal/da/forced_inclusion_retriever.go:107-122

The DA height scanning loop doesn't check context cancellation:

for epochHeight := epochStart + 1; epochHeight < epochEnd; epochHeight++ {
    result := r.client.RetrieveForcedInclusion(ctx, epochHeight)
    // No ctx.Done() check
}

Problem: Long-running epoch scans can't be interrupted during shutdown.

Recommendation: Add context check in loop:

select {
case <-ctx.Done():
    return nil, ctx.Err()
default:
}

2.3 Pending Transaction Loss Risk 🔴

Location: sequencers/single/queue.go:88-103

// TODO(@julienrbrt): The batch is currently NOT persisted to the DB since these are 
// transactions that were already in the queue or were just processed. 
// -- FI txs are lost, this should be tackled.
func (bq *BatchQueue) Prepend(ctx context.Context, batch coresequencer.Batch) error {

Problem: Forced inclusion transactions that couldn't fit are returned to queue but NOT persisted. If the sequencer crashes, these transactions are permanently lost, violating forced inclusion guarantees.

Impact: Critical - users who paid to force include transactions could lose them without recourse.

Recommendation:

  1. Persist prepended batches to DB immediately
  2. Track forced inclusion transactions separately from regular mempool txs
  3. Add crash recovery tests for this scenario

3. Security 🔒

Issues:

3.1 Resource Exhaustion via Forced Inclusion Namespace

Severity: HIGH

Attack Vector:

  1. Attacker floods forced inclusion namespace with maximum-size (2MB) blobs
  2. Each epoch processes all blobs regardless of size
  3. Memory consumption grows unbounded in pending queues
  4. Based sequencers must process ALL forced inclusion transactions

Impact:

  • Memory exhaustion DoS on sequencers and full nodes
  • Potential chain halt if OOM occurs
  • Cost to attack is just DA submission fees

Mitigations Implemented:

  • AbsoluteMaxBlobSize check (2MB limit)
  • ✅ Blob size validation (ValidateBlobSize)
  • ✅ Skipping oversized blobs

Missing Mitigations:

  • ❌ No rate limiting on forced inclusion namespace
  • ❌ No total size limit per epoch
  • ❌ No cost/stake requirement beyond DA fees
  • ❌ No backpressure mechanism when overwhelmed

Recommendations:

  1. Epoch-level limits: Add max_forced_inclusion_bytes_per_epoch config
  2. Rate limiting: Limit forced inclusion txs per DA block
  3. Economic disincentive: Consider requiring higher DA fees for forced inclusion namespace
  4. Graceful degradation: When overwhelmed, sequencer should continue but may drop older forced inclusion txs with clear logging

3.2 Malicious Sequencer Detection Window

Location: block/internal/syncing/syncer.go:744-752

for _, pending := range stillPending {
    if currentState.DAHeight > pending.EpochEnd {
        maliciousTxs = append(maliciousTxs, pending)
    } else {
        remainingPending = append(remainingPending, pending)
    }
}

Issue: Detection only occurs when DA height moves PAST the epoch end. If DA is slow or stalled, malicious behavior isn't detected until potentially many blocks later.

Recommendation: Add configurable max_pending_epochs threshold and alert earlier.

3.3 Signature/Validation Bypass

Location: block/internal/da/forced_inclusion_retriever.go:160-163

for _, blob := range result.Data {
    if len(blob) > 0 {
        event.Txs = append(event.Txs, blob)
    }
}

Design Decision: Forced inclusion transactions have NO signature validation or structural checks in ev-node - all validation is deferred to the execution client.

Risk:

  • Garbage data can be submitted to forced inclusion namespace
  • Execution client must reject invalid transactions OR they'll be included in blocks
  • Creates dependency on execution client's validation robustness

ADR Justification: This is a documented design decision (ADR-019) - ev-node remains agnostic to transaction format.

Recommendation: This is acceptable BUT requires:

  1. ✅ Execution client interface MUST document this validation requirement
  2. ✅ Integration tests verifying execution client properly rejects garbage
  3. Missing: E2E tests with malformed forced inclusion data

4. Performance & Resource Efficiency ⚡

Good Practices:

  • Atomic operations for height tracking (atomic.Uint64)
  • Efficient epoch boundary checks avoid redundant DA fetches
  • Concurrent DA and P2P workers

Issues:

4.1 Sequential DA Height Fetching

Location: block/internal/da/forced_inclusion_retriever.go:107-122

for epochHeight := epochStart + 1; epochHeight < epochEnd; epochHeight++ {
    result := r.client.RetrieveForcedInclusion(ctx, epochHeight)
    // Sequential fetching
}

Problem: DA heights within an epoch are fetched sequentially. For large epochs (e.g., 50 blocks), this adds significant latency.

Impact: Identified in PR comments as follow-up work.

Recommendation: Parallelize epoch fetching with errgroup:

g, ctx := errgroup.WithContext(ctx)
for height := epochStart; height <= epochEnd; height++ {
    height := height  // capture
    g.Go(func() error {
        result := r.client.RetrieveForcedInclusion(ctx, height)
        // Process result
        return nil
    })
}
if err := g.Wait(); err != nil {
    return nil, err
}

4.2 Repeated Hash Computation

Location: block/internal/syncing/syncer.go:727-728

for _, tx := range event.Txs {
    txHash := hashTx(tx)  // SHA256 hash
    if _, ok := blockTxMap[txHash]; ok {

Problem: Same transactions may be hashed multiple times across different blocks within an epoch.

Recommendation: Consider caching tx hashes within the forced inclusion event.

4.3 Sync.Map vs Regular Map

Location: block/internal/syncing/syncer.go:68

pendingForcedInclusionTxs sync.Map // map[string]pendingForcedInclusionTx

Issue: sync.Map is optimized for read-heavy workloads, but this use case has frequent writes (adding/removing pending txs every block). A regular map with mutex might perform better.

Recommendation: Profile and consider using map[string]pendingForcedInclusionTx with sync.RWMutex.


5. Concurrency Patterns 🔀

Good Practices:

  • Proper use of atomic pointers for state management
  • errgroup for coordinated goroutine management
  • Context-based cancellation

Issues:

5.1 Potential Race in DA Height Updates

Location: Multiple locations

// sequencers/single/sequencer.go:156
c.SetDAHeight(forcedEvent.EndDaHeight)
// vs
// block/internal/executing/executor.go:390
newState.DAHeight = e.sequencer.GetDAHeight()

Problem: DA height is updated by sequencer during GetNextBatch but read by executor during block production. No clear happens-before relationship.

Analysis: Likely safe due to atomic operations, but the relationship between sequencer DA height and state DA height is unclear.

Recommendation: Add documentation clarifying the DA height synchronization protocol between sequencer and executor.

5.2 Queue Compaction Race

Location: sequencers/single/queue.go:119-131

The compaction logic is correct but complex. The condition bq.head > len(bq.queue)/2 && bq.head > 100 could be simplified.

Recommendation: Extract compaction into separate method with clear documentation.


6. Testing & Reliability ✅⚠️

Good Practices:

  • Comprehensive test coverage for forced inclusion scenarios
  • Test cases for epoch boundaries, partial inclusion, malicious sequencer detection
  • Mock-based unit tests with clear scenarios

Gaps:

6.1 Missing E2E Tests 🔴

  • No E2E test for actual forced inclusion flow from DA submission → sequencer retrieval → block inclusion → full node verification
  • No crash recovery test for pending forced inclusion transactions
  • No test for based sequencer restart scenarios
  • No test for DA layer slowdowns/stalls

6.2 Missing Edge Case Tests ⚠️

  • Epoch size of 1 (every DA block is an epoch)
  • Extremely large epochs (e.g., 10000 blocks)
  • Forced inclusion transaction exactly at MaxBytes limit
  • Multiple forced inclusion transactions summing to > MaxBytes
  • DA height regression (DA reorg scenario)

6.3 Load/Stress Testing ❌

  • No performance benchmarks for forced inclusion retrieval
  • No stress test for handling 100+ forced inclusion transactions in single epoch
  • No memory profiling with large pending queues

6.4 Test Coverage Gap

Codecov Report: 78.66% patch coverage (112 lines missing)

Critical Missing Coverage (from Codecov):

  • sequencers/single/sequencer.go: 28 missing, 9 partials
  • block/internal/da/forced_inclusion_retriever.go: 12 missing, 5 partials
  • block/internal/executing/executor.go: 11 missing, 6 partials

Recommendation: Increase coverage to at least 85% before merge, focusing on error paths and edge cases.


7. Documentation & Developer Experience 📚

Strengths:

  • Excellent ADR documentation (ADR-019) with clear rationale
  • Good code comments explaining complex logic (epoch boundaries, smoothing)
  • Clear interface documentation

Issues:

7.1 Missing Operational Documentation

  • No runbook for "what to do if forced inclusion verification fails"
  • No guide for monitoring forced inclusion health (metrics to watch)
  • No documentation on how to enable/disable forced inclusion post-deployment

7.2 Incomplete Configuration Documentation

Location: pkg/config/config.go:68

FlagDAForcedInclusionNamespace = FlagPrefixEvnode + "da.forced_inclusion_namespace"

Missing:

  • What happens if namespace is empty?
  • Can it be changed after genesis?
  • Interaction with DAEpochForcedInclusion setting

7.3 Error Message Clarity

Location: block/internal/syncing/syncer.go:473

s.sendCriticalError(fmt.Errorf("sequencer malicious. Restart the node with --node.aggregator --node.based_sequencer or keep the chain halted: %w", err))

Issue: Error message tells operator to restart with specific flags but doesn't explain:

  • Why this makes it safe
  • What happens to pending transactions
  • Whether chain state needs verification

Recommendation: Link to runbook/documentation in error message.


8. Specific Code Issues

8.1 Sequencer Queue Prepend Not Persisted 🔴

Location: sequencers/single/queue.go:89-102

Critical Issue: Already noted as TODO - forced inclusion transactions returned to queue via Prepend() are NOT persisted to DB. Crash = permanent loss.

Priority: MUST FIX before merge or disable feature.

8.2 Genesis Validation

Location: pkg/genesis/genesis.go:63-65

if g.DAEpochForcedInclusion < 1 {
    return fmt.Errorf("da_epoch_forced_inclusion must be at least 1, got %d", g.DAEpochForcedInclusion)
}

Good: Validates epoch size ≥ 1.

Question: Should there be an upper bound? Extremely large epochs (e.g., 1,000,000) could cause issues with:

  • Memory consumption when fetching full epoch
  • Long delay between forced inclusion submission and execution

8.3 Data Race in Syncer Pending Map

Location: block/internal/syncing/syncer.go:744-761

// Update pending map - clear old entries and store only remaining pending
s.pendingForcedInclusionTxs.Range(func(key, value any) bool {
    s.pendingForcedInclusionTxs.Delete(key)
    return true
})
for _, pending := range remainingPending {
    s.pendingForcedInclusionTxs.Store(pending.TxHash, pending)
}

Analysis: sync.Map operations are atomic individually, but this clear-then-repopulate pattern is not atomic as a whole. Another goroutine could observe empty map mid-operation.

Recommendation: Build new sync.Map and swap atomically:

newMap := &sync.Map{}
for _, pending := range remainingPending {
    newMap.Store(pending.TxHash, pending)
}
s.pendingForcedInclusionTxs = *newMap  // But sync.Map is not pointer, so this won't work

Actually, given the access pattern (only modified in processHeightEvent which is sequential), this might be safe. Needs verification.


9. Configuration & Deployment

Issues:

9.1 Feature Flag

Feature is currently disabled until follow-up work (per PR comments).

Good: Conservative approach for unfinished work.

Recommendation: Add explicit feature flag in config:

EnableForcedInclusion bool

9.2 Default Values

Location: pkg/genesis/genesis.go:39

DAEpochForcedInclusion: 50, // Default epoch size

Question: Is 50 a good default for all chains? This should depend on:

  • DA block time
  • Expected transaction throughput
  • Memory constraints

Recommendation: Document rationale for default and provide guidance on tuning.


10. Follow-Up Work (Per PR Comments)

The PR description lists several items for follow-ups:

  1. Rename apps folder structure - Noted for post-merge
  2. ⚠️ Extend execution interface - Required to avoid congestion (losing transactions)
  3. ⚠️ Async DA fetching - Performance improvement
  4. Commands/tooling - User-facing tooling for forced inclusion submission

Concern: Items #2 (execution interface extension) seems critical for correctness, not just optimization. Should this block merge?


Summary & Recommendations

Must Fix Before Merge 🔴

  1. Pending transaction persistence (queue.go:88) - Data loss risk
  2. Resource exhaustion protection - Add memory limits for pending queues
  3. Based sequencer timestamp (based/sequencer.go:127) - Non-deterministic
  4. Test coverage - Increase to 85%+, add E2E tests

Should Fix Before Merge ⚠️

  1. Error handling completeness - Return verification errors properly
  2. Context cancellation in DA loops - Shutdown grace
  3. Epoch boundary edge cases - epoch.go:36-51
  4. Operational documentation - Runbooks and monitoring

Nice to Have (Can Follow Up) ✅

  1. Parallel DA epoch fetching
  2. Performance benchmarks
  3. Load testing with large forced inclusion volumes
  4. Memory profiling and optimization

Final Verdict

This is high-quality work that implements a complex distributed systems feature with good architectural decisions. The epoch-based design and transaction smoothing show strong systems thinking.

However, several critical issues must be addressed before production use:

  • Data loss risk in queue prepend
  • Resource exhaustion vulnerabilities
  • Non-deterministic behavior in based sequencer
  • Test coverage gaps

Recommendation: REQUEST CHANGES - Address critical issues above, then this will be ready to merge with remaining items as documented follow-ups.


Estimated effort to address critical issues: 1-2 days
Risk level after fixes: LOW-MEDIUM (follow-ups should be prioritized)


@julienrbrt julienrbrt changed the title [WIP] feat: forced inclusion for executor feat: forced inclusion Nov 5, 2025
Copy link
Contributor

@alpe alpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice start!
Can you elaborate why you decided on a separate namespace for the force inclusion TX? The consumers have to read both ns anyway to stay up to date


event.StartDaHeight = epochHeight
event.Txs = append(event.Txs, result.Data...)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to prepare for malicious content. let's exit the loop early when a tx size threshold is reached. This can be a multiple of common.DefaultMaxBlobSize used by the executor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense for the height check yes!. However i was thinking of doing no other checks and let the execution client deal with gibberish data (this is why i added that as requirement in the execution interface description)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to keep raw TX data in the namespace, there is not much we can do here to validate, indeed. A size check is an easy win but more would require extending the executor interface for a checkTX.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree, and this actually may be required to avoid congestions issue and losing txs.

@julienrbrt
Copy link
Member Author

Can you elaborate why you decided on a separate namespace for the force inclusion TX? The consumers have to read both ns anyway to stay up to date

This was a suggestion. Personally I think it makes sense, as we are filtering what's coming up in that namespace at fetching level directly in ev-node. What is posted in the force included namespace is handled directly by the execution client. ev-node only pass down bytes.

@julienrbrt julienrbrt marked this pull request as ready for review November 6, 2025 20:46
@julienrbrt julienrbrt marked this pull request as draft November 6, 2025 20:47
@github-actions
Copy link
Contributor

github-actions bot commented Nov 10, 2025

PR Preview Action v1.6.3

🚀 View preview at
https://evstack.github.io/docs-preview/pr-2797/

Built to branch main at 2025-12-02 16:45 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

❌ Patch coverage is 78.66667% with 112 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.35%. Comparing base (474818d) to head (32c089e).

Files with missing lines Patch % Lines
sequencers/single/sequencer.go 72.59% 28 Missing and 9 partials ⚠️
block/internal/da/forced_inclusion_retriever.go 79.26% 12 Missing and 5 partials ⚠️
block/internal/executing/executor.go 43.33% 11 Missing and 6 partials ⚠️
sequencers/based/sequencer.go 80.00% 11 Missing and 5 partials ⚠️
block/internal/syncing/syncer.go 90.72% 8 Missing and 1 partial ⚠️
block/components.go 0.00% 6 Missing and 1 partial ⚠️
core/sequencer/dummy.go 0.00% 3 Missing ⚠️
block/public.go 77.77% 2 Missing ⚠️
pkg/config/config.go 81.81% 1 Missing and 1 partial ⚠️
pkg/genesis/genesis.go 75.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2797      +/-   ##
==========================================
+ Coverage   64.63%   65.35%   +0.72%     
==========================================
  Files          81       85       +4     
  Lines        7382     7852     +470     
==========================================
+ Hits         4771     5132     +361     
- Misses       2069     2154      +85     
- Partials      542      566      +24     
Flag Coverage Δ
combined 65.35% <78.66%> (+0.72%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@julienrbrt julienrbrt marked this pull request as ready for review November 10, 2025 16:14
@github-actions
Copy link
Contributor

github-actions bot commented Nov 10, 2025

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedDec 2, 2025, 4:45 PM

@julienrbrt
Copy link
Member Author

List of improvements to do in follow-ups:

  1. Improve DA fetching by parallelizing epoch fetching
  2. Simplify DA requests after [EPIC] Remove DA Interface #2796. Fetch DA latest height, instead of checking epoch boundaries
  3. Solve edge case where proposer misses blocks and comes back online with forced included blocks published

@julienrbrt julienrbrt marked this pull request as draft November 10, 2025 16:19
@julienrbrt
Copy link
Member Author

julienrbrt commented Nov 11, 2025

We discussed the above in the standup (#2797 (comment)), and a few ideas came.

1 - 2 . When making the call async, we need to make sure the executor and full node stay insync with an epoch. This can be done easily by making an epoch a few blocks behind the actual DA height.

  • We need to make sure all heights of that epoch are available when we fetch the epoch (there is already code for this)
  • We need to scale that block window based on an average fetching time (the higher the da epoch is, the higher the window is)
  1. We can re-use some code from [WIP] HA failover #2814 to automate node restarting (syncing -> base sequencer)
    • When the sequencer comes back online and missed an epoch, it needs to sync up until the head of the da layer
    • Based sequencers must check the forced included transaction namespace (@julienrbrt -- I picked this solution, otherwise it would need to fetch 2 namespaces instead of 1. alternative is to have the sequencer fetch only at the end of the epoch the header namespace) for a synced checkpoint from the da layer, and restart as sync node if it was found.

@julienrbrt julienrbrt marked this pull request as ready for review November 11, 2025 16:29
@julienrbrt julienrbrt marked this pull request as draft November 11, 2025 16:58
Copy link
Contributor

@alpe alpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for answering all my questions and comments.
There is still the todo in the code to store unprocessed direct TX when the max block size is reached.


event.StartDaHeight = epochHeight
event.Txs = append(event.Txs, result.Data...)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to keep raw TX data in the namespace, there is not much we can do here to validate, indeed. A size check is an easy win but more would require extending the executor interface for a checkTX.

julienrbrt added a commit that referenced this pull request Nov 13, 2025
we decided to remove the sequencer go.mod, as ev-node can provide
directly the sequencer implementation (sequencers/single was already
depending on ev-node anyway)

this means no go.mod need to be added for the new based sequencers in
#2797
@julienrbrt julienrbrt marked this pull request as ready for review November 13, 2025 10:58
@julienrbrt
Copy link
Member Author

Once is PR is merged, we should directly after:

In the meantime, I have disabled the feature so it can be merged (0d790ef)

@julienrbrt
Copy link
Member Author

FYI the upgrade test will fail until tastora is updated.

tac0turtle
tac0turtle previously approved these changes Dec 1, 2025
Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

Copy link
Contributor

@alpe alpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work.
I am a bit unrelaxed on the epoch window for forced inclusion TX to land in a block.

// verifyForcedInclusionTxs verifies that forced inclusion transactions from DA are properly handled.
// Note: Due to block size constraints (MaxBytes), sequencers may defer forced inclusion transactions
// to future blocks (smoothing). This is legitimate behavior within an epoch.
// However, ALL forced inclusion txs from an epoch MUST be included before the next epoch begins.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... MUST be included before the next epoch begins."

This can be a bigger problem, when the TXs in DA for an epoch exceed the max capacity of the chain for this period. Especially the later DA blocks of an epoch are vulnerable as the time window shrinks. A malicious actor could abuse this.

Same problem would exist, when the DA api is not available for some time so that the sequencer can not pull FI TX.

We are building this with the assumption that the ev-node chain moves faster than DA and that DA is always available. What if this is not the case? There can be planned or unplanned downtime on both sides.
The system should to be more resilient to recover from this.
This can be a bigger time window for example.

Int("current_size", currentSize).
Msg("completed processing forced inclusion transactions")

return validatedTxs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it makes sense in this method but with the pendingForcedInclusionTxs, you could detect that the sequencer missed the window for a FI TX and escalate.

bq.queue = append([]coresequencer.Batch{batch}, bq.queue...)
}

return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike AddBatch, the value is not persisted in the DB. You may also need to check for the max size of the queue.
Looking at the current impl, it seems to me that the queue does not preserves the order on a re-load. Just mentioning this.

Copy link
Member Author

@julienrbrt julienrbrt Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is true, and this is on my follow-up list and described as a comment. I'll make clearer.

Copy link
Contributor

@alpe alpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be great if you could follow up on:

Fixes could go into a new PR, too


return &coresequencer.GetNextBatchResponse{
Batch: batch,
Timestamp: time.Now(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not deterministic for all nodes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't really an issue, as every node is the sequencer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timestamp is used for the headerTime of the next block. This will lead to a different hash for the block. The other thing is that app logic on the chain may use this value in their decision tree or store it. State could diverge on the nodes which makes it hard to recover later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then we need to use the time of the day block, as the block producing time of a based sequencer can never be in sync across all nodes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then we need to use the time of the day block, as the block producing time of a based sequencer can never be in sync across all nodes.

@julienrbrt
Copy link
Member Author

julienrbrt commented Dec 1, 2025

Some of the changes were going to be tackled as follow-ups (the congestion issue, async fetching, commands) as it was getting hard to review this. This is why the feature cannot be enabled yet. There's still code missing in the execution client as well to get it all working.

I'll check the other comments.

@julienrbrt
Copy link
Member Author

To recap everything that needs to happen in follow ups:

Most of them are small and contained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants