-
Notifications
You must be signed in to change notification settings - Fork 241
feat: forced inclusion #2797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: forced inclusion #2797
Conversation
|
The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).
|
|
Claude finished @julienrbrt's task —— View job Comprehensive Code ReviewI've completed a thorough review of the forced inclusion mechanism PR. This is a substantial feature that adds censorship resistance and liveness guarantees to the rollup framework. Here's my detailed feedback: 🎯 Overall AssessmentSummary: This is a well-architected feature with solid implementation. The code demonstrates good understanding of concurrent systems, proper error handling patterns, and thoughtful design around epoch boundaries and transaction smoothing. However, there are several areas requiring attention before production readiness. Recommendation: Request changes - primarily around resource management, error handling edge cases, and test coverage gaps. 1. Code Quality & Architecture ✅Strengths:
Issues:1.1 Epoch Calculation Edge Case
|
alpe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice start!
Can you elaborate why you decided on a separate namespace for the force inclusion TX? The consumers have to read both ns anyway to stay up to date
|
|
||
| event.StartDaHeight = epochHeight | ||
| event.Txs = append(event.Txs, result.Data...) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to prepare for malicious content. let's exit the loop early when a tx size threshold is reached. This can be a multiple of common.DefaultMaxBlobSize used by the executor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense for the height check yes!. However i was thinking of doing no other checks and let the execution client deal with gibberish data (this is why i added that as requirement in the execution interface description)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to keep raw TX data in the namespace, there is not much we can do here to validate, indeed. A size check is an easy win but more would require extending the executor interface for a checkTX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree, and this actually may be required to avoid congestions issue and losing txs.
This was a suggestion. Personally I think it makes sense, as we are filtering what's coming up in that namespace at fetching level directly in ev-node. What is posted in the force included namespace is handled directly by the execution client. ev-node only pass down bytes. |
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2797 +/- ##
==========================================
+ Coverage 64.63% 65.35% +0.72%
==========================================
Files 81 85 +4
Lines 7382 7852 +470
==========================================
+ Hits 4771 5132 +361
- Misses 2069 2154 +85
- Partials 542 566 +24
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
List of improvements to do in follow-ups:
|
|
We discussed the above in the standup (#2797 (comment)), and a few ideas came. 1 - 2 . When making the call async, we need to make sure the executor and full node stay insync with an epoch. This can be done easily by making an epoch a few blocks behind the actual DA height.
|
alpe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for answering all my questions and comments.
There is still the todo in the code to store unprocessed direct TX when the max block size is reached.
|
|
||
| event.StartDaHeight = epochHeight | ||
| event.Txs = append(event.Txs, result.Data...) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to keep raw TX data in the namespace, there is not much we can do here to validate, indeed. A size check is an easy win but more would require extending the executor interface for a checkTX.
we decided to remove the sequencer go.mod, as ev-node can provide directly the sequencer implementation (sequencers/single was already depending on ev-node anyway) this means no go.mod need to be added for the new based sequencers in #2797
|
Once is PR is merged, we should directly after:
In the meantime, I have disabled the feature so it can be merged (0d790ef) |
|
FYI the upgrade test will fail until tastora is updated. |
tac0turtle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
alpe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work.
I am a bit unrelaxed on the epoch window for forced inclusion TX to land in a block.
| // verifyForcedInclusionTxs verifies that forced inclusion transactions from DA are properly handled. | ||
| // Note: Due to block size constraints (MaxBytes), sequencers may defer forced inclusion transactions | ||
| // to future blocks (smoothing). This is legitimate behavior within an epoch. | ||
| // However, ALL forced inclusion txs from an epoch MUST be included before the next epoch begins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... MUST be included before the next epoch begins."
This can be a bigger problem, when the TXs in DA for an epoch exceed the max capacity of the chain for this period. Especially the later DA blocks of an epoch are vulnerable as the time window shrinks. A malicious actor could abuse this.
Same problem would exist, when the DA api is not available for some time so that the sequencer can not pull FI TX.
We are building this with the assumption that the ev-node chain moves faster than DA and that DA is always available. What if this is not the case? There can be planned or unplanned downtime on both sides.
The system should to be more resilient to recover from this.
This can be a bigger time window for example.
| Int("current_size", currentSize). | ||
| Msg("completed processing forced inclusion transactions") | ||
|
|
||
| return validatedTxs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it makes sense in this method but with the pendingForcedInclusionTxs, you could detect that the sequencer missed the window for a FI TX and escalate.
| bq.queue = append([]coresequencer.Batch{batch}, bq.queue...) | ||
| } | ||
|
|
||
| return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike AddBatch, the value is not persisted in the DB. You may also need to check for the max size of the queue.
Looking at the current impl, it seems to me that the queue does not preserves the order on a re-load. Just mentioning this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is true, and this is on my follow-up list and described as a comment. I'll make clearer.
alpe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be great if you could follow up on:
Fixes could go into a new PR, too
sequencers/based/sequencer.go
Outdated
|
|
||
| return &coresequencer.GetNextBatchResponse{ | ||
| Batch: batch, | ||
| Timestamp: time.Now(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not deterministic for all nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't really an issue, as every node is the sequencer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This timestamp is used for the headerTime of the next block. This will lead to a different hash for the block. The other thing is that app logic on the chain may use this value in their decision tree or store it. State could diverge on the nodes which makes it hard to recover later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Then we need to use the time of the day block, as the block producing time of a based sequencer can never be in sync across all nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Then we need to use the time of the day block, as the block producing time of a based sequencer can never be in sync across all nodes.
|
Some of the changes were going to be tackled as follow-ups (the congestion issue, async fetching, commands) as it was getting hard to review this. This is why the feature cannot be enabled yet. There's still code missing in the execution client as well to get it all working. I'll check the other comments. |
|
To recap everything that needs to happen in follow ups:
Most of them are small and contained. |
ref: #1914
A choice has been made to make this logic in the executor and avoid extending the reaper and the sequencer.
This is because, updating the repeer, means passing down the last fetched da height accross all components.
It adds a lot of complexity otherwise. Adding it in the sequencer may be preferable, but this makes the inclusion in a sync node less straightforward. This is what is being investigated.
Compared to the previous implementation, a forced transaction does not have any structure. It should be the raw structure from the execution client. This is to keep ev-node know nothing about the transaction. No signature checks, no validation of correctness. The execution client must make sure to reject gibberish transactions.
---- for later, won't be included in this pr (ref #2797 (comment))