FMA: OP-Supervisor #233

axelKingsley · 2025-03-26T21:02:52Z

Failure Mode Analysis for op-supervisor, a new consensus-critical piece of code for the op-stack.

Copilot

Pull Request Overview

This PR introduces a new markdown document that details the failure modes and recovery path analysis for the OP-Supervisor component. It outlines interop protocol behavior and provides in-depth discussions of multiple failure modes (FM1a–FM6), including their descriptions, risk assessments, and mitigation strategies.

security/fma-supervisor.md

protolambda

Looks good. Commented on a few of the items, but no blockers.

security/fma-supervisor.md

protolambda · 2025-03-28T14:42:18Z

security/fma-supervisor.md

+- Mitigations
+    - The Sequencer could detect cross-unsafe head stall and issue a reorg on the unsafe chain in order to avoid invalid L1 inclusions. Depending on the heuristic used, this could create regular unsafe reorgs with low threshold, or larger, less common ones. This also saves operators from wasted L1 fees when a batch would contain unwanted data.
+    - When promoting local-unsafe to cross-unsafe, the Supervisor can additionally detect if the data it is stalled on is already cross-safe or not. If it is, it can proactively notify the Sequencer that the current chain is hopeless to be valid, creating a more eager reorg point.
+    - The Batcher can decline to post beyond the current cross-unsafe head. This will avoid the publishing of bad data so the sequencer may reorg it out, saving the replacement based reorg. If it went on long enough, the Batcher would prevent any new data from being posted to L1, effectively creating a safe-head stall until the sequencer resolved the issue. This *could* be a preferred scenario for some chains.


Safety over liveness, I think this is valid to do by default.

That said, we should invest more in the syncing of the unsafe chain, so that low-latency batch-submission is not as important for stability. And the longer we defer batch-submission, the larger the gap in L1 fees and actual batch-costs may be.

security/fma-supervisor.md

tynes · 2025-04-02T21:20:53Z

security/fma-supervisor.md

+    - The Batcher can decline to post beyond the current cross-unsafe head. This will avoid the publishing of bad data so the sequencer may reorg it out, saving the replacement based reorg. If it went on long enough, the Batcher would prevent any new data from being posted to L1, effectively creating a safe-head stall until the sequencer resolved the issue. This *could* be a preferred scenario for some chains.
+    - We need to develop and use production-realistic networks in large scale testing to exercise failure cases and get confidence that the system behaves and recovers as expected.
+
+## FM2b: Supervisor Doesn’t Catch Block Invalidation of Safe Data


We could come up with a plan to roll back the chain, but this would break exchange integrations. This is critical severity and the most risky part of interop

security/fma-supervisor.md

K-Ho · 2025-04-02T21:20:06Z

security/fma-supervisor.md

+    - When the Supervisor attempts to promote the local-safe data to cross-safe, it discovers the invalid message and issues an invalidation and replacement.
+    - The Replacement block is applied to the block which contained the invalid message, and the chain has now reorg’d out all blocks from the invalid message to the safe head (effectively resetting the chain back to the stalled cross-unsafe head).
+- Risk Assessment
+    - Medium Impact, Medium Likelihood.


If a third party bridge or centralized exchange trusts unsafe interop messages, this is a risk. Dangerous scenario:

Chain A includes initiating message for "move 1000 ETH to Chain B"

Chain B sees the unsafe initiating message

Executing message to mint 1000 ETH on Chain B is executed

Attacker moves 1000 ETH out to L1 via a third party bridge or CEX which just trusts the unsafe head

An unsafe chain reorg happens on Chain A, and the initiating message is reorged out (this could happen maliciously, or even via just an L1 reorg)

Attacker has stolen 1000 ETH

Mitigation: Ensure that any applications which enabling moving substantial value out of the Superchain understand this risk, and ideally require making sure the chain is safe

Minor clarification - an L1 reorg is actually not a vector for this, because the unsafe chain isn't based in any L1. And from the other perspective, any L1 reorg which changes initiating messages must also change the executing messages downstream from it. So L1 reorgs apply pretty transparently to the interop protocol.

K-Ho · 2025-04-02T21:29:26Z

security/fma-supervisor.md

+    - Any validators who *do not* rely on the failing Supervisor will see the correct chain, but there are currently no alternative implementations to use.
+    - An output root posted from this incorrect state would be open to be fault-proven.
+- Risk Assessment
+    - Even Higher Impact, Low Likelihood.


Clarifying from the FMA review, the reason this is low likelihood, while 2a is medium likelihood is because 2a could happen even if there are no bugs (e.g. L1 has a major reorg) whereas 2b should never happen unless there is a bug in the supervisor.

This is the most critical impact of any of these. This can allow a user to mint uncapped amounts of ETH, and quickly exit that ETH from the superchain via CEXes, third party bridges, etc.
These bridges/cexes look at the safe head, and in this scenario, their checks would pass and enable these withdrawals.
To figure out:

How much value exited out of the superchain would this be over how much time?

How would we even respond if this happened? somehow halt all chains? pause L2->L1 withdrawals? How would we notify all third party bridges/cexes?
^ also assuming this is a huge amount of $, what do we need to do to properly mitigate this risk (alternative supervisor implementation, audit competiton, etc.?)

security/fma-supervisor.md

smartcontracts · 2025-04-15T14:54:23Z

security/fma-supervisor.md

+        - This pause mechanism should be triggerable via admin API
+        - And also should be responsive to automated Alert/Monitoring based triggers.
+        - Once paused, unpause should be manual to start with.
+    - Optionally: Implement a matching Batcher pause to avoid publishing invalid blocks in batches where avoidable.


We need to make a determination on this, possible it will be required, but need to analyze further. Approving for now, it gets the gist across.

FMA: OP-Supervisor

4214383

tynes added the H-interop label Mar 27, 2025

tynes added this to the Interop RC Beta milestone Mar 27, 2025

tynes requested a review from Copilot March 27, 2025 18:42

Copilot AI reviewed Mar 27, 2025

View reviewed changes

security/fma-supervisor.md Outdated Show resolved Hide resolved

security/fma-supervisor.md Outdated Show resolved Hide resolved

security/fma-supervisor.md Outdated Show resolved Hide resolved

security/fma-supervisor.md Outdated Show resolved Hide resolved

tynes linked an issue Mar 27, 2025 that may be closed by this pull request

Interop op-supervisor FMA #225

Closed

tynes mentioned this pull request Mar 27, 2025

Interop op-supervisor FMA #225

Closed

protolambda reviewed Mar 28, 2025

View reviewed changes

FMA: OP-Supervisor

a6c13c8

tynes mentioned this pull request Mar 31, 2025

[Tracker]: Add impl redundancy to superchain mempool ingress checks ethereum-optimism/optimism#15128

Closed

teddyknox reviewed Mar 31, 2025

View reviewed changes

security/fma-supervisor.md Outdated Show resolved Hide resolved

teddyknox reviewed Mar 31, 2025

View reviewed changes

security/fma-supervisor.md Outdated Show resolved Hide resolved

axelKingsley added 2 commits March 31, 2025 14:59

add FM7

00bdbe5

Address comments

44e79d5

axelKingsley marked this pull request as ready for review April 1, 2025 20:28

tynes reviewed Apr 2, 2025

View reviewed changes

security/fma-supervisor.md Outdated Show resolved Hide resolved

tynes reviewed Apr 2, 2025

View reviewed changes

security/fma-supervisor.md Show resolved Hide resolved

tynes reviewed Apr 2, 2025

View reviewed changes

K-Ho reviewed Apr 2, 2025

View reviewed changes

tynes reviewed Apr 2, 2025

View reviewed changes

security/fma-supervisor.md Show resolved Hide resolved

Address Feedback

0a1070c

smartcontracts reviewed Apr 14, 2025

View reviewed changes

security/fma-supervisor.md Show resolved Hide resolved

security/fma-supervisor.md Show resolved Hide resolved

security/fma-supervisor.md Show resolved Hide resolved

Add Action Items

ea4eb80

smartcontracts reviewed Apr 15, 2025

View reviewed changes

smartcontracts approved these changes Apr 15, 2025

View reviewed changes

axelKingsley merged commit 4053328 into main Apr 15, 2025

axelKingsley deleted the fma-supervisor branch April 15, 2025 16:09

FMA: OP-Supervisor #233

FMA: OP-Supervisor #233

Uh oh!

Conversation

axelKingsley commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

protolambda left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

protolambda Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tynes Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

K-Ho Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

axelKingsley Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

K-Ho Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

smartcontracts Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

axelKingsley commented Mar 26, 2025 •

edited

Loading

K-Ho Apr 2, 2025 •

edited

Loading