fix: warn if OperatorAdded events not in sync #625

petarjuki7 · 2025-09-22T13:15:19Z

Issue Addressed

Adresses issue #519

Proposed Changes

We keep track of all OperatorAdded events in a new variable stored in the database (max_operator_id_seen).
Then, when we get a new OperatorAdded event, we can check does it match the expected OperatorId we should have seen next.

Also, during historical sync we check if we saw any OperatorAdded event, if not, there is likely a sync error.

Copilot

Pull Request Overview

Adds tracking and validation for OperatorAdded events to detect synchronization issues by monitoring sequence gaps and missing events during historical sync.

Introduces max_operator_id_seen field to track the highest operator ID encountered
Validates that OperatorAdded events arrive in sequential order during live sync
Warns if no OperatorAdded events are found during historical synchronization

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
anchor/eth/src/event_processor.rs	Adds validation logic for OperatorAdded event sequence and missing events detection
anchor/database/src/table_schema.sql	Adds max_operator_id_seen column to metadata table
anchor/database/src/state.rs	Implements getter for max_operator_id_seen from database and state
anchor/database/src/sql_operations.rs	Defines SQL queries for retrieving and updating max_operator_id_seen
anchor/database/src/lib.rs	Adds max_operator_id_seen field to SingleState and bump operation

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

anchor/eth/src/event_processor.rs

dknopik

Nice! Here are my thoughts.

If we decide on not persisting the max operator id seen, I think the implementation might be straight forward enough to warrant inclusion in v1.0.0. wdyt?

dknopik · 2025-09-22T13:52:15Z

anchor/database/src/table_schema.sql

    domain_type INTEGER NOT NULL,
-    block_number INTEGER NOT NULL DEFAULT 0 CHECK (block_number >= 0)
+    block_number INTEGER NOT NULL DEFAULT 0 CHECK (block_number >= 0),
+    max_operator_id_seen INTEGER NOT NULL DEFAULT 0


For simplicity, I am not sure if we should persist this. It might be sufficient to keep this as an Option<OperatorId> during sync, initialized on the first seen operatorAdded, and checked subsequently during this anchor instances lifetime

I was thinking about persistence also... I think the problem may be with the historical sync because, if I understood the code correctly, after we do it for the first time we persist the block data, and on the next startup of the anchor instance we only sync from the last_processed_block, but our operator_seen count would start from 0 which could cause inconsistencies.

That is correct. This is why I suggest using an last_operator_add_seen: Option<OperatorId>, initialized to None. To avoid the inconsistency you mentioned, we would not do the check on the first operatorAdded we see, but use it to initialize the last_operator_add_seen. Then we can do the check when receiving more operatorAdded.

The downside is that we do not catch missing operatorsAdded between the last run and the first operatorAdded we get in the current run. I am on the fence whether that is alright.

Ah okay, I understand.
The downside might be fine, because it if it's truly not catching the events, we will find out soon enough?

we would not do the check on the first operatorAdded we see, but use it to initialize the last_operator_add_see

That's the crux of the issue, isn't it? A pruned client could have deleted the state containing some initial operators

anchor/eth/src/event_processor.rs

diegomrsantos · 2025-09-23T16:20:06Z

The current approach might not be enough. Does the operator id start at zero or one and then continue increasing by 1? If so, we need to guarantee that we have seen all of them up to the current value in the contract.

petarjuki7 · 2025-09-23T20:07:19Z

we need to guarantee that we have seen all of them up to the current value in the contract.

I think that was the downside discussed about this approach when we don't persist the max_operator_id_seen variable. We kind of "trust" the first OperatorAdded event we see will initialize the variable with the highest OperatorId.

I think the purpose of this PR is more to warn users that something with syncing is wrong so they configure their EL client correctly.

But I could implement a check that goes through our saved OperatorIds in the database and checks if they are monotonically increasing by 1 up until the max_seen?

diegomrsantos · 2025-09-24T08:52:06Z

What I described is a suggestion about how to check if Anchor has synced correctly starting with an empty DB. Additionally, we also need to persist the highest operator id we have seen to be able to do the check when we start syncing after being offline.

diegomrsantos · 2025-09-24T10:18:45Z

I think we do have to persist the highest_operator_id_seen. It will be created initially with the first value used by the contract.

But I could implement a check that goes through our saved OperatorIds in the database and checks if they are monotonically increasing by 1

This needs to be done during sync, starting with the current value in the DB.

diegomrsantos · 2025-09-24T10:23:30Z

And before we start syncing, we could read the last operator id from the ssv contract and check that see all ids up to this value. But not sure it's necessary

petarjuki7 · 2025-09-24T10:27:31Z

Yeah, I understand the idea, can do it.
Just tagging @dknopik to confirm, if I remember correctly we decided on the non-persistence path yesterday.

dknopik · 2025-09-24T10:30:27Z

@diegomrsantos In yesterdays meeting we discussed that we can also not persist the highest_operator_id_seen and initialize it with the first operator ID seen per sync. This would be sufficient to catch any missing operatorAdded for the lifetime of that instance.

In the meeting, I asked if there is agreement to do it this way for now, as it is a tradeoff for not having to modify the database that closely to the release. While it is of course okay to change one's mind, I believe the simpler option is better for a short term change

dknopik · 2025-09-24T10:30:49Z

@petarjuki7 please rebase onto release-v1.0.0

dknopik · 2025-09-24T13:54:17Z

anchor/eth/src/event_processor.rs

+                but got operator {operator_id}."
+                );
+            }
+            self.db.bump_max_operator_id_seen();


I think it would be a bit cleaner to just have set_max_operator_id_seen

anchor/eth/src/sync.rs

anchor/database/src/sql_operations.rs

anchor/database/src/table_schema.sql

dknopik · 2025-10-30T10:48:21Z

anchor/eth/src/event_processor.rs

+            error!(
+                "Missing OperatorAdded events: database has only seen up to id {max_seen}, \
+                but got operator {operator_id}."
+            );


We should discuss whether we want to keep this a log-only soft error or return an error here to signal that the sync has failed. I think personally, I prefer the latter

The problem I encountered while testing is that by returning an Error(...) type, it gets handled in the process_logs() function, and if its a historical sync it only gets logged as a trace. So it's practically invisible without running the client with RUST_LOG = trace or similar, its not really a "loud warning" as stated in the issue description.
Maybe keep the error!() log and also return an Error() ?

anchor/database/src/table_schema.sql

dknopik

looks good!

anchor/database/src/table_schema.sql

diegomrsantos · 2025-11-12T17:00:27Z

@claude review this PR. Is it worth moving forward with this approach, given https://eips.ethereum.org/EIPS/eip-4444 and the fact that we are currently investigating a solution similar to Ethereum's checkpoint sync to stop relying on all logs existing in execution clients?

claude-code-actions-sigp · 2025-11-12T17:00:47Z

Claude finished @diegomrsantos's task —— View job

Review Progress

Read and analyze changed files
Review code quality and correctness
Assess strategic value given EIP-4444 and checkpoint sync
Provide comprehensive feedback

Code Review

I've reviewed the PR implementation. Here's my assessment:

Implementation Quality

Positives:

Clean migration path: The schema migration from v1 to v2 is properly implemented with the MIGRATION_V1_TO_V2 script (anchor/database/src/schema.rs:16-19)
Good test coverage: The migration test in anchor/database/src/tests/metadata_tests.rs:172-209 validates both the upgrade process and NULL default behavior
Proper NULL handling: Migrated databases correctly have max_operator_id_seen = NULL to distinguish from fresh databases
Sequential validation: The check at anchor/eth/src/event_processor.rs:194-201 properly validates operator IDs are sequential

Issues Found:

DEFAULT 0 vs NULL inconsistency (anchor/database/src/table_schema.sql:8):
- The schema uses DEFAULT 0 but the migration test expects NULL for migrated databases
- @dknopik's latest comment correctly identifies this: fresh databases should start at 0, but the migration adds the column without a default
- Fix needed: Change line 8 to just max_operator_id_seen INTEGER (no default), then update the migration to explicitly set it to 0 for fresh inserts
Logic error in validation (anchor/eth/src/event_processor.rs:195):
- The check max_seen != operatorId - 1 is correct for detecting gaps
- However, this assumes operator IDs start at 1 and increment by 1
- Question: Does the contract guarantee operator IDs start at 1? If they start at 0, the first check will fail
Missing initialization on first operator (anchor/eth/src/event_processor.rs:193-201):
- When max_seen is None (migrated DB), the code skips validation and just sets the current ID
- This is intentional per the discussion, but means we lose gap detection for the first sync after migration
- This is the accepted trade-off mentioned in comments
Warning placement (anchor/eth/src/sync.rs:514-522):
- The warning only triggers after historical sync completes
- Good placement, but consider: what if operators were added before the contract deployment block being synced from?
- Might want to clarify this is "no operators found since block X"

Strategic Assessment: EIP-4444 and Checkpoint Sync

You asked whether this approach is worth pursuing given EIP-4444 and checkpoint sync plans. Here's my analysis:

Short Answer: This is a valuable transitional solution, but has limited long-term value.

Detailed Reasoning:

EIP-4444 Impact:
- EIP-4444 allows execution clients to prune logs older than ~1 year
- This PR's approach will fail once logs are pruned, because:
  - max_operator_id_seen needs full historical sync to detect gaps
  - If logs are missing, the monotonic check becomes unreliable
  - The warning at sync.rs:521 will trigger constantly
Checkpoint Sync Alternative:
- A checkpoint sync approach would provide:
  - Initial state snapshot (including all operator IDs)
  - Ability to verify completeness against snapshot
  - No dependency on historical logs
- This makes the max_operator_id_seen approach obsolete
Current Value:
- Before EIP-4444 is widely adopted: This provides user value by detecting misconfigured EL clients
- Detection window: Useful for ~1 year after deployment (until pruning kicks in)
- User experience: Helps users diagnose sync issues NOW, when they need it

Recommendation:

Proceed with this PR with the understanding that:

It's a tactical fix for the current problem (issue Loudly warn if logs are not being served #519)
It has a limited lifespan (~1-2 years until EIP-4444 adoption)
It should be documented as temporary with a note about EIP-4444 limitations
Consider adding a migration path or deprecation plan for when checkpoint sync is ready

Alternative Consideration:

If checkpoint sync is coming soon (3-6 months), you might:

Skip this PR and focus efforts on checkpoint sync instead
Add a simpler check (e.g., "warn if <10 operators found") without persistence
Document the limitation in release notes and recommend users verify EL configuration

Trade-offs:

Aspect	This PR	Wait for Checkpoint Sync
User value today	✅ Immediate detection	❌ No help for current issues
Long-term maintenance	⚠️ Will need removal later	✅ Permanent solution
Development effort	⚠️ ~1 day work, now	⚠️ ~weeks of work, later
EIP-4444 compatibility	❌ Breaks when logs pruned	✅ Works with pruned logs

Required Changes

Before merging, address:

Fix DEFAULT vs NULL in table_schema.sql:8 per dknopik's comment
Clarify operator ID starting value - verify the contract starts at 1, not 0
Add documentation about EIP-4444 limitations in code comments
Consider adding an issue reference in comments for future removal when checkpoint sync arrives

Verdict

The implementation is sound for its intended purpose, but its strategic value is time-limited. If you need a solution NOW and checkpoint sync is >6 months away, merge it. If checkpoint sync is imminent, consider simpler alternatives.

petarjuki7 · 2025-11-13T09:44:41Z

Clarify operator ID starting value - verify the contract starts at 1, not 0

It does, but I agree it's a good thing to point out / double check

Good placement, but consider: what if operators were added before the contract deployment block being synced from?

By definition they shouldn't be.

For other points I agree but EIP-4444 doesn't seem to be arriving soon, and there will be a number of users receiving this error (unfortunately) so I think it makes sense to include it into 1.2.0.

jking-aus · 2025-11-14T08:12:18Z

hey just chiming in to say four fours is realistically not happening in the next two forks so we dont need to worry about that. We are however having a consistent problem where users aren't aware their sync is working because of missing receipts. So we need something like this PR -- good work Petar.

petarjuki7 · 2025-12-05T14:10:49Z

Forgot to mention here, I tested the happy case, works fine.
I simulated an error by just adding if operatorId == 15 { early return; }, and it showed errors correctly (picture attached)

petarjuki7 requested review from Zacholme7, diegomrsantos and dknopik September 22, 2025 13:15

petarjuki7 self-assigned this Sep 22, 2025

diegomrsantos requested a review from Copilot September 22, 2025 13:42

Copilot AI reviewed Sep 22, 2025

View reviewed changes

anchor/eth/src/event_processor.rs Show resolved Hide resolved

anchor/eth/src/event_processor.rs Outdated Show resolved Hide resolved

dknopik reviewed Sep 22, 2025

View reviewed changes

petarjuki7 force-pushed the warn_missing_logs branch from f52b3f6 to 8764b86 Compare September 24, 2025 11:57

petarjuki7 changed the base branch from unstable to release-v1.0.0 September 24, 2025 11:58

petarjuki7 requested a review from dknopik September 24, 2025 12:44

dknopik reviewed Sep 24, 2025

View reviewed changes

dknopik changed the base branch from release-v1.0.0 to unstable October 20, 2025 13:20

petarjuki7 added 4 commits October 27, 2025 18:54

fix: warn if OperatorAdded events not in sync

38f81a1

small refactor

a6e1e5a

change of error trigger and handling

629e020

persist max_operator_id_seen

2b04394

petarjuki7 force-pushed the warn_missing_logs branch from 9f5f856 to 2b04394 Compare October 28, 2025 09:52

dknopik reviewed Oct 30, 2025

View reviewed changes

anchor/database/src/sql_operations.rs Outdated Show resolved Hide resolved

dknopik reviewed Oct 30, 2025

View reviewed changes

anchor/database/src/table_schema.sql Outdated Show resolved Hide resolved

dknopik reviewed Oct 30, 2025

View reviewed changes

petarjuki7 and others added 2 commits November 4, 2025 10:53

Update SQL operations

b85d45e

Merge branch 'unstable' into warn_missing_logs

7563b62

Updated SQL schema

285b6d5

petarjuki7 force-pushed the warn_missing_logs branch 2 times, most recently from 4b318bb to 285b6d5 Compare November 7, 2025 13:26

petarjuki7 added 2 commits November 7, 2025 14:31

chore: remove Cargo.lock to sync with unstable

469a955

Merge branch 'unstable' into warn_missing_logs

33c5cff

dknopik reviewed Nov 7, 2025

View reviewed changes

anchor/database/src/table_schema.sql Outdated Show resolved Hide resolved

dknopik added the v1.2.0 label Nov 7, 2025

petarjuki7 and others added 2 commits November 10, 2025 04:56

add database migration

3b22d81

Merge branch 'unstable' into warn_missing_logs

b4d7979

petarjuki7 added the ready-for-review This PR is ready to be reviewed label Nov 10, 2025

petarjuki7 requested a review from dknopik November 10, 2025 04:13

dknopik reviewed Nov 12, 2025

View reviewed changes

anchor/database/src/table_schema.sql Outdated Show resolved Hide resolved

dknopik added waiting-on-author and removed ready-for-review This PR is ready to be reviewed labels Nov 12, 2025

petarjuki7 and others added 2 commits November 12, 2025 17:03

fix tests with proper operator_id order

8672a61

Merge branch 'unstable' into warn_missing_logs

d5d425f

petarjuki7 added ready-for-review This PR is ready to be reviewed and removed waiting-on-author labels Nov 12, 2025

dknopik removed the v1.2.0 label Dec 9, 2025

fix: warn if OperatorAdded events not in sync #625

Are you sure you want to change the base?

fix: warn if OperatorAdded events not in sync #625

Uh oh!

Conversation

petarjuki7 commented Sep 22, 2025

Issue Addressed

Proposed Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

dknopik left a comment

Choose a reason for hiding this comment

Uh oh!

dknopik Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

petarjuki7 Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

dknopik Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

petarjuki7 Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

diegomrsantos Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

diegomrsantos commented Sep 23, 2025

Uh oh!

petarjuki7 commented Sep 23, 2025

Uh oh!

diegomrsantos commented Sep 24, 2025

Uh oh!

diegomrsantos commented Sep 24, 2025

Uh oh!

diegomrsantos commented Sep 24, 2025

Uh oh!

petarjuki7 commented Sep 24, 2025

Uh oh!

dknopik commented Sep 24, 2025

Uh oh!

dknopik commented Sep 24, 2025

Uh oh!

dknopik Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dknopik Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

petarjuki7 Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dknopik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

diegomrsantos commented Nov 12, 2025

Uh oh!

claude-code-actions-sigp bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review Progress

Code Review

Implementation Quality

Strategic Assessment: EIP-4444 and Checkpoint Sync

Required Changes

dknopik Sep 22, 2025 •

edited

Loading

claude-code-actions-sigp bot commented Nov 12, 2025 •

edited

Loading