Add Raindex Event Sourcing Sync Specification V1 #41

hardyjosh · 2025-10-15T13:23:26Z

Motivation

We need a clear spec and approach to validate our implementation against.

Solution

Added Raindex Event Sourcing Sync Specification V1 documentation detailing the architecture, design goals, synchronization flow, error handling, and remote server API for a local-first event-sourced system.

Checks

By submitting this for review, I'm confirming I've done the following:

made this PR as small as possible
unit-tested any new functionality
linked any relevant issues or PRs
included screenshots (if this involves a front-end change)

Summary by CodeRabbit

Documentation
- Added Event Sourcing Sync Specification V1 detailing architecture for local SQLite-based blockchain data synchronization with multi-network/orderbook support, live RPC sync, remote bootstrap, and query-while-sync functionality.

Added Raindex Event Sourcing Sync Specification V1 documentation detailing the architecture, design goals, synchronization flow, error handling, and remote server API for a local-first event-sourced system.

coderabbitai · 2025-10-15T13:23:54Z

Walkthrough

Introduces a comprehensive Raindex Event Sourcing Sync Specification V1 document defining a local-first, event-sourced system for synchronizing blockchain data into SQLite. The specification covers RPC-based live synchronization, remote bootstrap, multi-network and multi-orderbook support, database schema, sync coordination flows, and a Remote Server API.

Changes

Cohort / File(s)	Summary
Raindex Event Sourcing Sync Specification Specification document	Adds complete specification for local-first event-sourced sync system including: architecture components (Local Client, Remote Server, Settings), extended YAML configuration schema (networks, orderbooks, remotes, sync parameters), comprehensive database schema (sync state, events, state, side effects, tracking tables), RPC synchronization flow with reorg detection and recovery, remote bootstrap and merge semantics, per-orderbook coordination and query-while-sync support, error handling and retry strategies, observability metrics, UI progress events, and Remote Server API endpoints with dump structure

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "Add Raindex Event Sourcing Sync Specification V1" directly and clearly describes the primary change in the changeset. The raw summary confirms that the main modification is the addition of a comprehensive specification document for the Raindex Event Sourcing Sync system. The title is concise, uses specific terminology that conveys the exact nature of the change, and avoids vague or generic language. A teammate scanning the repository history would immediately understand that this PR introduces a new specification document, making the title both clear and sufficiently descriptive.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch 2025-10-15-local-db-spec

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abcbe3f and b99191d.

📒 Files selected for processing (1)

i9r/raindex-local-db (1 hunks)

coderabbitai · 2025-10-15T13:28:09Z

i9r/raindex-local-db

+1. **Sync State Tables**: Track synchronization progress per orderbook
+2. **Event Tables**: Raw logs and processed events (orders, trades, deposits, withdrawals)
+3. **State Tables**: Current state (vaults, vault balances)
+4. **Side Effect Tables**: Token info, interpreter bytecode, store Set events
+5. **Tracking Tables**: e.g. Which stores are being monitored per orderbook
+
+## Multi-Orderbook Coordination
+
+### Independent Sync
+
+Each orderbook syncs independently and maintains its own state:
+
+- **Separate sync_state**: Each orderbook has its own record tracking block number, block hash, and sync status
+- **Independent error handling**: One orderbook failing doesn't affect others
+- **Different sync speeds**: Networks with different block times sync at different rates
+- **Flexible configuration**: Each orderbook can use different remotes or sync purely from RPC
+- **Per-orderbook tracking**: Each orderbook tracks its own stores via tracked_stores table
+- **Concurrent syncing**: Multiple orderbooks can sync simultaneously
+
+**Benefits of independent sync:**
+- Add/remove orderbooks without affecting existing syncs
+- Each orderbook's data is always internally consistent
+- Different networks progress at their natural pace
+- Errors are isolated and don't cascade
+
+### Shared Side Effects
+
+While orderbooks sync independently, certain data is shared across all orderbooks:
+
+#### Token Info (token_info table)
+- **Scope**: One record per network:token (not per orderbook)
+- **Rationale**: Token metadata is network-level data, not orderbook-specific
+- **Benefit**: Fetching token info for one orderbook benefits all orderbooks on same network
+
+#### Interpreter Bytecode (interpreter_bytecode table)
+- **Scope**: One record per network:interpreter
+- **Rationale**: Bytecode is immutable and not orderbook-specific
+- **Benefit**: Multiple orderbooks can reference same interpreter without duplication
+
+#### Store Set Events (store_set_events table)
+- **Scope**: Filtered by orderbook context, but stores can be shared
+- **Rationale**: Stores are shared contracts that multiple orderbooks may use
+- **Benefit**: If multiple orderbooks use the same store, Set events are available to all
+
+**Implications of shared side effects:**
+- Side effect tables use INSERT OR IGNORE pattern during merge
+- Cannot simply delete and replace side effect data during remote merge
+- Store Set events must be carefully filtered by orderbook transaction context


⚠️ Potential issue | 🟠 Major

Resolve the contradictory guidance on store_set_events.

This section classifies store_set_events as a shared side-effect table that must be preserved with INSERT OR IGNORE semantics, yet the merge policy later lists it among orderbook-specific tables to delete and replace. The reorg rollback procedure also deletes it while asserting side-effect tables are not rolled back. Implementers cannot act on these conflicting instructions, and following the delete/replace path would violate the stated side-effect guarantees. Please consolidate the specification so store_set_events has a single, consistent lifecycle (either fully shared side effect or fully orderbook-scoped) and update all affected sections accordingly.

Also applies to: 408-419

🤖 Prompt for AI Agents

In i9r/raindex-local-db around lines 78 to 125 (and also apply same change to lines 408–419), the spec gives contradictory lifecycle rules for store_set_events by calling it both a shared side-effect table (preserved with INSERT OR IGNORE and not rolled back) and an orderbook-specific table that is deleted/replaced during remote merges and reorg rollbacks; consolidate to one clear policy: decide whether store_set_events is shared (network-scoped) or orderbook-scoped, then update every mention to match that choice — if shared, mark it as a side-effect table, remove it from delete/replace and rollback lists, and add INSERT OR IGNORE merge semantics and filtering notes; if orderbook-scoped, remove it from the side-effect section, document delete-and-replace merge behavior and include it in rollback/delete lists; apply the change consistently in both locations and any related reorg/merge procedure descriptions.

coderabbitai · 2025-10-15T13:28:09Z

i9r/raindex-local-db

+- **Replace on hash mismatch**: If block numbers match but hashes differ (reorg), trust remote
+- **Preserve side effects**: Never overwrite existing side effect data
+- **Atomic operation**: All merges happen within a transaction
+
+#### Transaction Boundaries
+
+Remote dump merges are atomic operations:
+```
+BEGIN TRANSACTION
+  - Validate schema version compatibility
+  - For each orderbook-specific table: delete and replace if criteria met
+  - For each side effect table: insert new records only (preserve existing)
+  - Update sync_state with remote's block number and hash
+COMMIT TRANSACTION
+```
+
+If any step fails during merge, the entire transaction rolls back and the local database remains in its previous consistent state.
+
+#### Detailed Merge Rules
+
+**For orderbook-specific tables** (orders, trades, vaults, vault_balance_changes, raw_logs, tracked_stores, store_set_events):
+```
+IF remote.last_synced_block > local.last_synced_block 
+   OR (remote.last_synced_block == local.last_synced_block 
+       AND remote.block_hash != local.block_hash):
+
+  DELETE FROM <table> WHERE network = ? AND orderbook_address = ?
+  INSERT all records from dump
+
+ELSE:
+  Skip (local is current or ahead)
+```
+
+**For side effect tables** (token_info, interpreter_bytecode):
+```
+For each record in dump:
+  IF NOT EXISTS (SELECT 1 FROM <table> WHERE id = dump_record.id):
+    INSERT dump_record
+  ELSE:
+    SKIP (preserve existing local data)
+```


⚠️ Potential issue | 🟠 Major

Allow remote dumps to upgrade side-effect records.

The merge rule “insert only when missing, otherwise skip” prevents remote dumps from improving existing side-effect rows. If a local client previously recorded token_info.fetch_succeeded = 0 (or stale metadata) and the remote has a successful, up-to-date record, the insert-or-ignore policy will retain the failed local row indefinitely. That negates the benefit of remote bootstraps delivering richer metadata. Please adjust the spec to permit safe updates (e.g., compare timestamps or prefer successful remote fetches) so remote side-effect data can replace incomplete local state.

🤖 Prompt for AI Agents

i9r/raindex-local-db around lines 388-428: the current spec forbids updating existing side-effect rows (token_info, interpreter_bytecode), preventing remote dumps from improving stale or failed local records; change the merge rules to allow safe updates by replacing a local row only when the remote row is demonstrably newer or better—for example, if remote.last_updated > local.last_updated OR (remote.fetch_succeeded = 1 AND local.fetch_succeeded = 0), otherwise skip; apply these comparisons per-record inside the same transaction (preserve atomicity), and ensure updates are limited to fields provided by the dump (don’t clobber unrelated local fields) to avoid regressing local improvements.

Removed unnecessary RPC sync configuration and clarified remote checking process.

coderabbitai

Actionable comments posted: 10

♻️ Duplicate comments (2)

i9r/raindex-local-db (2)

115-124: Unify lifecycle and merge/rollback rules for store_set_events (contradictions).

It’s called a shared side-effect (Lines 115-124), but is deleted/replaced on remote merges (Lines 406-419) and rolled back on reorg (Lines 258-264). The dump section also treats it as orderbook-scoped (Lines 641-646). This is internally inconsistent and not implementable.

Additionally, “shared across orderbooks” conflicts with “filter Set events to only those in orderbook transaction hashes” (orderbook-scoped by context).

Choose one model and apply it consistently across all sections:

Option A (recommended for correctness): Treat store_set_events as orderbook-scoped artifacts derived by filtering global store Set logs by orderbook tx context. Remove it from “Shared Side Effects,” keep it in delete/replace and reorg rollback lists, and ensure dumps are orderbook-filtered.

Option B: Treat store_set_events as globally shared side-effects. Then remove it from delete/replace and rollback lists, keep only INSERT/UPSERT semantics, and clarify how cross‑orderbook Set events affect each orderbook’s state.

Provide a single authoritative definition and update “Implications,” “Reorg Recovery,” “Merge Policy,” and “Dump Generation” accordingly.

Also applies to: 258-264, 406-419, 641-646

419-426: Permit safe upgrades of side‑effect rows (token_info, interpreter_bytecode).

“Insert if missing, otherwise skip” blocks remote improvements (e.g., upgrading failed or stale token_info). Change to UPSERT with safe precedence, e.g.:

Prefer remote when remote.last_updated > local.last_updated, or remote.fetch_succeeded = 1 and local.fetch_succeeded = 0.

Limit updates to provided columns to avoid clobbering unrelated local fields.

Also applies to: 120-124

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b99191d and df466e6.

📒 Files selected for processing (1)

i9r/raindex-local-db (1 hunks)

coderabbitai · 2025-10-24T12:05:56Z

i9r/raindex-local-db

+networks:
+  arbitrum:
+    rpcs:
+      - https://arbitrum-one-rpc.publicnode.com
+    chain-id: 42161
+    network-id: 42161
+    currency: ETH
+


🧹 Nitpick | 🔵 Trivial

Clarify chain-id vs network-id to avoid ambiguity.

Having both chain-id and network-id invites misconfiguration. Prefer chain-id (EIP‑155) as the canonical identifier; drop or clearly define network-id usage.

If network-id is required by any component, please point to that dependency so the spec can document the exact need and mapping.

🤖 Prompt for AI Agents

In i9r/raindex-local-db around lines 36 to 43, the config currently includes both chain-id and network-id which can cause ambiguity; update the spec to use chain-id (EIP-155) as the canonical identifier by removing the network-id field (or, if some legacy component requires it, rename it to network-id-legacy and add a comment linking to the dependent component and its exact requirement); if you must keep network-id, add a precise comment documenting the dependency (with repo/package and version) and the mapping between network-id and chain-id so the spec is unambiguous.

coderabbitai · 2025-10-24T12:05:56Z

i9r/raindex-local-db

+  6. For each batch (with max_concurrent_batches parallelism):
+     a. REORG CHECK: Fetch block_hash for last_synced_block from RPC
+        - Compare with last_synced_block_hash in sync_state
+        - If hashes don't match: REORG DETECTED → execute reorg recovery
+
+     b. Fetch logs via eth_getLogs filtered by orderbook address
+


⚠️ Potential issue | 🔴 Critical

Per‑orderbook batch concurrency risks atomicity; enforce single in‑flight DB batch per orderbook.

“max_concurrent_batches” within the same orderbook can race on last_synced_block, commit out of order, undermine reorg checks (6a), and violate “complete, consistent state per block.” Allow parallel RPC fetching, but serialize DB transactions and sync_state updates per orderbook (single in‑flight batch). Explicitly require a per‑orderbook mutex/advisory lock and deterministic commit order.

Also applies to: 218-221

coderabbitai · 2025-10-24T12:05:56Z

i9r/raindex-local-db

+     d. BEGIN TRANSACTION
+
+     e. Insert raw logs into raw_logs table
+
+     f. Extract transaction hashes from raw_logs
+
+     g. Parse AddOrder logs to identify new store addresses


⚠️ Potential issue | 🟠 Major

Define canonical log ordering and idempotency constraints.

Add explicit processing order by (block_number ASC, transaction_index ASC, log_index ASC). Specify unique keys and UPSERTs for raw_logs and derived tables to ensure idempotency under retries/merges. Example constraints:

raw_logs: UNIQUE(network, block_number, transaction_hash, log_index)

orders/trades/vault_balance_changes/store_set_events: deterministic primary keys and ON CONFLICT DO NOTHING/UPSERT as appropriate.

Also applies to: 206-214

🤖 Prompt for AI Agents

In i9r/raindex-local-db around lines 190-196 (and also apply same changes to lines 206-214), the processing lacks a canonical deterministic log ordering and idempotency guarantees; enforce processing order by sorting by (block_number ASC, transaction_index ASC, log_index ASC) when selecting/iterating logs, add a UNIQUE constraint on raw_logs like (network, block_number, transaction_hash, log_index), and change inserts into raw_logs and derived tables (orders, trades, vault_balance_changes, store_set_events) to use deterministic primary keys and UPSERT semantics (ON CONFLICT DO NOTHING or ON CONFLICT (...) DO UPDATE SET ... with deterministic columns) so repeated runs/merges are idempotent; ensure derived-table key choices are stable (e.g., composed from network + transaction_hash + log_index or an actual domain PK) and document the ordering/constraints in the code comments where the steps (d–g) are implemented.

coderabbitai · 2025-10-24T12:05:56Z

i9r/raindex-local-db

+     h. Fetch Store Set events:
+        - Query tracked_stores for this orderbook to get all store addresses
+        - For each tracked store:
+          * Fetch Set events via eth_getLogs for batch block range
+          * Filter Set events to only those in orderbook transaction hashes
+          * Insert Set event raw_logs and store_set_events records
+


🧹 Nitpick | 🔵 Trivial

Reduce RPC load when fetching Set events across many stores.

Clarify implementation to:

Use topics[0] filter for Set events and pass multiple addresses per eth_getLogs call (chunked to provider limits).

Cache negative results per (store, block_range).

Bound per‑batch address fan‑out to avoid provider timeouts.

coderabbitai · 2025-10-24T12:05:56Z

i9r/raindex-local-db

+Note: Side effect tables (token_info, interpreter_bytecode) are NOT rolled back
+as they represent immutable on-chain data. tracked_stores is also not rolled back
+as stores remain relevant even after a reorg.


⚠️ Potential issue | 🟡 Minor

tracked_stores not rolled back can cause stale scans; add validation/pruning.

If a reorg eliminates an AddOrder that introduced a store, keeping tracked_stores forever causes unnecessary Set‑event scans. Add a mechanism to:

Mark entries “suspect” when rolling back past first_seen_block and re‑validate, or

Maintain last_seen_block and periodically prune stores not referenced by any canonical AddOrder up to last_synced_block.

Also applies to: 197-205

🤖 Prompt for AI Agents

In i9r/raindex-local-db around lines 277-279 (also apply same fix to lines 197-205), tracked_stores is never rolled back which can cause stale Set-event scans after reorgs; implement a two-part fix: when rolling back past a store's first_seen_block mark that tracked_store entry as “suspect” (add a suspect flag and timestamp) so the next scan will re-validate the store against canonical AddOrder data, and additionally maintain/track last_seen_block for each store and run a periodic pruning job that, up to last_synced_block, removes stores not referenced by any canonical AddOrder (or marks them stale) to avoid perpetual scanning of orphaned stores. Ensure rollback code sets suspect=true when appropriate and the periodic job queries canonical AddOrders up to last_synced_block to prune or clear suspect flags accordingly.

coderabbitai · 2025-10-24T12:05:57Z

i9r/raindex-local-db

+## Remote Bootstrap
+
+### Remote Discovery
+
+When local client determines it needs to bootstrap from remote:
+```
+1. Fetch remote metadata: GET {remote_url}/metadata
+   Response: {
+     schema_version: 1,
+     orderbooks: [
+       {
+         network: "mainnet",
+         address: "0x...",
+         last_block: 18500000,
+         block_hash: "0xabcd...",
+         timestamp: 1704067200
+       }
+     ]
+   }
+
+2. Validate schema_version matches local
+
+3. For each configured orderbook that exists in remote:
+   a. Compare remote's last_block and block_hash with local sync_state
+   b. If remote is ahead OR block_hashes differ at same height:
+      - Fetch dump: GET {remote_url}/dump/{network}/{orderbook}
+   c. Otherwise skip (local is current)
+```
+


⚠️ Potential issue | 🟠 Major

Serialize remote merges with RPC sync for the same orderbook.

Remote merge “within a transaction” is not sufficient if RPC sync is concurrently running on that orderbook. Specify an exclusive per‑orderbook lock so remote merge pauses RPC processing (and vice versa) to avoid interleaved writes and torn sync_state.

Also applies to: 390-404

🤖 Prompt for AI Agents

In i9r/raindex-local-db around lines 351 to 379 (also apply same change at 390-404), remote merge logic currently runs "within a transaction" but does not prevent concurrent RPC sync tasks for the same orderbook from interleaving writes; add an exclusive per-orderbook lock (e.g., keyed mutex by network+orderbook) that must be acquired by both the remote-merge path and the RPC-sync path before performing any dump/apply or sync_state updates, hold the lock for the duration of the merge/sync (including reading/writing sync_state), and ensure lock acquisition is timeout- or cancel-aware and released in all error paths so remote merges and RPC syncs are fully serialized for the same orderbook.

coderabbitai · 2025-10-24T12:05:57Z

i9r/raindex-local-db

+**Resolution**: Trust the remote dump. Rationale:
+- Remote servers typically have better infrastructure and connectivity
+- Remote servers are more likely to be on the canonical chain
+- Local can detect its own reorgs via block hash checks during sync
+- If remote is wrong, local will detect mismatch on next RPC sync and self-correct
+```
+Action when hash mismatch detected:
+1. Log warning about potential reorg
+2. Replace local data with remote dump (per merge policy)
+3. Resume RPC sync from remote's last_synced_block
+4. RPC sync will validate chain continuity and detect if remote was on wrong fork
+```


⚠️ Potential issue | 🟠 Major

Don’t blindly “trust remote” on hash mismatch; add canonicality checks + finality guard.

Before replacing local state:

Verify remote.block_hash with at least one independent RPC for the same height and chainId.

Require remote.last_block ≤ (current_block − finality_depth) to avoid importing non‑final blocks.

If verification fails, skip merge and continue RPC sync.

coderabbitai · 2025-10-24T12:05:57Z

i9r/raindex-local-db

+   c. Include side effect tables (full tables, not filtered):
+      - token_info (all networks)
+      - interpreter_bytecode (all networks)
+


🧹 Nitpick | 🔵 Trivial

Scope side‑effect tables in dumps to relevant networks to keep artifacts small.

Including all networks’ side‑effects in every dump inflates size and bandwidth. Restrict token_info/interpreter_bytecode in dumps to the network(s) referenced by the orderbook, or make inclusion configurable.

Also applies to: 641-646

coderabbitai · 2025-10-24T12:05:57Z

i9r/raindex-local-db

+Queries MUST filter by `block_number <= last_synced_block` to ensure they never see partial state.
+
+Example:
+```sql
+SELECT * FROM orders 
+WHERE network = ? 
+  AND orderbook_address = ?
+  AND added_block <= (
+    SELECT last_synced_block 
+    FROM sync_state 
+    WHERE network = ? AND orderbook_address = ?
+  )
+```


🧹 Nitpick | 🔵 Trivial

Generalize “safe query” guidance for multi-table reads.

Extend the example to joins and derived tables: every participating table must be filtered to rows with block_number/added_block ≤ last_synced_block for the same (network, orderbook). Consider providing SQL views or a helper that injects these predicates to avoid foot‑guns.

coderabbitai · 2025-10-24T12:05:57Z

i9r/raindex-local-db

+Returns metadata about available dumps.
+```json
+{
+  "schema_version": 1,
+  "dumps": [
+    {
+      "network": "mainnet",
+      "orderbook": "0x1234...",
+      "last_block": 18500000,
+      "block_hash": "0xabcd...",
+      "timestamp": 1704067200,
+      "size_bytes": 52428800,
+      "url": "/dump/mainnet/0x1234..."
+    }
+  ]
+}
+```


⚠️ Potential issue | 🟠 Major

Strengthen metadata and artifact integrity.

Augment metadata with:

chainId and genesisBlockHash (not just “network” string).

sha256 (or stronger) of the gz dump, size_bytes, and content schema hash.

Optionally a signature to authenticate the dump origin.
Clients must verify these before merge.

Also applies to: 357-370

🤖 Prompt for AI Agents

In i9r/raindex-local-db around lines 615 to 631 (and also apply same changes to lines 357 to 370), the dump metadata is too weak — replace/augment the current "network" string with explicit chainId and genesisBlockHash fields, add a sha256 checksum for the gzipped dump, ensure size_bytes is accurate, add a content_schema_hash (hash of the dump's JSON schema or canonicalized content) and an optional signature field (base64) to authenticate origin; update the JSON schema_version if needed and ensure any code that produces or reads these dumps computes and validates sha256, content_schema_hash and signature before accepting a dump.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (12)

i9r/raindex-local-db (12)
36-43: Clarify chainId vs network-id; prefer chainId (EIP‑155).

Having both invites misconfig. Keep chainId canonical; remove or clearly mark network-id as legacy.
 networks:
   arbitrum:
     rpcs:
       - https://arbitrum-one-rpc.publicnode.com
-    chain-id: 42161
-    network-id: 42161
+    chain-id: 42161  # Canonical EIP-155 chainId
+    # network-id removed to avoid ambiguity; if a legacy component needs it,
+    # name it `network-id-legacy` and document the dependency and mapping.
     currency: ETH
115-124: Contradictory lifecycle for store_set_events; pick one model or split tables.

You mark store_set_events “shared side effects” but also delete/replace on merge and roll them back on reorg. This is internally inconsistent and unimplementable.

Two viable fixes (choose one and apply consistently):

Option A (split, recommended):

Keep a network‑scoped side‑effect table storing all Set events per store (store_set_events_raw) — never rolled back.

Materialize per‑orderbook filtered rows into ob_store_set_events — delete/replace on merge and roll back on reorg.

Option B (orderbook‑scoped only):

Treat store_set_events as strictly orderbook‑scoped everywhere; remove it from “shared side effects.”

Suggested edits:
- #### Store Set Events (store_set_events table)
- - Scope: Filtered by orderbook context, but stores can be shared
- - Rationale: Stores are shared contracts that multiple orderbooks may use
- - Benefit: If multiple orderbooks use the same store, Set events are available to all
+ #### Store Set Events
+ - Side-effect (network-scoped): `store_set_events_raw` — all Set events per (network, store_address); never rolled back.
+ - Orderbook-scoped derived: `ob_store_set_events` — subset where tx_hash belongs to the orderbook’s txs; rolled back on reorg and replaced on remote merge.
- **For orderbook-specific tables** (..., tracked_stores, store_set_events):
+ **For orderbook-specific tables** (..., tracked_stores, ob_store_set_events):
-   - Delete all records where block_number > common_ancestor_block:
+   - Delete all records where block_number > common_ancestor_block:
      * raw_logs
      * orders ...
      * vault_balance_changes
      * trades
-     * store_set_events
+     * ob_store_set_events
- Note: Side effect tables (token_info, interpreter_bytecode) are NOT rolled back
- as they represent immutable on-chain data. tracked_stores is also not rolled back
- as stores remain relevant even after a reorg.
+ Note: Side-effect tables (token_info, interpreter_bytecode, store_set_events_raw) are NOT rolled back.
+ Orderbook-scoped derived tables (e.g., ob_store_set_events) ARE rolled back.
Also applies to: 258-265, 277-279, 406-417

181-186: Serialize DB batches per orderbook; allow parallel fetch only.

Parallel in‑orderbook DB transactions race on last_synced_block and can commit out of order, breaking atomicity/reorg guarantees. Use a per‑orderbook mutex; one in‑flight DB batch at a time. Parallelize RPC fetches only.
- 6. For each batch (with max_concurrent_batches parallelism):
+ 6. Fetch multiple ranges in parallel if desired, but:
+    - Acquire a per-(network, orderbook) mutex before BEGIN TRANSACTION.
+    - At most one DB batch may be in-flight per orderbook (writes serialized).
+    - Release the mutex after COMMIT/ROLLBACK.
- 2. Transaction Boundaries: Every batch is atomic. All operations (...) happen within a single transaction.
+ 2. Transaction Boundaries: Every batch is atomic. DB writes are serialized per orderbook (single in-flight batch). RPC fetches may run in parallel, but their results must queue behind the mutex before writing.
-local-db-sync:
-    batch-size: 2000
-    max-concurrent-batches: 5
+local-db-sync:
+    batch-size: 2000
+    max-concurrent-fetches: 5   # fetch concurrency
+    serialize-db-batches: true  # enforce single writer per orderbook
Also applies to: 218-221, 239-241

192-205: Add canonical log ordering and idempotent constraints/UPSERTs.

Specify sort by (block_number ASC, transaction_index ASC, log_index ASC) and enforce UNIQUE keys to make retries/merges safe.
+ Processing order: strictly sort logs by (block_number ASC, transaction_index ASC, log_index ASC).
Constraints (documentation snippet):
-- raw_logs
UNIQUE(network, block_number, transaction_hash, log_index);

-- orders/trades/vault_balance_changes/ob_store_set_events
-- choose deterministic PKs (e.g., network + transaction_hash + log_index [+ role])
-- and use UPSERTs:
INSERT INTO ... ON CONFLICT (...) DO UPDATE SET ...;  -- or DO NOTHING where correct
Also applies to: 206-214, 230-238

199-205: Reduce RPC load for Set events (batch addresses + topics).

Batch store addresses per eth_getLogs call using topics[0] = Set signature, chunked to provider limits; cache negatives per (store, range).
-  h. Fetch Store Set events:
-     - Query tracked_stores ... get all store addresses
-     - For each tracked store:
-         * Fetch Set events via eth_getLogs for batch block range
+  h. Fetch Store Set events:
+     - Query tracked_stores to list store addresses.
+     - Call eth_getLogs with topics[0]=Set(...) and batched address arrays (chunked to provider limits) for the batch range.
+     - Cache negative results per (store, block_range) to avoid repeat calls.
277-279: tracked_stores not rolled back → stale scans; add suspect/pruning.

Reorgs that remove AddOrder can orphan stores. Mark suspect when rolling back past first_seen_block and prune if not referenced by any canonical AddOrder up to last_synced_block.
- tracked_stores is also not rolled back as stores remain relevant even after a reorg.
+ tracked_stores is not rolled back, but:
+ - If rollback passes a store's first_seen_block, set tracked_stores.suspect = 1, suspect_at = now().
+ - A periodic job clears suspect or deletes the entry if no canonical AddOrder references the store up to last_synced_block.
390-404: Serialize remote merges with RPC sync for the same orderbook.

A DB transaction alone doesn’t prevent interleaving with RPC sync. Require an exclusive per‑orderbook lock for both paths.
- Remote dump merges are atomic operations:
+ Remote dump merges are atomic and mutually exclusive with RPC sync:
  BEGIN TRANSACTION
-  ...
+  - Acquire per-(network, orderbook) exclusive lock/mutex (timeout/cancel-aware).
+  - Pause/skip RPC sync while the lock is held.
   ...
  COMMIT TRANSACTION
Also applies to: 351-370

384-389: Permit safe upgrades to side‑effect rows during merge.

Insert‑only blocks improvements (e.g., failed token_info retained forever). Allow updates when the dump is newer/better.
- **Preserve side effects**: Never overwrite existing side effect data
+ **Preserve side effects, but allow safe upgrades**:
+   - Update when remote.last_updated > local.last_updated OR
+     (remote.fetch_succeeded = 1 AND local.fetch_succeeded = 0).
+   - Only update fields present in the dump; don’t clobber unrelated local fields.
- For each record in dump:
-   IF NOT EXISTS (...):
-     INSERT
-   ELSE:
-     SKIP
+ For each record in dump:
+   INSERT ... ON CONFLICT(id) DO UPDATE SET
+     ... -- only if the record is newer or strictly better (per rule above)
Also applies to: 419-426

438-457: Don’t blindly “trust remote” on hash mismatch; verify canonicality + finality.

Before replacing local state, independently verify remote.block_hash and ensure the dump is at least finality_depth behind head.
- **Resolution**: Trust the remote dump.
+ **Resolution**: Verify the remote dump then merge:
+ - Fetch the block hash for (chainId, remote.last_synced_block) from at least one independent RPC and match remote.block_hash.
+ - Require remote.last_synced_block ≤ (current_block − finality_depth).
+ - If verification fails, skip merge and continue RPC sync.
478-481: Scope side‑effect tables in dumps to relevant networks.

Including all networks’ side‑effects bloats artifacts. Restrict to the dump’s networks or make inclusion configurable.
-   c. Include side effect tables (full tables, not filtered):
-      - token_info (all networks)
-      - interpreter_bytecode (all networks)
+   c. Include side-effect tables for the relevant network(s) only (or make configurable):
+      - token_info (only networks present in this dump)
+      - interpreter_bytecode (only networks present in this dump)
- The dump includes:
- - Complete side effect tables (token_info, interpreter_bytecode)
+ The dump includes:
+ - Side-effect tables limited to the orderbook’s network(s) (token_info, interpreter_bytecode)
Also applies to: 641-646

549-563: Generalize “safe querying” for joins/derived tables and removed rows.

When joining multiple tables, each must be bounded by last_synced_block for the same (network, orderbook). Also account for removed_block.
-- Example for live orders:
SELECT * FROM orders o
WHERE o.network = ?
  AND o.orderbook_address = ?
  AND o.added_block <= (SELECT last_synced_block FROM sync_state WHERE network=? AND orderbook_address=?)
  AND (o.removed_block IS NULL OR o.removed_block >
       (SELECT last_synced_block FROM sync_state WHERE network=? AND orderbook_address=?));
Add a note: “For multi‑table queries, apply the same last_synced_block predicate to every participating table (joins/CTEs/views). Consider providing SQL views/helpers that inject these predicates.”

357-370: Strengthen dump metadata and integrity.

Add chainId and genesisBlockHash, sha256 of the gz, content schema hash, and optional signature; require clients to verify before merge.
 {
-  schema_version: 1,
+  schema_version: 1,
   orderbooks: [
     {
-      network: "mainnet",
+      network: "mainnet",
+      chain_id: 1,
+      genesis_block_hash: "0x...",
       address: "0x...",
       last_block: 18500000,
       block_hash: "0xabcd...",
-      timestamp: 1704067200
+      timestamp: 1704067200
     }
   ]
 }
 {
   "schema_version": 1,
   "dumps": [
     {
       "network": "mainnet",
+      "chain_id": 1,
+      "genesis_block_hash": "0x...",
       "orderbook": "0x1234...",
       "last_block": 18500000,
       "block_hash": "0xabcd...",
       "timestamp": 1704067200,
-      "size_bytes": 52428800,
+      "size_bytes": 52428800,
+      "sha256": "base16-hex...",
+      "content_schema_hash": "base16-hex...",
+      "signature": "base64...",  // optional
       "url": "/dump/mainnet/0x1234..."
     }
   ]
 }
Also applies to: 615-631

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df466e6 and 8a7b437.

📒 Files selected for processing (1)

i9r/raindex-local-db (1 hunks)

coderabbitai · 2025-10-24T13:32:26Z

i9r/raindex-local-db

+## Database Schema
+
+The database consists of the following key table categories:
+
+1. **Sync State Tables**: Track synchronization progress per orderbook
+2. **Event Tables**: Raw logs and processed events (orders, trades, deposits, withdrawals)
+3. **State Tables**: Current state (vaults, vault balances)
+4. **Side Effect Tables**: Token info, interpreter bytecode, store Set events
+5. **Tracking Tables**: e.g. Which stores are being monitored per orderbook
+


⚠️ Potential issue | 🟠 Major

Add a canonical blocks table to persist headers for reorg checks.

You fetch block data but don’t persist it. Store (block_number, hash, parent_hash, timestamp, safe/finalized flags) to support reorg depth scans and continuity checks.

1. **Sync State Tables** ... 2. **Event Tables** ... 3. **State Tables** ... -4. **Side Effect Tables** ... -5. **Tracking Tables** ... +4. **Blocks Table**: canonical headers per (network, block_number) with hash, parent_hash, timestamp, safe/finalized. +5. **Side Effect Tables** ... +6. **Tracking Tables** ...

- 3. Get current_block from RPC + 3. Get current_block from RPC + - Persist headers for encountered blocks into `blocks` and use them for reorg detection.

Also applies to: 188-191

🤖 Prompt for AI Agents

In i9r/raindex-local-db around lines 72 to 81 (and also apply the same change for lines 188 to 191), the schema lacks a canonical blocks table to persist fetched block headers; add a new table definition that stores block_number (bigint, primary key), hash (text), parent_hash (text), timestamp (bigint or timestamptz), and flags for safe/finalized (booleans) and include indices on block_number and hash; update any DB migration/schema file and README comment to describe its purpose (reorg depth scans and continuity checks) and ensure code paths that fetch blocks persist rows (on insert or upsert) so headers are available for reorg checks.

coderabbitai · 2025-10-24T13:32:26Z

i9r/raindex-local-db

+     j. Perform side effects:
+        - Token info: fetch name/symbol/decimals for new tokens (non-blocking)
+        - Interpreter bytecode: fetch bytecode for new interpreters (blocking)
+


⚠️ Potential issue | 🟠 Major

Don’t perform network side‑effects inside the DB transaction; use an outbox.

eth_call/eth_getCode during the transaction hold SQLite write locks, risking timeouts and torn state under retries. Stage side‑effect jobs in an outbox, commit the batch, then process jobs asynchronously/idempotently.

- j. Perform side effects: - - Token info ... (non-blocking) - - Interpreter bytecode ... (blocking) + j. Enqueue side-effect jobs into `side_effect_jobs` (token_info fetches, interpreter bytecode). + - No external RPC calls inside the DB transaction. ... - 2. Transaction Boundaries: ... All operations (raw logs, Set events, event processing, side effects, sync_state update) happen within a single transaction. + 2. Transaction Boundaries: ... All DB writes for the batch happen within one transaction; side-effect jobs are queued and processed after COMMIT.

Outbox processing (new subsection):

+ ### Side-Effect Outbox + - Table: `side_effect_jobs(job_id, kind, network, address, created_at, attempts, last_attempt_at, status, payload)` + - Worker: reads pending jobs, performs eth_call/eth_getCode, UPSERTs into token_info/interpreter_bytecode, marks success/failure. + - Idempotent: multiple executions safe; updates only when remote/local record improves.

Also applies to: 239-241, 329-350, 515-529

Add Raindex Event Sourcing Sync Specification V1

b99191d

Added Raindex Event Sourcing Sync Specification V1 documentation detailing the architecture, design goals, synchronization flow, error handling, and remote server API for a local-first event-sourced system.

coderabbitai bot reviewed Oct 15, 2025

View reviewed changes

Refactor sync configuration and remote checks

df466e6

Removed unnecessary RPC sync configuration and clarified remote checking process.

coderabbitai bot reviewed Oct 24, 2025

View reviewed changes

update

8a7b437

coderabbitai bot reviewed Oct 24, 2025

View reviewed changes

Add Raindex Event Sourcing Sync Specification V1 #41

Are you sure you want to change the base?

Add Raindex Event Sourcing Sync Specification V1 #41

Uh oh!

Conversation

hardyjosh commented Oct 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Solution

Checks

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hardyjosh commented Oct 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 15, 2025 •

edited

Loading