graph, store: reject grafts below the base subgraph's earliest_block#6617
Open
rslowinski wants to merge 3 commits into
Open
graph, store: reject grafts below the base subgraph's earliest_block#6617rslowinski wants to merge 3 commits into
rslowinski wants to merge 3 commits into
Conversation
…lock Reject grafts whose block is below the base subgraph's earliest_block_number. When the base has been pruned past the requested graft block, the versions of mutable entities that were live at the graft block have been removed (closed versions with upper(block_range) <= earliest_block are eligible for pruning). The copy then reads versions where lower(block_range) <= graft_block and finds none for those entities, leaving them absent in the destination; the post-copy re-open (RevertClampQuery) has no version to re-open. Entities whose live-at-graft version is still open (never updated again, hence never pruned) and all immutable entities are unaffected — only heavily-updated mutable singletons are silently reset to their default state. Add SubgraphStore::earliest_block_number and use it in Graft::validate to reject the manifest before deployment registration. Make Graft::validate `pub` so tooling and tests can validate a graft directly without resolving a full UnvalidatedSubgraphManifest.
Add a pre-copy check rejecting graft.block < src.earliest_block_number, mirroring the manifest-level check in Graft::validate. Callers that reach DeploymentStore::start_subgraph directly (graphman, custom tooling, test-store helpers) bypass the registrar and therefore the manifest validator; the store-level check guarantees the invariant is enforced for every graft. Also add a post-copy check that fails the transaction if the source's earliest_block_number advanced past the graft block while the copy was running — the prune-during-copy race that copy_earliest_block already acknowledges. Partial copied data remains in the destination's tables and must be cleaned up by deleting and recreating the deployment; document this in the error message.
…raft The graft finalization previously overwrote the destination's history_blocks with the source's value unconditionally. As a result, a destination subgraph manifesting `prune: never` (history_blocks = BLOCK_NUMBER_MAX) grafted from a `prune: auto` source silently inherited the source's much smaller retention window, contradicting the destination's manifest. Use max(src, dst) instead. The destination's retention is at least as long as the source's (otherwise the inherited copied data would be immediately eligible for pruning) while a destination that requests longer retention is no longer silently downgraded. Also expose `SubgraphStore::history_blocks` mirroring `set_history_blocks`, so the test can verify the post-graft retention value.
7ee5da8 to
e9f7dc3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The bug
Graft::validatepermits grafting at a block below the base subgraph'searliest_block_number. When that happens, the copy silently produces a destination with reset state for any entity whose live-at-graft version had been pruned from the base. This is most visible on heavily-updated mutable singletons: every sub-graft-block version of such an entity is a closed historical version, every closed historical version below the prune floor is eligible for pruning, so the live-at-graft version is gone, the copy (lower(block_range) <= graft_block) finds nothing for that entity, and the handler subsequently re-initialises it from defaults. Immutable entities and rarely-updated mutable entities (whose live-at-graft version is still open) are unaffected, so the corruption is selective and easy to miss in spot checks.A structural fingerprint of an affected deployment: its
earliest_block_numberis strictly greater than its head/graft block. That state cannot arise from a correctly-performed graft; it's the inversion produced bycopy_earliest_blockpropagating the parent's prune floor into a destination seeded at a block below it.Reproduction (new tests)
graft_validate_rejects_below_prune_floorandgraft_store_path_rejects_below_prune_floorcapture the minimal repro against a fresh Postgres in ~1–2s:graft.rs).user 2at block 3, closing its[1, 3)version (which was live at block 2).history_blocks=2→earliest_block=4. This removes the closeduser 2 [1, 3)anduser 3 [1, 2).user 2: its only surviving version[3, )is excluded by the copy filterlower(block_range) <= 2. With this fix, the graft is rejected with an actionable error naming the prune floor.Fix (3 commits)
graph, store: validate graft block against base subgraph's earliest_block— addsSubgraphStore::earliest_block_number, uses it inGraft::validateto reject sub-floor grafts at manifest validation time. MakesGraft::validatepubso tooling and tests can validate a graft directly without resolving a fullUnvalidatedSubgraphManifest.store: enforce graft prune-floor invariant in start_subgraph— defence in depth: a pre-copy check inDeploymentStore::start_subgraphcatches callers that reach the store path without going throughGraft::validate(graphman, custom tooling,test_store::create_subgraph). A post-copy check detects the prune-during-copy race thatcopy_earliest_blockalready warns about.store: take the max of source and destination history_blocks during graft— secondary issue found in the same code path. Graft finalization currently overwrites the destination'shistory_blockswith the source's value unconditionally, silently downgradingprune: neverchildren to whatever retention the parent used. Usemax(src, dst)instead.Related upstream work
#6384 added the analogous check on the
graphman copyadmin path. This PR extends the same invariant to the deployment graft path (Graft::validate+start_subgraph). The two PRs are complementary — neither makes the other redundant.Tests
Three new regression tests in
store/test-store/tests/postgres/graft.rs:graft_validate_rejects_below_prune_floor— manifest-level check.graft_store_path_rejects_below_prune_floor— store-level check (also confirms graft at the floor is still accepted).graft_history_blocks_takes_max_of_parent_and_child— secondary fix.Existing
graft,prune,copy,on_synctests are unchanged and green.cargo fmt --all -- --check,cargo check --workspace --tests, andcargo check --release --workspaceall pass on rustc 1.91.Notes for the reviewer
Graft::validatebecomespub. The struct and the validation error type were already public; making the function callable directly from tooling/tests removes the need to resolve a full manifest in those callers. Happy to use a different approach (e.g. a thinpubwrapper, or a test-only re-export) if you'd prefer the function itself to stay crate-private.The
history_blockschange tomax()is a deliberate behaviour change. The previous overwrite was internally consistent if you assumed destinations should match their source's retention, but it silently overrides an explicitprune: neveron the destination.max()honours both constraints simultaneously. Easy to split into its own PR if you'd rather discuss it independently from the prune-floor fix.Does this recover existing subgraphs already corrupted by this bug? No — re-indexing is the remediation. The fix prevents new occurrences only.