Disable embedded-blob ingestion in crash test (T277310719)#14899
Disable embedded-blob ingestion in crash test (T277310719)#14899anand1976 wants to merge 1 commit into
Conversation
|
@anand1976 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D110100542. |
✅ clang-tidy: No findings on changed linesCompleted in 0.0s. |
Summary:
Disables the experimental embedded-blob SST ingestion path in the crash test (`ingest_external_file_with_embedded_blobs` is forced to 0) to unblock the `fbcode_blackbox_crash_test` CI signal tracked in T277310719. The feature still has unresolved correctness bugs, so this keeps it out of the crash test until they are fixed. Re-enable by reverting the one-line change; the gating logic in `finalize_and_sanitize` is left in place.
## Culprit
The embedded-blob feature was introduced by D108564468 (commit `5f01ffb1d40c`, "[rocksdb][PR] Add experimental embedded blob SST support"), which also enabled `ingest_external_file_with_embedded_blobs` in `db_crashtest.py`. The first crash-test failure on T277310719 appeared the same day, shortly after that commit. D109796428 ("Fix use-after-free in EmbeddedBlobResolvingIterator when key() called before value()") fixed one of the bugs below but the corruption persists, so disabling in the crash test is still required.
## Bugs found in the embedded-blob feature
These all stem from one root cause: `EmbeddedBlobResolvingIterator` resolves a same-file blob payload into a buffer (`resolved_pinned_value_` / `resolved_value_`) that is invalidated on the next iterator reposition, but consumers expect the value to outlive a `Next()`.
1. heap-use-after-free (FIXED by D109796428): calling `key()` before `value()` on a whole-value blob entry made `value()` -> `MaterializeValue()` move-assign `resolved_internal_key_`, freeing the buffer the earlier `key()` Slice still pointed to. Surfaced under ASAN during compaction (`CompactionIterator::NextFromInput` -> `ParseInternalKey`).
2. `IsValuePinned()` assertion (NOT fixed): `EmbeddedBlobResolvingIterator::IsValuePinned()` only reports a resolved value as pinned when a `PinnedIteratorsManager` is active, but `DBIter::FindValueForCurrentKeyUsingSeek` (`db_iter.cc:1464`) asserts `iter_.iter()->IsValuePinned()` without setting one. Reproduces under ASAN via the user-iteration path.
3. `IsValueBaseValid(value_base)` assertion / value corruption (NOT fixed): this is the failure in T277310719. With `use_merge=1` plus embedded blobs, a value persisted to the DB comes back with a garbage value base, tripping the assert in `db_stress` (`expected_value.cc:102`, reached from `VerifyDb` -> `VerifyOrSyncValue` and from `TestGet`). Reproduced reliably on the post-D109796428 tree by forcing `use_merge=1` + embedded-blob ingestion (read-heavy, no iteration): fails on the first crash-test iteration; without merge, 36 iterations passed. Same value-lifetime root cause as (2), surfacing through the merge/compaction write path. Exact line not yet pinned down; needs the feature author.
Differential Revision: D110100542
94d2252 to
7e8736b
Compare
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 7e8736b ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 7e8736b SummaryClean, minimal change that disables an experimental feature ( No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMNone. 🟢 LOW / NITL1. Redundant
|
| Context | Affected? | Notes |
|---|---|---|
| Production code | No | No production files changed |
| Public API | No | No API changes |
| Serialization format | No | No format changes |
| Other test suites | No | Only db_crashtest.py default_params affected |
db_stress binary |
No | The gflag still exists; it just won't be set to 1 by the crash test |
Positive Observations
- The PR description is exceptionally thorough — it documents three specific bugs, identifies the root cause pattern (value-lifetime in
EmbeddedBlobResolvingIterator), references the introducing commit, and explains exactly how to re-enable. - The change is minimal and surgical: one value change + comment, no unnecessary refactoring.
- The
lambda: 0pattern (rather than removing the key entirely) preserves the parameter in the crash test infrastructure, making re-enablement trivial.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
Summary:
Disables the experimental embedded-blob SST ingestion path in the crash test (
ingest_external_file_with_embedded_blobsis forced to 0) to unblock thefbcode_blackbox_crash_testCI signal tracked in T277310719. The feature still has unresolved correctness bugs, so this keeps it out of the crash test until they are fixed. Re-enable by reverting the one-line change; the gating logic infinalize_and_sanitizeis left in place.Culprit
The embedded-blob feature was introduced by D108564468 (commit
5f01ffb1d40c, "[rocksdb][PR] Add experimental embedded blob SST support"), which also enabledingest_external_file_with_embedded_blobsindb_crashtest.py. The first crash-test failure on T277310719 appeared the same day, shortly after that commit. D109796428 ("Fix use-after-free in EmbeddedBlobResolvingIterator when key() called before value()") fixed one of the bugs below but the corruption persists, so disabling in the crash test is still required.Bugs found in the embedded-blob feature
These all stem from one root cause:
EmbeddedBlobResolvingIteratorresolves a same-file blob payload into a buffer (resolved_pinned_value_/resolved_value_) that is invalidated on the next iterator reposition, but consumers expect the value to outlive aNext().heap-use-after-free (FIXED by D109796428): calling
key()beforevalue()on a whole-value blob entry madevalue()->MaterializeValue()move-assignresolved_internal_key_, freeing the buffer the earlierkey()Slice still pointed to. Surfaced under ASAN during compaction (CompactionIterator::NextFromInput->ParseInternalKey).IsValuePinned()assertion (NOT fixed):EmbeddedBlobResolvingIterator::IsValuePinned()only reports a resolved value as pinned when aPinnedIteratorsManageris active, butDBIter::FindValueForCurrentKeyUsingSeek(db_iter.cc:1464) assertsiter_.iter()->IsValuePinned()without setting one. Reproduces under ASAN via the user-iteration path.IsValueBaseValid(value_base)assertion / value corruption (NOT fixed): this is the failure in T277310719. Withuse_merge=1plus embedded blobs, a value persisted to the DB comes back with a garbage value base, tripping the assert indb_stress(expected_value.cc:102, reached fromVerifyDb->VerifyOrSyncValueand fromTestGet). Reproduced reliably on the post-D109796428 tree by forcinguse_merge=1+ embedded-blob ingestion (read-heavy, no iteration): fails on the first crash-test iteration; without merge, 36 iterations passed. Same value-lifetime root cause as (2), surfacing through the merge/compaction write path. Exact line not yet pinned down; needs the feature author.Differential Revision: D110100542