Skip to content

feat(store): Tier-3 content-addressed OHLCV chunks (epic #540 phase 3c)#546

Merged
MDUYN merged 1 commit into
feature/iaf-local-tiered-storefrom
feature/iaf-ohlcv-chunk-store
May 12, 2026
Merged

feat(store): Tier-3 content-addressed OHLCV chunks (epic #540 phase 3c)#546
MDUYN merged 1 commit into
feature/iaf-local-tiered-storefrom
feature/iaf-ohlcv-chunk-store

Conversation

@MDUYN
Copy link
Copy Markdown
Collaborator

@MDUYN MDUYN commented May 11, 2026

Phase 3c — Tier-3 content-addressed OHLCV chunks

Stacked on #545 (Phase 3b). Part of epic #540.

This slice unlocks the headline storage win for users who run many
backtests over a shared universe: identical OHLCV is stored exactly
once across the entire store
, no matter how many bundles reference it.

What's in the box

LocalTieredStore.write() now routes save_bundle's OHLCV writes to
<root>/ohlcv/ whenever the backtest carries price data. The bundle
envelope still uses the existing content-addressed manifest
({key -> "<sha256>.parquet"}), so old bundles remain readable and
new ones simply share the storage location.

store.open() forwards the same shared directory to open_bundle, so
OHLCV lookups resolve regardless of where the bundle was originally
written.

store.delete() deliberately leaves the ohlcv/ directory alone —
chunks are globally shared. Orphans are reclaimed on demand.

New introspection / GC API

Method Purpose
iter_ohlcv_hashes() yields one hash per (handle, key) reference (with duplicates)
ohlcv_referenced_hashes() -> set deduplicated set across all bundles
ohlcv_stored_hashes() -> set what's physically on disk under ohlcv/
ohlcv_stats() -> dict {stored_blobs, stored_bytes, referenced_blobs, orphan_blobs, missing_blobs}
garbage_collect_ohlcv(*, dry_run=False) -> list[str] reclaim orphans; dry_run=True returns the audit list without deleting

Manifests are decoded straight from the bundle envelope via
_decode_payload — no full Backtest instantiation, so introspection
across thousands of bundles stays cheap.

This is exactly the set-difference primitive that
docs/design/ohlcv-dedup-protocol.md
needs for its POST /ohlcv/negotiate step.

Tests

9 new tests in tests/services/backtest_store/test_local_tiered_store_ohlcv.py:

  • No OHLCV → no chunk dir created.
  • Identical OHLCV stored once across distinct handles (the dedup invariant).
  • Different OHLCV → separate chunks.
  • Round-trip via store.open() resolves OHLCV from the shared dir.
  • delete() keeps still-referenced chunks; orphans only appear when the last reference is gone.
  • garbage_collect_ohlcv(dry_run=True) lists without deleting; the real call removes them.
  • iter_ohlcv_hashes() emits per-reference; ohlcv_referenced_hashes() dedups.
  • Hash strings are 64-char lowercase hex (matches the upload-protocol spec).

Targeted suite (tests/services/backtest_store/ + tests/services/backtest_index/ + tests/cli/): 110 / 110 passing.

What's left in Phase 3

Slice Scope
3d iaf migrate-store --from local-dir --to local-tiered; byte-identical Tier-2 → Backtest reassembly (.iafbt becomes export-only); parameterised pytest fixture running every backtest test against both stores

…e (epic #540 phase 3c)

Wires LocalTieredStore into the existing OHLCV side-store machinery
so identical (symbol, timeframe) Parquet bytes are written exactly
once and shared across every bundle that references them.

- write() now routes save_bundle's OHLCV writes to <root>/ohlcv/
  whenever backtest.ohlcv is non-empty. The bundle envelope keeps
  its content-addressed manifest unchanged, so old bundles remain
  readable.
- open() forwards the same shared directory to open_bundle so OHLCV
  lookups resolve regardless of what path the bundle was originally
  written with.
- delete() intentionally does NOT touch ohlcv/. Chunks are globally
  shared; orphans are reclaimed via garbage_collect_ohlcv(dry_run=…).
- Introspection helpers required by the dedup-upload protocol
  (docs/design/ohlcv-dedup-protocol.md):
    * iter_ohlcv_hashes() / ohlcv_referenced_hashes()
    * ohlcv_stored_hashes()
    * ohlcv_stats() -> stored_blobs / stored_bytes / referenced_blobs
                       / orphan_blobs / missing_blobs
    * garbage_collect_ohlcv(dry_run=False)
  Manifests are decoded straight from the bundle envelope
  (_decode_payload) so the cost is one msgpack read per bundle —
  no full Backtest instantiation.

9 new tests:
- No OHLCV -> no chunk dir created.
- Identical OHLCV is stored once across distinct handles (dedup).
- Different OHLCV yields separate chunks.
- Round-trip via store.open() resolves OHLCV from the shared dir.
- delete() keeps still-referenced chunks; orphans only after GC.
- garbage_collect_ohlcv(dry_run=True) lists without deleting; the
  real call removes them.
- iter_ohlcv_hashes() emits per-reference; ohlcv_referenced_hashes()
  dedups.
- Hash strings are 64-char lowercase hex (matches the upload protocol
  spec).

Targeted suite (backtest_store + backtest_index + cli): 110 / 110 passing.
@MDUYN MDUYN merged commit 4b8cf31 into feature/iaf-local-tiered-store May 12, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant