Skip to content

feat: spillable DuckDB buffer for flush thread#27

Merged
YuweiXiao merged 4 commits intomainfrom
worktree-spillable-flush
Mar 16, 2026
Merged

feat: spillable DuckDB buffer for flush thread#27
YuweiXiao merged 4 commits intomainfrom
worktree-spillable-flush

Conversation

@YuweiXiao
Copy link
Collaborator

@YuweiXiao YuweiXiao commented Mar 14, 2026

Summary

  • Replace the in-memory Vec<Change> accumulator in the flush thread with a DuckDB buffer table that can spill to disk, decoupling queue consumption from flushing and enabling larger batch windows without unbounded Rust memory growth
  • Split FlushWorker.flush(TableQueue) into buffer lifecycle methods: ensure_buffer(), append_to_buffer(), flush_buffer(), clear_buffer()
  • Cache column metadata (target_table, pk_cols, all_cols) and track has_non_inserts in Rust to avoid redundant DuckDB queries and per-flush string formatting
  • Extract report_flush_error() helper to eliminate 4x copy-pasted error handling blocks
  • Replace hot-path QueueMeta clone with a borrow (eliminates Vec<String> deep copy per drain iteration)
  • Two-phase DuckDB memory limits: duckdb_buffer_memory_mb (default 16 MB) caps buffer accumulation for 100+ concurrent tables, duckdb_flush_memory_mb (default 512 MB) allows higher memory during compaction/flush (only ~4 concurrent due to DuckLake commit lock). Both exposed as PG GUCs and daemon CLI args
  • Recreate DuckDB connection after each flush cycle — duckdb_close releases all buffer manager pages and temp files back to the OS, preventing RSS from staying at flush-phase high-water mark
  • Change flush_interval default from 1s to 5s with unlimited upper bound; regression tests override to 100ms

Before: SharedQueue → drain → Vec<Change> (Rust heap) → flush → DuckLake
After: SharedQueue → drain → DuckDB buffer table (spillable) → flush → DuckLake → drop connection

Test plan

  • cargo fmt — no formatting issues
  • cargo check — compiles with no warnings (only pre-existing pgrx cfg warning)
  • make installcheck — all 30 regression tests pass

🤖 Generated with Claude Code

YuweiXiao and others added 4 commits March 16, 2026 14:39
Replace the in-memory Vec<Change> accumulator in the flush thread with
a DuckDB buffer table that can spill to disk, decoupling queue
consumption from flushing and enabling larger batch windows without
unbounded Rust memory growth.

Before: SharedQueue → drain → Vec<Change> (Rust heap) → flush → DuckLake
After:  SharedQueue → drain → DuckDB buffer table (spillable) → flush → DuckLake

Split FlushWorker.flush(TableQueue) into buffer lifecycle methods:
- ensure_buffer(): lazy-creates buffer table, caches column metadata
- append_to_buffer(): loads changes via DuckDB Appender with seq tracking
- flush_buffer(): compacts (dedup by PK), applies DELETE+INSERT to DuckLake
- clear_buffer(): drops buffer without flushing (shutdown/error)

Also includes quality improvements from code review:
- Track has_non_inserts in Rust (avoids DuckDB table scan at flush time)
- Cache target_table, pk_cols, all_cols on FlushWorker (avoid rebuilding)
- Extract report_flush_error() helper (eliminates 4x copy-paste)
- Replace QueueMeta clone with borrow (eliminates hot-path allocation)
- Add parse_target_key() helper using split_once (idiomatic, no Vec alloc)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mark "Unbounded DuckDB memory" as done (GUC-based two-phase limits)
and update per-group config TODO with actual GUC names as migration
candidates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Drop FlushWorker after every successful flush so duckdb_close frees
all buffer manager pages and temp files back to the OS. The worker is
lazily recreated on the next buffer cycle.

Two-phase memory limits preserved: buffer phase uses low limit
(duckdb_buffer_memory_mb, default 16 MB) for 100+ concurrent tables,
flush phase raises to high limit (duckdb_flush_memory_mb, default
512 MB) for compaction. Both exposed as GUCs and daemon CLI args.

Also changes flush_interval default from 1s to 5s with unlimited
upper bound. Regression tests override to 100ms for fast feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@YuweiXiao YuweiXiao force-pushed the worktree-spillable-flush branch from cd76e72 to cfeb29f Compare March 16, 2026 06:39
@YuweiXiao YuweiXiao merged commit bea23d1 into main Mar 16, 2026
3 checks passed
@YuweiXiao YuweiXiao deleted the worktree-spillable-flush branch March 16, 2026 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant