feat: spillable DuckDB buffer for flush thread#27
Merged
Conversation
Replace the in-memory Vec<Change> accumulator in the flush thread with a DuckDB buffer table that can spill to disk, decoupling queue consumption from flushing and enabling larger batch windows without unbounded Rust memory growth. Before: SharedQueue → drain → Vec<Change> (Rust heap) → flush → DuckLake After: SharedQueue → drain → DuckDB buffer table (spillable) → flush → DuckLake Split FlushWorker.flush(TableQueue) into buffer lifecycle methods: - ensure_buffer(): lazy-creates buffer table, caches column metadata - append_to_buffer(): loads changes via DuckDB Appender with seq tracking - flush_buffer(): compacts (dedup by PK), applies DELETE+INSERT to DuckLake - clear_buffer(): drops buffer without flushing (shutdown/error) Also includes quality improvements from code review: - Track has_non_inserts in Rust (avoids DuckDB table scan at flush time) - Cache target_table, pk_cols, all_cols on FlushWorker (avoid rebuilding) - Extract report_flush_error() helper (eliminates 4x copy-paste) - Replace QueueMeta clone with borrow (eliminates hot-path allocation) - Add parse_target_key() helper using split_once (idiomatic, no Vec alloc) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mark "Unbounded DuckDB memory" as done (GUC-based two-phase limits) and update per-group config TODO with actual GUC names as migration candidates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Drop FlushWorker after every successful flush so duckdb_close frees all buffer manager pages and temp files back to the OS. The worker is lazily recreated on the next buffer cycle. Two-phase memory limits preserved: buffer phase uses low limit (duckdb_buffer_memory_mb, default 16 MB) for 100+ concurrent tables, flush phase raises to high limit (duckdb_flush_memory_mb, default 512 MB) for compaction. Both exposed as GUCs and daemon CLI args. Also changes flush_interval default from 1s to 5s with unlimited upper bound. Regression tests override to 100ms for fast feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cd76e72 to
cfeb29f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Vec<Change>accumulator in the flush thread with a DuckDB buffer table that can spill to disk, decoupling queue consumption from flushing and enabling larger batch windows without unbounded Rust memory growthFlushWorker.flush(TableQueue)into buffer lifecycle methods:ensure_buffer(),append_to_buffer(),flush_buffer(),clear_buffer()target_table,pk_cols,all_cols) and trackhas_non_insertsin Rust to avoid redundant DuckDB queries and per-flush string formattingreport_flush_error()helper to eliminate 4x copy-pasted error handling blocksQueueMetaclone with a borrow (eliminatesVec<String>deep copy per drain iteration)duckdb_buffer_memory_mb(default 16 MB) caps buffer accumulation for 100+ concurrent tables,duckdb_flush_memory_mb(default 512 MB) allows higher memory during compaction/flush (only ~4 concurrent due to DuckLake commit lock). Both exposed as PG GUCs and daemon CLI argsduckdb_closereleases all buffer manager pages and temp files back to the OS, preventing RSS from staying at flush-phase high-water markflush_intervaldefault from 1s to 5s with unlimited upper bound; regression tests override to 100msBefore:
SharedQueue → drain → Vec<Change> (Rust heap) → flush → DuckLakeAfter:
SharedQueue → drain → DuckDB buffer table (spillable) → flush → DuckLake → drop connectionTest plan
cargo fmt— no formatting issuescargo check— compiles with no warnings (only pre-existing pgrx cfg warning)make installcheck— all 30 regression tests pass🤖 Generated with Claude Code