Fix StringView buffer bloat in stream_next FFI export#21753
Conversation
PR Reviewer Guide 🔍(Review updated until commit fbd7a83)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to fbd7a83 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 681d734
Suggestions up to commit 68f0a34
Suggestions up to commit c3cae77
Suggestions up to commit 2be0546
Suggestions up to commit 41069b3
|
f679645 to
8ba38ce
Compare
|
Persistent review updated to latest commit 8ba38ce |
|
Persistent review updated to latest commit f877a9a |
f877a9a to
09147cb
Compare
|
Persistent review updated to latest commit 09147cb |
09147cb to
8ba38ce
Compare
|
Persistent review updated to latest commit 8ba38ce |
|
Persistent review updated to latest commit 373b21a |
|
Persistent review updated to latest commit 304654d |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #21753 +/- ##
============================================
+ Coverage 73.43% 73.48% +0.04%
- Complexity 75103 75131 +28
============================================
Files 6016 6016
Lines 341072 341072
Branches 49091 49091
============================================
+ Hits 250469 250621 +152
+ Misses 70682 70452 -230
- Partials 19921 19999 +78 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
304654d to
ef42d8c
Compare
d2f9458 to
ed1305d
Compare
|
Persistent review updated to latest commit ed1305d |
|
❌ Gradle check result for ed1305d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Persistent review updated to latest commit 46abccb |
|
Similar fix in DF - apache/datafusion#20381 |
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 3678ddb.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
3678ddb to
a4ec8f2
Compare
|
Persistent review updated to latest commit a4ec8f2 |
a4ec8f2 to
e8b820d
Compare
|
Persistent review updated to latest commit e8b820d |
e8b820d to
41069b3
Compare
|
Persistent review updated to latest commit 41069b3 |
expani
left a comment
There was a problem hiding this comment.
Great finding and deep dive @bowenlan-amzn 👏
41069b3 to
2be0546
Compare
|
Persistent review updated to latest commit 2be0546 |
2be0546 to
c3cae77
Compare
|
Persistent review updated to latest commit c3cae77 |
|
❌ Gradle check result for c3cae77: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Persistent review updated to latest commit 68f0a34 |
|
❌ Gradle check result for 68f0a34: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
StringViewArray::slice() shares ALL backing buffers via Arc::clone. When DataFusion's hash aggregate emits output (EmitTo::All + slice into 8192-row batches), each slice carries the full backing buffer pool. For a 14M-group aggregate with SearchPhrase strings, this means 174MB per batch instead of 0.4MB — a 435x amplification. Add compact_string_view_columns() in stream_next() that calls gc() on Utf8View/BinaryView columns before C Data Interface export. This compacts each batch to contain only its own referenced strings. Validated on 4-shard ClickBench q19: Before: 9748 batches × 174MB = 1.7TB, 12 minutes After: 9748 batches × 0.4MB = 4GB, 11 seconds (65x faster) Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Regression guard for e2fd9bc (StringView buffer bloat fix). Tests prove that sliced StringView/BinaryView batches carry inflated backing buffers and that gc() compacts them to proportional size. Covers: large buffer compaction, inline-only no-op, empty array safety, BinaryView parity, and non-view passthrough. Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
compact_string_view_columns now checks whether backing buffers are over-allocated before calling gc(). Non-sliced batches (common case) pay only an O(n) view scan instead of a full buffer copy. New tests: - view_needs_gc_detects_bloat: proves detection correctly identifies sliced arrays vs non-sliced arrays - non_sliced_batch_skips_gc: proves non-sliced batches pass through without allocation/copy Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
- New integration test (stringview_gc_test.rs): feeds a sliced 10K→100 StringView batch through df_stream_next and asserts the output backing buffers are compact (<10KB, not ~300KB). Fails immediately if compact_string_view_columns is removed from stream_next — the actual regression guard for this fix. - Fix non_sliced_batch_skips_gc: use Arc::ptr_eq to prove the fast path returns the original column without copying, not just that sizes match. Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
68f0a34 to
681d734
Compare
|
Persistent review updated to latest commit 681d734 |
|
❌ Gradle check result for 681d734: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
The makeNodeStatsWithResourceUsage helper was missing the AnalyticsBackendNativeMemoryStats parameter added in opensearch-project#21637. Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
|
Persistent review updated to latest commit fbd7a83 |
Description
StringViewArray::slice()shares ALL backing buffers via Arc. When DataFusion's hash aggregate emits viaEmitTo::Alland slices the result into per-batch chunks, each batch carries the full backing buffer pool — not just the strings it references. This caused up to 435x data amplification on multi-shard aggregate queries with many distinct string values.Root cause: Arrow IPC and C Data Interface intentionally do not compact StringView backing buffers during serialization (arrow-rs #5513). Compaction is the caller's responsibility at the serialization boundary.
Fix: Add
compact_string_view_columns()instream_next()before FFI export. Callsgc()on StringView/BinaryView columns only when backing buffers are significantly over-allocated. Three-level precondition avoids unnecessary work:Performance
Validated on 4-shard ClickBench q19 (14M groups, SearchPhrase):
Testing
The integration test feeds a sliced 10K→100 StringView batch through
df_stream_nextand asserts output buffers are compact (<10KB). Fails immediately if the gc() call is removed (gets 310KB instead).Check List