-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Balanced Docs Slice Allocator to increase document balance for Concurrent Segment Search #17687
base: main
Are you sure you want to change the base?
Add Balanced Docs Slice Allocator to increase document balance for Concurrent Segment Search #17687
Conversation
Signed-off-by: Kunal Kotwani <[email protected]>
…test performance Signed-off-by: Finn Roblin <[email protected]>
…ghtly to account for non-ordered heap container vs previous ordered arraylist Signed-off-by: Finn Roblin <[email protected]>
Signed-off-by: Finn Roblin <[email protected]>
Signed-off-by: Finn Roblin <[email protected]>
Signed-off-by: Finn Roblin <[email protected]>
Signed-off-by: Finn Roblin <[email protected]>
Signed-off-by: Finn Roblin <[email protected]>
Signed-off-by: Finn Roblin <[email protected]>
❌ Gradle check result for 99803c0: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
public static final String CONCURRENT_SEGMENT_SEARCH_USE_EXPERIMENTAL_SLICING_KEY = "search.concurrent.experimental_slicing.enable"; | ||
public static final boolean CONCURRENT_SEGMENT_SEARCH_USE_EXPERIMENTAL_SLICING_DEFAULT_VALUE = false; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to gate this behind a setting when this ships? If we can avoid shipping yet another cluster setting, it would be nice.
I suspect that this approach may help (or at least not hurt) the existing concurrent segment search logic. In that case, we should just replace the old slice supplier.
I like having the setting in place during development/testing so that we can compare the old/new implementations easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that one fewer setting is desirable and would prefer not to gate it behind a setting when it ships, but I'm worried about performance on non-k-NN use cases compared to the preexisting MaxTargetSliceSupplier. Right now I'm running big5-1000 on a larger cluster with 8 slices to see how performance changes as we scale up cluster size. I'll add a comment when these runs are finished, and hopefully the balanced slicing is clearly better.
For now, I noticed some performance regressions on the big5 workload for scroll and a few other operations when we enable balanced docs slice supplier. The operations (besides scroll) that had performance regressions returned 0 hits in their queries (for the documents-100 corpus). This might be due to the added overhead of a priority queue, be a consequence of the different slice assignment policies, or something else. As another caveat the big5 benchmarks were run on a small r5.xl instance with only 2 slices.
We could also, as I believe you suggested before, have branching logic in the shouldUseMaxTargetSupplier
method to decide to use round robin or balanced based on any patterns in the performance benchmarks (for instance, if scrollContext is present). I'm a little worried to take a decision based on just the big5 workload across two clusters.
Signed-off-by: Finn Roblin <[email protected]>
❌ Gradle check result for a07f46d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Finn Roblin <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17687 +/- ##
============================================
- Coverage 72.43% 72.37% -0.07%
+ Complexity 65694 65681 -13
============================================
Files 5311 5312 +1
Lines 304937 304975 +38
Branches 44226 44231 +5
============================================
- Hits 220872 220712 -160
- Misses 65912 66202 +290
+ Partials 18153 18061 -92 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Here are performance results for the big5 documents-1000 workload. This is the biggest workload that I could find to test OpenSearch's functionality. I ran the test with a 3x r6g.4xl cluster. The following settings were used: Here are median throughput, p90 latencies, and p90 throughputs for the operations (big5-documents-1000-comparison.xlsx): <style> </style>
The following operations are worse with balanced docs slice supplier. Looking at the queries, most of the queries with reggressions either contain term, a descending sort, a range on timestamps, or a match/match-all query.
If we don't want to add another setting we could change the slice supplier in the @msfroh , what do you think based on these additional benchmarks? |
Will address code coverage in next revision. |
Thanks @finnroblin, I had a few thoughts on this.
|
Thanks @jed326 for the thoughts!
2/3 Performance regressions Here's a summary table:
2 -- other ways to roll out the change A search request option is an interesting idea but I'm not sure it sidesteps the apprehension with adding additional configuration options. From an implementation POV the option would need to be obtainable from the If we use KNNConcurrentSearchRequestDecider then I think we must still add a hook in on the opensearch side (either cluster/index setting or search request option) to use BD. If it's an index setting we could use the |
Description
Adds a new slicing mechanism, BalancedDocsSliceSupplier, to concurrent segment search. This mechanism assigns leaves to slices greedily based on the number of documents in the leaf and slice. Each leaf is assigned to the slice with the current lowest number of documents. This mechanism achieves a more balanced distribution of work compared to the preexisting MaxTargetSliceSupplier.
MaxTargetSliceSupplier sorts the leaves by document count and then round robin distributes the sorted leaves to slices. This round robin implementation can create uneven document distribution where there are a few very large slices and a few very small slices. The proposed BalancedDocsSliceSupplier creates a more balanced distribution of document counts across search slices.
Benchmarks on a vector search workload show that this method improves search throughput and latency compared to other slicing mechanisms. Benchmarks on the big5 workload show that this method is better than maxTargetSliceSupplier in some cases but worse or comparable in others (scroll queries are worse, mixed results for range aggregations). Please see the benchmarking subsections for experiment details and more commentary.
I propose adding this slicing mechanism as a concurrent segment search setting for the following reasons:
max_slice_count
parameter to use MaxTargetSliceSupplier. So there is precedent for an additional setting in order to adjust CSS behavior compared to the lucene default. We could call out the new slicing option in the documentation similar to how we already call out the choice between Lucene slicing and round robin slicing.getAdditionalIndexSettings()
hook.Vector Search Benchmarks
Vector search benchmarks on a dataset with 10M 768-dimension vectors show search throughput increases of 7-10% and search latency decreases of 12-16%. The comparison was performed on a 3x r6g.4xl cluster with 8 slices. Full benchmark results available here.
Big5 Benchmarks
Big5 benchmarks were performed against a single-node cluster with a r5.xl instance (same as in the GH benchmark configs). Only 2 target slices were used due to the small size of the cluster (4 vCPU). I used the big5-100 corpus. The table is attached at the bottom comparing median throughput, p90 latency, and p90 service time for each operation.
I will run the big5 workload with the
documents-1000
corpus on a large 3x r6g.4xl cluster with 8 target slices to test more performance characteristics. It will take a few hours to get the results so I am opening the PR now with the more limited big5 results.The following operations had worse performance with the balanced docs slice supplier. I dug into the queries with performance regressions. Besides the
scroll
operation, all the worse operations returned 0 hits. So the higher latencies are likely a result of the extra overhead of using a priority queue to assign leaves to slices instead of a sort plus round robin distribution. However, more benchmarking withdocuments-1000
will hopefully better reveal the performance differences.Operations with worse performance:
shouldUseMaxTargetSlice()
to use MaxTargetSlicing whenever there is scroll context present.Brief comparison of all big5 operations
(Full results and comparison available here).
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.