-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increasing Floor Segment Size to 16MB #17699
Conversation
Signed-off-by: Prudhvi Godithi <[email protected]>
Adding @msfroh to please take a look and provide your inputs. |
server/src/main/java/org/opensearch/index/TieredMergePolicyProvider.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/LogByteSizeMergePolicyProvider.java
Show resolved
Hide resolved
Signed-off-by: Prudhvi Godithi <[email protected]>
Added a small comment, can you please check again @msfroh ? |
Signed-off-by: Prudhvi Godithi <[email protected]>
I understand the rationale of going with Lucene defaults as they are applicable to a lot of workloads. But, should we not conduct some perf tests that can mimic heavy deletes/updates which would depend on the change of such defaults ? AFAIK OSB doesn't support such workloads out of the box so yeah it's tedious. Also from a user perspective, I believe they would already have index templates in place based on their own benchmarks to tweak these advanced settings if required. |
I ran some perf testing from the k-NN side since we thought this change would boost search performance. On a workload of 10M 768-D vectors we saw 5-7% improvements in query throughput and common-case latency, larger 20+% improvements in tail latency, ~27% increase in median merge time, and ~4% decrease in indexing time. @expani these perf tests didn't involve any deletes/updates but the change lead to good improvements for the k-NN use case. I agree that a delete/update test would be good to understand the impact of more merges resulting from this change. |
I agree, we should provide some control to users to go back to 2MB setting if they see regression for their workload. We can only test some general workloads and wont be able to cover all the different types of workloads users might leverage Opensearch for. We are discussing something similar at #17051 (comment) |
Thanks @finnroblin for running some early benchmarks on this change.
So merges are taking longer with increasing the floor segment size ? |
There are more merges, but they're not taking longer. There were 27% fewer segments after the change thus 27% more merges. Surprisingly, indexing time was 4% less after the change. I'm not sure why; I would expect it to be higher since there's more merging occurring. |
Thanks everyone for the feedback.
@linuxpi and @expani the users can always update this behavior, I can see both
|
@prudhvigodithi - Should we run a big5 workload also for comparison? In the previous discussions, folks wanted to know if there were any perf implications. |
{"run-benchmark-test": "id_4"} |
The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/2735/ . Final results will be published once the job is completed. |
[Minor code-cleanup comment] Since We can double-check other variables as well if that applies - else can take up as a follow-up as well. Unless the reasons are different why we have these 2 constants separate? |
Thanks @sandeshkr419, yes we can do that as a follow up PR. |
@prudhvigodithi Submit an indexing only run as well, may be use nyc_taxis to confirm any performance impact on indexing as well. |
They aren't exactly the same thing. They have similar behavior in terms of defining the threshold for a "small" segment. The specifics are different between
If my understanding is correct, The fact that they're both set to 16MB in Lucene |
Benchmark ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/2735/
|
Benchmark Baseline Comparison ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/49/
|
@finnroblin - I don't think the increase in median merge time, should be read as 27% more merges. Since, we have increase d the floor segment size from 2MB to 16MB, the amount of data being processed per merge operation is more than what it was earlier. Hence, it is expected that the median/mean merge time is more than what is was before. Do you also have any numbers on the total merge time? I suspect that should be lesser than before as we are not spending any resources/time to promote the segments from bottom most tier to next tier, as we already write the bigger 16MB segment instead of bunch of smaller 2MB segments.
While I agree with @expani's comment in general that we should have our perf benchmarks to ensure setting change does not have any adverse effect. But in this case specifically, as pointed by @prudhvigodithi #17699 (comment), we do have dynamic setting for this already. That combined with the good judgement from Lucene community, does give me confidence to go ahead with this change. |
❌ Gradle check result for 6a73b0e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 6a73b0e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Hi @ankitkala, this change was merged which is great, just wanted to quickly address your comments--
Around 31% increase in median cumulative merge time (I think the median is taken across the total merge times of the primary shards). I posted some benchmark runs here. Here's the full spreadsheet of the benchmark runs: I think the phrasing of the setting is a little confusing 🙃 -- the way I understand it is that segments sized below the floorsegment are merged more aggressively compared to how they would be merged if their true size were transparent to the policy. The docs phrase it as "for background merges, smaller segments are "rounded up" to this size." Actually looking at the source code, we see that this rounding affects how segments are scored -- the maximum of the segment's bytes and the floor segment size is taken for scoring. My understanding is that if the score is high enough, a merge occurs (but I also don't have a deep knowledge of lucene). |
Description
This change helps aggressively merging smaller segments and more efficient use of index structures with reduced overhead from fewer segments.
Related Issues
Part of #16935
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.