Skip to content

Switch percentiles implementation to MergingDigest #18124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 28, 2025

Conversation

peteralfonsi
Copy link
Contributor

@peteralfonsi peteralfonsi commented Apr 28, 2025

Description

Switches the percentiles agg implementation from AVLTreeDigest to MergingDigest. MergingDigest is now the recommended implementation from the t-digest library, and we see significant speedups by switching to it. (See more details in the original issue). Some latencies from percentiles aggregations on http_logs:

Field Baseline latency (ms) Modififed latency (ms)
timestamp 13,085 6,293
status 196,794 6,212

Since the implementations extend the same abstract class AbstractTDigest, this is a drag-and-drop change. Our extending class TDigestState just adds some StreamInput/StreamOutput methods, which use the result of the centroids() and centroidCount() methods which are overridden from the abstract class. I don't think we need additional tests. The existing tests in TDigestPercentilesAggregatorTests.java still pass after the change. However I will still mark this PR as a draft, please let me know if I'm missing something here...

I also don't think we need any tuning of the sole parameter compression, since in the library's method createDigest that creates a digest of the default implementation, it doesn't modify the input compression based on which implementation it's actually using.

Related Issues

Resolves #18122

Check List

  • Functionality includes testing.
  • [N/A] API changes companion pull request created, if applicable.
  • [N/A] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 852c178: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@peteralfonsi
Copy link
Contributor Author

On further thought I will add some tests around serialization + serialization inter-version compatibility. Both classes provide their own serialization methods. Probably the most logical thing to do would be to add a version check, and for < 3.1 use the preexisting logic assuming TDigestState extends AVLTreeDigest, and for >= 3.1 just rely on the library's serialization code.

Copy link
Contributor

❌ Gradle check result for 4599cff: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented May 5, 2025

❌ Gradle check result for 9c83d7d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Peter Alfonsi <[email protected]>
@peteralfonsi peteralfonsi force-pushed the merging-percentiles branch from 9c83d7d to 0e93844 Compare May 6, 2025 00:26
Copy link
Contributor

github-actions bot commented May 6, 2025

❌ Gradle check result for 0e93844: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@peteralfonsi
Copy link
Contributor Author

Flaky test: #14293

@peteralfonsi peteralfonsi force-pushed the merging-percentiles branch from ff7c7da to 955f175 Compare May 6, 2025 15:50
Copy link
Contributor

github-actions bot commented May 6, 2025

✅ Gradle check result for 955f175: SUCCESS

Copy link

codecov bot commented May 6, 2025

Codecov Report

Attention: Patch coverage is 47.36842% with 20 lines in your changes missing coverage. Please review.

Project coverage is 72.56%. Comparing base (3b018ad) to head (e4caace).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...arch/search/aggregations/metrics/TDigestState.java 41.17% 17 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18124      +/-   ##
============================================
+ Coverage     72.52%   72.56%   +0.04%     
- Complexity    67509    67576      +67     
============================================
  Files          5496     5496              
  Lines        311467   311546      +79     
  Branches      45253    45231      -22     
============================================
+ Hits         225891   226076     +185     
+ Misses        67180    67091      -89     
+ Partials      18396    18379      -17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@peteralfonsi peteralfonsi marked this pull request as ready for review May 6, 2025 17:19
@peteralfonsi peteralfonsi requested a review from a team as a code owner May 6, 2025 17:19
@peteralfonsi
Copy link
Contributor Author

Hm, codecov thinks the pre-3.1.0 parts of the serialization functions are not covered, but they're actually run by the BWC tests. (I know this is true because there were failures that I fixed by modifying this code). Probably this is because the test cases are coming in from .yml files rather than being run like a normal UT/IT.

@msfroh I think this is ready for review now - mind taking a look when you get a chance?

Signed-off-by: Peter Alfonsi <[email protected]>
Copy link
Contributor

✅ Gradle check result for 3dc0454: SUCCESS

Copy link
Contributor

❕ Gradle check result for f19c941: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@peteralfonsi
Copy link
Contributor Author

Hey @msfroh , any further comments or do you think we're good to merge this one?

Copy link
Contributor

❕ Gradle check result for 3cea57c: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@msfroh
Copy link
Contributor

msfroh commented May 27, 2025

Hey @msfroh , any further comments or do you think we're good to merge this one?

This should be good. Someday, I'd like to revisit the histogram implementation(s) that we use, but this is already a great improvement.

@yupeng9, this might be an interesting thing to keep in mind when we work on metrics.

Copy link
Contributor

✅ Gradle check result for e4caace: SUCCESS

@msfroh msfroh merged commit 22a6194 into opensearch-project:main May 28, 2025
28 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Speed up percentile aggregation by switching implementation
2 participants