Skip to content

[Feature Request] Speed up percentile aggregation by switching implementation #18122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
peteralfonsi opened this issue Apr 28, 2025 · 0 comments · Fixed by #18124
Closed

[Feature Request] Speed up percentile aggregation by switching implementation #18122

peteralfonsi opened this issue Apr 28, 2025 · 0 comments · Fixed by #18124
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance

Comments

@peteralfonsi
Copy link
Contributor

Is your feature request related to a problem? Please describe

The percentiles aggregation can be very slow. We rely on the t-digest library to get approximate percentiles. While poking around in the code I noticed we use their AVLTreeDigest implementation, but the recommended one is now MergingDigest. It looks like OpenSearch's TDigestState was last meaningfully modified in March 2017, but this new implementation was introduced after that in April 2017, which explains why we aren't already using it.

The comments claim this implementation is both faster and also uses "much less than half" of the memory of AVLTreeDigest. I couldn't find any actual numbers for speed posted online but I did run some benchmarks with OpenSearch that look good.

Describe the solution you'd like

We should switch to the new implementation. Since these extend the same abstract class it would be a drag-and-drop change.

I benchmarked this change on http_logs which has 247M docs. I did it for the "@timestamp" field (high cardinality) and the "status" field (low cardinality since it's an HTTP status code). The speedup was especially large for status:

Field Baseline latency (ms) Modififed latency (ms)
timestamp 13,085 6,293
status 196,794 6,212

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants