You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe
The percentiles aggregation can be very slow. We rely on the t-digest library to get approximate percentiles. While poking around in the code I noticed we use their AVLTreeDigest implementation, but the recommended one is now MergingDigest. It looks like OpenSearch's TDigestState was last meaningfully modified in March 2017, but this new implementation was introduced after that in April 2017, which explains why we aren't already using it.
The comments claim this implementation is both faster and also uses "much less than half" of the memory of AVLTreeDigest. I couldn't find any actual numbers for speed posted online but I did run some benchmarks with OpenSearch that look good.
Describe the solution you'd like
We should switch to the new implementation. Since these extend the same abstract class it would be a drag-and-drop change.
I benchmarked this change on http_logs which has 247M docs. I did it for the "@timestamp" field (high cardinality) and the "status" field (low cardinality since it's an HTTP status code). The speedup was especially large for status:
Field
Baseline latency (ms)
Modififed latency (ms)
timestamp
13,085
6,293
status
196,794
6,212
Related component
Search:Performance
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe
The percentiles aggregation can be very slow. We rely on the t-digest library to get approximate percentiles. While poking around in the code I noticed we use their
AVLTreeDigest
implementation, but the recommended one is nowMergingDigest
. It looks like OpenSearch'sTDigestState
was last meaningfully modified in March 2017, but this new implementation was introduced after that in April 2017, which explains why we aren't already using it.The comments claim this implementation is both faster and also uses "much less than half" of the memory of
AVLTreeDigest
. I couldn't find any actual numbers for speed posted online but I did run some benchmarks with OpenSearch that look good.Describe the solution you'd like
We should switch to the new implementation. Since these extend the same abstract class it would be a drag-and-drop change.
I benchmarked this change on http_logs which has 247M docs. I did it for the "@timestamp" field (high cardinality) and the "status" field (low cardinality since it's an HTTP status code). The speedup was especially large for status:
Related component
Search:Performance
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: