-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Is your feature request related to a problem? Please describe
While working on recent customer investigation, I noticed they were executing following aggregation pattern - top level date histogram with 1s buckets (I know its a lot!) having 100 percentile as sub-aggregation:
"aggregations":{"2":{"date_histogram":{"field":"log_time","format":"epoch_millis","interval":"1s","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":0,"extended_bounds":{"min":1747560702304,"max":1750152702304}},"aggregations":{"1":{"percentiles":{"field":"time_taken_ms","percents":[100.0],"keyed":true,"tdigest":{"compression":100.0}}}}}}}
This query execution was pretty slow and many times result in node going down. The captured histogram showed lot of objects related to aggregation dominating the heap.
17: 10475440 335214080 com.tdunning.math.stats.IntAVLTree$NodeAllocator
16: 10475440 335214080 com.tdunning.math.stats.IntAVLTree$IntStack
8: 10475440 754231680 com.tdunning.math.stats.AVLGroupTree$1
6: 10475440 838035200 com.tdunning.math.stats.AVLGroupTree
9: 10475410 754229520 org.opensearch.search.aggregations.metrics.InternalTDigestPercentiles
7: 10475410 838032800 org.opensearch.search.aggregations.metrics.TDigestState
10: 10475255 586614280 org.opensearch.search.aggregations.bucket.histogram.InternalDateHistogram$Bucket
While we have been able to address the resiliency issue mostly by hardening the cancellation framework for aggregation, thanks to @kaushalmahi12 (#18426, #19066, #19231, #18673), I am still concerned about the performance for this aggregation execution. Since we are looking at 100th percentile only, isn't that equivalent of max aggregation?
Describe the solution you'd like
- We should execute the percentile aggregation as min/max aggregation when we are interested in just 0th or 100th percentile. That combined with the recent skiplist based improvements for date histogram at top level - [META] Skip List Based Optimization for Aggregations in OpenSearch #19384 , should help improve the performance significantly and also reduce the memory required for aggregation execution.
- For other types of percentile aggregation, we should evaluate
MergingDigest
instead ofAVLTreeDigest
. Based of a quick search on google, the primary advantage ofAVLTreeDigest
isMore consistent insertion times
. But that should not be concern the aggregation scenario, as we only care about the amortized cost. Rest everything should be advantage in favor ofMergingDigest
:
Feature | AVLTreeDigest | MergingDigest |
---|---|---|
Performance | More consistent insertion times. | Amortized speed is faster, but has occasional slower compactions. |
Memory | Requires more memory due to dynamic allocation. | Uses less memory, requiring no dynamic allocation after construction. |
Accuracy | Nearly as good as MergingDigest. | Generally more accurate, especially after recent improvements to interpolation and two-level merging. |
Bugs | Had issues with duplicate values in older versions that were fixed in later releases. | Fewer known bugs reported. |
Recommendation | A good alternative if consistent insertion times are critical, but MergingDigest is generally preferred. | The recommended implementation for most use cases due to superior accuracy and memory efficiency. |
Related component
Search:Aggregations
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status