Open
Description
Time-series aggregations, such as {agg}_over_time
and rate
, against time-series indices are currently slow due to several reasons:
- They require two phases:
- First, grouping by each time-series (by
tsid
andtimebucket
). - Then, grouping by user-specified groups.
- First, grouping by each time-series (by
- For
rate
aggregations, data must be provided in timestamp order per time-series.
This issue proposes some ideas and tracks optimizations to improve the performance of time-series aggregations in ES|QL.
Source command
- Translate time-series queries without
rate
toFROM
: Avoid sorted source for time_series aggs without rates #127033 - Avoid comparing
tsid
when iterating over documents in TS source: Optimize time-series source operator #127095 - Extract fields directly from the time-series source: Push down field extraction to time-series source #127445
- Speed up reading dimension fields: Speed up read dimension fields in TS #128283
- Optimize loading of time-series data using
FROM
.
Execution
- Execute time-series source in a separate driver: Increase concurrency for TS command #128419
- Execute extract fields in a separate driver: Run field extraction concurrently in TS #128643
- Support segment data partitioning for TS
- Emit final results for non-overlapping buckets (drop tsid for these buckets)
Values aggregation
- Emit ordinal output blocks: Emit ordinal output block for values aggregate #127201
- Handle ordinal input blocks: Optimize ordinal inputs in Values aggregation #127849
- Optimize for single-value aggregations (dimension fields?).
Block hash
- Enable time-series block hash: Enable time-series block hash #127488
- Leverage ordinal blocks in time-series block hash: Enable time-series block hash #127488
- Emit ordinal blocks in PackedValuesBlockHash.
Planning
- Use a single aggregation for the second phase.
- Optimize for a single target index.
- Skip backing indices with
start_time
andend_time
outside theTRANGE
filter.
Misc
- Ensure ordinal builder emit ordinal blocks #127949
- Load the first seen value only for last_over_time
Migrated from 105397 and to be considered
- Add support of sparse index to easily navigate a time series documents (Sparse index for tsdb #95701). This is required for determining the last value of a metric and skipping to the next last value of the next time serie. And other functionally like interpolation and geo fencing. Additionally a query may be too selective, and mask documents which are valid metric of a time serie. A sparse index would allow us to access the metrics even if that would be the case.
- Enhancing the time serie grouping operator to also group by time series and time interval. A typical use case would group by time serie and time interval. This is when the BUCKET syntax is used.