Increase concurrency for TS command #128419

dnhatn · 2025-05-24T06:12:54Z

Today, with FROM, we can partition a shard into multiple slices, allowing multiple drivers to execute against a single shard.

For time-series, specifically rate aggregation, the data needs to arrive in order. Strictly speaking, data for one tsid in each bucket must arrive in order to avoid buffering. There are several options to parallel the execution:

Split the queries into multiple time intervals based on the bucket interval, with multiple drivers executing concurrently at different intervals. However, since the data is sorted by TSID and timestamp, this partitioning might not be efficient within each driver.
Alternatively, split the current single driver vertically into multiple parts, with one driver for each part.

This PR implements the first step of the option-2, where time-series source operator and time-series aggregation operator are executed in two separate drivers.

The field extractions within TS will be executed in a separate driver in a follow-up PR.

elasticsearchmachine · 2025-05-24T23:01:29Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2025-05-24T23:01:29Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

dnhatn · 2025-05-25T03:33:40Z

[
    {
        "operator": "TimeSeriesSourceOperator[shards = [.ds-metrics-hostmetricsreceiver.otel-default-2025.05.08-000001:0], maxPageSize = 2520[maxPageSize=2520, remainingDocs=2147388847]]",
        "status":
        {
            "processed_slices": 1,
            "processed_queries":
            [
                "IndexOrDocValuesQuery(indexQuery=@timestamp:[1746727208001 TO 9223372036854775807], dvQuery=@timestamp:[1746727208001 TO 9223372036854775807])"
            ],
            "processed_shards":
            [
                ".ds-metrics-hostmetricsreceiver.otel-default-2025.05.08-000001:0"
            ],
            "process_nanos": 18334025,
            "slice_index": 0,
            "total_slices": 1,
            "pages_emitted": 38,
            "slice_min": 0,
            "slice_max": 0,
            "current": 0,
            "rows_emitted": 94800,
            "partitioning_strategies":
            {
                ".ds-metrics-hostmetricsreceiver.otel-default-2025.05.08-000001:0": "SHARD"
            },
            "tsid_loaded": 7900,
            "values_loaded": 284400
        }
    },
    {
        "operator": "EvalOperator[evaluator=DateTruncDatetimeEvaluator[fieldVal=Attribute[channel=1], rounding=Rounding[300000 in Z][fixed]]]",
        "status":
        {
            "process_nanos": 310306,
            "pages_processed": 38,
            "rows_received": 94800,
            "rows_emitted": 94800
        }
    },
    {
        "operator": "TimeSeriesAggregationOperator[blockHash=TimeSeriesBlockHash{keys=[BytesRefKey[channel=0], LongKey[channel=4]], entries=7900b}, aggregators=[GroupingAggregator[aggregatorFunction=RateDoubleGroupingAggregatorFunction[channels=[2, 1]], mode=INITIAL], GroupingAggregator[aggregatorFunction=ValuesBytesRefGroupingAggregatorFunction[channels=[3]], mode=INITIAL]]]",
        "status":
        {
            "hash_nanos": 2006304, 
            "aggregation_nanos": 3007861,
            "pages_processed": 38,
            "rows_received": 94800,
            "rows_emitted": 7900,
            "emit_nanos": 1242781
        }
    }
]

With this change, we save 2ms in hash_nanos and 3ms in aggregation_nanos because the TimeSeriesAggregationOperator now executes concurrently with the TimeSeriesSourceOperator. Also, we expect to save more from the TimeSeriesSourceOperator (18ms) by executing field extraction in a separate driver.

The query time of this query decreased from 41ms to 36ms.

POST /_query
{
    "query": "TS metrics-hostmetricsreceiver.otel-default | WHERE @timestamp >= \"2025-05-08T18:00:08.001Z\" 
| STATS cpu = avg(rate(`metrics.process.cpu.time`)) BY host.name, BUCKET(@timestamp, 5 minute)"
}
``

kkrik-es · 2025-05-25T15:10:18Z

If I understand correctly, this allows using 2 threads per query - and we may add another one for field extraction? This sounds great, but may still leave resources unused (e.g. in sockets with dozens of cores), so we may want to investigate partitioning work per tsid, later.

dnhatn · 2025-05-26T02:55:32Z

Thanks, Kostas! In the long term, we should explore better partitioning strategies, but this approach helps TS in the short term.

elasticsearchmachine added the v9.1.0 label May 24, 2025

dnhatn force-pushed the parallel-ts branch 2 times, most recently from 6daee94 to 7f08143 Compare May 24, 2025 15:51

dnhatn added >non-issue :StorageEngine/TSDB You know, for Metrics :Analytics/ES|QL AKA ESQL labels May 24, 2025

dnhatn requested review from nik9000 and removed request for nik9000 May 24, 2025 22:42

dnhatn force-pushed the parallel-ts branch from 7f08143 to 77ff6f7 Compare May 24, 2025 22:58

dnhatn requested review from nik9000, martijnvg and kkrik-es May 24, 2025 23:00

dnhatn marked this pull request as ready for review May 24, 2025 23:01

elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels May 24, 2025

dnhatn mentioned this pull request May 22, 2025

Speed up time-series aggregation #127444

Open

17 tasks

kkrik-es approved these changes May 25, 2025

View reviewed changes

dnhatn force-pushed the parallel-ts branch from 77ff6f7 to e09fd48 Compare May 25, 2025 17:07

Increase concurrency for TS command

a18fc8c

dnhatn force-pushed the parallel-ts branch from e09fd48 to a18fc8c Compare May 25, 2025 17:10

dnhatn added 2 commits May 25, 2025 10:57

update tests

433d638

Merge remote-tracking branch 'elastic/main' into parallel-ts

93d6e98

dnhatn merged commit f76e201 into elastic:main May 26, 2025
18 checks passed

dnhatn deleted the parallel-ts branch May 26, 2025 02:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increase concurrency for TS command #128419

Increase concurrency for TS command #128419

Uh oh!

dnhatn commented May 24, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented May 24, 2025

Uh oh!

elasticsearchmachine commented May 24, 2025

Uh oh!

dnhatn commented May 25, 2025 •

edited

Loading

Uh oh!

kkrik-es commented May 25, 2025

Uh oh!

dnhatn commented May 26, 2025

Uh oh!

Uh oh!

Uh oh!

Increase concurrency for TS command #128419

Increase concurrency for TS command #128419

Uh oh!

Conversation

dnhatn commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented May 24, 2025

Uh oh!

elasticsearchmachine commented May 24, 2025

Uh oh!

dnhatn commented May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kkrik-es commented May 25, 2025

Uh oh!

dnhatn commented May 26, 2025

Uh oh!

Uh oh!

Uh oh!

dnhatn commented May 24, 2025 •

edited

Loading

dnhatn commented May 25, 2025 •

edited

Loading