Skip to content

From SQL to SPL:Aggregate according to time interval

esProcSPL edited this page May 30, 2025 · 1 revision

A certain database table stores time-series data, with intervals of several seconds between each record.

dt payload
2024-01-01 12:00:13 0
2024-01-01 12:00:28 9
2024-01-01 12:00:39 2
2024-01-01 12:00:44 3
2024-01-01 12:00:53 0
2024-01-01 12:01:00 6
2024-01-01 12:01:19 0
2024-01-01 12:01:26 4
2024-01-01 12:01:27 2
2024-01-01 12:01:49 6
2024-01-01 12:02:09 5
2024-01-01 12:02:32 6
2024-01-01 12:02:59 7
2024-01-01 12:03:18 5
2024-01-01 12:03:25 4
2024-01-01 12:03:28 5
2024-01-01 12:03:30 3
2024-01-01 12:03:41 1
2024-01-01 12:04:23 9
2024-01-01 12:04:32 7
2024-01-01 12:04:34 5
2024-01-01 12:04:43 4
2024-01-01 12:04:56 7
2024-01-01 12:05:18 7
2024-01-01 12:05:33 3
2024-01-01 12:05:39 5
2024-01-01 12:05:54 1
2024-01-01 12:05:56 8
2024-01-01 12:06:11 7
2024-01-01 12:06:40 2
2024-01-01 12:06:47 5
2024-01-01 12:06:51 6
2024-01-01 12:06:58 5
2024-01-01 12:07:06 3
2024-01-01 12:07:10 5
2024-01-01 12:07:27 0
2024-01-01 12:07:33 1
2024-01-01 12:07:38 3
2024-01-01 12:08:04 2
2024-01-01 12:08:11 4
2024-01-01 12:08:21 1
2024-01-01 12:08:44 1
2024-01-01 12:08:55 3
2024-01-01 12:09:01 4
2024-01-01 12:09:05 0
2024-01-01 12:09:12 2
2024-01-01 12:09:28 8
2024-01-01 12:09:38 5
2024-01-01 12:10:14 3
2024-01-01 12:10:25 2
2024-01-01 12:10:30 7
2024-01-01 12:10:39 1
2024-01-01 12:10:53 6
2024-01-01 12:11:15 5
2024-01-01 12:11:25 6
2024-01-01 12:11:41 1
2024-01-01 12:11:53 8
2024-01-01 12:11:59 5
2024-01-01 12:12:03 9
2024-01-01 12:12:09 5
2024-01-01 12:12:28 8
2024-01-01 12:12:34 2
2024-01-01 12:12:53 1
2024-01-01 12:13:13 3
2024-01-01 12:13:14 8
2024-01-01 12:13:19 8
2024-01-01 12:13:48 3
2024-01-01 12:14:06 5
2024-01-01 12:14:18 1
2024-01-01 12:14:36 3
2024-01-01 12:14:40 4
2024-01-01 12:14:55 3

Now we need to do a group and aggregation every minute, summarizing data for 5 minutes each time. For example, generate three records in minutes 1, 2, and 3, and summarize the data for minutes 1-5, 2-6, and 3-7 respectively.

From To payload
2024-01-01 12:00:00 2024-01-01 12:05:00 100
2024-01-01 12:01:00 2024-01-01 12:06:00 110
2024-01-01 12:02:00 2024-01-01 12:07:00 117
2024-01-01 12:03:00 2024-01-01 12:08:00 111
2024-01-01 12:04:00 2024-01-01 12:09:00 104
2024-01-01 12:05:00 2024-01-01 12:10:00 91
2024-01-01 12:06:00 2024-01-01 12:11:00 86
2024-01-01 12:07:00 2024-01-01 12:12:00 86
2024-01-01 12:08:00 2024-01-01 12:13:00 99
2024-01-01 12:09:00 2024-01-01 12:14:00 110
2024-01-01 12:10:00 2024-01-01 12:15:00 107

SQL:

SELECT
    [From], DATEADD(MINUTE, 1, [To]) [To], payload
FROM (
    SELECT
        dt, MIN(dt) OVER(ORDER BY dt ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) [From],
        dt [To], SUM(payload) OVER(ORDER BY dt ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) payload
    FROM (
        SELECT
            DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0) dt, 
SUM(payload) payload
        FROM #tmstmp
        GROUP BY DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0)
    ) q
) q
WHERE DATEDIFF(MINUTE, [From], [To]) > 3

SQL needs to implement it using nested subqueries and multiple window functions, which makes the code cumbersome. SPL provides syntax for directly accessing positions.

https://try.esproc.com/splx?4ql

 A
1 $select * from tmstmp.txt
2 =A1.groups(datetime@m(dt):dt;sum(payload):payload)
3 =A2.new(dt:From, elapse@s(From,300):To, payload[0:4].sum(): payload)
4 =A3.m(:-5)

A1: Load data.

A2: Group and aggregate by minute.

A3: Generate a new two-dimensional table, where To is taken from the current record, From is taken from 5 minutes after the current record, and the payload summarizes the interval from the current record to the 4th record.

A4: Take from the first item backward to the fifth item.

Question source:https://stackoverflow.com/questions/78152707/get-sum-for-each-5-minute-time-interval

Clone this wiki locally