-
Notifications
You must be signed in to change notification settings - Fork 360
From SQL to SPL:Statistics by time window
The Time field of a certain database table is time, and the time interval is sometimes greater than 1 minute.
Time | Value |
---|---|
10:10:00 | 3 |
10:11:00 | 4 |
10:13:00 | 5 |
10:13:00 | 9 |
10:13:00 | 8 |
10:14:00 | 2 |
Now we need to divide the data into windows every minute, fill in the missing windows, and calculate the four values for each window, which are: start_value, the last item of the previous window; end_value, the last item in this window; min, the minimum value of this window; max, the maximum value of this window. The start_value in the first minute is based on the first record in this window; If data for a certain window is missing, replace it with the last item of the previous window (same as the start_value of this window).
start | end | start_value | end_value | min | max |
---|---|---|---|---|---|
10:10:00 | 10:11:00 | 3 | 3 | 3 | 3 |
10:11:00 | 10:12:00 | 3 | 4 | 4 | 4 |
10:12:00 | 10:13:00 | 4 | 4 | 4 | 4 |
10:13:00 | 10:14:00 | 4 | 8 | 5 | 9 |
10:14:00 | 10:15:00 | 8 | 2 | 2 | 2 |
SQL:
with overview as (
SELECT
distinct on (a.time) a.id, a.time, b.time as "end", a.value,
date_trunc('minute', a.time) as minute_start,
date_trunc('minute', b.time) as minute_end
FROM
main a
left join
main b
on
a."time"<b."time" and a.id = b.id
order by
a.time, b.time asc
),
overview2 as (
select
id, value, true as backfill,
date_trunc('minute', "end") as time,
date_trunc('minute', "end") as minute
from
overview
where
minute_start <> minute_end
UNION ALL
select
id, time, value, false as backfill,
date_trunc('minute', time) as minute
from
overview
),
overview3 as (
select
*
from
overview2
UNION ALL (
Select
distinct on (a.missingminute)
c.id,
a.missingminute as time,
a.missingminute as minute,
c.value,
true as backfill
from (
SELECT
date_trunc('minute', time.time) as missingminute
FROM
generate_series((select min(minute) from overview2),(select max(minute) from overview2),'1 minute'::interval) time
left join (
select distinct
minute
from
overview2
) b
on
date_trunc('minute', time) = b.minute
where
b.minute isnull
) a
left join
main c
on
a.missingminute > c.time
order by
a.missingminute,
c.time desc
)
order by
time
)
select
t1.id,
t1.minute as minute_start,
t1.minute + interval '1 minute' as minute_end,
t1.backfill as start_backfill,
t1.start,
t2.end,
coalesce(t3.min, t1.start) as min,
coalesce(t3.max, t1.start) as max
from
(select distinct on (id, minute) id, minute, value as start, backfill from overview3 order by id, minute, time asc) t1
left join
(select distinct on (id, minute) id, minute, value as end from overview3 order by id, minute, time desc) t2 on t1.id = t2.id and t1.minute = t2.minute
left join
(select id, minute, min(value) min, max(value) max from overview2 group by id,minute) t3 on t1.id = t3.id and t1.minute = t3.minute
SQL requires multiple layers of nested subqueries and complex join statements to implement, and the code is lengthy and difficult to understand. SPL directly provides time series functions, sequence aligned functions, and position related syntax:
https://try.esproc.com/splx?55o
A | |
---|---|
1 | $select * from main.txt |
2 | =A1.run(Time=time@m(Time)) |
3 | =list=periods@s(A2.min(Time),A2.max(Time),60) |
4 | =A2.align@a(list,Time) |
5 | =A4.new(list(#):start, elapse@s(start,60):end, sv=ifn(end_value[-1],~.Value):start_value, ifn(~.m(-1).Value, sv):end_value, ifn(~.min(Value),sv):min, ifn(~.max(Value),sv):max) |
A1: Load data.
A2: Change the Time field to full minutes.
A3: Generate a continuous minute time series list.
A4: Align the data with the list, with each group being the data of a window and some windows being empty.
A5: Generate a new two-dimensional table, use the original records from each group to generate a new record, start takes values from the list according to the sequence number, start_value is taken from the end_value of the previous new record, and when this value is null (in the first minute), it is taken from the first record of the current group. end_value is taken from the last record of the previous group, min is taken from the minimum value of this group, and max is taken from the maximum value of this group. If the last three items are missing, use start_value(sv) instead. [-1] represents the relative previous one, and m(-1) represents the last member in the set.
Question source:https://stackoverflow.com/questions/78102526/postgressql-backfill-data-if-doesnt-exist-for-a-start-period
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code