-
Notifications
You must be signed in to change notification settings - Fork 176
Description
Is your feature request related to a problem?
PPL currently lacks support for per_*
aggregation functions (per_second
, per_minute
, per_hour
, per_day
). These functions calculate rate-based metrics by normalizing aggregated values to specific time units, converting raw counts into meaningful per-unit rates.
Without these functions, users cannot easily perform rate calculations that are common in performance monitoring scenarios, such as calculating packets per second, requests per minute when using the timechart
command.
What solution would you like?
Implement the four per_*
aggregation functions in PPL:
per_second(<value>)
- Returns values normalized to per-second rateper_minute(<value>)
- Returns values normalized to per-minute rateper_hour(<value>)
- Returns values normalized to per-hour rateper_day(<value>)
- Returns values normalized to per-day rate
These functions should work exclusively with the timechart
command (due to implicit timestamp field dependency):
# Sample data
{\"_time\":\"2025-09-08T10:00:00\", \"packets\":10},
{\"_time\":\"2025-09-08T10:00:05\", \"packets\":60},
{\"_time\":\"2025-09-08T10:00:30\", \"packets\":20},
{\"_time\":\"2025-09-08T10:00:50\", \"packets\":30}
# Example 1
...
| eval _time=strptime(_time, "%Y-%m-%dT%H:%M:%S")
| timechart per_second(packets) span=1m
----------------------|--------------------
_time | per_second(packets)
----------------------|--------------------
2025-09-08T10:00:00 | 2 # (10+60+20+30)/60s
# Example 2
timechart per_second(packets) span=20s
----------------------|--------------------
_time | per_second(packets)
----------------------|--------------------
2025-09-08T10:00:00 | 3.5 # (10+60)/20s
2025-09-08T10:00:20 | 1 # (20)/20s
2025-09-08T10:00:40 | 1.5 # (30)/20s
# Example 3
timechart per_minute(packets), per_hour(packets), per_day(packets) span=1m
----------------------|--------------------|--------------------|--------------------
_time | per_minute(packets)| per_hour(packets) | per_day(packets)
----------------------|--------------------|--------------------|--------------------
2025-09-08T10:00:00 | 120 | 7200 | 172800
What alternatives have you considered?
- Manual calculation: Users can manually divide aggregated values by time span, but this requires knowledge of time conversion factors.
Do you have any additional context?
Implementation approaches
- Short-term solution: Implement rewriting for fixed-width buckets
- Currently PPL
timechart
only supportsspan
option which is fixed interval - We can simply transform
per_*
functions to mathematical formulas at compile time
- Currently PPL
# Example
... | timechart per_second(packets) span=1m
=>
SELECT SUM(packets) / 60
...
GROUP BY SPAN(@timestamp, 1m)
- Long-term solutions [TBD]
- The primary challenge lies in dynamic bucketing behavior in bin-options.
- Option 1: Output bounds from bucketing function similar as windowing function in Spark SQL
- Option 2: Dynamic calculation via
LEAD
window function to determine next bucket's start time
Metadata
Metadata
Assignees
Labels
Type
Projects
Status