[FLINK-39256][table] Support ORDER BY clause in Process Table Functions#27929
[FLINK-39256][table] Support ORDER BY clause in Process Table Functions#27929
Conversation
| ```java | ||
| // Function that processes events in order and captures the ordering | ||
| public static class OrderedProcessor extends ProcessTableFunction<List<Event>> { | ||
| public static class Event { |
There was a problem hiding this comment.
will it work with Event as record?
I wonder if we should apply it to examples as well
| Optional<ChangelogMode> changelogMode(); | ||
|
|
||
| /** Sort direction for ORDER BY columns in table arguments with set semantics. */ | ||
| enum SortDirection implements Serializable { |
There was a problem hiding this comment.
do we need it to have Serializable?
aren't java's enums serializable by default?
| /** Ascending order with nulls first. */ | ||
| ASC_NULLS_FIRST(false, true), | ||
| /** Ascending order with nulls last. */ | ||
| ASC_NULLS_LAST(false, false), | ||
| /** Descending order with nulls first. */ | ||
| DESC_NULLS_FIRST(true, true), | ||
| /** Descending order with nulls last. */ | ||
| DESC_NULLS_LAST(true, false); | ||
|
|
||
| private final boolean descending; | ||
| private final boolean nullsFirst; |
There was a problem hiding this comment.
can we instead reuse Calcite's Direction https://github.com/apache/calcite/blob/0ce0c5ae2f7055f13551ebcde8e1aaa2ff17b469/core/src/main/java/org/apache/calcite/rel/RelFieldCollation.java#L54
?
There was a problem hiding this comment.
also I guess in case we know something more about data like if it is monotonic, then order by might be optimized (Calcite's RelFieldCollation already has some functionality for that)
| } | ||
|
|
||
| @Override | ||
| public PartitionedTable orderBy(Expression... fields) { |
There was a problem hiding this comment.
| public PartitionedTable orderBy(Expression... fields) { | |
| public PartitionedTable orderBy(Expression... expressions) { |
in general it could be not only fields, right?
| * @param fields expressions for ordering (e.g., {@code $("ts").asc(), $("score").desc()}) | ||
| * @return a partitioned and ordered table |
There was a problem hiding this comment.
We have some weird tests with expressions in ORDER BY like for instance
are there any limitations for PTF case?
can there be order by by some PTF?
do we have any validation around this?
|
|
||
| public PartitionQueryOperation( | ||
| List<ResolvedExpression> partitionExpressions, QueryOperation child) { | ||
| this(partitionExpressions, Collections.emptyList(), child); |
There was a problem hiding this comment.
| this(partitionExpressions, Collections.emptyList(), child); | |
| this(partitionExpressions, List.of(), child); |
| private final List<Expression> orderKeys; | ||
|
|
||
| private PartitionedTableImpl(TableImpl table, List<Expression> partitionKeys) { | ||
| this(table, partitionKeys, Collections.emptyList()); |
There was a problem hiding this comment.
| this(table, partitionKeys, Collections.emptyList()); | |
| this(table, partitionKeys, List.of()); |
|
|
||
| final MailboxPartialWatermarkProcessor processor = | ||
| new MailboxPartialWatermarkProcessor( | ||
| "test", mailboxExecutor, Collections.emptyList(), consumedWatermarks::add); |
There was a problem hiding this comment.
| "test", mailboxExecutor, Collections.emptyList(), consumedWatermarks::add); | |
| "test", mailboxExecutor, List.of(), consumedWatermarks::add); |
| new MailboxPartialWatermarkProcessor( | ||
| "test", | ||
| mailboxExecutor, | ||
| Arrays.asList(service1, service2, service3), |
There was a problem hiding this comment.
| Arrays.asList(service1, service2, service3), | |
| List.of(service1, service2, service3), |
| new MailboxPartialWatermarkProcessor( | ||
| "test", | ||
| mailboxExecutor, | ||
| Collections.singletonList(timerService), |
There was a problem hiding this comment.
| Collections.singletonList(timerService), | |
| List.of(timerService), |
| new MailboxPartialWatermarkProcessor( | ||
| "test", | ||
| mailboxExecutor, | ||
| Collections.singletonList(timerService), |
There was a problem hiding this comment.
| Collections.singletonList(timerService), | |
| List.of(timerService), |
| new MailboxPartialWatermarkProcessor( | ||
| "test", | ||
| mailboxExecutor, | ||
| Collections.singletonList(timerService), |
There was a problem hiding this comment.
| Collections.singletonList(timerService), | |
| List.of(timerService), |
| new MailboxPartialWatermarkProcessor( | ||
| "test", | ||
| mailboxExecutor, | ||
| Collections.singletonList(timerService), |
There was a problem hiding this comment.
| Collections.singletonList(timerService), | |
| List.of(timerService), |
| new MailboxPartialWatermarkProcessor( | ||
| "test", | ||
| mailboxExecutor, | ||
| Collections.singletonList(timerService), |
There was a problem hiding this comment.
| Collections.singletonList(timerService), | |
| List.of(timerService), |
| new MailboxPartialWatermarkProcessor( | ||
| "test", | ||
| mailboxExecutor, | ||
| Collections.singletonList(timerService), |
There was a problem hiding this comment.
| Collections.singletonList(timerService), | |
| List.of(timerService), |
What is the purpose of the change
This PR implements ORDER BY support for Process Table Functions (PTFs), enabling guaranteed ordered processing of rows within partitions based on time attributes and additional sort keys.
Brief change log
Verifying this change
This change added tests and can be verified as follows:
and various unit tests.
Does this pull request potentially affect one of the following parts:
@Public(Evolving): yesDocumentation