-
Notifications
You must be signed in to change notification settings - Fork 176
Allow renaming group-by fields to existing field names #4586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Yuanchun Shen <[email protected]>
Signed-off-by: Yuanchun Shen <[email protected]>
private boolean isInputRef(RexNode node) { | ||
return switch (node.getKind()) { | ||
case AS, DESCENDING, NULLS_FIRST, NULLS_LAST -> { | ||
final List<RexNode> operands = ((RexCall) node).operands; | ||
yield isInputRef(operands.getFirst()); | ||
} | ||
default -> node instanceof RexInputRef; | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the PlanUtil.getInputRefs
be used to replace this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they serve different purposes. PlanUtil.getInputRefs
returns all referred input refs. Besides, if a node refers multiple inputs, it will return all of them. Yet here I just want to check whether a node is an input ref (optionally aliased), keeping the node as is.
// During aggregation, Calcite projects both input dependencies and output group-by fields. | ||
// When names conflict, Calcite adds numeric suffixes (e.g., "value0"). | ||
// Apply explicit renaming to restore the intended aliases. | ||
if (names.size() == reResolved.getLeft().size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the names.size
not equals to reResolved.getLeft().size()
? seems the condition is always true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lengths do not equal when a group key is not aliased -- under which circumstance extractAliasLiteral
will return empty:
private Optional<RexLiteral> extractAliasLiteral(RexNode node) {
if (node == null) {
return Optional.empty();
} else if (node.getKind() == AS) {
return Optional.of((RexLiteral) ((RexCall) node).getOperands().get(1));
} else {
return Optional.empty();
}
Although it seems that all group keys are aliased in practice, this defense check was to prevent unintended future changes to avoid in-correspondent renaming. Should I remove it?
Pair<List<RexNode>, List<AggCall>> reResolved = | ||
resolveAttributesForAggregation(groupExprList, aggExprList, context); | ||
|
||
List<String> names = getGroupKeyNamesAfterAggregation(reResolved.getLeft()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rename the var names
to make it more meaningful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed
Signed-off-by: Yuanchun Shen <[email protected]>
* Imitates {@code Registrar.registerExpression} of {@link RelBuilder} to derive the output order | ||
* of group-by keys after aggregation. | ||
* | ||
* <p>The projected input reference comes first, while any other computed expression follows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Registrar.registerExpression
, seems the other computed expression won't promise following the original order if there is expression duplication.
But since our PPL only allow span
expr in our group by and it cannot be combined with other span
expr. This logic may be right and I cannot find any bad case so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a bad case: stats count() by value, value, @timestamp
. I'll fix it.
Update: Fixed by checking duplication
/** Whether a rex node is an aliased input reference */ | ||
private boolean isInputRef(RexNode node) { | ||
return switch (node.getKind()) { | ||
case AS, DESCENDING, NULLS_FIRST, NULLS_LAST -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any case that we have DESCENDING, NULLS_FIRST, NULLS_LAST
in our stats .. by ...
command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I didn't manage to create any. It seems there is always a projection after sorting and before aggregation.
E.g.
LogicalAggregate(group=[{0}], count()=[COUNT()])
LogicalProject(value=[$2])
LogicalSort(sort0=[$2], dir0=[DESC-nulls-last])
Signed-off-by: Yuanchun Shen <[email protected]>
Signed-off-by: Yuanchun Shen <[email protected]>
Description
This PR fixes a bug in Calcite-enabled PPL queries where group-by fields cannot be aliased to their original field names, causing queries to fail with "field not found" errors.
When Calcite is enabled, PPL queries that use span functions with aliases matching the original field names fail with errors like:
field [value] not found; input fields are: [value0, count()]
Affected Query Patterns:
source=time_test | stats count() by span(value, 2000) as value
source=time_test | stats count() by span(timestamp, 1h) as timestamp
Root Cause Analysis
The issue occurs during Calcite's aggregation processing:
Solution Implementation
This PR implements a post-aggregation field renaming strategy that preserves intended aliases.
Related Issues
Resolves #4580
Check List
--signoff
or-s
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.