-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Infer complex expression type for SSE aggregation function rewrite #17017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
yashmayya
commented
Oct 14, 2025
- Enhancement for Automatically rewrite MIN / MAX on string col to MINSTRING / MAXSTRING #16980, bringing SSE aggregation function rewrite functionality (nearly) on par with the MSE one.
- The only gap remaining is certain transformation functions that might not have an equivalent scalar function implementation - this gap should anyway be bridged because transform functions can't be used in MSE's intermediate stages and in various other places where only scalar functions can be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the SSE aggregation function rewrite optimizer to support complex expressions containing nested function calls, bringing it closer to feature parity with the MSE optimizer. The main improvement is the ability to infer data types for complex expressions rather than just simple column identifiers.
- Enhanced
AggregateFunctionRewriteOptimizer
to handle complex expressions likeMIN(SUB_STRING(TRIM(OriginCityName), 1))
- Added new
inferExpressionType
method inRequestUtils
to recursively determine expression data types - Added comprehensive test coverage for the new type inference functionality
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
AggregateFunctionRewriteOptimizer.java |
Replaced simple column-only logic with complex expression type inference using RequestUtils.inferExpressionType |
RequestUtils.java |
Added new inferExpressionType method that recursively traverses expression trees to determine data types |
RequestUtilsTest.java |
Added comprehensive test suite covering identifier, literal, scalar function, and nested function type inference |
BaseClusterIntegrationTestSet.java |
Added integration test to verify rewrite functionality works with complex expressions |
pinot-common/src/main/java/org/apache/pinot/common/utils/request/RequestUtils.java
Show resolved
Hide resolved
* functions aren't supported and {@link ColumnDataType#UNKNOWN} will be returned for them. | ||
*/ | ||
@Nullable | ||
public static ColumnDataType inferExpressionType(@Nullable Expression expression, Schema schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to accept null
expression here. If we don't accept null
expression, the return is also always not null
} | ||
FunctionInfo functionInfo = FunctionRegistry.lookupFunctionInfo(fn.getOperator(), argTypes); | ||
if (functionInfo != null) { | ||
Class<?> returnClass = functionInfo.getMethod().getReturnType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably not enough, especially when a function returns Object
because various types of value can be returned by the same function based on the argument, e.g. JSON_EXTRACT_SCALAR
.
Ideally we want to use SqlReturnTypeInference
to inference the return type of a function. This info should be available from PinotOperatorTable
. All the working functions (at least in MSE) are already registered there