Skip to content

Conversation

yashmayya
Copy link
Contributor

  • Enhancement for Automatically rewrite MIN / MAX on string col to MINSTRING / MAXSTRING #16980, bringing SSE aggregation function rewrite functionality (nearly) on par with the MSE one.
  • The only gap remaining is certain transformation functions that might not have an equivalent scalar function implementation - this gap should anyway be bridged because transform functions can't be used in MSE's intermediate stages and in various other places where only scalar functions can be used.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the SSE aggregation function rewrite optimizer to support complex expressions containing nested function calls, bringing it closer to feature parity with the MSE optimizer. The main improvement is the ability to infer data types for complex expressions rather than just simple column identifiers.

  • Enhanced AggregateFunctionRewriteOptimizer to handle complex expressions like MIN(SUB_STRING(TRIM(OriginCityName), 1))
  • Added new inferExpressionType method in RequestUtils to recursively determine expression data types
  • Added comprehensive test coverage for the new type inference functionality

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
AggregateFunctionRewriteOptimizer.java Replaced simple column-only logic with complex expression type inference using RequestUtils.inferExpressionType
RequestUtils.java Added new inferExpressionType method that recursively traverses expression trees to determine data types
RequestUtilsTest.java Added comprehensive test suite covering identifier, literal, scalar function, and nested function type inference
BaseClusterIntegrationTestSet.java Added integration test to verify rewrite functionality works with complex expressions

* functions aren't supported and {@link ColumnDataType#UNKNOWN} will be returned for them.
*/
@Nullable
public static ColumnDataType inferExpressionType(@Nullable Expression expression, Schema schema) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to accept null expression here. If we don't accept null expression, the return is also always not null

}
FunctionInfo functionInfo = FunctionRegistry.lookupFunctionInfo(fn.getOperator(), argTypes);
if (functionInfo != null) {
Class<?> returnClass = functionInfo.getMethod().getReturnType();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not enough, especially when a function returns Object because various types of value can be returned by the same function based on the argument, e.g. JSON_EXTRACT_SCALAR.
Ideally we want to use SqlReturnTypeInference to inference the return type of a function. This info should be available from PinotOperatorTable. All the working functions (at least in MSE) are already registered there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants