Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support group by span over time based column with Span UDF #3421

Conversation

songkant-aws
Copy link
Contributor

@songkant-aws songkant-aws commented Mar 13, 2025

Description

Support group by span over time based column with Span UDF

Related Issues

Resolves #3354

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

date.atStartOfDay().atZone(ZoneOffset.UTC).toInstant().toEpochMilli(), interval);
return SqlFunctions.timestampToDate(dateEpochValue);
case SqlTypeName.TIME:
if (dateTimeUnit.getId() > 4) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why need this limitation?

Copy link
Contributor Author

@songkant-aws songkant-aws Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment inline. The TIME type usually means date format without year, month, etc like the field '17:59:59.99'

* day, 1 month, 1 hour
* </ol>
*/
public class SpanFunction implements UserDefinedFunction {
Copy link
Collaborator

@qianheng-aws qianheng-aws Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe leave a TODO here to do refactoring in future. Your previous idea of implementing a self-defined implementor seems to be a better approach.

Looks like it could be implemented by replacing ScalarFunction(we currently use) with ImplementableFunction, and move logic for handling different filed type to the implementor. In that way, this span function should be simpler with only handling timestamp type.

Then, we may have different functions like dateToTs, tsToDate, span(ForTs) and provide us more flexibility to reuse them or combine them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TODO

@@ -100,6 +101,9 @@ static SqlOperator translate(String op) {
return SqlLibraryOperators.DATE_ADD_SPARK;
case "DATE_ADD":
return SqlLibraryOperators.DATEADD;
// UDF Functions
case "SPAN":
return TransferUserDefinedFunction(SpanFunction.class, "SPAN", ReturnTypes.ARG0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReturnTypes.ARG0 or ReturnTypes.ARG0_NULLABLE? What if the timestamp is null in span(timestamp, ...)?

Copy link
Contributor Author

@songkant-aws songkant-aws Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to ReturnTypes.ARG0_NULLABLE but it will transform it to default long or int value like 0L or 0 in linq4j generated code. Still looking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved the groupby nulls issue due to sql type creation conflicts in RexNodeBuilder.

public void testAvgByTimeSpanAndFields() {
JSONObject actual =
executeQuery(
String.format(
"source=%s | stats avg(balance) by span(birthdate, 1 day) as age_balance",
"source=%s | stats avg(balance) by span(birthdate, 1 month) as age_balance",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need some ITs for such as 15 minutes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 15 minutes IT.

@LantaoJin
Copy link
Member

LantaoJin commented Mar 18, 2025

PPLQueryDataAnonymizer and PPLQueryDataAnonymizerTest are required to update for span expression. @songkant-aws

@LantaoJin
Copy link
Member

Flaky test

CalcitePPLStringBuiltinFunctionIT > testPosition FAILED
    java.lang.AssertionError: expected:<1> but was:<2>
        at __randomizedtesting.SeedInfo.seed([EFDC39F9CD048074:F34E991E6AAE7EDB]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at org.junit.Assert.assertEquals(Assert.java:633)
        at org.opensearch.sql.util.MatcherUtils.verify(MatcherUtils.java:192)
        at org.opensearch.sql.util.MatcherUtils.verifyDataRows(MatcherUtils.java:160)
        at org.opensearch.sql.calcite.standalone.CalcitePPLStringBuiltinFunctionIT.testPosition(CalcitePPLStringBuiltinFunctionIT.java:161)

Let me trigger a re-run attempt.

cc @yuancu

@LantaoJin LantaoJin added the calcite calcite migration releated label Mar 18, 2025
@songkant-aws
Copy link
Contributor Author

PPLQueryDataAnonymizer and PPLQueryDataAnonymizerTest are required to update for span expression. @songkant-aws

Added corresponding change and more IT to cover span over different kinds of date formats.

@songkant-aws
Copy link
Contributor Author

Looks like we have some flaky tests, this time there is one in build-windows-macos, which is irrelevant to this PR's change.

org.opensearch.sql.calcite.standalone.CalcitePPLStringBuiltinFunctionIT.testSubstring(CalcitePPLStringBuiltinFunctionIT.java:148)

@LantaoJin LantaoJin merged commit b99130a into opensearch-project:feature/calcite-engine Mar 19, 2025
18 of 20 checks passed
@LantaoJin
Copy link
Member

Failure due to flaky test, merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
calcite calcite migration releated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants