Skip to content

analytics-engine: More Math function support#21743

Open
mch2 wants to merge 1 commit into
opensearch-project:mainfrom
mch2:math2
Open

analytics-engine: More Math function support#21743
mch2 wants to merge 1 commit into
opensearch-project:mainfrom
mch2:math2

Conversation

@mch2
Copy link
Copy Markdown
Member

@mch2 mch2 commented May 19, 2026

Description

Brings PPL scalar function coverage on the analytics-engine route up to parity.

Arithmetic

  • x % y and mod(x, y) on floating-point operands (fp32, fp64, and mixed fp/int)
  • mod(x, 0) returns NULL instead of raising
  • x / 0 returns NULL instead of raising

Trigonometric

  • sin, cos, tan, asin, acos, atan, atan2, cot on integer-typed columns (was failing with "Unable to convert call SIN(i32?)" etc.)

Base conversion

  • conv(n, fromBase, toBase) — convert integer between bases 2–36, matches MySQL/Java semantics (lowercase digit output, NumberFormatException on out-of-range radix)
  • Accepts both string and integer first argument
  • Round-trips negative numbers

Predicates

  • where rand() > 0 and other non-deterministic filter predicates (was rejected by the planner)

No new PPL syntax — these are existing PPL functions that were silently failing or throwing on the analytics-engine route. Users gain consistent behavior whether their query hits the analytics path.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@mch2 mch2 requested a review from a team as a code owner May 19, 2026 17:42
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

PR Reviewer Guide 🔍

(Review updated until commit 85764bb)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The isNonZeroLiteral check uses BigDecimal.signum() to detect zero, but RexLiteral.getValueAs(BigDecimal.class) can return null for non-numeric literals (e.g., string '0'). If value is null, calling value.signum() throws NullPointerException. This occurs when the divisor is a non-numeric literal that Calcite typed as numeric but cannot convert to BigDecimal.

private static boolean isNonZeroLiteral(RexNode node) {
    if (!(node instanceof RexLiteral lit) || lit.isNull()) {
        return false;
    }
    BigDecimal value = lit.getValueAs(BigDecimal.class);
    return value != null && value.signum() != 0;
Possible Issue

The radix validation checks from_base < 2 and from_base > 36 but uses i64 for the base parameters. If a user passes a base outside the i32 range (e.g., 2^40), the cast from_base as u32 at line 99 silently truncates the value, potentially accepting an invalid radix or producing incorrect results instead of raising the expected error.

if from_base < 2 {
    return exec_err!("radix {from_base} less than Character.MIN_RADIX");
}
if from_base > 36 {
    return exec_err!("radix {from_base} greater than Character.MAX_RADIX");
}
if to_base < 2 {
    return exec_err!("radix {to_base} less than Character.MIN_RADIX");
}
if to_base > 36 {
    return exec_err!("radix {to_base} greater than Character.MAX_RADIX");
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

PR Code Suggestions ✨

Latest suggestions up to 85764bb
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Handle non-numeric operands correctly

When either operand has a non-numeric type (rank < 0), the code proceeds to emit MOD
without type widening. This may fail substrait signature resolution if the operands
are non-numeric types that Calcite accepted but substrait's modulus doesn't support.
Consider returning original unchanged or adding validation.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/ModAdapter.java [60-67]

 if (lhsRank < 0 || rhsRank < 0) {
-    return wrapWithNullOnZero(
-        rexBuilder.makeCall(original.getType(), SqlStdOperatorTable.MOD, List.of(lhs, rhs)),
-        rhs,
-        rexBuilder,
-        typeFactory
-    );
+    return wrapWithNullOnZero(original, rhs, rexBuilder, typeFactory);
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that non-numeric operands (rank < 0) should be handled more carefully. Returning original unchanged is safer than reconstructing the call, as it preserves the original operator and type information. This prevents potential substrait signature resolution failures.

Low
Remove unreachable operator check

The condition checks if the operator is already DIVIDE, but the adapter is
registered for ScalarFunction.DIVIDE which should always map to this operator. The
fallback branch that reconstructs the call appears unreachable. Remove the
conditional and use original directly, or clarify when the operator would differ.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DivideAdapter.java [50-52]

-RexNode divideExpr = original.getOperator() == SqlStdOperatorTable.DIVIDE
-    ? original
-    : rexBuilder.makeCall(original.getType(), SqlStdOperatorTable.DIVIDE, List.of(lhs, rhs));
+RexNode divideExpr = original;
Suggestion importance[1-10]: 4

__

Why: The condition is defensive but likely unnecessary since DivideAdapter is registered for ScalarFunction.DIVIDE. However, the fallback ensures correctness if the operator differs, so removing it could introduce subtle bugs. The suggestion improves code clarity but has limited impact.

Low
Replace unwrap with expect

The unwrap() call on char::from_digit can panic if d >= base, though this should be
mathematically impossible. For defensive programming in production code, replace
with expect() and a descriptive message, or handle the None case explicitly to avoid
potential panics.

sandbox/plugins/analytics-backend-datafusion/rust/src/udf/conv.rs [150-155]

 let mut abs = (value as i128).unsigned_abs();
 let mut digits = String::new();
 while abs > 0 {
     let d = (abs % base as u128) as u32;
-    digits.push(char::from_digit(d, base).unwrap());
+    digits.push(char::from_digit(d, base).expect("digit must be valid for base"));
     abs /= base as u128;
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion improves defensive programming by replacing unwrap() with expect() and a descriptive message. While the panic should be mathematically impossible, the change provides better error diagnostics if the invariant is violated. The impact is minor since the condition should never occur.

Low

Previous suggestions

Suggestions up to commit ebacc88
CategorySuggestion                                                                                                                                    Impact
General
Use safe integer conversion

Casting from_base (i64) to u32 can silently truncate values outside the u32 range.
Although the caller validates the base is between 2 and 36, the cast should be
explicit about the expected range to prevent potential issues if validation logic
changes. Use a checked conversion or document the assumption.

sandbox/plugins/analytics-backend-datafusion/rust/src/udf/conv.rs [138]

-let parsed = i64::from_str_radix(s.trim(), from_base as u32).ok()?;
+let parsed = i64::from_str_radix(s.trim(), from_base.try_into().ok()?).ok()?;
Suggestion importance[1-10]: 3

__

Why: While the validation at lines 95-106 ensures bases are in [2,36], using try_into() would make the code more defensive against future changes. However, the impact is minimal since the validation is explicit and the cast is safe within the validated range.

Low

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for ebacc88: SUCCESS

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.48%. Comparing base (8f2d058) to head (85764bb).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #21743      +/-   ##
============================================
+ Coverage     73.46%   73.48%   +0.02%     
- Complexity    74825    74867      +42     
============================================
  Files          5997     5996       -1     
  Lines        339688   339679       -9     
  Branches      48961    48961              
============================================
+ Hits         249558   249630      +72     
+ Misses        70272    70248      -24     
+ Partials      19858    19801      -57     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ger operands

  - Add fp32/fp64 overloads to substrait `modulus` via opensearch_arithmetic_overloads.yaml
    so isthmus can serialize MOD(fp, fp); ModAdapter widens mixed-type operands and wraps
    in CASE for null-on-zero (isthmus can't thread on_domain_error=NULL).
  - DivideAdapter wraps in CASE for null-on-zero (same isthmus option gap); short-circuits
    on non-zero literal divisors.
  - Wire conv(n, fromBase, toBase) end-to-end: ScalarFunction.CONVERT enum, ConvAdapter,
    function mapping, opensearch_scalar_functions.yaml entry, Rust UDF in udf/conv.rs.
    Mirrors Long.toString(Long.parseLong(...)) — lowercase output, NumberFormatException
    message on invalid radix.
  - Add NumericToDoubleAdapter entries for SIN/COS/TAN/ASIN/ACOS/ATAN/ATAN2/COT so
    isthmus binds the call when PPL passes an integer field (substrait declares fp only).
  - OpenSearchFilterRule: pass non-deterministic predicates (e.g. RAND() > 0) through to
    child viable backends instead of throwing on zero field references.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 85764bb

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 85764bb: SUCCESS

Map.entry(ScalarFunction.ACOS, new NumericToDoubleAdapter(SqlStdOperatorTable.ACOS)),
Map.entry(ScalarFunction.ASIN, new NumericToDoubleAdapter(SqlStdOperatorTable.ASIN)),
Map.entry(ScalarFunction.ATAN, new NumericToDoubleAdapter(SqlStdOperatorTable.ATAN)),
Map.entry(ScalarFunction.ATAN2, new NumericToDoubleAdapter(SqlStdOperatorTable.ATAN2)),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need similar adapters for RADIANS and DEGREES
I think we need similar fix for them as well. We need to convert the arguments in those functions to Double as well.

please add

Map.entry(ScalarFunction.RADIANS, new NumericToDoubleAdapter(SqlStdOperatorTable.RADIANS)),
Map.entry(ScalarFunction.DEGREES, new NumericToDoubleAdapter(SqlStdOperatorTable.DEGREES)),

@vinaykpud vinaykpud self-requested a review May 21, 2026 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants