Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add spark_signed_integer_remainder native function for compatibility with spark #1416

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

wForget
Copy link
Member

@wForget wForget commented Feb 18, 2025

Which issue does this PR close?

Closes #1412.

Rationale for this change

Fix overflow happened on: minimum integer value % -1

What changes are included in this PR?

add a spark_signed_integer_remainder native function for compatibility with spark

How are these changes tested?

added unit test and bench

@wForget wForget marked this pull request as draft February 18, 2025 12:09
@kazuyukitanimura
Copy link
Contributor

wondering if we can fix in arrow/datafusion

It looks addition is working, I wonder what would be the difference between + and %

checkSparkAnswerAndOperator("select c1 + c2 from t1")

@wForget
Copy link
Member Author

wForget commented Feb 19, 2025

wondering if we can fix in arrow/datafusion
It looks addition is working, I wonder what would be the difference between + and %

It seems to be by design in rust: https://github.com/rust-lang/rust/blob/f44efbf9e11b1b6bba77c046d7dd150de37e0e0f/library/core/src/num/int_macros.rs#L993

@codecov-commenter
Copy link

codecov-commenter commented Feb 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 39.32%. Comparing base (f09f8af) to head (55170c0).
Report is 42 commits behind head on main.

Additional details and impacted files
@@              Coverage Diff              @@
##               main    #1416       +/-   ##
=============================================
- Coverage     56.12%   39.32%   -16.81%     
- Complexity      976     2085     +1109     
=============================================
  Files           119      265      +146     
  Lines         11743    61132    +49389     
  Branches       2251    12962    +10711     
=============================================
+ Hits           6591    24040    +17449     
- Misses         4012    32587    +28575     
- Partials       1140     4505     +3365     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wForget wForget marked this pull request as ready for review February 19, 2025 06:24
@kazuyukitanimura
Copy link
Contributor

wondering if we can fix in arrow/datafusion
It looks addition is working, I wonder what would be the difference between + and %

It seems to be by design in rust: https://github.com/rust-lang/rust/blob/f44efbf9e11b1b6bba77c046d7dd150de37e0e0f/library/core/src/num/int_macros.rs#L993

Thanks for digging into the rust source.
Perhaps I am missing something but still not clear to me...
rust -2147483648 + -1 overflows

>> let r = -2147483648 + -1;
[arithmetic_overflow] Error: this arithmetic operation will overflow
   ╭─[command:1:1]
   │
 1 │ let r = -2147483648 + -1;
   │         ────────┬───────  
   │                 ╰───────── attempt to compute `i32::MIN + -1_i32`, which would overflow
───╯

But DataFusion is allowing the overflow

$datafusion-cli                                             
DataFusion CLI v44.0.0
> create table t1(c1 int, c2 int);
0 row(s) fetched. 
Elapsed 0.016 seconds.

> insert into t1 values(-2147483648, -1);
+-------+
| count |
+-------+
| 1     |
+-------+
1 row(s) fetched. 
Elapsed 0.027 seconds.

> select c1 + c2 from t1;
+---------------+
| t1.c1 + t1.c2 |
+---------------+
| 2147483647    |
+---------------+
1 row(s) fetched. 
Elapsed 0.010 seconds.

> select c1 % c2 from t1;
Arrow error: Arithmetic overflow: Overflow happened on: -2147483648 % -1

It looks postgres also returns 0, I will open a datafusion ticket on this

@kazuyukitanimura
Copy link
Contributor

Filed apache/datafusion#14771

@wForget
Copy link
Member Author

wForget commented Feb 19, 2025

https://github.com/apache/datafusion/blob/c176533d185b76bf4728c21d3b83ca00c633614f/datafusion/physical-expr/src/expressions/binary.rs#L313

https://github.com/apache/arrow-rs/blob/d0a2301697f8a0fe5fc267bc35fd959103e6ec4e/arrow-array/src/arithmetic.rs#L166

@kazuyukitanimura Add without fail_on_overflow will call add_wrapping method, so test like:

let a = -2147483648i32;
let b = -1i32;
let c = a.wrapping_add(b);
println!("a: {}, b: {}, c: {}", a, b, c);

result:

a: -2147483648, b: -1, c: 2147483647

@wForget
Copy link
Member Author

wForget commented Feb 19, 2025

I also tested wrapping_rem and it seems to work as expected:

let a = -2147483648i32;
let b = -1i32;
let c = a.wrapping_rem(b);
println!("a: {}, b: {}, c: {}", a, b, c);

result:

a: -2147483648, b: -1, c: 0

Does this mean we can implement a similar rem_warning function in datafusion?

@parthchandra
Copy link
Contributor

The panic is because of the language specification: #1412 (comment)
The spec does not explain why but the behaviour is specifically for / and % and only for these arguments.
Imho, if the operations work correctly for all other arguments, then it seems like overkill to try to solve this specific use case.

@kazuyukitanimura
Copy link
Contributor

wrapping_rem seems promising. On top we may need to test with both ANSI and non-ANSI mode

@parthchandra
Copy link
Contributor

parthchandra commented Feb 20, 2025

wrapping_rem seems promising. On top we may need to test with both ANSI and non-ANSI mode

Yes, I agree that wrapping_rem seems like a reasonable way to address this.

@wForget wForget marked this pull request as draft February 20, 2025 05:08
@wForget
Copy link
Member Author

wForget commented Feb 20, 2025

Thanks @kazuyukitanimura @parthchandra , I will try to fix this issue in upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Overflow happened on: -2147483648 % -1
4 participants