Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: enable full decimal to decimal support #1385

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

himadripal
Copy link
Contributor

@himadripal himadripal commented Feb 11, 2025

Completes #375

  • enable decimal to decimal
  • remove hard coded castoptions to pass to native execution
  • fixed castTest to match arrow invalid argument error with spark's Number out of range error.

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

How are these changes tested?

use a regex to match arrow invalid argument error.
@@ -872,6 +872,13 @@ fn cast_array(
let array = array_with_timezone(array, cast_options.timezone.clone(), Some(to_type))?;
let from_type = array.data_type().clone();

let native_cast_options: CastOptions = CastOptions {
safe: !matches!(cast_options.eval_mode, EvalMode::Ansi), // take safe mode from cast_options passed
format_options: FormatOptions::new()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one can use a default value defined for FormatOptions here

Copy link
Contributor Author

@himadripal himadripal Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the default CAST_OPTIONS which is replaced by this native_cast_options had two these set to

static TIMESTAMP_FORMAT: Option<&str> = Some("%Y-%m-%d %H:%M:%S%.f");
           
 timestamp_format: TIMESTAMP_FORMAT,
 timestamp_tz_format: TIMESTAMP_FORMAT,

If we change it to default, I checked FormatOptions::default() implementation set these

            timestamp_format: None,
            timestamp_tz_format: None,

Hence kept it as it is defined inside default CAST_OPTIONS for comet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. (The format options are used only to make the cast of timestamp to string compatible with Spark, and are not needed anywhere else) but I guess it is a good idea to be consistent everywhere.

@codecov-commenter
Copy link

codecov-commenter commented Feb 11, 2025

Codecov Report

Attention: Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 39.32%. Comparing base (f09f8af) to head (4df3e68).
Report is 37 commits behind head on main.

Files with missing lines Patch % Lines
...scala/org/apache/comet/expressions/CometCast.scala 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##               main    #1385       +/-   ##
=============================================
- Coverage     56.12%   39.32%   -16.81%     
- Complexity      976     2085     +1109     
=============================================
  Files           119      265      +146     
  Lines         11743    61128    +49385     
  Branches       2251    12960    +10709     
=============================================
+ Hits           6591    24036    +17445     
- Misses         4012    32587    +28575     
- Partials       1140     4505     +3365     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kazuyukitanimura kazuyukitanimura changed the title enable full decimal to decimal support fix: enable full decimal to decimal support Feb 14, 2025
Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, thank you @himadripal just minor comments

|-|-|-|
| boolean | byte | |
| boolean | short | |
|-|---------|-|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Just checking whether this change is due to the changes in the producing method
Should be automatically created by make release

Comment on lines +1132 to 1137
// for comet decimal conversion throws ArrowError(string) from arrow - across spark versions the message dont match.
if (sparkMessage.contains("cannot be represented as")) {
assert(
sparkException.getMessage
.replace(".WITH_SUGGESTION] ", "]")
.startsWith(cometMessage))
} else if (CometSparkSessionExtensions.isSpark34Plus) {
// for Spark 3.4 we expect to reproduce the error message exactly
assert(cometMessage == sparkMessage)
cometMessage.contains("cannot be represented as") || cometMessage.contains(
"too large to store"))
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are message modifications below per spark version
Would you mind update them instead of creating another if branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants