Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Use shortest scientific notation for cast(real|double as varchar) #12574

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

ccat3z
Copy link
Contributor

@ccat3z ccat3z commented Mar 7, 2025

Use shortest scientific notation for cast(real|double as varchar). Previously, the conversion to scientific notation used a fixed precision which led some results incompatible with IEEE754. Here is some example.

5.957E-4 -> 5.9570000000000001E-4
1.0E-6 -> 9.9999999999999995E-7
-7.639E-4 -> 7.6389999999999997E-4

Change to use dragonbox in fmt library instead of fmt::format("{:.16E}"). It provides shortest digits and better performance. I added a benchmark to compare the performance of several scientific notation conversions. DoubleToFmtExpPrecision is original impl and DoubleToScientificNotation is new impl.

============================================================================
[...]type/tests/FloatingPointBenchmark.cpp     relative  time/iter   iters/s
============================================================================
FloatToScientificNotation                                  99.59ms     10.04
FloatToFmt                                      84.754%   117.51ms      8.51
FloatToFmtExp                                   60.285%   165.20ms      6.05
FloatToFmtExpPrecision                          58.467%   170.34ms      5.87
DoubleToScientificNotation                                145.36ms      6.88
DoubleToFmt                                     101.58%   143.10ms      6.99
DoubleToFmtExp                                  84.720%   171.58ms      5.83
DoubleToFmtExpPrecision                         77.518%   187.52ms      5.33

Also added a fuzzer test to compare the results between java and velox. Due to https://bugs.openjdk.org/browse/JDK-4511638, java <= 18 will produce incorrect or longer decimal. The fuzzer ignored differences caused by this bug.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 7, 2025
Copy link

netlify bot commented Mar 7, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 61ed3e8
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67cffbd2e6cfab00085da57d

@ccat3z ccat3z changed the title Use shortest scientific notation for cast(real|double as varchar) fix: Use shortest scientific notation for cast(real|double as varchar) Mar 7, 2025
@Yuhta Yuhta requested a review from kgpai March 7, 2025 15:57
@Yuhta
Copy link
Contributor

Yuhta commented Mar 7, 2025

@kgpai Would this cause behavior mismatch with Presto?

@kgpai
Copy link
Contributor

kgpai commented Mar 7, 2025

@Yuhta There is always noise related to doubles - whether this results in more noise , Its hard to say but probably yes. Might be good to write some test and compare using presto fuzzer and see if it results in more or less matches.

@ccat3z ccat3z force-pushed the shortest-sci-double branch from 29847c9 to 2116bf7 Compare March 10, 2025 12:23
@ccat3z
Copy link
Contributor Author

ccat3z commented Mar 10, 2025

@kgpai I added a fuzzer test to compare results between java and velox. Both presto and spark use Double.toString or Float.toString to cast floating number to varchar. Due to https://bugs.openjdk.org/browse/JDK-4511638, java <= 18 will produce incorrect or longer decimal. The fuzzer ignored differences caused by this bug.

Test on java 8 and 18:

[ RUN      ] FloatToString.double
Warning: 6.117152267E18 != 6.1171522670000005E18
Warning: 6.199534181665E17 != 6.1995341816649997E17
Warning: 4.90577135932E18 != 4.9057713593200005E18
Warning: 5.854571232282407E18 != 5.8545712322824069E18
Warning: -2.584509E20 != -2.5845089999999998E20
Warning: -3.818241343600966E17 != -3.8182413436009658E17
Warning: 6.92112981002196E17 != 6.9211298100219597E17
Warning: 2.983699071438001E17 != 2.9836990714380013E17
Warning: -5.029338E20 != -5.0293379999999997E20
Warning: 2.44379246333533E17 != 2.44379246333532992E17
Warning: -8.013810855174E17 != -8.0138108551740006E17
Warning: 5.704695365E18 != 5.7046953649999995E18
Warning: -5.94662655662E18 != -5.9466265566200003E18
Warning: 2.2835713357246198E18 != 2.28357133572461978E18
Warning: -4.4563355E19 != -4.4563355000000004E19
Warning: 2.5283202E20 != 2.5283201999999998E20
Warning: -7.707719043100662E16 != -7.7077190431006624E16
[       OK ] FloatToString.double (201 ms)

Test on java 19:

[ RUN      ] FloatToString.double
[       OK ] FloatToString.double (190 ms)

Comment on lines 35 to 36
gflags::gflags
glog::glog)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see gflags and glog being used directly, so these can be removed.

Comment on lines 133 to 136
GTest::gtest
GTest::gtest_main
gflags::gflags
glog::glog)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

@ccat3z
Copy link
Contributor Author

ccat3z commented Mar 11, 2025

@assignUser I've updated the code, could you help to review it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants