fix: Use shortest scientific notation for cast(real|double as varchar) #12574

ccat3z · 2025-03-07T08:04:00Z

Use shortest scientific notation for cast(real|double as varchar). Previously, the conversion to scientific notation used a fixed precision which led some results incompatible with IEEE754. Here is some example.

5.957E-4 -> 5.9570000000000001E-4
1.0E-6 -> 9.9999999999999995E-7
-7.639E-4 -> 7.6389999999999997E-4

Change to use dragonbox in fmt library instead of fmt::format("{:.16E}"). It provides shortest digits and better performance. I added a benchmark to compare the performance of several scientific notation conversions. DoubleToFmtExpPrecision is original impl and DoubleToScientificNotation is new impl.

============================================================================
[...]type/tests/FloatingPointBenchmark.cpp     relative  time/iter   iters/s
============================================================================
FloatToScientificNotation                                  99.59ms     10.04
FloatToFmt                                      84.754%   117.51ms      8.51
FloatToFmtExp                                   60.285%   165.20ms      6.05
FloatToFmtExpPrecision                          58.467%   170.34ms      5.87
DoubleToScientificNotation                                145.36ms      6.88
DoubleToFmt                                     101.58%   143.10ms      6.99
DoubleToFmtExp                                  84.720%   171.58ms      5.83
DoubleToFmtExpPrecision                         77.518%   187.52ms      5.33

Also added a fuzzer test to compare the results between java and velox. Due to https://bugs.openjdk.org/browse/JDK-4511638, java <= 18 will produce incorrect or longer decimal. The fuzzer ignored differences caused by this bug.

netlify · 2025-03-07T08:04:19Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`61ed3e8`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/67cffbd2e6cfab00085da57d

Yuhta · 2025-03-07T15:57:42Z

@kgpai Would this cause behavior mismatch with Presto?

kgpai · 2025-03-07T17:38:47Z

@Yuhta There is always noise related to doubles - whether this results in more noise , Its hard to say but probably yes. Might be good to write some test and compare using presto fuzzer and see if it results in more or less matches.

ccat3z · 2025-03-10T12:25:17Z

@kgpai I added a fuzzer test to compare results between java and velox. Both presto and spark use Double.toString or Float.toString to cast floating number to varchar. Due to https://bugs.openjdk.org/browse/JDK-4511638, java <= 18 will produce incorrect or longer decimal. The fuzzer ignored differences caused by this bug.

Test on java 8 and 18:

[ RUN      ] FloatToString.double
Warning: 6.117152267E18 != 6.1171522670000005E18
Warning: 6.199534181665E17 != 6.1995341816649997E17
Warning: 4.90577135932E18 != 4.9057713593200005E18
Warning: 5.854571232282407E18 != 5.8545712322824069E18
Warning: -2.584509E20 != -2.5845089999999998E20
Warning: -3.818241343600966E17 != -3.8182413436009658E17
Warning: 6.92112981002196E17 != 6.9211298100219597E17
Warning: 2.983699071438001E17 != 2.9836990714380013E17
Warning: -5.029338E20 != -5.0293379999999997E20
Warning: 2.44379246333533E17 != 2.44379246333532992E17
Warning: -8.013810855174E17 != -8.0138108551740006E17
Warning: 5.704695365E18 != 5.7046953649999995E18
Warning: -5.94662655662E18 != -5.9466265566200003E18
Warning: 2.2835713357246198E18 != 2.28357133572461978E18
Warning: -4.4563355E19 != -4.4563355000000004E19
Warning: 2.5283202E20 != 2.5283201999999998E20
Warning: -7.707719043100662E16 != -7.7077190431006624E16
[       OK ] FloatToString.double (201 ms)

Test on java 19:

[ RUN      ] FloatToString.double
[       OK ] FloatToString.double (190 ms)

assignUser · 2025-03-10T13:06:10Z

velox/type/fuzzer/CMakeLists.txt

+    gflags::gflags
+    glog::glog)


I don't see gflags and glog being used directly, so these can be removed.

assignUser · 2025-03-10T13:07:53Z

velox/type/tests/CMakeLists.txt

+    GTest::gtest
+    GTest::gtest_main
+    gflags::gflags
+    glog::glog)


ccat3z · 2025-03-11T10:45:57Z

@assignUser I've updated the code, could you help to review it?

ccat3z · 2025-03-19T03:07:09Z

@majetideepak @kgpai Could you help review it?

assignUser

CMake 👍

stale · 2025-06-26T02:09:28Z

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

Use shortest scientific notation for cast(real|double as varchar)

3151c0d

ccat3z requested review from assignUser and majetideepak as code owners March 7, 2025 08:04

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 7, 2025

ccat3z changed the title ~~Use shortest scientific notation for cast(real|double as varchar)~~ fix: Use shortest scientific notation for cast(real|double as varchar) Mar 7, 2025

code format

4877876

Yuhta requested a review from kgpai March 7, 2025 15:57

ccat3z added 3 commits March 10, 2025 20:11

Remove failure case due to JDK-4511638

7c91fcc

Add fuzzer

86cb522

code format

2116bf7

ccat3z force-pushed the shortest-sci-double branch from 29847c9 to 2116bf7 Compare March 10, 2025 12:23

assignUser reviewed Mar 10, 2025

View reviewed changes

ccat3z added 2 commits March 11, 2025 14:50

remove boost::process

13583f8

Remove indirect depends

e7e701e

ccat3z mentioned this pull request Mar 11, 2025

[VL] Result mismatch in cast(float|double as string) apache/incubator-gluten#8959

Open

fix add_test

61ed3e8

assignUser approved these changes Mar 28, 2025

View reviewed changes

stale bot added the stale label Jun 26, 2025

stale bot closed this Jul 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Use shortest scientific notation for cast(real|double as varchar) #12574

fix: Use shortest scientific notation for cast(real|double as varchar) #12574

Uh oh!

ccat3z commented Mar 7, 2025 •

edited

Loading

Uh oh!

netlify bot commented Mar 7, 2025 •

edited

Loading

Uh oh!

Yuhta commented Mar 7, 2025

Uh oh!

kgpai commented Mar 7, 2025

Uh oh!

ccat3z commented Mar 10, 2025 •

edited

Loading

Uh oh!

assignUser Mar 10, 2025

Uh oh!

assignUser Mar 10, 2025

Uh oh!

ccat3z commented Mar 11, 2025

Uh oh!

ccat3z commented Mar 19, 2025

Uh oh!

assignUser left a comment

Uh oh!

stale bot commented Jun 26, 2025

Uh oh!

Uh oh!

fix: Use shortest scientific notation for cast(real|double as varchar) #12574

fix: Use shortest scientific notation for cast(real|double as varchar) #12574

Uh oh!

Conversation

ccat3z commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for meta-velox canceled.

Uh oh!

Yuhta commented Mar 7, 2025

Uh oh!

kgpai commented Mar 7, 2025

Uh oh!

ccat3z commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

assignUser Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

assignUser Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

ccat3z commented Mar 11, 2025

Uh oh!

ccat3z commented Mar 19, 2025

Uh oh!

assignUser left a comment

Choose a reason for hiding this comment

Uh oh!

stale bot commented Jun 26, 2025

Uh oh!

Uh oh!

ccat3z commented Mar 7, 2025 •

edited

Loading

netlify bot commented Mar 7, 2025 •

edited

Loading

ccat3z commented Mar 10, 2025 •

edited

Loading