-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Use shortest scientific notation for cast(real|double as varchar) #12574
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@kgpai Would this cause behavior mismatch with Presto? |
@Yuhta There is always noise related to doubles - whether this results in more noise , Its hard to say but probably yes. Might be good to write some test and compare using presto fuzzer and see if it results in more or less matches. |
29847c9
to
2116bf7
Compare
@kgpai I added a fuzzer test to compare results between java and velox. Both presto and spark use Double.toString or Float.toString to cast floating number to varchar. Due to https://bugs.openjdk.org/browse/JDK-4511638, java <= 18 will produce incorrect or longer decimal. The fuzzer ignored differences caused by this bug. Test on java 8 and 18:
Test on java 19:
|
velox/type/fuzzer/CMakeLists.txt
Outdated
gflags::gflags | ||
glog::glog) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see gflags and glog being used directly, so these can be removed.
velox/type/tests/CMakeLists.txt
Outdated
GTest::gtest | ||
GTest::gtest_main | ||
gflags::gflags | ||
glog::glog) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here.
@assignUser I've updated the code, could you help to review it? |
Use shortest scientific notation for cast(real|double as varchar). Previously, the conversion to scientific notation used a fixed precision which led some results incompatible with IEEE754. Here is some example.
Change to use
dragonbox
in fmt library instead offmt::format("{:.16E}")
. It provides shortest digits and better performance. I added a benchmark to compare the performance of several scientific notation conversions. DoubleToFmtExpPrecision is original impl and DoubleToScientificNotation is new impl.Also added a fuzzer test to compare the results between java and velox. Due to https://bugs.openjdk.org/browse/JDK-4511638, java <= 18 will produce incorrect or longer decimal. The fuzzer ignored differences caused by this bug.