0.3.0
Pre-release
Pre-release
DataFusion Comet 0.3.0 Changelog
This release consists of 57 commits from 12 contributors. See credits at the end of this changelog for more information.
Fixed bugs:
- fix: Support type coercion for ScalarUDFs #865 (Kimahriman)
- fix: CometTakeOrderedAndProjectExec native scan node should use child operator's output #896 (viirya)
- fix: Fix various memory leaks problems #890 (Kontinuation)
- fix: Add output to Comet operators equal and hashCode #902 (viirya)
- fix: Fallback to Spark when cannot resolve AttributeReference #926 (viirya)
- fix: Fix memory bloat caused by holding too many unclosed
ArrowReaderIterator
s #929 (Kontinuation) - fix: Normalize NaN and zeros for floating number comparison #953 (viirya)
- fix: window function range offset should be long instead of int #733 (huaxingao)
- fix: CometScanExec on Spark 3.5.2 #915 (Kimahriman)
- fix: div and rem by negative zero #960 (kazuyukitanimura)
Performance related:
- perf: Optimize CometSparkToColumnar for columnar input #892 (mbutrovich)
- perf: Fall back to Spark if query uses DPP with v1 data sources #897 (andygrove)
- perf: Report accurate total time for scans #916 (andygrove)
- perf: Add metric for time spent casting in native scan #919 (andygrove)
- perf: Add criterion benchmark for aggregate expressions #948 (andygrove)
- perf: Add metric for time spent in CometSparkToColumnarExec #931 (mbutrovich)
- perf: Optimize decimal precision check in decimal aggregates (sum and avg) #952 (andygrove)
Implemented enhancements:
- feat: Add config option to enable converting CSV to columnar #871 (andygrove)
- feat: Implement basic version of string to float/double/decimal #870 (andygrove)
- feat: Implement to_json for subset of types #805 (andygrove)
- feat: Add ShuffleQueryStageExec to direct child node for CometBroadcastExchangeExec #880 (viirya)
- feat: Support sort merge join with a join condition #553 (viirya)
- feat: Array element extraction #899 (Kimahriman)
- feat: date_add and date_sub functions #910 (mbutrovich)
- feat: implement scripts for binary release build #932 (parthchandra)
- feat: Publish artifacts to maven #946 (parthchandra)
Documentation updates:
- doc: Documenting Helm chart for Comet Kube execution #874 (comphead)
- doc: Update native code path in development #921 (viirya)
- docs: Add more detailed architecture documentation #922 (andygrove)
Other:
- chore: Update installation.md #869 (haoxins)
- chore: Update versions to 0.3.0 / 0.3.0-SNAPSHOT #868 (andygrove)
- chore: Add documentation on running benchmarks with Microk8s #848 (andygrove)
- chore: Improve CometExchange metrics #873 (viirya)
- chore: Add spilling metrics of SortMergeJoin #878 (viirya)
- chore: change shuffle mode default from jvm to auto #877 (andygrove)
- chore: Enable shuffle by default #881 (andygrove)
- chore: print Comet native version to logs after Comet is initialized #900 (SemyonSinchenko)
- chore: Revise batch pull approach to more follow C Data interface semantics #893 (viirya)
- chore: Close dictionary provider when iterator is closed #904 (andygrove)
- chore: Remove unused function #906 (viirya)
- chore: Upgrade to latest DataFusion revision #909 (andygrove)
- build: fix build #917 (andygrove)
- chore: Revise array import to more follow C Data Interface semantics #905 (viirya)
- chore: Address reviews #920 (viirya)
- chore: Enable Comet shuffle for Spark core-1 test #924 (viirya)
- build: Add maven-compiler-plugin for java cross-build #911 (viirya)
- build: Disable upload-test-reports for macos-13 runner #933 (viirya)
- minor: cast timestamp test #468 #923 (himadripal)
- build: Set Java version arg for scala-maven-plugin #934 (viirya)
- chore: Remove redundant RowToColumnar from CometShuffleExchangeExec for columnar shuffle #944 (viirya)
- minor: rename CometMetricNode
add
toset
and update documentation #940 (andygrove) - chore: Add config for enabling SMJ with join condition #937 (andygrove)
- chore: Change maven group ID to
org.apache.datafusion
#941 (andygrove) - chore: Upgrade to DataFusion 42.0.0 #945 (andygrove)
- build: Fix regression in jar packaging #950 (andygrove)
- chore: Show reason for falling back to Spark when SMJ with join condition is not enabled #956 (andygrove)
- chore: clarify tarball installation #959 (comphead)
Credits
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
22 Andy Grove
18 Liang-Chi Hsieh
3 Adam Binford
3 Matt Butrovich
2 Kristin Cowalcijk
2 Oleks V
2 Parth Chandra
1 Himadri Pal
1 Huaxin Gao
1 KAZUYUKI TANIMURA
1 Semyon
1 Xin Hao
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.