Skip to content

0.3.0

Pre-release
Pre-release
Compare
Choose a tag to compare
@andygrove andygrove released this 28 Sep 14:55
· 52 commits to main since this release

DataFusion Comet 0.3.0 Changelog

This release consists of 57 commits from 12 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: Support type coercion for ScalarUDFs #865 (Kimahriman)
  • fix: CometTakeOrderedAndProjectExec native scan node should use child operator's output #896 (viirya)
  • fix: Fix various memory leaks problems #890 (Kontinuation)
  • fix: Add output to Comet operators equal and hashCode #902 (viirya)
  • fix: Fallback to Spark when cannot resolve AttributeReference #926 (viirya)
  • fix: Fix memory bloat caused by holding too many unclosed ArrowReaderIterators #929 (Kontinuation)
  • fix: Normalize NaN and zeros for floating number comparison #953 (viirya)
  • fix: window function range offset should be long instead of int #733 (huaxingao)
  • fix: CometScanExec on Spark 3.5.2 #915 (Kimahriman)
  • fix: div and rem by negative zero #960 (kazuyukitanimura)

Performance related:

  • perf: Optimize CometSparkToColumnar for columnar input #892 (mbutrovich)
  • perf: Fall back to Spark if query uses DPP with v1 data sources #897 (andygrove)
  • perf: Report accurate total time for scans #916 (andygrove)
  • perf: Add metric for time spent casting in native scan #919 (andygrove)
  • perf: Add criterion benchmark for aggregate expressions #948 (andygrove)
  • perf: Add metric for time spent in CometSparkToColumnarExec #931 (mbutrovich)
  • perf: Optimize decimal precision check in decimal aggregates (sum and avg) #952 (andygrove)

Implemented enhancements:

  • feat: Add config option to enable converting CSV to columnar #871 (andygrove)
  • feat: Implement basic version of string to float/double/decimal #870 (andygrove)
  • feat: Implement to_json for subset of types #805 (andygrove)
  • feat: Add ShuffleQueryStageExec to direct child node for CometBroadcastExchangeExec #880 (viirya)
  • feat: Support sort merge join with a join condition #553 (viirya)
  • feat: Array element extraction #899 (Kimahriman)
  • feat: date_add and date_sub functions #910 (mbutrovich)
  • feat: implement scripts for binary release build #932 (parthchandra)
  • feat: Publish artifacts to maven #946 (parthchandra)

Documentation updates:

  • doc: Documenting Helm chart for Comet Kube execution #874 (comphead)
  • doc: Update native code path in development #921 (viirya)
  • docs: Add more detailed architecture documentation #922 (andygrove)

Other:

  • chore: Update installation.md #869 (haoxins)
  • chore: Update versions to 0.3.0 / 0.3.0-SNAPSHOT #868 (andygrove)
  • chore: Add documentation on running benchmarks with Microk8s #848 (andygrove)
  • chore: Improve CometExchange metrics #873 (viirya)
  • chore: Add spilling metrics of SortMergeJoin #878 (viirya)
  • chore: change shuffle mode default from jvm to auto #877 (andygrove)
  • chore: Enable shuffle by default #881 (andygrove)
  • chore: print Comet native version to logs after Comet is initialized #900 (SemyonSinchenko)
  • chore: Revise batch pull approach to more follow C Data interface semantics #893 (viirya)
  • chore: Close dictionary provider when iterator is closed #904 (andygrove)
  • chore: Remove unused function #906 (viirya)
  • chore: Upgrade to latest DataFusion revision #909 (andygrove)
  • build: fix build #917 (andygrove)
  • chore: Revise array import to more follow C Data Interface semantics #905 (viirya)
  • chore: Address reviews #920 (viirya)
  • chore: Enable Comet shuffle for Spark core-1 test #924 (viirya)
  • build: Add maven-compiler-plugin for java cross-build #911 (viirya)
  • build: Disable upload-test-reports for macos-13 runner #933 (viirya)
  • minor: cast timestamp test #468 #923 (himadripal)
  • build: Set Java version arg for scala-maven-plugin #934 (viirya)
  • chore: Remove redundant RowToColumnar from CometShuffleExchangeExec for columnar shuffle #944 (viirya)
  • minor: rename CometMetricNode add to set and update documentation #940 (andygrove)
  • chore: Add config for enabling SMJ with join condition #937 (andygrove)
  • chore: Change maven group ID to org.apache.datafusion #941 (andygrove)
  • chore: Upgrade to DataFusion 42.0.0 #945 (andygrove)
  • build: Fix regression in jar packaging #950 (andygrove)
  • chore: Show reason for falling back to Spark when SMJ with join condition is not enabled #956 (andygrove)
  • chore: clarify tarball installation #959 (comphead)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    22	Andy Grove
    18	Liang-Chi Hsieh
     3	Adam Binford
     3	Matt Butrovich
     2	Kristin Cowalcijk
     2	Oleks V
     2	Parth Chandra
     1	Himadri Pal
     1	Huaxin Gao
     1	KAZUYUKI TANIMURA
     1	Semyon
     1	Xin Hao

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.