forked from apache/datafusion
-
Notifications
You must be signed in to change notification settings - Fork 1
Upgrade datafusion 42 -> 43 #347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
avantgardnerio
wants to merge
46
commits into
branch-43
Choose a base branch
from
bg_43
base: branch-43
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
a6d0991 to
1002af6
Compare
1002af6 to
5d265e2
Compare
* Add pool_size method to MemoryPool * Fix * Fmt Co-authored-by: Daniël Heres <[email protected]>
* ignore writer shutdown error * cargo check
* Try and fix swap_hash_join * Only swap projections when join does not have projections * just backport upstream fix * remove println
* Support Duration in min/max agg functions * Attempt to fix build * Attempt to fix build - Fix chrono version * Revert "Attempt to fix build - Fix chrono version" This reverts commit fd76fe6. * Revert "Attempt to fix build" This reverts commit 9114b86. --------- Co-authored-by: svranesevic <[email protected]>
* Drop rust-toolchain * Fix panics in array_union * Fix the chrono
…4496) v46 * fix: rewrite fetch, skip of the Limit node in correct order * style: fix clippy
* Support aliases in ConstEvaluator (apache#14734) Not sure why they are not supported. It seems that if we're not careful, some transformations can introduce aliases nested inside other expressions. * Format Cargo.toml
…che#14888) v46 Whenever we use `recompute_schema` or `with_exprs_and_inputs`, this ensures that we obtain the same schema.
Co-authored-by: svranesevic <[email protected]>
* fix case_column_or_null with nullable when conditions * improve sqllogictests for case_column_or_null --------- Co-authored-by: zhangli20 <[email protected]>
* fix: Limits are not applied correctly * Add easy fix * Add fix * Add slt testing * Address comments
* Add fix for segfault in ByteGroupValueBuilder * spelling
* add fetch info to CoalescePartitionsExec * use Statistics with_fetch API on CoalescePartitionsExec * check limit_reached only if fetch is assigned Co-authored-by: mertak-synnada <[email protected]>
… v48 * add fetch to CoalescePartitionsExecNode * gen proto code * Add test * fix * fix build * Fix test build * remove comments Co-authored-by: 张林伟 <[email protected]>
* Add JoinContext with JoinLeftData to TaskContext in HashJoinExec * Expose random state as const * re-export ahash::RandomState * JoinContext default impl * Add debug log when setting join left data
… v44 * simple support vectorized append. * fix tests. * some logs. * add `append_n` in `MaybeNullBufferBuilder`. * impl basic append_batch * fix equal to. * define `GroupIndexContext`. * define the structs useful in vectorizing. * re-define some structs for vectorized operations. * impl some vectorized logics. * impl chekcing hashmap stage. * fix compile. * tmp * define and impl `vectorized_compare`. * fix compile. * impl `vectorized_equal_to`. * impl `vectorized_append`. * finish the basic vectorized ops logic. * impl `take_n`. * fix `renaming clear` and `groups fill`. * fix death loop due to rehashing. * fix vectorized append. * add counter. * use extend rather than resize. * remove dbg!. * remove reserve. * refactor the codes to make simpler and more performant. * clear `scalarized_indices` in `intern` to avoid some corner case. * fix `scalarized_equal_to`. * fallback to total scalarized `GroupValuesColumn` in streaming aggregation. * add unit test for `VectorizedGroupValuesColumn`. * add unit test for emitting first n in `VectorizedGroupValuesColumn`. * sort out tests codes in for group columns and add vectorized tests for primitives. * add vectorized test for byte builder. * add vectorized test for byte view builder. * add test for the all nulls or not nulls branches in vectorized. * fix clippy. * fix fmt. * fix compile in rust 1.79. * improve comments. * fix doc. * add more comments to explain the really complex vectorized intern process. * add comments to explain why we still need origin `GroupValuesColumn`. * remove some stale comments. * fix clippy. * add comments for `vectorized_equal_to` and `vectorized_append`. * fix clippy. * use zip to simplify codes. * use izip to simplify codes. * Update datafusion/physical-plan/src/aggregates/group_values/group_column.rs Co-authored-by: Jay Zhan <[email protected]> * first_n attempt Signed-off-by: jayzhan211 <[email protected]> * add test Signed-off-by: jayzhan211 <[email protected]> * improve hashtable modifying in emit first n test. * add `emit_group_index_list_buffer` to avoid allocating new `Vec` to store the remaining gourp indices. * make comments in VectorizedGroupValuesColumn::intern simpler and clearer. * define `VectorizedOperationBuffers` to hold buffers used in vectorized operations to make code clearer. * unify `VectorizedGroupValuesColumn` and `GroupValuesColumn`. * fix fmt. * fix comments. * fix clippy. --------- Signed-off-by: jayzhan211 <[email protected]> Co-authored-by: Jay Zhan <[email protected]> (cherry picked from commit 345117b)
* Fix record batch memory size double counting (cherry picked from commit 172cf8d)
…in GroupedHashAggregateStream (apache#13995) (#302) v45 * Refactor spill handling in GroupedHashAggregateStream to use partial aggregate schema * Implement aggregate functions with spill handling in tests * Add tests for aggregate functions with and without spill handling * Move test related imports into mod test * Rename spill pool test functions for clarity and consistency * Refactor aggregate function imports to use fully qualified paths * Remove outdated comments regarding input batch schema for spilling in GroupedHashAggregateStream * Update aggregate test to use AVG instead of MAX * assert spill count * Refactor partial aggregate schema creation to use create_schema function * Refactor partial aggregation schema creation and remove redundant function * Remove unused import of Schema from arrow::datatypes in row_hash.rs * move spill pool testing for aggregate functions to physical-plan/src/aggregates * Use Arc::clone for schema references in aggregate functions (cherry picked from commit 81b50c4) Co-authored-by: kosiew <[email protected]>
* converted LexOrderingRef to &LexOrdering * using LexOrdering::from_ref fn instead of directly cloning it * using as_ref instread of & * using as_ref * removed commented code * updated cargo lock * updated LexRequirementRef to &LexRequirement * fixed clippy issues * fixed taplo error for cargo.toml in physical-expr-common * removed commented code * fixed clippy errors * fixed clippy error * fixes * removed LexOrdering::from_ref instead using clone and created LexOrdering::empty() fn * Update mod.rs --------- Co-authored-by: Berkay Şahin <[email protected]> Co-authored-by: berkaysynnada <[email protected]> (cherry picked from commit 9005585)
…pache#13201) v44 * refactored nth_value * continue * test * proto and rustlint * fix datatype * cont * cont * apply jcsherins early validation * docs * doc * Apply suggestions from code review Co-authored-by: Sherin Jacob <[email protected]> * passes lint but does not have tests * continue * Update roundtrip_physical_plan.rs * udwf, not udaf * fix bounded but not fixed roundtrip * added * Update datafusion/sqllogictest/test_files/errors.slt Co-authored-by: Sherin Jacob <[email protected]> --------- Co-authored-by: Sherin Jacob <[email protected]> Co-authored-by: berkaysynnada <[email protected]> Co-authored-by: Andrew Lamb <[email protected]> (cherry picked from commit 54ab128)
* Remove BuiltInWindowFunction * fix docs * Fix typo (cherry picked from commit 75a27a8)
…328) v44 * Adds roundtrip physical plan test * Adds enum for udwf to `WindowFunction` * initial fix for serializing udwf * Revives deleted test * Adds codec methods for physical plan * Rewrite error message * Minor: rename binding + formatting fixes * Extends `PhysicalExtensionCodec` for udwf * Minor: formatting * Restricts visibility to tests (cherry picked from commit d840e98)
(cherry picked from commit fda500a)
… keys from window and aggregate operators (#355) (apache#17757) v51
joroKr21
reviewed
Oct 7, 2025
Comment on lines
-4237
to
-4240
| 03)----Limit: skip=0, fetch=2 | ||
| 04)------TableScan: t0 projection=[c1, c2], fetch=2 | ||
| 05)----Limit: skip=0, fetch=2 | ||
| 06)------TableScan: t1 projection=[c1, c2, c3], fetch=2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's wrong with the previous version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, nvm - I read through the related issues in Datafusion 👍
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?