Skip to content

Conversation

@avantgardnerio
Copy link

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

thinkharderdev and others added 2 commits October 1, 2025 10:53
* Try and fix swap_hash_join

* Only swap projections when join does not have projections

* just backport upstream fix

* remove println
* Support Duration in min/max agg functions

* Attempt to fix build

* Attempt to fix build - Fix chrono version

* Revert "Attempt to fix build - Fix chrono version"

This reverts commit fd76fe6.

* Revert "Attempt to fix build"

This reverts commit 9114b86.

---------

Co-authored-by: svranesevic <[email protected]>
joroKr21 and others added 5 commits October 1, 2025 11:19
* Drop rust-toolchain

* Fix panics in array_union

* Fix the chrono
…4496) v46

* fix: rewrite fetch, skip of the Limit node in correct order

* style: fix clippy
* Support aliases in ConstEvaluator (apache#14734)

Not sure why they are not supported. It seems that if we're not careful,
some transformations can introduce aliases nested inside other expressions.

* Format Cargo.toml
joroKr21 and others added 4 commits October 1, 2025 11:44
…che#14888) v46

Whenever we use `recompute_schema` or `with_exprs_and_inputs`,
this ensures that we obtain the same schema.
* fix case_column_or_null with nullable when conditions

* improve sqllogictests for case_column_or_null

---------

Co-authored-by: zhangli20 <[email protected]>
@github-actions github-actions bot added the common label Oct 1, 2025
thinkharderdev and others added 17 commits October 1, 2025 13:22
* Add fix for segfault in ByteGroupValueBuilder

* spelling
* add fetch info to CoalescePartitionsExec

* use Statistics with_fetch API on CoalescePartitionsExec

* check limit_reached only if fetch is assigned

Co-authored-by: mertak-synnada <[email protected]>
… v48

* add fetch to CoalescePartitionsExecNode

* gen proto code

* Add test

* fix

* fix build

* Fix test build

* remove comments

Co-authored-by: 张林伟 <[email protected]>
* Add JoinContext with JoinLeftData to TaskContext in HashJoinExec

* Expose random state as const

* re-export ahash::RandomState

* JoinContext default impl

* Add debug log when setting join left data
… v44

* simple support vectorized append.

* fix tests.

* some logs.

* add `append_n` in `MaybeNullBufferBuilder`.

* impl basic append_batch

* fix equal to.

* define `GroupIndexContext`.

* define the structs useful in vectorizing.

* re-define some structs for vectorized operations.

* impl some vectorized logics.

* impl chekcing hashmap stage.

* fix compile.

* tmp

* define and impl `vectorized_compare`.

* fix compile.

* impl `vectorized_equal_to`.

* impl `vectorized_append`.

* finish the basic vectorized ops logic.

* impl `take_n`.

* fix `renaming clear` and `groups fill`.

* fix death loop due to rehashing.

* fix vectorized append.

* add counter.

* use extend rather than resize.

* remove dbg!.

* remove reserve.

* refactor the codes to make simpler and more performant.

* clear `scalarized_indices` in `intern` to avoid some corner case.

* fix `scalarized_equal_to`.

* fallback to total scalarized `GroupValuesColumn` in streaming aggregation.

* add unit test for `VectorizedGroupValuesColumn`.

* add unit test for emitting first n in `VectorizedGroupValuesColumn`.

* sort out tests codes in for group columns and add vectorized tests for primitives.

* add vectorized test for byte builder.

* add vectorized test for byte view builder.

* add test for the all nulls or not nulls branches in vectorized.

* fix clippy.

* fix fmt.

* fix compile in rust 1.79.

* improve comments.

* fix doc.

* add more comments to explain the really complex vectorized intern process.

* add comments to explain why we still need origin `GroupValuesColumn`.

* remove some stale comments.

* fix clippy.

* add comments for `vectorized_equal_to` and `vectorized_append`.

* fix clippy.

* use zip to simplify codes.

* use izip to simplify codes.

* Update datafusion/physical-plan/src/aggregates/group_values/group_column.rs

Co-authored-by: Jay Zhan <[email protected]>

* first_n attempt

Signed-off-by: jayzhan211 <[email protected]>

* add test

Signed-off-by: jayzhan211 <[email protected]>

* improve hashtable modifying in emit first n test.

* add `emit_group_index_list_buffer` to avoid allocating new `Vec` to store the remaining gourp indices.

* make comments in VectorizedGroupValuesColumn::intern simpler and clearer.

* define `VectorizedOperationBuffers` to hold buffers used in vectorized operations to make code clearer.

* unify `VectorizedGroupValuesColumn` and `GroupValuesColumn`.

* fix fmt.

* fix comments.

* fix clippy.

---------

Signed-off-by: jayzhan211 <[email protected]>
Co-authored-by: Jay Zhan <[email protected]>

(cherry picked from commit 345117b)
* Fix record batch memory size double counting

(cherry picked from commit 172cf8d)
…in GroupedHashAggregateStream (apache#13995) (#302) v45

* Refactor spill handling in GroupedHashAggregateStream to use partial aggregate schema

* Implement aggregate functions with spill handling in tests

* Add tests for aggregate functions with and without spill handling

* Move test related imports into mod test

* Rename spill pool test functions for clarity and consistency

* Refactor aggregate function imports to use fully qualified paths

* Remove outdated comments regarding input batch schema for spilling in GroupedHashAggregateStream

* Update aggregate test to use AVG instead of MAX

* assert spill count

* Refactor partial aggregate schema creation to use create_schema function

* Refactor partial aggregation schema creation and remove redundant function

* Remove unused import of Schema from arrow::datatypes in row_hash.rs

* move spill pool testing for aggregate functions to physical-plan/src/aggregates

* Use Arc::clone for schema references in aggregate functions

(cherry picked from commit 81b50c4)

Co-authored-by: kosiew <[email protected]>
* converted LexOrderingRef to &LexOrdering

* using  LexOrdering::from_ref fn  instead of directly cloning it

* using as_ref instread of &

* using as_ref

* removed commented code

* updated cargo lock

* updated LexRequirementRef to &LexRequirement

* fixed clippy issues

* fixed taplo error for cargo.toml in physical-expr-common

* removed commented code

* fixed clippy errors

* fixed clippy error

* fixes

* removed  LexOrdering::from_ref instead using clone and created LexOrdering::empty() fn

* Update mod.rs

---------

Co-authored-by: Berkay Şahin <[email protected]>
Co-authored-by: berkaysynnada <[email protected]>

(cherry picked from commit 9005585)
…pache#13201) v44

* refactored nth_value

* continue

* test

* proto and rustlint

* fix datatype

* cont

* cont

* apply jcsherins early validation

* docs

* doc

* Apply suggestions from code review

Co-authored-by: Sherin Jacob <[email protected]>

* passes lint but does not have tests

* continue

* Update roundtrip_physical_plan.rs

* udwf, not udaf

* fix bounded but not fixed roundtrip

* added

* Update datafusion/sqllogictest/test_files/errors.slt

Co-authored-by: Sherin Jacob <[email protected]>

---------

Co-authored-by: Sherin Jacob <[email protected]>
Co-authored-by: berkaysynnada <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
(cherry picked from commit 54ab128)
* Remove BuiltInWindowFunction

* fix docs

* Fix typo

(cherry picked from commit 75a27a8)
…328) v44

* Adds roundtrip physical plan test

* Adds enum for udwf to `WindowFunction`

* initial fix for serializing udwf

* Revives deleted test

* Adds codec methods for physical plan

* Rewrite error message

* Minor: rename binding + formatting fixes

* Extends `PhysicalExtensionCodec` for udwf

* Minor: formatting

* Restricts visibility to tests

(cherry picked from commit d840e98)
Comment on lines -4237 to -4240
03)----Limit: skip=0, fetch=2
04)------TableScan: t0 projection=[c1, c2], fetch=2
05)----Limit: skip=0, fetch=2
06)------TableScan: t1 projection=[c1, c2, c3], fetch=2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong with the previous version?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, nvm - I read through the related issues in Datafusion 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.