Skip to content

Conversation

avantgardnerio
Copy link
Contributor

@avantgardnerio avantgardnerio commented Oct 17, 2025

Which issue does this PR close?

Rationale for this change

Return correct results

What changes are included in this PR?

A fix for the PushPastLimits rule to accommodate the special needs of LEAD()

Are these changes tested?

An 800 line file of slts was added. There are never enough.

Are there any user-facing changes?

Queries using lead() with a limit should return correct results again (but also go fast)

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) proto Related to proto crate functions Changes to functions implementation ffi Changes to the ffi crate physical-plan Changes to the physical-plan crate labels Oct 17, 2025
@alamb alamb changed the title Bg backport [branch-50] Backport Fix bug in LimitPushPastWindows (#18029) Oct 17, 2025
@alamb
Copy link
Contributor

alamb commented Oct 18, 2025

@avantgardnerio -- since this is on the coralogix fork, I can't push changes to this PR

to get a clean CI run, I think you need to

  1. Merge up from the branch-50 branch to get [branch-50] chore: Fix no space left on device #18141
  2. Update the sqllogictests output (looks like it changes a bit)

* Add test

* Use ROWS instead of RANGE

* Fix a test

* progress

* window.slt like master

* passing existing tests

* Break out window limit tests

* LimitEffect

* fix a bug

* repartitions

* refactor

* refactor

* fmt

* remove casual

* two phased approach

* refactor into context

* refactor

* refactor

* refactor

* remove comments

* remove deps

* Fix NthValue

* aggregates

* ranking functions

* More tests

* Max lead test

* More tests, JIC

* More tests, JIC

* Notes

* Notes--

(cherry picked from commit 4e69241)
@avantgardnerio
Copy link
Contributor Author

The tests are failing because the test file seems to have been removed? Why would that happen in a patch fix branch? Can it come back?

 arrow-datafusion % find . -name 'aggregate_test_100_with_dates.csv' | wc -l
       0
       ```

@avantgardnerio
Copy link
Contributor Author

I tracked down the submodule update. Let's see if it breaks other things.

@avantgardnerio avantgardnerio requested a review from alamb October 18, 2025 20:13
03)----BoundedWindowAggExec: wdw=[sum(test.c2) FILTER (WHERE test.c2 >= Int64(2)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Field { name: "sum(test.c2) FILTER (WHERE test.c2 >= Int64(2)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, sum(test.c2) FILTER (WHERE test.c2 >= Int64(2) AND test.c2 < Int64(4) AND test.c1 > Int64(0)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Field { name: "sum(test.c2) FILTER (WHERE test.c2 >= Int64(2) AND test.c2 < Int64(4) AND test.c1 > Int64(0)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, count(test.c2) FILTER (WHERE test.c2 >= Int64(2)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Field { name: "count(test.c2) FILTER (WHERE test.c2 >= Int64(2)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, array_agg(test.c2) FILTER (WHERE test.c2 >= Int64(2)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Field { name: "array_agg(test.c2) FILTER (WHERE test.c2 >= Int64(2)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", data_type: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, array_agg(test.c2) FILTER (WHERE test.c2 >= Int64(2) AND test.c2 < Int64(4) AND test.c1 > Int64(0)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Field { name: "array_agg(test.c2) FILTER (WHERE test.c2 >= Int64(2) AND test.c2 < Int64(4) AND test.c1 > Int64(0)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", data_type: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW], mode=[Sorted]
04)------SortPreservingMergeExec: [c1@2 ASC NULLS LAST, c2@3 ASC NULLS LAST]
05)--------SortExec: expr=[c1@2 ASC NULLS LAST, c2@3 ASC NULLS LAST], preserve_partitioning=[true]
04)------SortPreservingMergeExec: [c1@2 ASC NULLS LAST, c2@3 ASC NULLS LAST], fetch=5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New plan is better and correct!.

Copy link
Contributor

@akurmustafa akurmustafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff is almost same with the PR. I checked the additional plan change in window.slt. New plan is better and correct. Thanks you @avantgardnerio for this PR.

@avantgardnerio avantgardnerio merged commit d554f1c into apache:branch-50 Oct 19, 2025
28 checks passed
@avantgardnerio avantgardnerio deleted the bg_backport branch October 19, 2025 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate ffi Changes to the ffi crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants