[branch-50] Backport Fix bug in LimitPushPastWindows (#18029) #18107

avantgardnerio · 2025-10-17T02:39:13Z

Which issue does this PR close?

Related to Release DataFusion 50.3.0 (minor) #18072.
Backports Fix bug in LimitPushPastWindows #18029

Rationale for this change

Return correct results

What changes are included in this PR?

A fix for the PushPastLimits rule to accommodate the special needs of LEAD()

Backport of Fix bug in LimitPushPastWindows #18029

Are these changes tested?

An 800 line file of slts was added. There are never enough.

Are there any user-facing changes?

Queries using lead() with a limit should return correct results again (but also go fast)

alamb · 2025-10-18T10:51:23Z

@avantgardnerio -- since this is on the coralogix fork, I can't push changes to this PR

to get a clean CI run, I think you need to

Merge up from the branch-50 branch to get [branch-50] chore: Fix no space left on device #18141
Update the sqllogictests output (looks like it changes a bit)

* Add test * Use ROWS instead of RANGE * Fix a test * progress * window.slt like master * passing existing tests * Break out window limit tests * LimitEffect * fix a bug * repartitions * refactor * refactor * fmt * remove casual * two phased approach * refactor into context * refactor * refactor * refactor * remove comments * remove deps * Fix NthValue * aggregates * ranking functions * More tests * Max lead test * More tests, JIC * More tests, JIC * Notes * Notes-- (cherry picked from commit 4e69241)

avantgardnerio · 2025-10-18T18:30:49Z

The tests are failing because the test file seems to have been removed? Why would that happen in a patch fix branch? Can it come back?

 arrow-datafusion % find . -name 'aggregate_test_100_with_dates.csv' | wc -l
       0
       ```

avantgardnerio · 2025-10-18T18:43:52Z

I tracked down the submodule update. Let's see if it breaks other things.

akurmustafa · 2025-10-19T03:34:13Z

datafusion/sqllogictest/test_files/window.slt

 03)----BoundedWindowAggExec: wdw=[sum(test.c2) FILTER (WHERE test.c2 >= Int64(2)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Field { name: "sum(test.c2) FILTER (WHERE test.c2 >= Int64(2)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, sum(test.c2) FILTER (WHERE test.c2 >= Int64(2) AND test.c2 < Int64(4) AND test.c1 > Int64(0)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Field { name: "sum(test.c2) FILTER (WHERE test.c2 >= Int64(2) AND test.c2 < Int64(4) AND test.c1 > Int64(0)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, count(test.c2) FILTER (WHERE test.c2 >= Int64(2)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Field { name: "count(test.c2) FILTER (WHERE test.c2 >= Int64(2)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, array_agg(test.c2) FILTER (WHERE test.c2 >= Int64(2)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Field { name: "array_agg(test.c2) FILTER (WHERE test.c2 >= Int64(2)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", data_type: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, array_agg(test.c2) FILTER (WHERE test.c2 >= Int64(2) AND test.c2 < Int64(4) AND test.c1 > Int64(0)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Field { name: "array_agg(test.c2) FILTER (WHERE test.c2 >= Int64(2) AND test.c2 < Int64(4) AND test.c1 > Int64(0)) ORDER BY [test.c1 ASC NULLS LAST, test.c2 ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", data_type: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW], mode=[Sorted]
-04)------SortPreservingMergeExec: [c1@2 ASC NULLS LAST, c2@3 ASC NULLS LAST]
-05)--------SortExec: expr=[c1@2 ASC NULLS LAST, c2@3 ASC NULLS LAST], preserve_partitioning=[true]
+04)------SortPreservingMergeExec: [c1@2 ASC NULLS LAST, c2@3 ASC NULLS LAST], fetch=5


New plan is better and correct!.

akurmustafa

The diff is almost same with the PR. I checked the additional plan change in window.slt. New plan is better and correct. Thanks you @avantgardnerio for this PR.

avantgardnerio mentioned this pull request Oct 17, 2025

Release DataFusion 50.3.0 (minor) #18072

Open

26 tasks

alamb changed the title ~~Bg backport~~ [branch-50] Backport Fix bug in LimitPushPastWindows (#18029) Oct 17, 2025

avantgardnerio added 2 commits October 18, 2025 12:02

backport

88dbe0b

avantgardnerio force-pushed the bg_backport branch from 13ffb6d to 88dbe0b Compare October 18, 2025 18:06

update submodule

20d7684

update test to match main

b017302

avantgardnerio requested a review from alamb October 18, 2025 20:13

akurmustafa reviewed Oct 19, 2025

View reviewed changes

akurmustafa approved these changes Oct 19, 2025

View reviewed changes

avantgardnerio merged commit d554f1c into apache:branch-50 Oct 19, 2025
28 checks passed

avantgardnerio deleted the bg_backport branch October 19, 2025 03:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[branch-50] Backport Fix bug in LimitPushPastWindows (#18029) #18107

[branch-50] Backport Fix bug in LimitPushPastWindows (#18029) #18107

Uh oh!

avantgardnerio commented Oct 17, 2025 •

edited by alamb

Loading

Uh oh!

alamb commented Oct 18, 2025

Uh oh!

avantgardnerio commented Oct 18, 2025

Uh oh!

avantgardnerio commented Oct 18, 2025

Uh oh!

akurmustafa Oct 19, 2025

Uh oh!

akurmustafa left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[branch-50] Backport Fix bug in LimitPushPastWindows (#18029) #18107

[branch-50] Backport Fix bug in LimitPushPastWindows (#18029) #18107

Uh oh!

Conversation

avantgardnerio commented Oct 17, 2025 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb commented Oct 18, 2025

Uh oh!

avantgardnerio commented Oct 18, 2025

Uh oh!

avantgardnerio commented Oct 18, 2025

Uh oh!

akurmustafa Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

akurmustafa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

avantgardnerio commented Oct 17, 2025 •

edited by alamb

Loading