[branch-53] Fix push_down_filter for children with non-empty fetch fields (#21057) by hareshkh · Pull Request #21142 · apache/datafusion

hareshkh · 2026-03-24T23:06:35Z

Which issue does this PR close?

Related to Filter pushdown past children with fetch limits causes correctness bug #21063
Related to Release DataFusion 53.1.0 (minor) (Apr 2026) #21079

Rationale for this change

Currently if we see a filter with a limit underneath, we don't push the filter past the limit. However, sort nodes and table scan nodes can have fetch fields which do essentially the same thing, and we don't stop filters being pushed past them. This is a correctness bug that can lead to undefined behaviour.

I added checks for exactly this condition so we don't push the filter down. I think the prior expectation was that there would be a limit node between any of these nodes, but this is also not true. In push_down_limit.rs, there's code that does this optimisation when a limit has a sort under it:

LogicalPlan::Sort(mut sort) => {
    let new_fetch = {
        let sort_fetch = skip + fetch;
        Some(sort.fetch.map(|f| f.min(sort_fetch)).unwrap_or(sort_fetch))
    };
    if new_fetch == sort.fetch {
        if skip > 0 {
            original_limit(skip, fetch, LogicalPlan::Sort(sort))
        } else {
            Ok(Transformed::yes(LogicalPlan::Sort(sort)))
        }
    } else {
        sort.fetch = new_fetch;
        limit.input = Arc::new(LogicalPlan::Sort(sort));
        Ok(Transformed::yes(LogicalPlan::Limit(limit)))
    }
}

The first time this runs, it sets the internal fetch of the sort to new_fetch, and on the second optimisation pass it hits the branch where we just get rid of the limit node altogether, leaving the sort node exposed to potential filters which can now push down into it.

There is also a related fix in gather_filters_for_pushdown in SortExec, which does the same thing for physical plan nodes. If we see that a given execution plan has non-empty fetch, it should not allow any parent filters to be pushed down.

What changes are included in this PR?

Added checks in the optimisation rule to avoid pushing filters past children with built-in limits.

Are these changes tested?

Yes:

Unit tests in push_down_filter.rs
Fixed an existing test in window.slt
Unit tests for the physical plan change in sort.rs
New slt test in push_down_filter_sort_fetch.slt for this exact behaviour

Are there any user-facing changes?

No

…#21057)  - Closes apache#21063  Currently if we see a filter with a limit underneath, we don't push the filter past the limit. However, sort nodes and table scan nodes can have fetch fields which do essentially the same thing, and we don't stop filters being pushed past them. This is a correctness bug that can lead to undefined behaviour. I added checks for exactly this condition so we don't push the filter down. I think the prior expectation was that there would be a limit node between any of these nodes, but this is also not true. In `push_down_limit.rs`, there's code that does this optimisation when a limit has a sort under it: ``` LogicalPlan::Sort(mut sort) => { let new_fetch = { let sort_fetch = skip + fetch; Some(sort.fetch.map(|f| f.min(sort_fetch)).unwrap_or(sort_fetch)) }; if new_fetch == sort.fetch { if skip > 0 { original_limit(skip, fetch, LogicalPlan::Sort(sort)) } else { Ok(Transformed::yes(LogicalPlan::Sort(sort))) } } else { sort.fetch = new_fetch; limit.input = Arc::new(LogicalPlan::Sort(sort)); Ok(Transformed::yes(LogicalPlan::Limit(limit))) } } ``` The first time this runs, it sets the internal fetch of the sort to new_fetch, and on the second optimisation pass it hits the branch where we just get rid of the limit node altogether, leaving the sort node exposed to potential filters which can now push down into it. There is also a related fix in `gather_filters_for_pushdown` in `SortExec`, which does the same thing for physical plan nodes. If we see that a given execution plan has non-empty fetch, it should not allow any parent filters to be pushed down.  Added checks in the optimisation rule to avoid pushing filters past children with built-in limits.  Yes: - Unit tests in `push_down_filter.rs` - Fixed an existing test in `window.slt` - Unit tests for the physical plan change in `sort.rs` - New slt test in `push_down_filter_sort_fetch.slt` for this exact behaviour  No  --------- Co-authored-by: Shiv Bhatia <sbhatia@palantir.com>

github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Mar 24, 2026

hareshkh mentioned this pull request Mar 24, 2026

Release DataFusion 53.1.0 (minor) (Apr 2026) #21079

Open

5 tasks

hareshkh force-pushed the hk/cp-optimizer-53 branch from 247b2d9 to ec00d02 Compare March 24, 2026 23:30

hareshkh force-pushed the hk/cp-optimizer-53 branch from ec00d02 to 69035c3 Compare March 25, 2026 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[branch-53] Fix push_down_filter for children with non-empty fetch fields (#21057)#21142

[branch-53] Fix push_down_filter for children with non-empty fetch fields (#21057)#21142
hareshkh wants to merge 1 commit intoapache:branch-53from
hareshkh:hk/cp-optimizer-53

hareshkh commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hareshkh commented Mar 24, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants