Fix/aggregate output ordering streaming by xudong963 · Pull Request #21107 · apache/datafusion

xudong963 · 2026-03-23T05:36:50Z

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

This PR updates EnforceDistribution to keep order-preserving repartition variants when preserving input ordering allows the parent operator to remain incremental/streaming.

Previously, order-preserving variants could be removed when prefer_existing_sort = false or when there was no explicit ordering requirement, even if dropping the ordering would force a parent operator such as AggregateExec to fall back to blocking execution. This change adds a targeted preserving_order_enables_streaming check and uses it to avoid replacing RepartitionExec(..., preserve_order=true) / SortPreservingMergeExec when that preserved ordering is what enables streaming behavior.

As a result, the optimizer now prefers keeping order-preserving repartitioning in these cases, and the updated sqllogictests reflect the new physical plans: instead of inserting a SortExec above a plain repartition, plans now retain RepartitionExec(... preserve_order=true) so sorted or partially sorted aggregates can continue running incrementally.

Are these changes tested?

Are there any user-facing changes?

No extra sort needed for these cases

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

alamb · 2026-03-24T13:52:36Z

Previously, order-preserving variants could be removed when prefer_existing_sort = false or when there was no explicit ordering requirement, even if dropping the ordering would force a parent operator such as AggregateExec to fall back to blocking execution.

I am not sure this is a "bug" necessarily -- more like a tradeoff. I believe the enforce_distribution plan will attempt to increase plan parallelism even if it has to resort by default

My understanding is that this is what the prefer_existing_sort setting controls

prefer_existing_sort = false

So if you want plans to keep existing sorts and not increase parallelism in that case, you should set prefer_existing_sort = true -- this is what we do in InfluxDB FWIW

alamb

Thanks @xudong963 -- this change looks good to me. As I understand it it avoids adding a HashRepartitioning (and uses a more memory efficient operator) so that sounds like a win all around

I was worried that this change would result in trading off "more sortedness" for "less parallelism" but from what I can see that is not the case.

One thing we may want to consider it gating this behavior behind the "prefer existing sort" flag -- that way

I don't think we should be ignoring the errors, but otherwise the code and tests look good to me.

alamb · 2026-03-24T13:54:19Z

datafusion/sqllogictest/test_files/unnest.slt

-10)------------------BoundedWindowAggExec: wdw=[row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { "row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING": UInt64 }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted]
-11)--------------------LazyMemoryExec: partitions=1, batch_generators=[range: start=1, end=5, batch_size=8192]
+03)----RepartitionExec: partitioning=Hash([generated_id@0], 4), input_partitions=4, preserve_order=true, sort_exprs=generated_id@0 ASC NULLS LAST
+04)------AggregateExec: mode=Partial, gby=[generated_id@0 as generated_id], aggr=[array_agg(unnested.ar)], ordering_mode=Sorted


this plan looks better to me (it still is fully parallelized and now uses avoids an unecessary hash repartition

alamb · 2026-03-24T13:56:14Z

datafusion/physical-optimizer/src/enforce_distribution.rs

+        // Parent is blocking even with ordering — no benefit
+        return false;
+    }
+    // Build parent with an unordered child (simulating CoalescePartitionsExec)


this commet is strange to me -- the code adds a CoalescePartitionsExec -- so I don't think it is "simulating" anything

alamb · 2026-03-24T13:57:08Z

datafusion/physical-optimizer/src/enforce_distribution.rs

+    let with_ordered =
+        match Arc::clone(parent).with_new_children(vec![Arc::clone(ordered_child)]) {
+            Ok(p) => p,
+            Err(_) => return false,


I don't think we should ignore the Err here or below as it could mask real errors / a bug with this code

xudong963 marked this pull request as draft March 23, 2026 05:36

github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Mar 23, 2026

xudong963 changed the title ~~Fix/aggregate output ordering streaming (#33)~~ Fix/aggregate output ordering streaming Mar 23, 2026

Fix/aggregate output ordering streaming (apache#33)

3f07243

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

xudong963 force-pushed the agg_order branch from 722ca96 to 3f07243 Compare March 23, 2026 06:37

add ut

11501ff

github-actions bot added the core Core DataFusion crate label Mar 23, 2026

xudong963 added 2 commits March 23, 2026 17:30

refine code

a3f8191

Merge branch 'main' into agg_order

b0318cd

xudong963 added the performance Make DataFusion faster label Mar 23, 2026

xudong963 marked this pull request as ready for review March 23, 2026 09:33

xudong963 requested a review from alamb March 23, 2026 09:33

alamb approved these changes Mar 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/aggregate output ordering streaming#21107

Fix/aggregate output ordering streaming#21107
xudong963 wants to merge 4 commits intoapache:mainfrom
xudong963:agg_order

xudong963 commented Mar 23, 2026 •

edited

Loading

Uh oh!

alamb commented Mar 24, 2026

Uh oh!

alamb left a comment

Uh oh!

alamb Mar 24, 2026

Uh oh!

alamb Mar 24, 2026

Uh oh!

alamb Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xudong963 commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb commented Mar 24, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xudong963 commented Mar 23, 2026 •

edited

Loading