Fix/aggregate output ordering streaming#21107
Conversation
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
I am not sure this is a "bug" necessarily -- more like a tradeoff. I believe the enforce_distribution plan will attempt to increase plan parallelism even if it has to resort by default My understanding is that this is what the prefer_existing_sort setting controls So if you want plans to keep existing sorts and not increase parallelism in that case, you should set |
alamb
left a comment
There was a problem hiding this comment.
Thanks @xudong963 -- this change looks good to me. As I understand it it avoids adding a HashRepartitioning (and uses a more memory efficient operator) so that sounds like a win all around
I was worried that this change would result in trading off "more sortedness" for "less parallelism" but from what I can see that is not the case.
One thing we may want to consider it gating this behavior behind the "prefer existing sort" flag -- that way
I don't think we should be ignoring the errors, but otherwise the code and tests look good to me.
| 10)------------------BoundedWindowAggExec: wdw=[row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { "row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING": UInt64 }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted] | ||
| 11)--------------------LazyMemoryExec: partitions=1, batch_generators=[range: start=1, end=5, batch_size=8192] | ||
| 03)----RepartitionExec: partitioning=Hash([generated_id@0], 4), input_partitions=4, preserve_order=true, sort_exprs=generated_id@0 ASC NULLS LAST | ||
| 04)------AggregateExec: mode=Partial, gby=[generated_id@0 as generated_id], aggr=[array_agg(unnested.ar)], ordering_mode=Sorted |
There was a problem hiding this comment.
this plan looks better to me (it still is fully parallelized and now uses avoids an unecessary hash repartition
| // Parent is blocking even with ordering — no benefit | ||
| return false; | ||
| } | ||
| // Build parent with an unordered child (simulating CoalescePartitionsExec) |
There was a problem hiding this comment.
this commet is strange to me -- the code adds a CoalescePartitionsExec -- so I don't think it is "simulating" anything
| let with_ordered = | ||
| match Arc::clone(parent).with_new_children(vec![Arc::clone(ordered_child)]) { | ||
| Ok(p) => p, | ||
| Err(_) => return false, |
There was a problem hiding this comment.
I don't think we should ignore the Err here or below as it could mask real errors / a bug with this code
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
This PR updates
EnforceDistributionto keep order-preserving repartition variants when preserving input ordering allows the parent operator to remain incremental/streaming.Previously, order-preserving variants could be removed when
prefer_existing_sort = falseor when there was no explicit ordering requirement, even if dropping the ordering would force a parent operator such asAggregateExecto fall back to blocking execution. This change adds a targetedpreserving_order_enables_streamingcheck and uses it to avoid replacingRepartitionExec(..., preserve_order=true)/SortPreservingMergeExecwhen that preserved ordering is what enables streaming behavior.As a result, the optimizer now prefers keeping order-preserving repartitioning in these cases, and the updated sqllogictests reflect the new physical plans: instead of inserting a
SortExecabove a plain repartition, plans now retainRepartitionExec(... preserve_order=true)so sorted or partially sorted aggregates can continue running incrementally.Are these changes tested?
Are there any user-facing changes?
No extra sort needed for these cases