Is your feature request related to a problem or challenge?
#21317 introduced reordering row groups by statistics for sort pushdown (ORDER BY / TopK queries). The same mechanism could be extended to GROUP BY queries to improve aggregate performance.
As @adriangb and @Dandandan suggested in #21580 (comment):
Ordering by grouping keys can:
- Reduce cardinality within partitions (partition state can be smaller)
- Allow for better cache locality (row groups with more equal keys are grouped together)
Describe the solution you'd like
The existing PreparedAccessPlan::reorder_by_statistics method already accepts any LexOrdering — it's generic, not tied to sort pushdown. Extending this for GROUP BY would be:
- In the aggregate planner, compute a preferred row group ordering from the grouping keys
- Pass it through
ParquetSource::sort_order_for_reorder
- Existing reorder logic handles the rest
SELECT category, SUM(x) FROM t GROUP BY category
Before: RGs in random order
→ HashAggregate sees mixed categories
→ Hash table stays large (all categories in memory)
→ Cache misses as different categories interleave
After: RGs ordered by category's min statistics
→ HashAggregate sees category-grouped rows
→ Can finalize/flush groups earlier (streaming aggregation opportunity)
→ Better cache locality (same category rows adjacent)
Describe alternatives you've considered
- Streaming aggregation requires fully sorted input, which is a stronger guarantee. RG-level reorder doesn't guarantee full sort but improves locality within each RG.
Additional context
The infrastructure from #21580 should be directly reusable — this issue is mainly about wiring up the GROUP BY planner to populate sort_order_for_reorder based on grouping keys.
Is your feature request related to a problem or challenge?
#21317 introduced reordering row groups by statistics for sort pushdown (ORDER BY / TopK queries). The same mechanism could be extended to GROUP BY queries to improve aggregate performance.
As @adriangb and @Dandandan suggested in #21580 (comment):
Describe the solution you'd like
The existing
PreparedAccessPlan::reorder_by_statisticsmethod already accepts anyLexOrdering— it's generic, not tied to sort pushdown. Extending this for GROUP BY would be:ParquetSource::sort_order_for_reorderDescribe alternatives you've considered
Additional context
reorder_by_statisticsinfrastructure)The infrastructure from #21580 should be directly reusable — this issue is mainly about wiring up the GROUP BY planner to populate
sort_order_for_reorderbased on grouping keys.