Reorder row groups by GROUP BY keys to reduce aggregate partition state and improve cache locality

## Is your feature request related to a problem or challenge?

#21317 introduced reordering row groups by statistics for **sort pushdown** (ORDER BY / TopK queries). The same mechanism could be extended to **GROUP BY** queries to improve aggregate performance.

As @adriangb and @Dandandan suggested in https://github.com/apache/datafusion/pull/21580#discussion_r3071295010:

> Ordering by grouping keys can:
> - Reduce cardinality within partitions (partition state can be smaller)
> - Allow for better cache locality (row groups with more equal keys are grouped together)

## Describe the solution you'd like

The existing `PreparedAccessPlan::reorder_by_statistics` method already accepts any `LexOrdering` — it's generic, not tied to sort pushdown. Extending this for GROUP BY would be:

1. In the aggregate planner, compute a preferred row group ordering from the grouping keys
2. Pass it through `ParquetSource::sort_order_for_reorder`
3. Existing reorder logic handles the rest

```
SELECT category, SUM(x) FROM t GROUP BY category

Before: RGs in random order
  → HashAggregate sees mixed categories
  → Hash table stays large (all categories in memory)
  → Cache misses as different categories interleave

After: RGs ordered by category's min statistics
  → HashAggregate sees category-grouped rows
  → Can finalize/flush groups earlier (streaming aggregation opportunity)
  → Better cache locality (same category rows adjacent)
```

## Describe alternatives you've considered

- **Streaming aggregation** requires fully sorted input, which is a stronger guarantee. RG-level reorder doesn't guarantee full sort but improves locality within each RG.

## Additional context

- Parent: #21317 (row group reorder for sort pushdown)
- Related PR: #21580 (adds the `reorder_by_statistics` infrastructure)

The infrastructure from #21580 should be directly reusable — this issue is mainly about wiring up the GROUP BY planner to populate `sort_order_for_reorder` based on grouping keys.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorder row groups by GROUP BY keys to reduce aggregate partition state and improve cache locality #21581

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Reorder row groups by GROUP BY keys to reduce aggregate partition state and improve cache locality #21581

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions