Skip to content

Add benchmark for sort pushdown Inexact path (row group reorder) #21582

@zhuqi-lucas

Description

@zhuqi-lucas

Is your feature request related to a problem or challenge?

The existing sort_pushdown_sorted benchmark covers the Exact path (sort elimination, scan limit). However, the Inexact path optimizations — reverse scan (#19064) and row group reorder by statistics (#21580) — are not benchmarked.

Without an Inexact benchmark, we can't:

Describe the solution you'd like

Extend benchmarks/bench.sh and queries under benchmarks/queries/sort_pushdown/ to add Inexact scenarios:

  1. Data: Generate a single large file with multiple row groups where row groups have overlapping or out-of-order statistics (forces Inexact path). Can be done by:

    • Writing data in non-sorted order with small max_row_group_size
    • Creating synthetic data with controlled row group boundaries
  2. Queries (benchmarks/queries/sort_pushdown/q5.sql, q6.sql, ...):

    • SELECT * FROM t ORDER BY col ASC LIMIT 10 — TopK + RG reorder
    • SELECT * FROM t ORDER BY col DESC LIMIT 10 — TopK + reverse scan + RG reorder
    • SELECT * FROM t ORDER BY col ASC LIMIT 1000 — larger LIMIT
    • Wide-row variant: SELECT * with many columns to show row-level filter benefit
  3. Baseline comparison: With/without datafusion.optimizer.enable_sort_pushdown to isolate the optimization's impact.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions