Skip to content

feat: reorder row groups by statistics during sort pushdown#21580

Open
zhuqi-lucas wants to merge 4 commits intoapache:mainfrom
zhuqi-lucas:feat/reorder-row-groups-by-stats
Open

feat: reorder row groups by statistics during sort pushdown#21580
zhuqi-lucas wants to merge 4 commits intoapache:mainfrom
zhuqi-lucas:feat/reorder-row-groups-by-stats

Conversation

@zhuqi-lucas
Copy link
Copy Markdown
Contributor

@zhuqi-lucas zhuqi-lucas commented Apr 13, 2026

Which issue does this PR close?

Closes #21317

Rationale for this change

When sort pushdown is active for Inexact paths (e.g., reverse scan or statistics-based reordering that can't be elevated to Exact), row groups within a file are currently read in their original order. This means TopK queries may read suboptimal row groups first, delaying threshold tightening.

By reordering row groups by their min/max statistics to match the requested sort order, TopK finds optimal values first. This gives two levels of benefit:

  1. Row-level filtering (immediate): TopK sets a tight dynamic filter threshold after reading the first (best) row group. For subsequent row groups, the dynamic filter acts as a row-level filter — the parquet reader uses page index to skip non-matching pages and avoids decoding non-sort columns for filtered rows. Significant I/O reduction for wide tables.

  2. Row-group-level skipping (follow-up with morsel API): Once morsel-driven scanning lands, the dynamic filter can be re-evaluated between row groups, skipping entire row groups when their min statistics exceed the TopK threshold.

What changes are included in this PR?

  • Add reorder_by_statistics to PreparedAccessPlan that sorts row_group_indexes by the first sort column's statistics (min for ASC, max for DESC)
  • Add sort_order_for_reorder: Option<LexOrdering> field to ParquetMorselizer and ParquetSource
  • Pass sort order from ParquetSource::try_pushdown_sort (Inexact path) through to the opener
  • Reorder is applied after pruning but before reverse — the two optimizations compose
  • Graceful fallback (returns plan unchanged) when:
    • row_selection is present (remapping is complex)
    • 0 or 1 row groups (nothing to reorder)
    • Sort expression is not a simple column reference
    • Statistics converter cannot be built (column missing, type mismatch, etc.)
    • Min/max values cannot be extracted from statistics

Are these changes tested?

Yes. Added 9 unit tests in access_plan.rs covering:

  • ASC reorder with unordered row groups
  • DESC reorder (uses max statistics)
  • DESC with overlapping RGs (verifies max is used instead of min for optimal first RG)
  • Single row group (no reorder needed)
  • Reorder with some row groups skipped (pruning applied first)
  • Skip with row_selection present
  • Already-sorted row groups preserved
  • Skip when sort expr is not a simple column (e.g., col + 1)
  • Skip when column is missing from schema

SLT tests in sort_pushdown.slt verify end-to-end correctness with multi-RG files.

Benchmark

Note: The existing sort_pushdown_sorted benchmark only covers the Exact path (sort elimination). The performance benefit of this PR (Inexact path with RG reorder) is not captured by existing benchmarks. A follow-up benchmark for the Inexact path is tracked in #21582 and will be added separately to demonstrate the optimization impact.

Are there any user-facing changes?

No API changes. Users benefit from improved TopK performance on queries like SELECT * FROM t ORDER BY col LIMIT N when:

  • Files have multiple row groups
  • Row groups are not in sort order (e.g., append-heavy workloads)
  • Statistics are available on the sort column

Copilot AI review requested due to automatic review settings April 13, 2026 06:40
@github-actions github-actions bot added the datasource Changes to the datasource crate label Apr 13, 2026
When sort pushdown is active, reorder row groups within each file by
their min/max statistics to match the requested sort order. This helps
TopK queries find optimal values first via dynamic filter pushdown.

- Add reorder_by_statistics to PreparedAccessPlan that sorts
  row_group_indexes by the first sort column's min values
- Pass sort order from ParquetSource::try_pushdown_sort through to
  the opener via sort_order_for_reorder field
- Reorder happens after pruning but before reverse (they compose)
- Gracefully skips reorder when statistics unavailable, sort expr
  is not a simple column, row_selection present, or <=1 row groups

Closes apache#21317
@zhuqi-lucas zhuqi-lucas force-pushed the feat/reorder-row-groups-by-stats branch from 3700464 to a013bf6 Compare April 13, 2026 06:42
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

cc @alamb @adriangb

@Dandandan
Copy link
Copy Markdown
Contributor

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234363943-1136-k2bb2 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (a013bf6) to 29c5dd5 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234363943-1134-pcn8f 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (a013bf6) to 29c5dd5 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234363943-1135-k9dwc 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (a013bf6) to 29c5dd5 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves TopK performance for Parquet scans when sort pushdown is Inexact by enabling row-group reordering based on statistics, so likely “best” row groups are read earlier and dynamic filters can tighten sooner.

Changes:

  • Thread an optional LexOrdering from ParquetSource::try_pushdown_sort through ParquetMorselizer to the access-plan preparation step.
  • Add PreparedAccessPlan::reorder_by_statistics to reorder row_group_indexes using Parquet statistics.
  • Add unit tests covering reorder/skip behavior for multiple edge cases.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
datafusion/datasource-parquet/src/source.rs Plumbs sort ordering into the file source for later row-group reordering.
datafusion/datasource-parquet/src/opener.rs Carries optional sort ordering into the opener and applies reorder_by_statistics during plan preparation.
datafusion/datasource-parquet/src/access_plan.rs Implements row-group reordering by statistics and adds focused unit tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +824 to +826
let sort_order = LexOrdering::new(order.iter().cloned());
let mut new_source = self.clone().with_reverse_row_groups(true);
new_source.sort_order_for_reorder = sort_order;
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LexOrdering::new(...) appears to return a Result<LexOrdering, _> (as used with .unwrap() in the new unit tests), but here it’s assigned directly without ?/unwrap, and then assigned to sort_order_for_reorder: Option<LexOrdering> without wrapping in Some(...). This should be changed to construct a LexOrdering with error propagation and store it as Some(sort_order) (or skip setting the field on error). Otherwise this won’t compile.

Suggested change
let sort_order = LexOrdering::new(order.iter().cloned());
let mut new_source = self.clone().with_reverse_row_groups(true);
new_source.sort_order_for_reorder = sort_order;
let sort_order = LexOrdering::new(order.iter().cloned())?;
let mut new_source = self.clone().with_reverse_row_groups(true);
new_source.sort_order_for_reorder = Some(sort_order);

Copilot uses AI. Check for mistakes.
Comment on lines +414 to +415
// LexOrdering is guaranteed non-empty, so first() returns &PhysicalSortExpr
let first_sort_expr = sort_order.first();
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort_order.first() (if LexOrdering is Vec-like) returns Option<&PhysicalSortExpr>, but the code uses it as if it were &PhysicalSortExpr (first_sort_expr.expr...). This is likely a compile error. A concrete fix is to obtain the first element via iteration and handle the empty case (e.g., early-return Ok(self) if no sort expressions), then use the returned &PhysicalSortExpr.

Suggested change
// LexOrdering is guaranteed non-empty, so first() returns &PhysicalSortExpr
let first_sort_expr = sort_order.first();
let first_sort_expr = match sort_order.iter().next() {
Some(expr) => expr,
None => {
debug!("Skipping RG reorder: empty sort order");
return Ok(self);
}
};

Copilot uses AI. Check for mistakes.
}
};

let descending = first_sort_expr.options.descending;
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DESC ordering, reordering by min values is often a poor proxy for “row group likely contains the largest values first”; typically you want to sort by max when descending == true (and by min when ascending). This can significantly reduce the intended TopK benefit (and can even choose a worse first row group when ranges overlap). Consider switching to row_group_maxs(...) for descending order, and update the doc comment (currently mentions “min/max”) and the DESC unit test accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +442 to +463
// Get min values for the selected row groups
let rg_metadata: Vec<&RowGroupMetaData> = self
.row_group_indexes
.iter()
.map(|&idx| file_metadata.row_group(idx))
.collect();

let min_values = match converter.row_group_mins(rg_metadata.iter().copied()) {
Ok(vals) => vals,
Err(e) => {
debug!("Skipping RG reorder: cannot get min values: {e}");
return Ok(self);
}
};

// Sort indices by min values
let sort_options = arrow::compute::SortOptions {
descending,
nulls_first: first_sort_expr.options.nulls_first,
};
let sorted_indices = match arrow::compute::sort_to_indices(
&min_values,
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DESC ordering, reordering by min values is often a poor proxy for “row group likely contains the largest values first”; typically you want to sort by max when descending == true (and by min when ascending). This can significantly reduce the intended TopK benefit (and can even choose a worse first row group when ranges overlap). Consider switching to row_group_maxs(...) for descending order, and update the doc comment (currently mentions “min/max”) and the DESC unit test accordingly.

Suggested change
// Get min values for the selected row groups
let rg_metadata: Vec<&RowGroupMetaData> = self
.row_group_indexes
.iter()
.map(|&idx| file_metadata.row_group(idx))
.collect();
let min_values = match converter.row_group_mins(rg_metadata.iter().copied()) {
Ok(vals) => vals,
Err(e) => {
debug!("Skipping RG reorder: cannot get min values: {e}");
return Ok(self);
}
};
// Sort indices by min values
let sort_options = arrow::compute::SortOptions {
descending,
nulls_first: first_sort_expr.options.nulls_first,
};
let sorted_indices = match arrow::compute::sort_to_indices(
&min_values,
// Get values for the selected row groups: mins for ASC, maxs for DESC
let rg_metadata: Vec<&RowGroupMetaData> = self
.row_group_indexes
.iter()
.map(|&idx| file_metadata.row_group(idx))
.collect();
let sort_values = match if descending {
converter.row_group_maxs(rg_metadata.iter().copied())
} else {
converter.row_group_mins(rg_metadata.iter().copied())
} {
Ok(vals) => vals,
Err(e) => {
debug!("Skipping RG reorder: cannot get min/max values: {e}");
return Ok(self);
}
};
// Sort indices by the statistics that best match the requested order
let sort_options = arrow::compute::SortOptions {
descending,
nulls_first: first_sort_expr.options.nulls_first,
};
let sorted_indices = match arrow::compute::sort_to_indices(
&sort_values,

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good point.

Comment on lines +462 to +466
let sorted_indices = match arrow::compute::sort_to_indices(
&min_values,
Some(sort_options),
None,
) {
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple row groups share the same min (or max) statistic, sort_to_indices may not guarantee a deterministic/stable tie-breaker across platforms/versions. Since row-group order can affect scan reproducibility and performance debugging, consider adding a stable secondary key (e.g., original row group index) when statistics are equal.

Copilot uses AI. Check for mistakes.
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Apr 13, 2026
/// - 0 or 1 row groups (nothing to reorder)
/// - Sort expression is not a simple column reference
/// - Statistics are unavailable
pub(crate) fn reorder_by_statistics(
Copy link
Copy Markdown
Contributor

@Dandandan Dandandan Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @adriangb had the great idea to also order by grouping keys which can

  • reduce cardinality within partitions (partition-local state can be smaller)
  • allow for better cache locality (row groups with more equal keys are grouped together)

Doesn't have to be in this PR but perhaps we can think about how it fits in.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dandandan for review! That's a great extension. The reorder_by_statistics method is generic enough to take any LexOrdering — it doesn't need to be tied to TopK specifically. So extending this for GROUP BY should be a matter of:

  1. Computing a preferred RG ordering from grouping keys in the aggregate planner
  2. Passing it through to ParquetSource::sort_order_for_reorder

Happy to track this as a follow-up issue. Will open one after this PR lands.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dandandan! Created #21581 to track this. The existing infrastructure from this PR should be directly reusable — mainly needs the aggregate planner to populate sort_order_for_reorder from grouping keys.

@Dandandan
Copy link
Copy Markdown
Contributor

run benchmarks

env:
    PUSHDOWN_FILTERS: true
    REORDER_FILTERS: true

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234422910-1137-vvpxc 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234422910-1138-qjxtt 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234422910-1139-wpv6n 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃         feat_reorder-row-groups-by-stats ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │              6.76 / 7.21 ±0.73 / 8.66 ms │              6.66 / 7.05 ±0.75 / 8.56 ms │    no change │
│ QQuery 2  │        145.75 / 146.85 ±1.06 / 148.46 ms │        144.70 / 146.59 ±1.26 / 148.61 ms │    no change │
│ QQuery 3  │        114.48 / 115.73 ±0.96 / 117.37 ms │        113.70 / 114.51 ±0.53 / 115.15 ms │    no change │
│ QQuery 4  │    1336.62 / 1383.93 ±28.36 / 1413.36 ms │    1341.10 / 1369.70 ±25.80 / 1400.67 ms │    no change │
│ QQuery 5  │        172.96 / 173.76 ±0.93 / 175.52 ms │        172.87 / 173.71 ±1.06 / 175.76 ms │    no change │
│ QQuery 6  │       831.80 / 869.64 ±22.94 / 893.03 ms │       817.74 / 885.67 ±36.62 / 924.77 ms │    no change │
│ QQuery 7  │        343.27 / 346.44 ±2.73 / 351.44 ms │        343.10 / 347.19 ±4.26 / 354.07 ms │    no change │
│ QQuery 8  │        115.88 / 117.19 ±0.88 / 118.45 ms │        115.12 / 116.82 ±1.09 / 118.14 ms │    no change │
│ QQuery 9  │        101.81 / 105.19 ±5.49 / 116.11 ms │       100.38 / 109.94 ±10.62 / 129.61 ms │    no change │
│ QQuery 10 │        105.57 / 107.18 ±0.90 / 108.19 ms │        106.58 / 108.12 ±1.52 / 110.94 ms │    no change │
│ QQuery 11 │        951.72 / 963.82 ±6.42 / 968.96 ms │        964.70 / 976.63 ±7.17 / 986.01 ms │    no change │
│ QQuery 12 │           44.04 / 47.17 ±1.95 / 49.81 ms │           46.18 / 47.24 ±0.73 / 48.28 ms │    no change │
│ QQuery 13 │        403.35 / 405.32 ±1.43 / 407.21 ms │        403.48 / 405.90 ±1.57 / 407.89 ms │    no change │
│ QQuery 14 │     1009.53 / 1015.83 ±5.78 / 1026.74 ms │    1002.02 / 1020.87 ±15.11 / 1047.87 ms │    no change │
│ QQuery 15 │           16.12 / 17.91 ±1.76 / 20.55 ms │           17.16 / 19.05 ±1.98 / 22.65 ms │ 1.06x slower │
│ QQuery 16 │              7.31 / 7.59 ±0.20 / 7.91 ms │              7.88 / 8.59 ±0.73 / 9.70 ms │ 1.13x slower │
│ QQuery 17 │        229.47 / 231.19 ±1.44 / 233.08 ms │        240.36 / 243.29 ±2.25 / 246.13 ms │ 1.05x slower │
│ QQuery 18 │        126.61 / 128.79 ±1.68 / 131.55 ms │        134.44 / 135.44 ±0.69 / 136.56 ms │ 1.05x slower │
│ QQuery 19 │        156.40 / 157.51 ±0.85 / 159.00 ms │        162.88 / 164.97 ±1.38 / 166.21 ms │    no change │
│ QQuery 20 │           13.75 / 14.66 ±0.67 / 15.79 ms │           15.45 / 15.77 ±0.24 / 15.99 ms │ 1.08x slower │
│ QQuery 21 │           19.51 / 20.16 ±0.51 / 20.76 ms │           21.05 / 21.45 ±0.35 / 22.06 ms │ 1.06x slower │
│ QQuery 22 │        481.56 / 489.10 ±4.25 / 493.73 ms │        491.92 / 498.86 ±8.92 / 516.06 ms │    no change │
│ QQuery 23 │        873.33 / 884.81 ±6.37 / 892.89 ms │       887.79 / 896.87 ±10.62 / 916.66 ms │    no change │
│ QQuery 24 │        381.97 / 385.40 ±3.50 / 391.57 ms │        382.98 / 387.53 ±2.66 / 390.54 ms │    no change │
│ QQuery 25 │        340.86 / 344.45 ±2.19 / 346.83 ms │        340.45 / 343.50 ±1.73 / 345.12 ms │    no change │
│ QQuery 26 │           80.84 / 81.79 ±0.82 / 83.32 ms │           81.27 / 82.66 ±0.83 / 83.86 ms │    no change │
│ QQuery 27 │              6.88 / 7.52 ±0.74 / 8.95 ms │              6.87 / 7.23 ±0.48 / 8.16 ms │    no change │
│ QQuery 28 │        148.59 / 150.91 ±1.93 / 153.50 ms │        148.77 / 150.19 ±1.12 / 152.08 ms │    no change │
│ QQuery 29 │        282.92 / 285.15 ±1.50 / 287.45 ms │        283.28 / 284.50 ±1.29 / 286.63 ms │    no change │
│ QQuery 30 │           43.82 / 45.47 ±1.49 / 48.04 ms │           42.04 / 44.32 ±1.23 / 45.71 ms │    no change │
│ QQuery 31 │        169.84 / 172.47 ±2.27 / 176.48 ms │        169.29 / 171.30 ±1.68 / 173.99 ms │    no change │
│ QQuery 32 │           57.32 / 58.30 ±0.57 / 59.04 ms │           57.74 / 58.68 ±0.98 / 60.05 ms │    no change │
│ QQuery 33 │        140.09 / 143.20 ±1.85 / 145.61 ms │        141.03 / 142.63 ±0.92 / 143.86 ms │    no change │
│ QQuery 34 │              6.96 / 7.30 ±0.30 / 7.84 ms │              7.00 / 7.23 ±0.26 / 7.61 ms │    no change │
│ QQuery 35 │        107.83 / 109.50 ±1.10 / 110.70 ms │        106.64 / 109.99 ±2.04 / 112.55 ms │    no change │
│ QQuery 36 │              6.51 / 6.99 ±0.48 / 7.88 ms │              6.53 / 6.71 ±0.20 / 7.07 ms │    no change │
│ QQuery 37 │             8.73 / 9.25 ±0.67 / 10.56 ms │              8.21 / 8.81 ±0.46 / 9.57 ms │    no change │
│ QQuery 38 │           85.23 / 88.80 ±4.94 / 98.57 ms │           84.88 / 87.11 ±3.72 / 94.52 ms │    no change │
│ QQuery 39 │        125.77 / 129.50 ±4.44 / 137.69 ms │        127.22 / 128.59 ±1.12 / 130.62 ms │    no change │
│ QQuery 40 │        111.88 / 117.77 ±5.93 / 128.78 ms │        110.11 / 117.40 ±8.84 / 134.20 ms │    no change │
│ QQuery 41 │           15.66 / 16.18 ±0.59 / 17.31 ms │           14.86 / 16.06 ±1.18 / 18.18 ms │    no change │
│ QQuery 42 │        108.25 / 110.15 ±1.58 / 112.51 ms │        107.58 / 109.71 ±1.45 / 111.73 ms │    no change │
│ QQuery 43 │              5.98 / 6.31 ±0.27 / 6.73 ms │              6.03 / 6.53 ±0.80 / 8.12 ms │    no change │
│ QQuery 44 │           11.71 / 12.19 ±0.68 / 13.53 ms │           11.79 / 12.14 ±0.20 / 12.35 ms │    no change │
│ QQuery 45 │           51.06 / 52.40 ±0.78 / 53.28 ms │           50.16 / 52.05 ±1.38 / 54.23 ms │    no change │
│ QQuery 46 │              8.65 / 8.89 ±0.17 / 9.16 ms │              8.63 / 8.88 ±0.20 / 9.11 ms │    no change │
│ QQuery 47 │        710.73 / 722.40 ±6.30 / 729.86 ms │        733.32 / 745.39 ±6.92 / 754.00 ms │    no change │
│ QQuery 48 │        289.87 / 294.72 ±4.74 / 300.92 ms │        288.52 / 294.73 ±6.45 / 306.78 ms │    no change │
│ QQuery 49 │        251.64 / 252.97 ±1.41 / 255.48 ms │        255.34 / 255.97 ±0.56 / 256.78 ms │    no change │
│ QQuery 50 │        222.58 / 228.41 ±4.03 / 235.01 ms │        226.54 / 233.03 ±5.00 / 240.31 ms │    no change │
│ QQuery 51 │        180.59 / 184.08 ±2.83 / 187.39 ms │        183.67 / 186.36 ±1.96 / 188.77 ms │    no change │
│ QQuery 52 │        107.91 / 109.06 ±0.89 / 110.34 ms │        110.03 / 111.42 ±0.81 / 112.47 ms │    no change │
│ QQuery 53 │        104.18 / 105.27 ±1.15 / 107.27 ms │        105.42 / 106.80 ±1.39 / 109.23 ms │    no change │
│ QQuery 54 │        147.22 / 148.77 ±1.46 / 151.25 ms │        149.43 / 151.07 ±0.93 / 152.31 ms │    no change │
│ QQuery 55 │        108.14 / 109.84 ±1.73 / 112.49 ms │        108.46 / 110.09 ±1.84 / 113.65 ms │    no change │
│ QQuery 56 │        141.89 / 144.13 ±1.81 / 146.71 ms │        142.50 / 144.45 ±1.14 / 145.70 ms │    no change │
│ QQuery 57 │        172.45 / 174.79 ±1.90 / 177.95 ms │        176.55 / 178.66 ±1.32 / 180.38 ms │    no change │
│ QQuery 58 │        292.06 / 297.55 ±2.86 / 299.80 ms │        290.26 / 298.13 ±6.17 / 309.20 ms │    no change │
│ QQuery 59 │        199.68 / 202.81 ±4.20 / 210.76 ms │        196.18 / 201.51 ±2.94 / 205.01 ms │    no change │
│ QQuery 60 │        145.63 / 146.61 ±1.30 / 149.18 ms │        144.93 / 145.97 ±0.58 / 146.66 ms │    no change │
│ QQuery 61 │           13.09 / 13.42 ±0.22 / 13.74 ms │           13.26 / 13.48 ±0.20 / 13.79 ms │    no change │
│ QQuery 62 │      903.17 / 972.36 ±73.07 / 1110.97 ms │       893.58 / 940.09 ±28.77 / 975.15 ms │    no change │
│ QQuery 63 │        104.29 / 106.12 ±1.11 / 107.45 ms │        106.05 / 107.70 ±1.58 / 110.72 ms │    no change │
│ QQuery 64 │        684.26 / 697.36 ±8.94 / 709.76 ms │        684.46 / 694.14 ±8.98 / 708.42 ms │    no change │
│ QQuery 65 │        251.36 / 257.09 ±3.42 / 260.31 ms │        254.62 / 257.41 ±1.68 / 259.77 ms │    no change │
│ QQuery 66 │        239.53 / 250.02 ±7.11 / 258.78 ms │        242.28 / 255.27 ±8.91 / 265.85 ms │    no change │
│ QQuery 67 │        312.00 / 314.58 ±2.53 / 317.92 ms │        315.48 / 320.21 ±4.64 / 327.33 ms │    no change │
│ QQuery 68 │             8.86 / 9.83 ±1.00 / 11.23 ms │            8.85 / 10.18 ±0.80 / 10.96 ms │    no change │
│ QQuery 69 │        101.72 / 103.97 ±1.51 / 106.48 ms │        102.05 / 104.18 ±2.49 / 108.67 ms │    no change │
│ QQuery 70 │        349.78 / 359.97 ±8.05 / 369.60 ms │        341.41 / 352.18 ±7.53 / 362.53 ms │    no change │
│ QQuery 71 │        135.76 / 139.31 ±3.34 / 145.38 ms │        136.47 / 138.33 ±2.01 / 141.85 ms │    no change │
│ QQuery 72 │       610.09 / 628.55 ±11.61 / 639.76 ms │        618.47 / 625.85 ±7.96 / 638.13 ms │    no change │
│ QQuery 73 │              7.84 / 8.48 ±0.63 / 9.55 ms │             6.99 / 9.50 ±2.08 / 13.02 ms │ 1.12x slower │
│ QQuery 74 │        578.61 / 594.14 ±8.66 / 602.76 ms │        598.69 / 607.55 ±7.77 / 620.04 ms │    no change │
│ QQuery 75 │        276.80 / 279.31 ±2.06 / 282.54 ms │        280.18 / 283.49 ±2.96 / 288.22 ms │    no change │
│ QQuery 76 │        133.52 / 135.27 ±1.41 / 137.64 ms │        134.75 / 136.00 ±1.23 / 138.35 ms │    no change │
│ QQuery 77 │        187.89 / 190.27 ±1.66 / 192.45 ms │        188.51 / 190.82 ±2.30 / 194.65 ms │    no change │
│ QQuery 78 │        339.14 / 344.57 ±3.86 / 351.08 ms │        335.65 / 342.70 ±4.54 / 349.86 ms │    no change │
│ QQuery 79 │        235.13 / 237.76 ±1.41 / 239.31 ms │        235.50 / 239.71 ±2.39 / 242.04 ms │    no change │
│ QQuery 80 │        321.80 / 323.60 ±1.62 / 326.10 ms │        318.67 / 323.18 ±2.71 / 327.21 ms │    no change │
│ QQuery 81 │           26.78 / 27.93 ±1.53 / 30.94 ms │           26.92 / 27.55 ±0.60 / 28.44 ms │    no change │
│ QQuery 82 │        200.18 / 203.94 ±2.03 / 205.90 ms │        198.56 / 201.76 ±2.24 / 204.67 ms │    no change │
│ QQuery 83 │           38.87 / 40.48 ±1.80 / 43.37 ms │           38.94 / 40.31 ±0.99 / 41.65 ms │    no change │
│ QQuery 84 │           49.28 / 50.21 ±0.71 / 51.04 ms │           48.23 / 49.02 ±0.65 / 49.90 ms │    no change │
│ QQuery 85 │        146.57 / 149.44 ±1.63 / 151.12 ms │        149.68 / 151.20 ±0.91 / 152.20 ms │    no change │
│ QQuery 86 │           39.01 / 40.64 ±0.95 / 41.99 ms │           38.73 / 41.24 ±1.97 / 44.03 ms │    no change │
│ QQuery 87 │           88.77 / 90.74 ±2.50 / 95.56 ms │           87.04 / 89.26 ±2.80 / 94.74 ms │    no change │
│ QQuery 88 │        100.76 / 101.56 ±0.68 / 102.61 ms │        101.35 / 102.49 ±0.61 / 103.04 ms │    no change │
│ QQuery 89 │        118.94 / 121.61 ±1.96 / 124.28 ms │        120.01 / 120.84 ±0.76 / 121.77 ms │    no change │
│ QQuery 90 │           24.31 / 25.44 ±1.09 / 27.13 ms │           24.29 / 24.83 ±0.42 / 25.47 ms │    no change │
│ QQuery 91 │           60.69 / 63.74 ±1.69 / 65.64 ms │           62.38 / 65.54 ±1.79 / 67.28 ms │    no change │
│ QQuery 92 │           57.86 / 58.66 ±0.63 / 59.28 ms │           58.08 / 59.35 ±1.21 / 61.29 ms │    no change │
│ QQuery 93 │        187.91 / 189.87 ±2.11 / 193.21 ms │        187.01 / 189.06 ±1.65 / 192.02 ms │    no change │
│ QQuery 94 │           60.96 / 62.29 ±1.12 / 63.81 ms │           61.39 / 62.58 ±0.65 / 63.32 ms │    no change │
│ QQuery 95 │        129.33 / 129.84 ±0.28 / 130.12 ms │        129.13 / 130.31 ±1.16 / 132.24 ms │    no change │
│ QQuery 96 │           73.64 / 75.06 ±0.88 / 76.02 ms │           72.31 / 74.37 ±1.26 / 76.19 ms │    no change │
│ QQuery 97 │        124.76 / 127.56 ±2.34 / 131.18 ms │        124.99 / 127.31 ±1.47 / 128.75 ms │    no change │
│ QQuery 98 │        152.55 / 155.51 ±2.30 / 158.79 ms │        152.39 / 155.81 ±2.16 / 159.14 ms │    no change │
│ QQuery 99 │ 10799.15 / 10864.77 ±35.59 / 10898.37 ms │ 10810.76 / 10852.07 ±29.64 / 10887.35 ms │    no change │
└───────────┴──────────────────────────────────────────┴──────────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 31773.55ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 31858.42ms │
│ Average Time (HEAD)                             │   320.94ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   321.80ms │
│ Queries Faster                                  │          0 │
│ Queries Slower                                  │          7 │
│ Queries with No Change                          │         92 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 159.2s
Peak memory 5.5 GiB
Avg memory 4.5 GiB
CPU user 262.4s
CPU sys 17.8s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 159.6s
Peak memory 5.6 GiB
Avg memory 4.5 GiB
CPU user 263.9s
CPU sys 17.1s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃      feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.23 / 4.55 ±6.48 / 17.51 ms │          1.19 / 4.44 ±6.37 / 17.18 ms │     no change │
│ QQuery 1  │        14.40 / 14.69 ±0.23 / 14.92 ms │        14.23 / 14.59 ±0.20 / 14.82 ms │     no change │
│ QQuery 2  │        44.12 / 44.31 ±0.17 / 44.54 ms │        44.11 / 44.28 ±0.11 / 44.47 ms │     no change │
│ QQuery 3  │        41.87 / 44.88 ±2.83 / 48.21 ms │        45.19 / 46.05 ±0.92 / 47.82 ms │     no change │
│ QQuery 4  │     301.57 / 305.93 ±4.33 / 312.84 ms │     283.62 / 292.46 ±7.51 / 301.96 ms │     no change │
│ QQuery 5  │     343.76 / 349.26 ±2.90 / 351.52 ms │     340.90 / 346.09 ±3.65 / 350.98 ms │     no change │
│ QQuery 6  │          5.00 / 7.74 ±2.08 / 10.58 ms │          5.80 / 8.85 ±4.45 / 17.63 ms │  1.14x slower │
│ QQuery 7  │        16.79 / 17.42 ±0.40 / 17.97 ms │        16.80 / 16.96 ±0.18 / 17.30 ms │     no change │
│ QQuery 8  │     417.96 / 426.38 ±7.76 / 436.12 ms │     421.02 / 426.12 ±4.84 / 434.24 ms │     no change │
│ QQuery 9  │     666.36 / 676.75 ±8.41 / 686.88 ms │    655.03 / 663.43 ±10.04 / 682.29 ms │     no change │
│ QQuery 10 │        92.21 / 93.67 ±2.08 / 97.80 ms │       93.16 / 96.01 ±4.39 / 104.70 ms │     no change │
│ QQuery 11 │     104.40 / 105.92 ±1.09 / 107.50 ms │     103.33 / 108.14 ±4.01 / 115.55 ms │     no change │
│ QQuery 12 │     345.12 / 351.62 ±5.08 / 358.61 ms │     338.51 / 349.23 ±6.79 / 358.13 ms │     no change │
│ QQuery 13 │    454.95 / 466.86 ±13.39 / 492.97 ms │    459.79 / 482.10 ±32.75 / 546.44 ms │     no change │
│ QQuery 14 │     344.61 / 348.93 ±4.09 / 356.53 ms │     343.57 / 349.57 ±3.91 / 354.96 ms │     no change │
│ QQuery 15 │    354.08 / 376.26 ±20.59 / 412.84 ms │    353.05 / 376.01 ±22.93 / 414.75 ms │     no change │
│ QQuery 16 │    717.04 / 731.91 ±17.98 / 766.45 ms │    714.64 / 749.96 ±28.15 / 784.01 ms │     no change │
│ QQuery 17 │     711.73 / 718.92 ±4.04 / 723.31 ms │     713.17 / 717.82 ±5.10 / 727.41 ms │     no change │
│ QQuery 18 │ 1419.90 / 1476.97 ±45.70 / 1523.25 ms │  1361.04 / 1376.04 ±9.64 / 1390.97 ms │ +1.07x faster │
│ QQuery 19 │       35.97 / 46.26 ±19.56 / 85.37 ms │        35.78 / 38.31 ±1.93 / 41.70 ms │ +1.21x faster │
│ QQuery 20 │    712.30 / 733.16 ±16.60 / 755.99 ms │     707.03 / 714.65 ±8.59 / 731.42 ms │     no change │
│ QQuery 21 │     767.93 / 773.25 ±4.38 / 778.46 ms │     757.44 / 762.39 ±4.21 / 769.10 ms │     no change │
│ QQuery 22 │  1137.01 / 1149.77 ±8.70 / 1162.30 ms │  1134.94 / 1140.41 ±5.69 / 1150.59 ms │     no change │
│ QQuery 23 │ 3090.99 / 3109.19 ±13.77 / 3131.70 ms │ 3079.16 / 3106.21 ±14.77 / 3123.90 ms │     no change │
│ QQuery 24 │     100.24 / 103.67 ±2.55 / 106.89 ms │     100.11 / 102.94 ±1.70 / 105.16 ms │     no change │
│ QQuery 25 │     139.49 / 141.35 ±1.43 / 143.90 ms │     137.95 / 141.81 ±2.79 / 146.51 ms │     no change │
│ QQuery 26 │      98.88 / 101.65 ±1.99 / 104.34 ms │      98.97 / 103.22 ±2.21 / 104.74 ms │     no change │
│ QQuery 27 │     852.53 / 858.70 ±9.74 / 878.11 ms │     846.79 / 851.11 ±4.26 / 857.73 ms │     no change │
│ QQuery 28 │ 3273.16 / 3306.19 ±16.95 / 3319.21 ms │ 3289.86 / 3315.39 ±20.65 / 3344.13 ms │     no change │
│ QQuery 29 │        50.27 / 54.97 ±4.49 / 62.93 ms │        50.24 / 56.60 ±5.57 / 65.85 ms │     no change │
│ QQuery 30 │     361.99 / 367.45 ±5.71 / 374.86 ms │     354.82 / 363.42 ±7.55 / 376.71 ms │     no change │
│ QQuery 31 │     354.41 / 371.28 ±9.10 / 378.15 ms │    361.59 / 378.76 ±12.41 / 394.38 ms │     no change │
│ QQuery 32 │ 1214.59 / 1260.10 ±34.96 / 1305.41 ms │ 1041.72 / 1056.56 ±15.17 / 1084.81 ms │ +1.19x faster │
│ QQuery 33 │ 1515.52 / 1570.85 ±38.41 / 1634.04 ms │  1469.34 / 1474.38 ±7.24 / 1488.32 ms │ +1.07x faster │
│ QQuery 34 │ 1485.86 / 1532.37 ±26.82 / 1565.04 ms │  1477.09 / 1487.24 ±7.28 / 1496.50 ms │     no change │
│ QQuery 35 │    393.36 / 426.17 ±54.55 / 534.85 ms │     391.43 / 401.95 ±7.89 / 411.93 ms │ +1.06x faster │
│ QQuery 36 │     115.02 / 120.80 ±3.82 / 125.19 ms │     118.07 / 122.65 ±3.64 / 128.83 ms │     no change │
│ QQuery 37 │        49.52 / 51.49 ±1.92 / 55.07 ms │        47.48 / 50.62 ±1.83 / 52.79 ms │     no change │
│ QQuery 38 │        74.07 / 76.49 ±1.25 / 77.52 ms │        75.14 / 77.58 ±1.56 / 79.76 ms │     no change │
│ QQuery 39 │     209.85 / 215.78 ±4.18 / 220.73 ms │     203.18 / 218.92 ±8.79 / 228.06 ms │     no change │
│ QQuery 40 │        24.46 / 25.99 ±1.19 / 27.44 ms │        21.42 / 23.66 ±1.70 / 26.54 ms │ +1.10x faster │
│ QQuery 41 │        20.66 / 22.64 ±2.61 / 27.69 ms │        19.87 / 21.02 ±1.09 / 22.36 ms │ +1.08x faster │
│ QQuery 42 │        19.06 / 19.93 ±0.46 / 20.34 ms │        19.08 / 20.03 ±0.64 / 21.10 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 23002.46ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 22498.00ms │
│ Average Time (HEAD)                             │   534.94ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   523.21ms │
│ Queries Faster                                  │          7 │
│ Queries Slower                                  │          1 │
│ Queries with No Change                          │         35 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 115.8s
Peak memory 36.4 GiB
Avg memory 27.0 GiB
CPU user 1079.4s
CPU sys 98.2s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 113.5s
Peak memory 40.1 GiB
Avg memory 33.2 GiB
CPU user 1075.7s
CPU sys 81.0s
Peak spill 0 B

File an issue against this benchmark runner

@Dandandan
Copy link
Copy Markdown
Contributor

run benchmark clickbench_partitioned clickbench_extended

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234477729-1140-pwvsf 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234477729-1141-9x5wm 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: clickbench_extended
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃         feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │              6.60 / 7.07 ±0.83 / 8.73 ms │              6.60 / 7.06 ±0.83 / 8.71 ms │     no change │
│ QQuery 2  │        143.88 / 144.99 ±1.22 / 147.28 ms │        145.49 / 146.39 ±0.75 / 147.36 ms │     no change │
│ QQuery 3  │        114.20 / 115.49 ±1.27 / 117.17 ms │        113.54 / 114.31 ±0.75 / 115.71 ms │     no change │
│ QQuery 4  │    1407.17 / 1446.02 ±28.01 / 1488.27 ms │    1346.64 / 1364.53 ±15.80 / 1393.25 ms │ +1.06x faster │
│ QQuery 5  │        171.78 / 174.25 ±2.64 / 179.25 ms │        172.57 / 174.59 ±1.43 / 176.25 ms │     no change │
│ QQuery 6  │       849.48 / 877.16 ±22.87 / 905.49 ms │       828.07 / 868.05 ±30.14 / 901.99 ms │     no change │
│ QQuery 7  │        343.32 / 346.59 ±2.93 / 350.49 ms │        341.03 / 345.45 ±3.39 / 351.37 ms │     no change │
│ QQuery 8  │        117.21 / 119.39 ±1.66 / 121.19 ms │        117.29 / 118.25 ±1.00 / 119.81 ms │     no change │
│ QQuery 9  │        101.98 / 105.46 ±2.06 / 107.89 ms │        101.30 / 103.91 ±2.47 / 107.10 ms │     no change │
│ QQuery 10 │        105.17 / 106.91 ±1.15 / 108.76 ms │        104.99 / 106.26 ±0.68 / 106.86 ms │     no change │
│ QQuery 11 │       950.10 / 964.75 ±15.56 / 992.71 ms │       952.07 / 966.13 ±10.18 / 977.82 ms │     no change │
│ QQuery 12 │           49.17 / 50.87 ±1.79 / 53.14 ms │           44.24 / 45.58 ±1.32 / 48.07 ms │ +1.12x faster │
│ QQuery 13 │        400.79 / 408.52 ±5.63 / 417.80 ms │        401.25 / 405.70 ±3.28 / 410.32 ms │     no change │
│ QQuery 14 │     1004.14 / 1009.46 ±3.40 / 1013.29 ms │     1004.37 / 1006.86 ±1.62 / 1008.42 ms │     no change │
│ QQuery 15 │           15.60 / 16.29 ±0.72 / 17.64 ms │           15.34 / 16.91 ±1.14 / 18.64 ms │     no change │
│ QQuery 16 │              7.29 / 7.58 ±0.23 / 7.82 ms │              7.31 / 7.77 ±0.29 / 8.12 ms │     no change │
│ QQuery 17 │        229.09 / 231.19 ±1.64 / 233.94 ms │        227.78 / 229.37 ±1.34 / 231.54 ms │     no change │
│ QQuery 18 │        129.42 / 129.76 ±0.34 / 130.31 ms │        126.89 / 128.72 ±1.18 / 130.47 ms │     no change │
│ QQuery 19 │        154.96 / 157.06 ±1.46 / 158.84 ms │        155.11 / 156.36 ±1.00 / 157.47 ms │     no change │
│ QQuery 20 │           13.40 / 14.08 ±0.44 / 14.76 ms │           13.71 / 14.26 ±0.30 / 14.57 ms │     no change │
│ QQuery 21 │           18.94 / 19.61 ±0.35 / 19.91 ms │           19.53 / 19.89 ±0.31 / 20.34 ms │     no change │
│ QQuery 22 │        486.41 / 490.25 ±2.46 / 492.89 ms │        485.59 / 488.57 ±2.14 / 491.78 ms │     no change │
│ QQuery 23 │        881.75 / 888.47 ±6.90 / 897.22 ms │        874.89 / 884.92 ±8.92 / 901.54 ms │     no change │
│ QQuery 24 │        382.00 / 384.91 ±2.95 / 389.94 ms │        381.40 / 383.88 ±3.14 / 389.86 ms │     no change │
│ QQuery 25 │        340.39 / 342.38 ±1.35 / 343.76 ms │        336.89 / 340.18 ±2.75 / 343.84 ms │     no change │
│ QQuery 26 │           82.02 / 82.93 ±0.75 / 84.22 ms │           81.69 / 83.69 ±2.44 / 87.06 ms │     no change │
│ QQuery 27 │              7.14 / 7.67 ±0.78 / 9.20 ms │              6.75 / 6.99 ±0.29 / 7.51 ms │ +1.10x faster │
│ QQuery 28 │        148.11 / 151.11 ±2.46 / 155.58 ms │        148.59 / 150.08 ±0.99 / 151.50 ms │     no change │
│ QQuery 29 │        280.02 / 283.14 ±1.85 / 285.10 ms │        278.47 / 282.20 ±2.15 / 284.39 ms │     no change │
│ QQuery 30 │           43.46 / 46.42 ±1.97 / 48.60 ms │           43.38 / 45.04 ±1.56 / 47.82 ms │     no change │
│ QQuery 31 │        169.78 / 171.51 ±1.02 / 172.58 ms │        171.26 / 173.61 ±1.70 / 175.62 ms │     no change │
│ QQuery 32 │           56.82 / 58.73 ±1.23 / 60.51 ms │           57.19 / 57.73 ±0.63 / 58.94 ms │     no change │
│ QQuery 33 │        141.79 / 142.90 ±0.89 / 144.49 ms │        140.06 / 142.63 ±2.83 / 147.88 ms │     no change │
│ QQuery 34 │              7.10 / 7.27 ±0.16 / 7.54 ms │             7.31 / 8.11 ±1.00 / 10.04 ms │  1.12x slower │
│ QQuery 35 │        105.24 / 108.18 ±1.55 / 109.74 ms │        113.10 / 114.31 ±1.13 / 115.81 ms │  1.06x slower │
│ QQuery 36 │              6.52 / 6.61 ±0.11 / 6.82 ms │              6.69 / 7.12 ±0.48 / 8.04 ms │  1.08x slower │
│ QQuery 37 │             8.66 / 9.39 ±0.80 / 10.84 ms │             8.66 / 9.51 ±0.66 / 10.70 ms │     no change │
│ QQuery 38 │           86.45 / 88.58 ±2.96 / 94.37 ms │           87.34 / 90.25 ±4.29 / 98.75 ms │     no change │
│ QQuery 39 │        125.56 / 128.65 ±2.68 / 132.66 ms │        126.42 / 130.74 ±3.44 / 136.60 ms │     no change │
│ QQuery 40 │        108.75 / 116.53 ±6.97 / 129.42 ms │        120.88 / 127.63 ±9.32 / 145.94 ms │  1.10x slower │
│ QQuery 41 │           14.34 / 15.28 ±0.58 / 16.07 ms │           14.30 / 15.82 ±1.19 / 17.47 ms │     no change │
│ QQuery 42 │        108.24 / 109.86 ±1.55 / 112.63 ms │        108.34 / 109.85 ±0.93 / 110.82 ms │     no change │
│ QQuery 43 │              6.00 / 6.12 ±0.12 / 6.31 ms │              5.93 / 6.03 ±0.12 / 6.27 ms │     no change │
│ QQuery 44 │           11.93 / 12.85 ±0.98 / 14.29 ms │           11.79 / 12.23 ±0.34 / 12.81 ms │     no change │
│ QQuery 45 │           51.59 / 52.20 ±0.71 / 53.58 ms │           50.50 / 51.48 ±0.80 / 52.59 ms │     no change │
│ QQuery 46 │              8.37 / 8.86 ±0.32 / 9.30 ms │              8.22 / 8.55 ±0.21 / 8.79 ms │     no change │
│ QQuery 47 │        730.15 / 735.98 ±6.90 / 748.40 ms │        705.59 / 712.82 ±4.86 / 720.66 ms │     no change │
│ QQuery 48 │        293.14 / 296.48 ±3.12 / 301.21 ms │        294.01 / 296.74 ±2.30 / 300.54 ms │     no change │
│ QQuery 49 │        250.28 / 253.44 ±3.22 / 259.53 ms │        251.81 / 253.12 ±1.05 / 254.43 ms │     no change │
│ QQuery 50 │        226.01 / 230.32 ±4.01 / 235.24 ms │        220.67 / 223.64 ±2.76 / 228.09 ms │     no change │
│ QQuery 51 │        183.04 / 185.25 ±2.09 / 189.07 ms │        178.31 / 181.98 ±1.95 / 184.09 ms │     no change │
│ QQuery 52 │        107.65 / 110.58 ±3.03 / 116.28 ms │        108.42 / 110.26 ±2.22 / 114.63 ms │     no change │
│ QQuery 53 │        102.87 / 103.59 ±0.90 / 105.24 ms │        103.27 / 104.20 ±1.01 / 106.01 ms │     no change │
│ QQuery 54 │        144.26 / 147.65 ±2.00 / 150.36 ms │        145.75 / 148.22 ±2.27 / 152.02 ms │     no change │
│ QQuery 55 │        107.20 / 108.13 ±0.76 / 109.28 ms │        107.44 / 109.68 ±1.38 / 111.81 ms │     no change │
│ QQuery 56 │        141.05 / 142.32 ±1.01 / 144.15 ms │        140.48 / 142.52 ±1.42 / 144.84 ms │     no change │
│ QQuery 57 │        172.82 / 175.12 ±1.39 / 176.89 ms │        174.64 / 176.19 ±1.47 / 178.51 ms │     no change │
│ QQuery 58 │        286.62 / 296.24 ±6.87 / 305.53 ms │       285.31 / 298.28 ±13.20 / 317.51 ms │     no change │
│ QQuery 59 │        199.23 / 200.95 ±1.69 / 204.20 ms │        195.69 / 199.36 ±3.05 / 203.59 ms │     no change │
│ QQuery 60 │        144.67 / 145.48 ±0.66 / 146.41 ms │        142.34 / 143.44 ±1.31 / 145.79 ms │     no change │
│ QQuery 61 │           12.99 / 13.45 ±0.35 / 13.95 ms │           12.73 / 13.06 ±0.22 / 13.34 ms │     no change │
│ QQuery 62 │       904.73 / 932.43 ±16.55 / 947.84 ms │       901.55 / 934.20 ±25.10 / 966.87 ms │     no change │
│ QQuery 63 │        103.15 / 106.72 ±3.02 / 110.78 ms │        103.85 / 105.22 ±1.02 / 106.83 ms │     no change │
│ QQuery 64 │        683.07 / 685.79 ±2.81 / 690.87 ms │        680.75 / 687.10 ±3.46 / 690.59 ms │     no change │
│ QQuery 65 │        246.22 / 253.56 ±4.22 / 258.12 ms │        252.05 / 256.03 ±3.55 / 262.20 ms │     no change │
│ QQuery 66 │       234.63 / 253.48 ±10.76 / 265.83 ms │        247.60 / 256.44 ±7.16 / 265.72 ms │     no change │
│ QQuery 67 │        307.25 / 316.77 ±5.71 / 323.28 ms │       319.99 / 334.45 ±14.63 / 357.79 ms │  1.06x slower │
│ QQuery 68 │           10.40 / 11.74 ±1.30 / 14.02 ms │            9.81 / 10.88 ±0.79 / 12.24 ms │ +1.08x faster │
│ QQuery 69 │        100.32 / 103.93 ±2.11 / 106.32 ms │        102.81 / 105.31 ±1.32 / 106.40 ms │     no change │
│ QQuery 70 │       342.77 / 354.40 ±11.96 / 373.37 ms │        337.23 / 344.94 ±6.28 / 351.91 ms │     no change │
│ QQuery 71 │        134.41 / 137.03 ±1.43 / 138.73 ms │        136.55 / 137.88 ±1.11 / 139.85 ms │     no change │
│ QQuery 72 │        611.97 / 618.14 ±5.14 / 627.10 ms │       605.50 / 623.82 ±12.23 / 637.66 ms │     no change │
│ QQuery 73 │              7.45 / 8.17 ±0.58 / 9.07 ms │             7.32 / 8.36 ±1.07 / 10.11 ms │     no change │
│ QQuery 74 │        581.34 / 592.24 ±8.46 / 606.84 ms │        574.83 / 587.08 ±9.45 / 597.06 ms │     no change │
│ QQuery 75 │        277.59 / 280.04 ±2.61 / 285.00 ms │        275.81 / 279.40 ±2.65 / 283.37 ms │     no change │
│ QQuery 76 │        131.53 / 133.57 ±1.57 / 136.07 ms │        131.98 / 133.99 ±1.18 / 135.67 ms │     no change │
│ QQuery 77 │        188.69 / 190.76 ±1.26 / 192.15 ms │        189.33 / 190.25 ±0.58 / 191.04 ms │     no change │
│ QQuery 78 │        340.49 / 343.98 ±3.33 / 350.02 ms │        339.79 / 342.76 ±2.82 / 346.34 ms │     no change │
│ QQuery 79 │        233.09 / 234.57 ±1.62 / 237.15 ms │        233.94 / 236.02 ±1.24 / 237.23 ms │     no change │
│ QQuery 80 │        320.55 / 323.94 ±2.76 / 327.30 ms │        321.31 / 326.39 ±2.91 / 329.11 ms │     no change │
│ QQuery 81 │           26.33 / 27.38 ±0.68 / 28.20 ms │           26.48 / 27.22 ±0.62 / 28.21 ms │     no change │
│ QQuery 82 │        197.82 / 199.31 ±2.29 / 203.86 ms │        198.55 / 200.71 ±2.16 / 203.59 ms │     no change │
│ QQuery 83 │           39.37 / 41.36 ±2.24 / 45.22 ms │           38.52 / 39.36 ±1.33 / 42.00 ms │     no change │
│ QQuery 84 │           48.63 / 49.58 ±0.88 / 50.80 ms │           48.77 / 49.40 ±0.39 / 49.92 ms │     no change │
│ QQuery 85 │        147.39 / 148.66 ±1.16 / 150.63 ms │        147.83 / 148.63 ±0.66 / 149.75 ms │     no change │
│ QQuery 86 │           38.52 / 40.01 ±1.14 / 41.54 ms │           39.86 / 40.90 ±0.87 / 42.04 ms │     no change │
│ QQuery 87 │           85.60 / 88.73 ±3.70 / 95.81 ms │           85.60 / 88.35 ±3.32 / 94.88 ms │     no change │
│ QQuery 88 │        100.63 / 101.95 ±0.96 / 103.51 ms │         99.93 / 101.22 ±1.04 / 102.68 ms │     no change │
│ QQuery 89 │        118.81 / 119.79 ±1.26 / 122.07 ms │        118.70 / 119.92 ±1.02 / 121.42 ms │     no change │
│ QQuery 90 │           23.99 / 24.20 ±0.20 / 24.55 ms │           22.99 / 24.11 ±0.67 / 24.90 ms │     no change │
│ QQuery 91 │           61.98 / 64.38 ±1.66 / 66.73 ms │           62.03 / 64.30 ±2.30 / 68.74 ms │     no change │
│ QQuery 92 │           57.67 / 58.07 ±0.31 / 58.44 ms │           57.81 / 59.39 ±1.21 / 61.43 ms │     no change │
│ QQuery 93 │        184.73 / 185.90 ±0.88 / 187.18 ms │        185.38 / 188.28 ±1.90 / 190.69 ms │     no change │
│ QQuery 94 │           61.74 / 62.66 ±0.75 / 63.87 ms │           60.38 / 62.32 ±1.48 / 64.92 ms │     no change │
│ QQuery 95 │        127.91 / 128.82 ±0.55 / 129.40 ms │        127.77 / 128.56 ±0.72 / 129.74 ms │     no change │
│ QQuery 96 │           73.22 / 74.44 ±0.77 / 75.59 ms │           73.32 / 74.75 ±1.10 / 76.65 ms │     no change │
│ QQuery 97 │        125.16 / 126.41 ±0.79 / 127.42 ms │        124.06 / 127.60 ±2.47 / 130.65 ms │     no change │
│ QQuery 98 │        154.18 / 156.03 ±1.73 / 159.27 ms │        153.08 / 156.89 ±2.23 / 159.74 ms │     no change │
│ QQuery 99 │ 10778.40 / 10822.79 ±36.20 / 10879.92 ms │ 10738.74 / 10797.52 ±47.31 / 10877.80 ms │     no change │
└───────────┴──────────────────────────────────────────┴──────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 31720.03ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 31590.96ms │
│ Average Time (HEAD)                             │   320.40ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   319.10ms │
│ Queries Faster                                  │          4 │
│ Queries Slower                                  │          5 │
│ Queries with No Change                          │         90 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 158.9s
Peak memory 5.5 GiB
Avg memory 4.5 GiB
CPU user 261.6s
CPU sys 17.7s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 158.3s
Peak memory 5.5 GiB
Avg memory 4.7 GiB
CPU user 260.4s
CPU sys 17.4s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃      feat_reorder-row-groups-by-stats ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.19 / 4.47 ±6.41 / 17.29 ms │          1.18 / 4.49 ±6.46 / 17.40 ms │    no change │
│ QQuery 1  │        14.15 / 14.60 ±0.26 / 14.84 ms │        14.15 / 14.56 ±0.22 / 14.82 ms │    no change │
│ QQuery 2  │        44.34 / 44.86 ±0.43 / 45.63 ms │        43.30 / 43.63 ±0.25 / 44.03 ms │    no change │
│ QQuery 3  │        44.54 / 45.82 ±1.10 / 47.71 ms │        43.32 / 44.20 ±1.01 / 46.01 ms │    no change │
│ QQuery 4  │     292.18 / 299.58 ±6.01 / 307.53 ms │     286.50 / 297.56 ±6.69 / 305.63 ms │    no change │
│ QQuery 5  │     347.46 / 350.76 ±2.10 / 353.17 ms │     346.62 / 348.87 ±1.91 / 351.08 ms │    no change │
│ QQuery 6  │          5.72 / 7.22 ±1.67 / 10.45 ms │         5.59 / 10.51 ±5.76 / 21.61 ms │ 1.46x slower │
│ QQuery 7  │        16.96 / 17.07 ±0.12 / 17.27 ms │        16.65 / 16.92 ±0.19 / 17.22 ms │    no change │
│ QQuery 8  │     417.45 / 427.68 ±7.65 / 440.48 ms │     426.14 / 431.89 ±5.38 / 441.00 ms │    no change │
│ QQuery 9  │     677.82 / 684.73 ±7.88 / 698.33 ms │    648.52 / 655.74 ±10.01 / 675.29 ms │    no change │
│ QQuery 10 │        94.83 / 95.71 ±0.80 / 97.09 ms │        90.54 / 93.26 ±2.43 / 97.71 ms │    no change │
│ QQuery 11 │     107.39 / 107.95 ±0.67 / 108.92 ms │     104.16 / 105.15 ±0.72 / 106.34 ms │    no change │
│ QQuery 12 │     349.32 / 356.12 ±4.19 / 361.45 ms │     338.19 / 342.31 ±2.21 / 344.83 ms │    no change │
│ QQuery 13 │    452.34 / 466.84 ±13.61 / 486.92 ms │    441.75 / 464.03 ±17.71 / 491.08 ms │    no change │
│ QQuery 14 │     348.39 / 351.36 ±3.27 / 356.18 ms │     348.22 / 351.57 ±1.94 / 353.55 ms │    no change │
│ QQuery 15 │    357.19 / 373.85 ±17.96 / 408.62 ms │     362.86 / 368.16 ±5.64 / 376.07 ms │    no change │
│ QQuery 16 │     714.87 / 726.32 ±6.98 / 736.12 ms │    741.65 / 757.99 ±13.25 / 781.76 ms │    no change │
│ QQuery 17 │    716.44 / 748.86 ±25.39 / 773.58 ms │     721.88 / 729.91 ±6.46 / 738.41 ms │    no change │
│ QQuery 18 │ 1373.87 / 1427.78 ±45.55 / 1482.21 ms │ 1434.15 / 1503.93 ±35.02 / 1525.61 ms │ 1.05x slower │
│ QQuery 19 │        35.59 / 36.41 ±0.62 / 37.03 ms │        36.31 / 38.22 ±1.99 / 41.73 ms │    no change │
│ QQuery 20 │    713.38 / 725.99 ±13.06 / 742.01 ms │    716.03 / 731.70 ±15.57 / 761.50 ms │    no change │
│ QQuery 21 │     765.34 / 768.87 ±3.09 / 772.97 ms │     762.06 / 764.19 ±1.70 / 767.11 ms │    no change │
│ QQuery 22 │  1134.09 / 1142.01 ±5.27 / 1147.72 ms │  1132.10 / 1137.95 ±4.16 / 1143.66 ms │    no change │
│ QQuery 23 │ 3094.85 / 3120.29 ±14.36 / 3137.68 ms │ 3077.09 / 3115.46 ±20.88 / 3134.31 ms │    no change │
│ QQuery 24 │     100.75 / 103.14 ±1.95 / 106.05 ms │     100.97 / 103.96 ±2.97 / 108.54 ms │    no change │
│ QQuery 25 │     139.98 / 141.47 ±1.44 / 144.06 ms │     138.30 / 140.65 ±1.57 / 142.80 ms │    no change │
│ QQuery 26 │     101.23 / 102.52 ±0.77 / 103.55 ms │     102.40 / 104.22 ±1.40 / 105.89 ms │    no change │
│ QQuery 27 │     855.42 / 859.58 ±5.27 / 869.85 ms │     855.77 / 861.08 ±5.43 / 869.79 ms │    no change │
│ QQuery 28 │ 3284.27 / 3308.55 ±14.18 / 3325.76 ms │ 3289.54 / 3316.91 ±14.83 / 3330.08 ms │    no change │
│ QQuery 29 │        50.39 / 55.78 ±5.11 / 62.80 ms │        51.97 / 56.29 ±4.39 / 63.23 ms │    no change │
│ QQuery 30 │     357.77 / 370.56 ±7.03 / 377.46 ms │     362.16 / 368.32 ±5.55 / 378.64 ms │    no change │
│ QQuery 31 │    363.55 / 385.00 ±12.49 / 398.04 ms │     398.54 / 401.59 ±2.74 / 405.02 ms │    no change │
│ QQuery 32 │ 1034.15 / 1059.31 ±22.35 / 1100.09 ms │ 1173.83 / 1288.89 ±81.25 / 1419.40 ms │ 1.22x slower │
│ QQuery 33 │ 1472.92 / 1487.84 ±11.01 / 1499.14 ms │ 1466.40 / 1513.38 ±43.97 / 1593.67 ms │    no change │
│ QQuery 34 │ 1464.67 / 1499.90 ±31.40 / 1548.80 ms │ 1475.45 / 1491.40 ±14.74 / 1517.30 ms │    no change │
│ QQuery 35 │     390.93 / 396.99 ±5.12 / 404.97 ms │     392.12 / 396.84 ±3.54 / 401.88 ms │    no change │
│ QQuery 36 │     120.42 / 122.98 ±1.62 / 125.38 ms │     119.20 / 123.01 ±3.25 / 127.60 ms │    no change │
│ QQuery 37 │        49.66 / 50.72 ±1.27 / 53.16 ms │        47.35 / 50.08 ±1.55 / 51.79 ms │    no change │
│ QQuery 38 │        76.35 / 78.01 ±1.50 / 80.66 ms │        76.64 / 77.90 ±0.90 / 78.73 ms │    no change │
│ QQuery 39 │     208.12 / 219.98 ±6.84 / 229.12 ms │     220.84 / 223.45 ±1.88 / 225.94 ms │    no change │
│ QQuery 40 │        24.82 / 25.18 ±0.37 / 25.85 ms │        24.34 / 26.23 ±2.29 / 30.09 ms │    no change │
│ QQuery 41 │        20.47 / 21.79 ±1.17 / 23.54 ms │        20.58 / 21.45 ±0.94 / 23.06 ms │    no change │
│ QQuery 42 │        19.76 / 20.16 ±0.31 / 20.63 ms │        19.68 / 20.30 ±0.47 / 21.02 ms │    no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 22654.62ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 22958.16ms │
│ Average Time (HEAD)                             │   526.85ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   533.91ms │
│ Queries Faster                                  │          0 │
│ Queries Slower                                  │          3 │
│ Queries with No Change                          │         40 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 114.5s
Peak memory 42.0 GiB
Avg memory 32.4 GiB
CPU user 1080.8s
CPU sys 84.9s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 115.9s
Peak memory 37.7 GiB
Avg memory 28.3 GiB
CPU user 1081.1s
CPU sys 96.2s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃      feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.34 / 4.79 ±6.61 / 18.02 ms │          1.22 / 4.58 ±6.53 / 17.64 ms │     no change │
│ QQuery 1  │        15.01 / 15.50 ±0.43 / 16.28 ms │        14.27 / 14.85 ±0.31 / 15.10 ms │     no change │
│ QQuery 2  │        45.69 / 46.01 ±0.28 / 46.38 ms │        44.25 / 44.67 ±0.33 / 45.07 ms │     no change │
│ QQuery 3  │        45.23 / 49.08 ±3.08 / 53.24 ms │        44.46 / 47.22 ±1.51 / 48.61 ms │     no change │
│ QQuery 4  │    307.65 / 329.18 ±15.17 / 353.16 ms │    333.50 / 349.92 ±10.12 / 363.58 ms │  1.06x slower │
│ QQuery 5  │     383.18 / 390.58 ±5.85 / 398.30 ms │    375.25 / 389.29 ±13.55 / 414.94 ms │     no change │
│ QQuery 6  │          5.21 / 7.51 ±2.87 / 13.10 ms │           5.67 / 7.06 ±1.14 / 8.47 ms │ +1.06x faster │
│ QQuery 7  │        17.58 / 18.20 ±0.57 / 19.20 ms │        17.81 / 21.41 ±6.39 / 34.18 ms │  1.18x slower │
│ QQuery 8  │     467.34 / 477.25 ±9.77 / 495.30 ms │    473.21 / 490.70 ±19.41 / 524.50 ms │     no change │
│ QQuery 9  │    699.39 / 729.91 ±22.29 / 765.71 ms │    745.55 / 764.99 ±14.67 / 789.00 ms │     no change │
│ QQuery 10 │      99.31 / 102.80 ±4.68 / 111.76 ms │       95.14 / 99.07 ±3.46 / 104.88 ms │     no change │
│ QQuery 11 │     107.63 / 109.49 ±1.08 / 110.50 ms │     110.95 / 113.80 ±2.44 / 117.91 ms │     no change │
│ QQuery 12 │     389.20 / 393.99 ±3.59 / 399.55 ms │    378.13 / 397.72 ±15.36 / 416.18 ms │     no change │
│ QQuery 13 │    497.82 / 519.79 ±18.11 / 553.00 ms │    507.72 / 534.68 ±18.52 / 564.44 ms │     no change │
│ QQuery 14 │    356.26 / 382.90 ±14.21 / 394.37 ms │     378.78 / 390.21 ±8.26 / 404.19 ms │     no change │
│ QQuery 15 │    397.89 / 421.76 ±20.69 / 455.22 ms │    406.54 / 430.62 ±31.51 / 491.99 ms │     no change │
│ QQuery 16 │    817.79 / 843.15 ±20.23 / 870.39 ms │    795.58 / 834.18 ±21.31 / 858.28 ms │     no change │
│ QQuery 17 │    769.11 / 793.23 ±12.93 / 806.04 ms │    790.19 / 823.71 ±33.40 / 886.71 ms │     no change │
│ QQuery 18 │ 1592.82 / 1638.07 ±31.84 / 1675.50 ms │ 1536.04 / 1625.95 ±49.00 / 1673.65 ms │     no change │
│ QQuery 19 │        36.17 / 38.43 ±2.73 / 41.97 ms │       39.25 / 52.79 ±14.25 / 76.08 ms │  1.37x slower │
│ QQuery 20 │    742.56 / 763.72 ±21.08 / 796.51 ms │    747.36 / 771.03 ±35.37 / 841.37 ms │     no change │
│ QQuery 21 │     787.42 / 799.07 ±8.52 / 810.66 ms │     794.97 / 798.60 ±2.97 / 803.45 ms │     no change │
│ QQuery 22 │  1173.63 / 1184.40 ±7.48 / 1192.57 ms │  1187.50 / 1195.28 ±6.21 / 1202.63 ms │     no change │
│ QQuery 23 │ 3281.57 / 3306.51 ±21.44 / 3343.54 ms │ 3275.73 / 3301.49 ±20.45 / 3332.33 ms │     no change │
│ QQuery 24 │     108.95 / 111.32 ±1.91 / 114.39 ms │     107.23 / 110.08 ±3.29 / 116.30 ms │     no change │
│ QQuery 25 │     144.33 / 146.42 ±1.45 / 148.05 ms │     143.15 / 145.40 ±1.34 / 146.55 ms │     no change │
│ QQuery 26 │     107.01 / 108.68 ±1.66 / 111.45 ms │     105.69 / 108.26 ±1.82 / 110.40 ms │     no change │
│ QQuery 27 │     883.03 / 891.30 ±4.95 / 898.28 ms │    874.98 / 887.56 ±12.77 / 911.22 ms │     no change │
│ QQuery 28 │ 3386.14 / 3425.51 ±28.74 / 3464.68 ms │ 3398.98 / 3422.81 ±12.65 / 3436.83 ms │     no change │
│ QQuery 29 │        53.27 / 58.56 ±6.00 / 69.01 ms │        52.73 / 57.20 ±4.79 / 64.77 ms │     no change │
│ QQuery 30 │     405.31 / 409.26 ±4.00 / 416.32 ms │     393.87 / 407.38 ±7.40 / 414.80 ms │     no change │
│ QQuery 31 │    383.84 / 403.31 ±16.85 / 432.32 ms │    397.47 / 427.43 ±17.62 / 452.58 ms │  1.06x slower │
│ QQuery 32 │ 1072.71 / 1165.40 ±47.01 / 1203.37 ms │ 1232.95 / 1400.42 ±92.57 / 1512.21 ms │  1.20x slower │
│ QQuery 33 │ 1640.38 / 1658.01 ±11.14 / 1675.05 ms │ 1633.63 / 1658.56 ±17.62 / 1686.57 ms │     no change │
│ QQuery 34 │ 1681.98 / 1711.27 ±17.86 / 1732.02 ms │ 1664.80 / 1679.12 ±11.31 / 1694.41 ms │     no change │
│ QQuery 35 │    454.45 / 479.47 ±16.71 / 500.66 ms │    462.20 / 481.40 ±13.83 / 502.10 ms │     no change │
│ QQuery 36 │     122.21 / 128.91 ±3.68 / 132.42 ms │     124.76 / 133.75 ±5.72 / 142.63 ms │     no change │
│ QQuery 37 │        52.60 / 56.72 ±3.07 / 61.58 ms │        52.08 / 54.36 ±1.53 / 56.64 ms │     no change │
│ QQuery 38 │        77.39 / 80.30 ±1.75 / 82.02 ms │        79.32 / 81.94 ±2.16 / 84.78 ms │     no change │
│ QQuery 39 │     242.23 / 248.31 ±4.99 / 255.10 ms │     246.21 / 254.21 ±7.55 / 264.73 ms │     no change │
│ QQuery 40 │        28.18 / 30.63 ±1.34 / 32.00 ms │        24.84 / 27.65 ±1.91 / 29.46 ms │ +1.11x faster │
│ QQuery 41 │        22.58 / 23.91 ±0.89 / 25.28 ms │        22.53 / 23.86 ±1.25 / 26.03 ms │     no change │
│ QQuery 42 │        21.19 / 22.21 ±1.14 / 24.33 ms │        21.33 / 22.81 ±1.23 / 24.84 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 24524.79ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 24888.03ms │
│ Average Time (HEAD)                             │   570.34ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   578.79ms │
│ Queries Faster                                  │          2 │
│ Queries Slower                                  │          5 │
│ Queries with No Change                          │         36 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 123.9s
Peak memory 42.3 GiB
Avg memory 31.0 GiB
CPU user 1165.0s
CPU sys 98.5s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 125.7s
Peak memory 40.7 GiB
Avg memory 29.1 GiB
CPU user 1166.3s
CPU sys 111.6s
Peak spill 0 B

File an issue against this benchmark runner

@Dandandan
Copy link
Copy Markdown
Contributor

I wonder if the ordering should be done before the files / row groups are assigned to partitions? So they are more globally sorted instead of just per partition? It seems now they are sorted within each partition, which should help, but perhaps not nearly as much as it would be if all the partitions contain the optimal row groups?

This would also help in the case of #21581

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                                   HEAD ┃        feat_reorder-row-groups-by-stats ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0  │     806.64 / 824.04 ±15.06 / 844.04 ms │       818.76 / 831.86 ±9.65 / 847.07 ms │ no change │
│ QQuery 1  │      207.91 / 208.37 ±0.34 / 208.85 ms │       208.05 / 209.42 ±1.18 / 211.29 ms │ no change │
│ QQuery 2  │      493.00 / 495.57 ±2.04 / 499.15 ms │       501.52 / 504.02 ±1.65 / 505.95 ms │ no change │
│ QQuery 3  │      313.03 / 314.64 ±0.96 / 315.57 ms │       313.38 / 315.81 ±1.51 / 317.65 ms │ no change │
│ QQuery 4  │     656.64 / 674.45 ±10.93 / 686.40 ms │       663.78 / 674.03 ±8.68 / 688.82 ms │ no change │
│ QQuery 5  │ 9437.73 / 9707.73 ±166.88 / 9887.44 ms │ 9679.36 / 9939.30 ±174.56 / 10160.05 ms │ no change │
│ QQuery 6  │  1002.26 / 1011.57 ±14.99 / 1041.49 ms │     997.60 / 1006.50 ±9.67 / 1023.43 ms │ no change │
│ QQuery 7  │     773.67 / 806.98 ±35.77 / 873.62 ms │       778.19 / 786.06 ±5.20 / 792.91 ms │ no change │
│ QQuery 8  │      397.92 / 404.38 ±5.08 / 412.20 ms │       398.58 / 404.24 ±5.67 / 415.04 ms │ no change │
│ QQuery 9  │  2807.44 / 2826.33 ±16.14 / 2853.16 ms │   2754.46 / 2797.70 ±24.98 / 2824.10 ms │ no change │
│ QQuery 10 │      633.75 / 639.16 ±5.96 / 648.49 ms │      631.36 / 642.65 ±13.99 / 670.06 ms │ no change │
│ QQuery 11 │  2047.27 / 2070.44 ±19.89 / 2101.14 ms │   2049.92 / 2079.78 ±21.09 / 2115.19 ms │ no change │
│ QQuery 12 │      200.39 / 202.67 ±2.01 / 205.97 ms │       194.24 / 202.01 ±6.44 / 213.63 ms │ no change │
└───────────┴────────────────────────────────────────┴─────────────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 20186.32ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 20393.39ms │
│ Average Time (HEAD)                             │  1552.79ms │
│ Average Time (feat_reorder-row-groups-by-stats) │  1568.72ms │
│ Queries Faster                                  │          0 │
│ Queries Slower                                  │          0 │
│ Queries with No Change                          │         13 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_extended — base (merge-base)

Metric Value
Wall time 101.8s
Peak memory 32.6 GiB
Avg memory 27.4 GiB
CPU user 981.3s
CPU sys 48.2s
Peak spill 0 B

clickbench_extended — branch

Metric Value
Wall time 102.8s
Peak memory 34.1 GiB
Avg memory 29.7 GiB
CPU user 986.9s
CPU sys 46.1s
Peak spill 0 B

File an issue against this benchmark runner

@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

zhuqi-lucas commented Apr 13, 2026

I wonder if the ordering should be done before the files / row groups are assigned to partitions? So they are more globally sorted instead of just per partition? It seems now they are sorted within each partition, which should help, but perhaps not nearly as much as it would be if all the partitions contain the optimal row groups?

This would also help in the case of #21581

Great point @Dandandan — you're right that global reorder is much more effective than per-partition reorder. With global reorder + round-robin distribution, each partition's first RG is close to the global optimum, so:

  1. All partitions quickly converge to tight local TopK thresholds in parallel
  2. SPM merging finishes faster because each partition's first few batches contain optimal values → LIMIT can be satisfied with minimal reads across partitions

The current per-partition reorder is limited because even after sorting, partition 0's "best" RG might be much worse than the global best (which may have landed in partition 2).

Moving to global reorder would require changes at the planning / EnforceDistribution layer to load RG statistics and redistribute RGs across partitions. I'd prefer to keep this PR as an incremental step (per-partition) and track global reorder as a follow-up — it would benefit both #21317 and #21581.

Does this make sense?

For overlapping row group ranges, sorting by min for DESC can pick
a worse first RG. Example: RG0(50-60) vs RG1(40-100) — min DESC
picks RG0 first (max=60), but RG1 contains the largest values (max=100).

Use min for ASC and max for DESC to correctly prioritize the row
group most likely to contain the optimal values for TopK.
@Dandandan
Copy link
Copy Markdown
Contributor

I wonder if the ordering should be done before the files / row groups are assigned to partitions? So they are more globally sorted instead of just per partition? It seems now they are sorted within each partition, which should help, but perhaps not nearly as much as it would be if all the partitions contain the optimal row groups?
This would also help in the case of #21581

Great point @Dandandan — you're right that global reorder is much more effective than per-partition reorder. With global reorder + round-robin distribution, each partition's first RG is close to the global optimum, so:

  1. All partitions quickly converge to tight local TopK thresholds in parallel
  2. SPM merging finishes faster because each partition's first few batches contain optimal values → LIMIT can be satisfied with minimal reads across partitions

The current per-partition reorder is limited because even after sorting, partition 0's "best" RG might be much worse than the global best (which may have landed in partition 2).

Moving to global reorder would require changes at the planning / EnforceDistribution layer to load RG statistics and redistribute RGs across partitions. I'd prefer to keep this PR as an incremental step (per-partition) and track global reorder as a follow-up — it would benefit both #21317 and #21581.

Does this make sense?

Sure, makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sort pushdown: reorder row groups by statistics within each file

4 participants