Add Heirarchical Performance Testing (HPT) technique to `compare_to`? #168

mdboom · 2023-08-25T17:57:13Z

I recently came across a technique for distilling benchmark measurements into a single number that takes into account the fact that some benchmarks are more consistent/reliable than others, called Heirarchical Performance Testing (HPT). There is an implementation (in bash!!!) for the PARSEC benchmark suite. I ported it to Python and ran it over the big Faster CPython data set.

The results are pretty useful -- for example, while a lot of the main specialization work in 3.11 has a reliability of 100%, some recent changes to the GC have a speed improvement but with a lower reliability, accounting for the fact that GC changes have a lot more randomness (more moving parts and interactions with other things happening in the OS). I think this reliability number, along with the more stable "expected speedup at the 99th percentile", is a lot more useful for evaluating a change (especially small changes) than the geometric mean. I did not, however, see the massive 3.5x discrepancy between the 99th percentile number and the geometric mean reported in the paper (on a different dataset).

Is there interest in adding this metric to the output of pyperf's compare_to command?

The text was updated successfully, but these errors were encountered:

vstinner · 2023-08-25T23:19:18Z

In pyperf, I tried to give the choice to the user to decide how to display data and to not take decisions for them. That's why it stores all timings, and not just min / avg / max. If there is a way to render data differently without losing the old way, why not. The implementation looks quite complicated.

mdboom · 2023-08-28T13:18:43Z

Yes, to be clear, this wouldn't change how the raw data is stored in the .json files at all -- in fact, it's because all of the raw data is retained that this can easily be computed after data collection.

I would suggest adding a flag (e.g. --hpt) to the compare_to command that would add the values from HPT to the bottom of the report. Does that make sense? If so, I'll work up a PR. My current implementation uses Numpy, but for pyperf it's probably best not to add that as a dependency. A pure Python implementation shouldn't be unusably slow (it's not very heavy computation).

vstinner · 2023-08-28T14:32:11Z

If it's a new option, and it doesn't change the default, i'm fine with it. Th problem is just how to explain it in the doc, shortly with simple words 😬

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Heirarchical Performance Testing (HPT) technique to `compare_to`? #168

Add Heirarchical Performance Testing (HPT) technique to `compare_to`? #168

mdboom commented Aug 25, 2023 •

edited

Loading

vstinner commented Aug 25, 2023

mdboom commented Aug 28, 2023

vstinner commented Aug 28, 2023

Add Heirarchical Performance Testing (HPT) technique to compare_to? #168

Add Heirarchical Performance Testing (HPT) technique to compare_to? #168

Comments

mdboom commented Aug 25, 2023 • edited Loading

vstinner commented Aug 25, 2023

mdboom commented Aug 28, 2023

vstinner commented Aug 28, 2023

Add Heirarchical Performance Testing (HPT) technique to `compare_to`? #168

Add Heirarchical Performance Testing (HPT) technique to `compare_to`? #168

mdboom commented Aug 25, 2023 •

edited

Loading