Use PkgBenchmark to detect performance regressions for heaps #531

milesfrain · 2019-09-25T05:46:17Z

This compares benchmark results of future PRs against master (just for heaps at this point). Note that this new benchmarking feature will not be fully operational until merged. An example of what to expect is in my development branch.

The results section of the Travis CI log is where to look for regressions (excerpt below). There are no regressions in this case of updating the readme. An additional enhancement would be to automatically post a comment with this information to the PR (as is done with code coverage). Perhaps also mention the number non-overlaping baseline and target cases.

  Results

  =========
  A ratio greater than 1.0 denotes a possible regression (marked with ❌),
  while a ratio less than 1.0 denotes a possible improvement (marked with ✅).
  Only significant results - results that indicate possible regressions or
  improvements - are shown below (thus, an empty table means that all
  benchmark results remained invariant between builds).

  | ID | time ratio | memory ratio | |–––––––––––––––––––|––––––|–––––––|

A current limitation is that the baseline for comparison must be the repo's master branch, which may lead to unexpected results for PRs to feature branches. Not sure if this is a common use case to be concerned about.

This feature is heavily inspired by work done by @tkf and @ericphanson, and I'm wondering how should we handle licensing of the benchmarking files lifted from Transducers.jl.

We should also follow through on splitting-up DataStructures.jl as proposed in #310. This is more important with increased automated testing, since we don't need to duplicate testing in all functionally independent sections upon changes in only one of these sections. I could work on updating Heaps.jl and bring regression testing there instead.

milesfrain · 2019-09-25T06:53:09Z

The Travis CI benchmark failure is expected for this change because it's attempting to re-run the test against a master which does not have PkgBenchmark support.

milesfrain · 2019-09-27T08:15:29Z

I have two more changes in the pipeline that I'll submit PRs for once this issue is accepted. If all these changes are combined into a single PR, then performance differences are obscured.

milesfrain@b9ffc1b expands benchmarking coverage in anticipation of milesfrain#12 which tackles the following:

Use Base.Ordering Deprecate compare in favor of lt and Ordering from Base? #243
Eliminate duplicate code. Reuse arrays_as_heaps.jl functions in binary_heap.jl. This also includes a performance boost and implements the bottom-up heap construction as proposed in Bottom-up heap construction #147.
Continue to use the same fast comparison for floats. It assumes no NaN values are used, and can be overriden with the slower and safer alternative. Discussion here.
Slight change to the API. For example: BinaryMinHeap{Int}() is now BinaryMinHeap(Int). This is easier for users to type and is similar to other functions, like zeros(Int,42) and rand(Int,42). Performance should be identical.

Updating the docs is still on the todo list, but figured it would be a good idea to discuss thoughts on the API change and deprecation first.

The benchmarking results show improvements in many areas. A highlight is 2x gain on both execution time and memory for nlargest and nsmallest. There's more noise on Travis CI, so here are the results of a local run:

                                                             ID  time ratio memory ratio
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––– ––––––––––––
       ["heap", "BinaryHeap", "make", "Float64", "10^1", "Min"] 0.40 (5%) ✅    1.00 (1%)
       ["heap", "BinaryHeap", "make", "Float64", "10^3", "Min"] 0.69 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "make", "Int64", "10^1", "Min"] 0.29 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "make", "Int64", "10^3", "Min"] 0.60 (5%) ✅    1.00 (1%)
        ["heap", "BinaryHeap", "pop", "Float64", "10^1", "Min"] 0.93 (5%) ✅    1.00 (1%)
          ["heap", "BinaryHeap", "pop", "Int64", "10^1", "Min"] 0.87 (5%) ✅    1.00 (1%)
       ["heap", "BinaryHeap", "push", "Float64", "10^1", "Min"] 0.94 (5%) ✅    1.00 (1%)
       ["heap", "BinaryHeap", "push", "Float64", "10^3", "Min"] 0.90 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "push", "Int64", "10^1", "Min"] 0.94 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "push", "Int64", "10^3", "Min"] 0.86 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMaxHeap", "make", "Float64", "10^1"] 1.12 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMaxHeap", "make", "Float64", "10^3"] 0.73 (5%) ✅    1.00 (1%)
            ["heap", "BinaryMaxHeap", "pop", "Float64", "10^3"] 0.94 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMaxHeap", "push", "Float64", "10^1"] 0.93 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMaxHeap", "push", "Float64", "10^3"] 0.93 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMinHeap", "make", "Float64", "10^1"] 1.14 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMinHeap", "make", "Float64", "10^3"] 0.75 (5%) ✅    1.00 (1%)
            ["heap", "BinaryMinHeap", "pop", "Float64", "10^1"] 0.89 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMinHeap", "push", "Float64", "10^3"] 0.91 (5%) ✅    1.00 (1%)
["heap", "MutableBinaryHeap", "make", "Float64", "10^1", "Min"] 0.45 (5%) ✅    1.00 (1%)
  ["heap", "MutableBinaryHeap", "make", "Int64", "10^1", "Min"] 0.36 (5%) ✅    1.00 (1%)
                 ["heap", "nlargest", "a=rand(10^4)", "n=10^2"] 0.49 (5%) ✅  0.57 (1%) ✅
                ["heap", "nsmallest", "a=rand(10^4)", "n=10^2"] 0.59 (5%) ✅  0.57 (1%) ✅

Full logs available here, and you can reproduce on your own machine with:

git branch -f baseline HEAD~
julia --project=benchmark -e '
    using Pkg; Pkg.instantiate();
    include("benchmark/runjudge.jl");
    include("benchmark/pprintjudge.jl");'

An area for follow-up investigation is why these two results are not identical.

                                                             ID  time ratio memory ratio
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––– ––––––––––––
       ["heap", "BinaryHeap", "make", "Float64", "10^1", "Min"] 0.40 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMinHeap", "make", "Float64", "10^1"] 1.14 (5%) ❌    1.00 (1%)

milesfrain · 2019-09-27T08:27:28Z

For reference, here's the regression if FasterForward and FasterReverse are removed:

                                                             ID  time ratio memory ratio
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––– ––––––––––––
       ["heap", "BinaryHeap", "make", "Float64", "10^1", "Min"] 1.32 (5%) ❌    1.00 (1%)
       ["heap", "BinaryHeap", "make", "Float64", "10^3", "Min"] 2.68 (5%) ❌    1.00 (1%)
        ["heap", "BinaryHeap", "pop", "Float64", "10^1", "Min"] 1.50 (5%) ❌    1.00 (1%)
        ["heap", "BinaryHeap", "pop", "Float64", "10^3", "Min"] 1.86 (5%) ❌    1.00 (1%)
       ["heap", "BinaryHeap", "push", "Float64", "10^1", "Min"] 1.17 (5%) ❌    1.00 (1%)
       ["heap", "BinaryHeap", "push", "Float64", "10^3", "Min"] 1.38 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMaxHeap", "make", "Float64", "10^1"] 1.35 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMaxHeap", "make", "Float64", "10^3"] 2.50 (5%) ❌    1.00 (1%)
            ["heap", "BinaryMaxHeap", "pop", "Float64", "10^1"] 1.45 (5%) ❌    1.00 (1%)
            ["heap", "BinaryMaxHeap", "pop", "Float64", "10^3"] 1.84 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMaxHeap", "push", "Float64", "10^1"] 1.21 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMaxHeap", "push", "Float64", "10^3"] 1.33 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMinHeap", "make", "Float64", "10^1"] 1.36 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMinHeap", "make", "Float64", "10^3"] 2.72 (5%) ❌    1.00 (1%)
            ["heap", "BinaryMinHeap", "pop", "Float64", "10^1"] 1.50 (5%) ❌    1.00 (1%)
            ["heap", "BinaryMinHeap", "pop", "Float64", "10^3"] 1.85 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMinHeap", "push", "Float64", "10^1"] 1.22 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMinHeap", "push", "Float64", "10^3"] 1.38 (5%) ❌    1.00 (1%)
["heap", "MutableBinaryHeap", "make", "Float64", "10^1", "Min"] 1.22 (5%) ❌    1.00 (1%)
["heap", "MutableBinaryHeap", "make", "Float64", "10^3", "Min"] 1.55 (5%) ❌    1.00 (1%)
 ["heap", "MutableBinaryHeap", "pop", "Float64", "10^1", "Min"] 1.44 (5%) ❌    1.00 (1%)
 ["heap", "MutableBinaryHeap", "pop", "Float64", "10^3", "Min"] 1.78 (5%) ❌    1.00 (1%)
["heap", "MutableBinaryHeap", "push", "Float64", "10^1", "Min"] 1.15 (5%) ❌    1.00 (1%)
["heap", "MutableBinaryHeap", "push", "Float64", "10^3", "Min"] 1.20 (5%) ❌    1.00 (1%)
                 ["heap", "nlargest", "a=rand(10^4)", "n=10^2"] 3.22 (5%) ❌    1.00 (1%)
                ["heap", "nsmallest", "a=rand(10^4)", "n=10^2"] 2.66 (5%) ❌    1.00 (1%)

milesfrain · 2019-09-27T17:13:31Z

@oxinabox What are the next steps for merging this?
Is it blocked by the Travis CI failure? Note that this is only a benchmarking failure when attempting to use PkgBenchmark on the current master (which doesn't have PkgBenchmark yet), so this is safe to overlook.

oxinabox · 2019-09-27T21:36:51Z

Lets merge it, can always fix it more later if e.g. the benchmarks break too often and we want to have them not block

oxinabox · 2019-09-27T21:37:04Z

thanks!

milesfrain added 2 commits September 24, 2019 21:17

PkgBenchmark to detect performance regressions

f5e759a

notes on PkgBenchmark usage

a86b873

milesfrain mentioned this pull request Sep 25, 2019

Fix heap benchmark and use BenchmarkTools #525

Closed

oxinabox approved these changes Sep 27, 2019

View reviewed changes

oxinabox merged commit 46f87f1 into JuliaCollections:master Sep 27, 2019

milesfrain mentioned this pull request Oct 25, 2019

Use Base.Ordering for heap, and other performance improvements #547

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use PkgBenchmark to detect performance regressions for heaps #531

Use PkgBenchmark to detect performance regressions for heaps #531

Uh oh!

milesfrain commented Sep 25, 2019

Uh oh!

milesfrain commented Sep 25, 2019

Uh oh!

milesfrain commented Sep 27, 2019

Uh oh!

milesfrain commented Sep 27, 2019

Uh oh!

milesfrain commented Sep 27, 2019

Uh oh!

oxinabox commented Sep 27, 2019

Uh oh!

oxinabox commented Sep 27, 2019

Uh oh!

Uh oh!

Use PkgBenchmark to detect performance regressions for heaps #531

Use PkgBenchmark to detect performance regressions for heaps #531

Uh oh!

Conversation

milesfrain commented Sep 25, 2019

Uh oh!

milesfrain commented Sep 25, 2019

Uh oh!

milesfrain commented Sep 27, 2019

Uh oh!

milesfrain commented Sep 27, 2019

Uh oh!

milesfrain commented Sep 27, 2019

Uh oh!

oxinabox commented Sep 27, 2019

Uh oh!

oxinabox commented Sep 27, 2019

Uh oh!

Uh oh!