Skip to content

Use PkgBenchmark to detect performance regressions for heaps #531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 27, 2019

Conversation

milesfrain
Copy link
Contributor

This compares benchmark results of future PRs against master (just for heaps at this point). Note that this new benchmarking feature will not be fully operational until merged. An example of what to expect is in my development branch.

The results section of the Travis CI log is where to look for regressions (excerpt below). There are no regressions in this case of updating the readme. An additional enhancement would be to automatically post a comment with this information to the PR (as is done with code coverage). Perhaps also mention the number non-overlaping baseline and target cases.

  Results

  =========
  A ratio greater than 1.0 denotes a possible regression (marked with ❌),
  while a ratio less than 1.0 denotes a possible improvement (marked with ✅).
  Only significant results - results that indicate possible regressions or
  improvements - are shown below (thus, an empty table means that all
  benchmark results remained invariant between builds).

  | ID | time ratio | memory ratio | |–––––––––––––––––––|––––––|–––––––|

A current limitation is that the baseline for comparison must be the repo's master branch, which may lead to unexpected results for PRs to feature branches. Not sure if this is a common use case to be concerned about.

This feature is heavily inspired by work done by @tkf and @ericphanson, and I'm wondering how should we handle licensing of the benchmarking files lifted from Transducers.jl.

We should also follow through on splitting-up DataStructures.jl as proposed in #310. This is more important with increased automated testing, since we don't need to duplicate testing in all functionally independent sections upon changes in only one of these sections. I could work on updating Heaps.jl and bring regression testing there instead.

@milesfrain
Copy link
Contributor Author

The Travis CI benchmark failure is expected for this change because it's attempting to re-run the test against a master which does not have PkgBenchmark support.

@milesfrain
Copy link
Contributor Author

I have two more changes in the pipeline that I'll submit PRs for once this issue is accepted. If all these changes are combined into a single PR, then performance differences are obscured.

milesfrain@b9ffc1b expands benchmarking coverage in anticipation of milesfrain#12 which tackles the following:

  • Use Base.Ordering Deprecate compare in favor of lt and Ordering from Base? #243
  • Eliminate duplicate code. Reuse arrays_as_heaps.jl functions in binary_heap.jl. This also includes a performance boost and implements the bottom-up heap construction as proposed in Bottom-up heap construction #147.
  • Continue to use the same fast comparison for floats. It assumes no NaN values are used, and can be overriden with the slower and safer alternative. Discussion here.
  • Slight change to the API. For example: BinaryMinHeap{Int}() is now BinaryMinHeap(Int). This is easier for users to type and is similar to other functions, like zeros(Int,42) and rand(Int,42). Performance should be identical.

Updating the docs is still on the todo list, but figured it would be a good idea to discuss thoughts on the API change and deprecation first.

The benchmarking results show improvements in many areas. A highlight is 2x gain on both execution time and memory for nlargest and nsmallest. There's more noise on Travis CI, so here are the results of a local run:

                                                             ID  time ratio memory ratio
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––– ––––––––––––
       ["heap", "BinaryHeap", "make", "Float64", "10^1", "Min"] 0.40 (5%) ✅    1.00 (1%)
       ["heap", "BinaryHeap", "make", "Float64", "10^3", "Min"] 0.69 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "make", "Int64", "10^1", "Min"] 0.29 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "make", "Int64", "10^3", "Min"] 0.60 (5%) ✅    1.00 (1%)
        ["heap", "BinaryHeap", "pop", "Float64", "10^1", "Min"] 0.93 (5%) ✅    1.00 (1%)
          ["heap", "BinaryHeap", "pop", "Int64", "10^1", "Min"] 0.87 (5%) ✅    1.00 (1%)
       ["heap", "BinaryHeap", "push", "Float64", "10^1", "Min"] 0.94 (5%) ✅    1.00 (1%)
       ["heap", "BinaryHeap", "push", "Float64", "10^3", "Min"] 0.90 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "push", "Int64", "10^1", "Min"] 0.94 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "push", "Int64", "10^3", "Min"] 0.86 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMaxHeap", "make", "Float64", "10^1"] 1.12 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMaxHeap", "make", "Float64", "10^3"] 0.73 (5%) ✅    1.00 (1%)
            ["heap", "BinaryMaxHeap", "pop", "Float64", "10^3"] 0.94 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMaxHeap", "push", "Float64", "10^1"] 0.93 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMaxHeap", "push", "Float64", "10^3"] 0.93 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMinHeap", "make", "Float64", "10^1"] 1.14 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMinHeap", "make", "Float64", "10^3"] 0.75 (5%) ✅    1.00 (1%)
            ["heap", "BinaryMinHeap", "pop", "Float64", "10^1"] 0.89 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMinHeap", "push", "Float64", "10^3"] 0.91 (5%) ✅    1.00 (1%)
["heap", "MutableBinaryHeap", "make", "Float64", "10^1", "Min"] 0.45 (5%) ✅    1.00 (1%)
  ["heap", "MutableBinaryHeap", "make", "Int64", "10^1", "Min"] 0.36 (5%) ✅    1.00 (1%)
                 ["heap", "nlargest", "a=rand(10^4)", "n=10^2"] 0.49 (5%) ✅  0.57 (1%) ✅
                ["heap", "nsmallest", "a=rand(10^4)", "n=10^2"] 0.59 (5%) ✅  0.57 (1%) ✅

Full logs available here, and you can reproduce on your own machine with:

git branch -f baseline HEAD~
julia --project=benchmark -e '
    using Pkg; Pkg.instantiate();
    include("benchmark/runjudge.jl");
    include("benchmark/pprintjudge.jl");'

An area for follow-up investigation is why these two results are not identical.

                                                             ID  time ratio memory ratio
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––– ––––––––––––
       ["heap", "BinaryHeap", "make", "Float64", "10^1", "Min"] 0.40 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMinHeap", "make", "Float64", "10^1"] 1.14 (5%) ❌    1.00 (1%)

@milesfrain
Copy link
Contributor Author

For reference, here's the regression if FasterForward and FasterReverse are removed:

                                                             ID  time ratio memory ratio
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––– ––––––––––––
       ["heap", "BinaryHeap", "make", "Float64", "10^1", "Min"] 1.32 (5%) ❌    1.00 (1%)
       ["heap", "BinaryHeap", "make", "Float64", "10^3", "Min"] 2.68 (5%) ❌    1.00 (1%)
        ["heap", "BinaryHeap", "pop", "Float64", "10^1", "Min"] 1.50 (5%) ❌    1.00 (1%)
        ["heap", "BinaryHeap", "pop", "Float64", "10^3", "Min"] 1.86 (5%) ❌    1.00 (1%)
       ["heap", "BinaryHeap", "push", "Float64", "10^1", "Min"] 1.17 (5%) ❌    1.00 (1%)
       ["heap", "BinaryHeap", "push", "Float64", "10^3", "Min"] 1.38 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMaxHeap", "make", "Float64", "10^1"] 1.35 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMaxHeap", "make", "Float64", "10^3"] 2.50 (5%) ❌    1.00 (1%)
            ["heap", "BinaryMaxHeap", "pop", "Float64", "10^1"] 1.45 (5%) ❌    1.00 (1%)
            ["heap", "BinaryMaxHeap", "pop", "Float64", "10^3"] 1.84 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMaxHeap", "push", "Float64", "10^1"] 1.21 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMaxHeap", "push", "Float64", "10^3"] 1.33 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMinHeap", "make", "Float64", "10^1"] 1.36 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMinHeap", "make", "Float64", "10^3"] 2.72 (5%) ❌    1.00 (1%)
            ["heap", "BinaryMinHeap", "pop", "Float64", "10^1"] 1.50 (5%) ❌    1.00 (1%)
            ["heap", "BinaryMinHeap", "pop", "Float64", "10^3"] 1.85 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMinHeap", "push", "Float64", "10^1"] 1.22 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMinHeap", "push", "Float64", "10^3"] 1.38 (5%) ❌    1.00 (1%)
["heap", "MutableBinaryHeap", "make", "Float64", "10^1", "Min"] 1.22 (5%) ❌    1.00 (1%)
["heap", "MutableBinaryHeap", "make", "Float64", "10^3", "Min"] 1.55 (5%) ❌    1.00 (1%)
 ["heap", "MutableBinaryHeap", "pop", "Float64", "10^1", "Min"] 1.44 (5%) ❌    1.00 (1%)
 ["heap", "MutableBinaryHeap", "pop", "Float64", "10^3", "Min"] 1.78 (5%) ❌    1.00 (1%)
["heap", "MutableBinaryHeap", "push", "Float64", "10^1", "Min"] 1.15 (5%) ❌    1.00 (1%)
["heap", "MutableBinaryHeap", "push", "Float64", "10^3", "Min"] 1.20 (5%) ❌    1.00 (1%)
                 ["heap", "nlargest", "a=rand(10^4)", "n=10^2"] 3.22 (5%) ❌    1.00 (1%)
                ["heap", "nsmallest", "a=rand(10^4)", "n=10^2"] 2.66 (5%) ❌    1.00 (1%)

@milesfrain
Copy link
Contributor Author

@oxinabox What are the next steps for merging this?
Is it blocked by the Travis CI failure? Note that this is only a benchmarking failure when attempting to use PkgBenchmark on the current master (which doesn't have PkgBenchmark yet), so this is safe to overlook.

@oxinabox
Copy link
Member

Lets merge it, can always fix it more later if e.g. the benchmarks break too often and we want to have them not block

@oxinabox oxinabox merged commit 46f87f1 into JuliaCollections:master Sep 27, 2019
@oxinabox
Copy link
Member

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants