-
-
Notifications
You must be signed in to change notification settings - Fork 19
Description
This is a sketch for some sections of documentation that should go in the README.
What to test?
Ideally, benchmarks measure how long our project (dask, distributed) spends doing something, not the underlying libraries they're built on. We want to limit the variance across runs to just code we control.
For example, I suspect (self.data.a > 0).compute()
is not a great benchmark. My guess (without having profiled) is that the .compute
part takes the majority of the time, most of which would be in pandas / NumPy. (I need to profile all these. I'm reading through dask now to find places where dask is doing a lot of work.)
Benchmarking new Code
If you're writing an optimization, say, you can benchmark it by
- writing a benchmark that exercises your optimization and placing it in
benchmarks/
- setting the
repo
field inasv.conf.json
to the path of your dask / distributed repository on your local file system - running
asv continuous -f 1.1 upstream/master HEAD
(optionally with a regex-b <regex>
to filter to just your benchmark.
Naming Conventions
Directory Structure
This repository contains benchmarks for several dask related projects.
Each project needs it's own benchmark directory because asv
is built around
one configuration file (asv.conf.json
) and benchmark suite per repository.