Skip to content

Documentation #1

@TomAugspurger

Description

@TomAugspurger

This is a sketch for some sections of documentation that should go in the README.

What to test?

Ideally, benchmarks measure how long our project (dask, distributed) spends doing something, not the underlying libraries they're built on. We want to limit the variance across runs to just code we control.

For example, I suspect (self.data.a > 0).compute() is not a great benchmark. My guess (without having profiled) is that the .compute part takes the majority of the time, most of which would be in pandas / NumPy. (I need to profile all these. I'm reading through dask now to find places where dask is doing a lot of work.)

Benchmarking new Code

If you're writing an optimization, say, you can benchmark it by

  • writing a benchmark that exercises your optimization and placing it in benchmarks/
  • setting the repo field in asv.conf.json to the path of your dask / distributed repository on your local file system
  • running asv continuous -f 1.1 upstream/master HEAD (optionally with a regex -b <regex> to filter to just your benchmark.

Naming Conventions

Directory Structure

This repository contains benchmarks for several dask related projects.
Each project needs it's own benchmark directory because asv is built around
one configuration file (asv.conf.json) and benchmark suite per repository.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions