Skip to content

Dask-GLM doesn't converge with Dask array #17

@pentschev

Description

@pentschev

After a bit of profiling, this is what I found out for Dask-GLM with Dask array:

    14339    0.139    0.000    0.814    0.000 /home/pentschev/.local/lib/python3.5/site-packages/dask/local.py:430(fire_task)
    44898   19.945    0.000   19.945    0.000 {method 'acquire' of '_thread.lock' objects}
     4055    0.042    0.000   19.992    0.005 /usr/lib/python3.5/threading.py:261(wait)
    14339    0.107    0.000   20.234    0.001 /usr/lib/python3.5/queue.py:147(get)
    14339    0.018    0.000   20.253    0.001 /home/pentschev/.local/lib/python3.5/site-packages/dask/local.py:140(queue_get)
      122    0.117    0.001   22.327    0.183 /home/pentschev/.local/lib/python3.5/site-packages/dask/local.py:345(get_async)
      122    0.013    0.000   22.346    0.183 /home/pentschev/.local/lib/python3.5/site-packages/dask/threaded.py:33(get)
      122    0.004    0.000   22.733    0.186 /home/pentschev/.local/lib/python3.5/site-packages/dask/base.py:345(compute)
        1    0.020    0.020   23.224   23.224 /home/pentschev/.local/lib/python3.5/site-packages/dask_glm/algorithms.py:200(admm)
        1    0.000    0.000   23.267   23.267 /home/pentschev/.local/lib/python3.5/site-packages/dask_glm/utils.py:13(normalize_inputs)
        1    0.000    0.000   23.268   23.268 /home/pentschev/.local/lib/python3.5/site-packages/dask_glm/estimators.py:65(fit)

A big portion of the time seems to be spent on waiting for thread lock. Also, looking at the callers, we see 100 compute() calls departing from admm(), which means it's not converging and stopping only at max_iter as @cicdw suggested:

/home/pentschev/.local/lib/python3.5/site-packages/dask/base.py:345(compute)                               <-     100    0.004   19.637  /home/pentschev/.local/lib/python3.5/site-packages/dask_glm/algorithms.py:197(admm)

Running with NumPy, the algorithm converges, showing only 7 compute() calls:

/home/pentschev/.local/lib/python3.5/site-packages/dask/base.py:345(compute)                          <-       7    0.000    0.120  /home/pentschev/.local/lib/python3.5/site-packages/dask_glm/algorithms.py:197(admm)

I'm running Dask 1.1.4 and Dask-GLM master branch, to ensure that my local changes aren't introduce any bugs. However, if I run my Dask-GLM branch and use CuPy as a backend, it also converges in 7 iterations.

To me this seems to suggest that we have one of those very well-hidden and difficult to track bugs in Dask. Before I spent hours with this, any suggestions what could we look for?

Originally posted by @pentschev in #15

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions