Dask-GLM doesn't converge with Dask array

After a bit of profiling, this is what I found out for Dask-GLM with Dask array:

```
    14339    0.139    0.000    0.814    0.000 /home/pentschev/.local/lib/python3.5/site-packages/dask/local.py:430(fire_task)
    44898   19.945    0.000   19.945    0.000 {method 'acquire' of '_thread.lock' objects}
     4055    0.042    0.000   19.992    0.005 /usr/lib/python3.5/threading.py:261(wait)
    14339    0.107    0.000   20.234    0.001 /usr/lib/python3.5/queue.py:147(get)
    14339    0.018    0.000   20.253    0.001 /home/pentschev/.local/lib/python3.5/site-packages/dask/local.py:140(queue_get)
      122    0.117    0.001   22.327    0.183 /home/pentschev/.local/lib/python3.5/site-packages/dask/local.py:345(get_async)
      122    0.013    0.000   22.346    0.183 /home/pentschev/.local/lib/python3.5/site-packages/dask/threaded.py:33(get)
      122    0.004    0.000   22.733    0.186 /home/pentschev/.local/lib/python3.5/site-packages/dask/base.py:345(compute)
        1    0.020    0.020   23.224   23.224 /home/pentschev/.local/lib/python3.5/site-packages/dask_glm/algorithms.py:200(admm)
        1    0.000    0.000   23.267   23.267 /home/pentschev/.local/lib/python3.5/site-packages/dask_glm/utils.py:13(normalize_inputs)
        1    0.000    0.000   23.268   23.268 /home/pentschev/.local/lib/python3.5/site-packages/dask_glm/estimators.py:65(fit)
```

A big portion of the time seems to be spent on waiting for thread lock. Also, looking at the callers, we see 100 `compute()` calls departing from `admm()`, which means it's _not_ converging and stopping only at `max_iter` as @cicdw suggested:

```
/home/pentschev/.local/lib/python3.5/site-packages/dask/base.py:345(compute)                               <-     100    0.004   19.637  /home/pentschev/.local/lib/python3.5/site-packages/dask_glm/algorithms.py:197(admm)
```

Running with NumPy, the algorithm converges, showing only 7 `compute()` calls:

```
/home/pentschev/.local/lib/python3.5/site-packages/dask/base.py:345(compute)                          <-       7    0.000    0.120  /home/pentschev/.local/lib/python3.5/site-packages/dask_glm/algorithms.py:197(admm)
```

I'm running Dask 1.1.4 and Dask-GLM master branch, to ensure that my local changes aren't introduce any bugs. However, if I run my Dask-GLM branch and use CuPy as a backend, it also converges in 7 iterations.

To me this seems to suggest that we have one of those very well-hidden and difficult to track bugs in Dask. Before I spent hours with this, any suggestions what could we look for?

_Originally posted by @pentschev in https://github.com/dask/dask-blog/pull/15_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Dask-GLM doesn't converge with Dask array #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Dask-GLM doesn't converge with Dask array #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions