-
-
Notifications
You must be signed in to change notification settings - Fork 35
Description
After a bit of profiling, this is what I found out for Dask-GLM with Dask array:
14339 0.139 0.000 0.814 0.000 /home/pentschev/.local/lib/python3.5/site-packages/dask/local.py:430(fire_task)
44898 19.945 0.000 19.945 0.000 {method 'acquire' of '_thread.lock' objects}
4055 0.042 0.000 19.992 0.005 /usr/lib/python3.5/threading.py:261(wait)
14339 0.107 0.000 20.234 0.001 /usr/lib/python3.5/queue.py:147(get)
14339 0.018 0.000 20.253 0.001 /home/pentschev/.local/lib/python3.5/site-packages/dask/local.py:140(queue_get)
122 0.117 0.001 22.327 0.183 /home/pentschev/.local/lib/python3.5/site-packages/dask/local.py:345(get_async)
122 0.013 0.000 22.346 0.183 /home/pentschev/.local/lib/python3.5/site-packages/dask/threaded.py:33(get)
122 0.004 0.000 22.733 0.186 /home/pentschev/.local/lib/python3.5/site-packages/dask/base.py:345(compute)
1 0.020 0.020 23.224 23.224 /home/pentschev/.local/lib/python3.5/site-packages/dask_glm/algorithms.py:200(admm)
1 0.000 0.000 23.267 23.267 /home/pentschev/.local/lib/python3.5/site-packages/dask_glm/utils.py:13(normalize_inputs)
1 0.000 0.000 23.268 23.268 /home/pentschev/.local/lib/python3.5/site-packages/dask_glm/estimators.py:65(fit)
A big portion of the time seems to be spent on waiting for thread lock. Also, looking at the callers, we see 100 compute() calls departing from admm(), which means it's not converging and stopping only at max_iter as @cicdw suggested:
/home/pentschev/.local/lib/python3.5/site-packages/dask/base.py:345(compute) <- 100 0.004 19.637 /home/pentschev/.local/lib/python3.5/site-packages/dask_glm/algorithms.py:197(admm)
Running with NumPy, the algorithm converges, showing only 7 compute() calls:
/home/pentschev/.local/lib/python3.5/site-packages/dask/base.py:345(compute) <- 7 0.000 0.120 /home/pentschev/.local/lib/python3.5/site-packages/dask_glm/algorithms.py:197(admm)
I'm running Dask 1.1.4 and Dask-GLM master branch, to ensure that my local changes aren't introduce any bugs. However, if I run my Dask-GLM branch and use CuPy as a backend, it also converges in 7 iterations.
To me this seems to suggest that we have one of those very well-hidden and difficult to track bugs in Dask. Before I spent hours with this, any suggestions what could we look for?
Originally posted by @pentschev in #15