Closed
Description
As far as I can see, +=
and the other updating operators are not handled via overloadable methods. For example, A += B
results in A = +(A,B)
. There is considerable scope for performance gains, especially for matrices, if one could tailor the behaviour of +=. As an example, consider the case of A += B
, where both are 1000 by 1000
matrices. I get (total elapsed time for 100 such operations) on an 8-core machine:
- Standard Julia
A+=B
: 1s - A hand-coded loop that iterates over all 10^6 elements: 0.5s
Linalg.BLAS.axpy!(1.0,A,B)
: 0.02s
(The advantage of 2 over 1 is that it does not have to allocate a temporary matrix to accomodate the sum. The advantage of 3 over 2 is multiple threads and whatever other magic BLAS does.)