Commit c5fc17c
Faster LazyTensor mul!() for dense suboperators (#39)
Dramatically speed up operations involving LazyTensors consisting of dense factors.
* Use transpose-GEMM-transpose approach for LazyTensor operations so that BLAS can be used.
* Add limited support for summing LazyTensor operators.
* Keep old code as special case for pure sparse LazyTensors (it is faster for this case)
Co-authored-by: Ashley Milsted <[email protected]>1 parent 91ac4fd commit c5fc17c
File tree
4 files changed
+480
-51
lines changed- src
- test
4 files changed
+480
-51
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| 12 | + | |
11 | 13 | | |
12 | 14 | | |
13 | 15 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
| 16 | + | |
| 17 | + | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
| |||
0 commit comments