Avoid allocating large cache array when slicing some sparse matrices by andreasnoack · Pull Request #42647 · JuliaLang/julia

andreasnoack · 2021-10-14T13:02:08Z

This came up when profiling some code for a customer. It was originally tracked in #12860 but then considered fixed by #12934 because very sparse matrices would no longer hit the methods that allocate large vectors. However, there can still be a significant overhead from the dense cache vector for typical sparse matrices. Currently, we have that

julia> A = spdiagm(ones(10^8));

julia> @btime A[[1], 1]
  260.925 ms (8 allocations: 762.94 MiB)
1-element SparseVector{Float64, Int64} with 1 stored entry:
  [1]  =  1.0

With this PR, it becomes

julia> @btime A[[1], 1]
  258.067 ns (8 allocations: 528 bytes)
1-element SparseVector{Float64, Int64} with 1 stored entry:
  [1]  =  1.0

Eventually, we should also change getindex_I_sorted_bsearch_I similarly to what I'm doing here it's also allocating the huge vector and therefore

julia> @time A[collect(1:(2^8+2)), 1]
  0.448473 seconds (8 allocations: 762.942 MiB, 11.40% gc time)
258-element SparseVector{Float64, Int64} with 1 stored entry:
  [1  ]  =  1.0

I don't have time for that right now so I'll file that in a new issue.

Finally, I'm also changing the code to avoid the nested ternary syntax when deciding sparse matrix slicing algorithm to make the code easier to read.

…ithm

tanmaykm · 2021-10-18T07:55:27Z

-    (alg == 0) ? getindex_I_sorted_bsearch_A(A, I, J) :
-    (alg == 1) ? getindex_I_sorted_bsearch_I(A, I, J) :
-    getindex_I_sorted_linear(A, I, J)
+    alg = if m > nzA && m > nI


alg not necessary anymore?

I'm just moving the the function calls to the first if-else since the second branching was almost identical to the first.

tanmaykm · 2021-10-18T07:55:51Z

The dense cache vector I think helped avoid the overhead of resizing during push!. There have been several improvements to reallocation while growing arrays since then. Testing some more conditions now, I still see:

Without this change

julia> A = spdiagm(ones(10^8));

julia> @btime A[[1], 1]
  391.406 ms (8 allocations: 762.94 MiB)
1-element SparseVector{Float64, Int64} with 1 stored entry:
  [1]  =  1.0


julia> @btime A[[1], :]
  1.409 s (8 allocations: 1.49 GiB)
1×100000000 SparseMatrixCSC{Float64, Int64} with 1 stored entry:
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
julia> A[1:5,:] .= 1;

julia> @btime A[[1], :]
  3.571 s (10 allocations: 2.98 GiB)
1×100000000 SparseMatrixCSC{Float64, Int64} with 100000000 stored entries:

julia> @btime A[[1,1,1,1], :]
  7.541 s (10 allocations: 7.45 GiB)
4×100000000 SparseMatrixCSC{Float64, Int64} with 400000000 stored entries:

With this change

julia> A = spdiagm(ones(10^8));

julia> @btime A[[1], 1]
  264.046 ns (8 allocations: 576 bytes)
1-element SparseVector{Float64, Int64} with 1 stored entry:
  [1]  =  1.0

julia> @btime A[[1], :]
  1.086 s (8 allocations: 762.94 MiB)
1×100000000 SparseMatrixCSC{Float64, Int64} with 1 stored entry:

julia> A[1:5,:] .= 1;

julia> @btime A[[1], :]
  7.152 s (50 allocations: 2.74 GiB)
1×100000000 SparseMatrixCSC{Float64, Int64} with 100000000 stored entries:

julia> @btime A[[1,1,1,1], :]
  20.929 s (56 allocations: 7.34 GiB)
4×100000000 SparseMatrixCSC{Float64, Int64} with 400000000 stored entries:

Will it be better to add this change as a new case in the heuristics?

andreasnoack · 2021-10-24T12:26:54Z

Thanks for the comment. Maybe my version here should be factored out into getindex(::SparseMatrixCSC, I::AbstractVector, j::Integer) method. If the second index is just a scalar then it should never be beneficial to allocate the cache. Alternatively, we could try to figure out a heuristic for when to use the cache vector but I'm not sure how easy that would be.

tanmaykm · 2021-10-24T13:17:48Z

Maybe my version here should be factored out into getindex(::SparseMatrixCSC, I::AbstractVector, j::Integer) method. If the second index is just a scalar then it should never be beneficial to allocate the cache.

yes, sounds good!

DilumAluthge · 2022-01-14T22:36:49Z

We have moved the SparseArrays stdlib to an external repository.

Please open this PR against that repository: https://github.com/JuliaLang/SparseArrays.jl

Thank you!

Avoid allocating large cache array when slicing some sparse matrices

c4b1f7a

andreasnoack added performance Must go faster sparse Sparse arrays labels Oct 14, 2021

Avoid nested ternary syntax when deciding sparse matrix slicing algor…

d0e6633

…ithm

andreasnoack force-pushed the an/sparseindex branch from b0c1a1e to d0e6633 Compare October 14, 2021 13:07

andreasnoack requested a review from tanmaykm October 17, 2021 18:53

tanmaykm reviewed Oct 18, 2021

View reviewed changes

andreasnoack mentioned this pull request Jan 14, 2022

Avoid large cache array in getindex_I_sorted_bsearch_I JuliaSparse/SparseArrays.jl#12

Open

DilumAluthge closed this Jan 14, 2022

DilumAluthge deleted the an/sparseindex branch January 14, 2022 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid allocating large cache array when slicing some sparse matrices#42647

Avoid allocating large cache array when slicing some sparse matrices#42647
andreasnoack wants to merge 2 commits intomasterfrom
an/sparseindex

andreasnoack commented Oct 14, 2021 •

edited

Loading

Uh oh!

tanmaykm Oct 18, 2021

Uh oh!

andreasnoack Oct 24, 2021

Uh oh!

tanmaykm commented Oct 18, 2021

Uh oh!

andreasnoack commented Oct 24, 2021

Uh oh!

tanmaykm commented Oct 24, 2021

Uh oh!

DilumAluthge commented Jan 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

andreasnoack commented Oct 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tanmaykm Oct 18, 2021

Choose a reason for hiding this comment

Uh oh!

andreasnoack Oct 24, 2021

Choose a reason for hiding this comment

Uh oh!

tanmaykm commented Oct 18, 2021

Uh oh!

andreasnoack commented Oct 24, 2021

Uh oh!

tanmaykm commented Oct 24, 2021

Uh oh!

DilumAluthge commented Jan 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andreasnoack commented Oct 14, 2021 •

edited

Loading