Skip to content

Avoid allocating large cache array when slicing some sparse matrices#42647

Closed
andreasnoack wants to merge 2 commits intomasterfrom
an/sparseindex
Closed

Avoid allocating large cache array when slicing some sparse matrices#42647
andreasnoack wants to merge 2 commits intomasterfrom
an/sparseindex

Conversation

@andreasnoack
Copy link
Copy Markdown
Member

@andreasnoack andreasnoack commented Oct 14, 2021

This came up when profiling some code for a customer. It was originally tracked in #12860 but then considered fixed by #12934 because very sparse matrices would no longer hit the methods that allocate large vectors. However, there can still be a significant overhead from the dense cache vector for typical sparse matrices. Currently, we have that

julia> A = spdiagm(ones(10^8));

julia> @btime A[[1], 1]
  260.925 ms (8 allocations: 762.94 MiB)
1-element SparseVector{Float64, Int64} with 1 stored entry:
  [1]  =  1.0

With this PR, it becomes

julia> @btime A[[1], 1]
  258.067 ns (8 allocations: 528 bytes)
1-element SparseVector{Float64, Int64} with 1 stored entry:
  [1]  =  1.0

Eventually, we should also change getindex_I_sorted_bsearch_I similarly to what I'm doing here it's also allocating the huge vector and therefore

julia> @time A[collect(1:(2^8+2)), 1]
  0.448473 seconds (8 allocations: 762.942 MiB, 11.40% gc time)
258-element SparseVector{Float64, Int64} with 1 stored entry:
  [1  ]  =  1.0

I don't have time for that right now so I'll file that in a new issue.

Finally, I'm also changing the code to avoid the nested ternary syntax when deciding sparse matrix slicing algorithm to make the code easier to read.

@andreasnoack andreasnoack added performance Must go faster sparse Sparse arrays labels Oct 14, 2021
(alg == 0) ? getindex_I_sorted_bsearch_A(A, I, J) :
(alg == 1) ? getindex_I_sorted_bsearch_I(A, I, J) :
getindex_I_sorted_linear(A, I, J)
alg = if m > nzA && m > nI
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alg not necessary anymore?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just moving the the function calls to the first if-else since the second branching was almost identical to the first.

@tanmaykm
Copy link
Copy Markdown
Member

The dense cache vector I think helped avoid the overhead of resizing during push!. There have been several improvements to reallocation while growing arrays since then. Testing some more conditions now, I still see:

Without this change

julia> A = spdiagm(ones(10^8));

julia> @btime A[[1], 1]
  391.406 ms (8 allocations: 762.94 MiB)
1-element SparseVector{Float64, Int64} with 1 stored entry:
  [1]  =  1.0


julia> @btime A[[1], :]
  1.409 s (8 allocations: 1.49 GiB)
1×100000000 SparseMatrixCSC{Float64, Int64} with 1 stored entry:
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
julia> A[1:5,:] .= 1;

julia> @btime A[[1], :]
  3.571 s (10 allocations: 2.98 GiB)
1×100000000 SparseMatrixCSC{Float64, Int64} with 100000000 stored entries:

julia> @btime A[[1,1,1,1], :]
  7.541 s (10 allocations: 7.45 GiB)
4×100000000 SparseMatrixCSC{Float64, Int64} with 400000000 stored entries:

With this change

julia> A = spdiagm(ones(10^8));

julia> @btime A[[1], 1]
  264.046 ns (8 allocations: 576 bytes)
1-element SparseVector{Float64, Int64} with 1 stored entry:
  [1]  =  1.0

julia> @btime A[[1], :]
  1.086 s (8 allocations: 762.94 MiB)
1×100000000 SparseMatrixCSC{Float64, Int64} with 1 stored entry:

julia> A[1:5,:] .= 1;

julia> @btime A[[1], :]
  7.152 s (50 allocations: 2.74 GiB)
1×100000000 SparseMatrixCSC{Float64, Int64} with 100000000 stored entries:

julia> @btime A[[1,1,1,1], :]
  20.929 s (56 allocations: 7.34 GiB)
4×100000000 SparseMatrixCSC{Float64, Int64} with 400000000 stored entries:

Will it be better to add this change as a new case in the heuristics?

@andreasnoack
Copy link
Copy Markdown
Member Author

Thanks for the comment. Maybe my version here should be factored out into getindex(::SparseMatrixCSC, I::AbstractVector, j::Integer) method. If the second index is just a scalar then it should never be beneficial to allocate the cache. Alternatively, we could try to figure out a heuristic for when to use the cache vector but I'm not sure how easy that would be.

@tanmaykm
Copy link
Copy Markdown
Member

Maybe my version here should be factored out into getindex(::SparseMatrixCSC, I::AbstractVector, j::Integer) method. If the second index is just a scalar then it should never be beneficial to allocate the cache.

yes, sounds good!

@DilumAluthge
Copy link
Copy Markdown
Member

We have moved the SparseArrays stdlib to an external repository.

Please open this PR against that repository: https://github.com/JuliaLang/SparseArrays.jl

Thank you!

@DilumAluthge DilumAluthge deleted the an/sparseindex branch January 14, 2022 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster sparse Sparse arrays

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants