Skip to content

Conversation

@kshyatt
Copy link
Member

@kshyatt kshyatt commented Nov 1, 2025

Hopefully fixes #2945, worked for me locally. We no longer need the deleted code in lib/cusparse/generic.jl because we're firmly in the CUSPARSE 12.x era.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/lib/cusparse/generic.jl b/lib/cusparse/generic.jl
index 64d904c39..dcf1c939d 100644
--- a/lib/cusparse/generic.jl
+++ b/lib/cusparse/generic.jl
@@ -159,7 +159,7 @@ function mv!(transa::SparseChar, alpha::Number, A::Union{CuSparseMatrixCSC{TA},C
     transa = T <: Real && transa == 'C' ? 'T' : transa
 
     descA = CuSparseMatrixDescriptor(A, index)
-    m,n = size(A)
+    m, n = size(A)
 
     if transa == 'N'
         chkmvdims(X,n,Y,m)
diff --git a/test/libraries/cusparse/interfaces.jl b/test/libraries/cusparse/interfaces.jl
index 0413197df..293f61f87 100644
--- a/test/libraries/cusparse/interfaces.jl
+++ b/test/libraries/cusparse/interfaces.jl
@@ -154,7 +154,7 @@ nB = 2
                         end
                         @testset "A * CuSparseVector" begin
                             @testset "A * b" begin
-                                c  = opa(geam_A) * b_spvec
+                                c = opa(geam_A) * b_spvec
                                 dc = opa(d_geam_A) * db_spvec
                                 @test c ≈ collect(dc)
                             end

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 49e51c9 Previous: f4c05e0 Ratio
latency/precompile 56870767669.5 ns 56743162658.5 ns 1.00
latency/ttfp 8151877959 ns 8292887489.5 ns 0.98
latency/import 4476539160 ns 4493784612 ns 1.00
integration/volumerhs 9626196 ns 9612835.5 ns 1.00
integration/byval/slices=1 146702 ns 146961 ns 1.00
integration/byval/slices=3 425728 ns 425977 ns 1.00
integration/byval/reference 144962 ns 145162 ns 1.00
integration/byval/slices=2 286169 ns 286531 ns 1.00
integration/cudadevrt 103614 ns 103664 ns 1.00
kernel/indexing 14242 ns 14225 ns 1.00
kernel/indexing_checked 15077 ns 14963.5 ns 1.01
kernel/occupancy 679.5894039735099 ns 712.5909090909091 ns 0.95
kernel/launch 2192.6666666666665 ns 2140.1111111111113 ns 1.02
kernel/rand 16554 ns 17014 ns 0.97
array/reverse/1d 20068 ns 19857 ns 1.01
array/reverse/2dL_inplace 66725 ns 66720 ns 1.00
array/reverse/1dL 70175 ns 70068 ns 1.00
array/reverse/2d 21616 ns 21721 ns 1.00
array/reverse/1d_inplace 9814 ns 11535 ns 0.85
array/reverse/2d_inplace 11086 ns 13153 ns 0.84
array/reverse/2dL 73496 ns 73755 ns 1.00
array/reverse/1dL_inplace 66966 ns 66862 ns 1.00
array/copy 20780 ns 20647 ns 1.01
array/iteration/findall/int 157192 ns 158235 ns 0.99
array/iteration/findall/bool 139633 ns 139770.5 ns 1.00
array/iteration/findfirst/int 160992 ns 161047 ns 1.00
array/iteration/findfirst/bool 161761.5 ns 162113 ns 1.00
array/iteration/scalar 72712 ns 73378 ns 0.99
array/iteration/logical 214671 ns 216537 ns 0.99
array/iteration/findmin/1d 50297 ns 50322 ns 1.00
array/iteration/findmin/2d 96461.5 ns 96281.5 ns 1.00
array/reductions/reduce/Int64/1d 43889 ns 43275 ns 1.01
array/reductions/reduce/Int64/dims=1 46439 ns 44878 ns 1.03
array/reductions/reduce/Int64/dims=2 61392 ns 61376 ns 1.00
array/reductions/reduce/Int64/dims=1L 88876 ns 89018 ns 1.00
array/reductions/reduce/Int64/dims=2L 88113 ns 87717 ns 1.00
array/reductions/reduce/Float32/1d 36603 ns 36706 ns 1.00
array/reductions/reduce/Float32/dims=1 42040 ns 41841.5 ns 1.00
array/reductions/reduce/Float32/dims=2 59765 ns 59890 ns 1.00
array/reductions/reduce/Float32/dims=1L 52507 ns 52369 ns 1.00
array/reductions/reduce/Float32/dims=2L 72274 ns 71845 ns 1.01
array/reductions/mapreduce/Int64/1d 43594 ns 43034 ns 1.01
array/reductions/mapreduce/Int64/dims=1 44294 ns 44568 ns 0.99
array/reductions/mapreduce/Int64/dims=2 61490 ns 61598 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 88938 ns 88831 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 88951 ns 88197 ns 1.01
array/reductions/mapreduce/Float32/1d 36459 ns 36550 ns 1.00
array/reductions/mapreduce/Float32/dims=1 41549 ns 51845 ns 0.80
array/reductions/mapreduce/Float32/dims=2 59765 ns 60046 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 52678.5 ns 52895 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 72202 ns 72274 ns 1.00
array/broadcast 20156 ns 20228 ns 1.00
array/copyto!/gpu_to_gpu 11423 ns 12997 ns 0.88
array/copyto!/cpu_to_gpu 214679 ns 214588 ns 1.00
array/copyto!/gpu_to_cpu 284040.5 ns 283061 ns 1.00
array/accumulate/Int64/1d 124323 ns 124766 ns 1.00
array/accumulate/Int64/dims=1 83426 ns 83121 ns 1.00
array/accumulate/Int64/dims=2 157663.5 ns 157489 ns 1.00
array/accumulate/Int64/dims=1L 1709133 ns 1708744 ns 1.00
array/accumulate/Int64/dims=2L 966104 ns 966369 ns 1.00
array/accumulate/Float32/1d 108723 ns 109029 ns 1.00
array/accumulate/Float32/dims=1 80430 ns 80115 ns 1.00
array/accumulate/Float32/dims=2 147325 ns 147066 ns 1.00
array/accumulate/Float32/dims=1L 1618198 ns 1617852.5 ns 1.00
array/accumulate/Float32/dims=2L 698043 ns 697700.5 ns 1.00
array/construct 1275.6 ns 1284.9 ns 0.99
array/random/randn/Float32 48180.5 ns 44088.5 ns 1.09
array/random/randn!/Float32 24946 ns 24724 ns 1.01
array/random/rand!/Int64 27204 ns 27197 ns 1.00
array/random/rand!/Float32 8787 ns 8847.666666666666 ns 0.99
array/random/rand/Int64 29679 ns 29769 ns 1.00
array/random/rand/Float32 13217 ns 13169 ns 1.00
array/permutedims/4d 59712 ns 60066.5 ns 0.99
array/permutedims/2d 53842 ns 53803 ns 1.00
array/permutedims/3d 54689.5 ns 54690 ns 1.00
array/sorting/1d 2757106 ns 2756717 ns 1.00
array/sorting/by 3343924 ns 3343987 ns 1.00
array/sorting/2d 1087396 ns 1080056.5 ns 1.01
cuda/synchronization/stream/auto 1027.3 ns 1028.4 ns 1.00
cuda/synchronization/stream/nonblocking 7506.2 ns 7619.4 ns 0.99
cuda/synchronization/stream/blocking 791.5 ns 806.3333333333334 ns 0.98
cuda/synchronization/context/auto 1180.8 ns 1172.7 ns 1.01
cuda/synchronization/context/nonblocking 7228.9 ns 7177 ns 1.01
cuda/synchronization/context/blocking 902.2272727272727 ns 911.1923076923077 ns 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wrong sparse matrix-vector multiplication after v5.9+

2 participants