Skip to content

Simplify sparse type mappings#3076

Open
kshyatt wants to merge 2 commits intomasterfrom
ksh/sparse
Open

Simplify sparse type mappings#3076
kshyatt wants to merge 2 commits intomasterfrom
ksh/sparse

Conversation

@kshyatt
Copy link
Copy Markdown
Member

@kshyatt kshyatt commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 5dd9f87 Previous: 340127b Ratio
array/accumulate/Float32/1d 100754 ns 101458 ns 0.99
array/accumulate/Float32/dims=1 76473 ns 77354.5 ns 0.99
array/accumulate/Float32/dims=1L 1585449 ns 1586530.5 ns 1.00
array/accumulate/Float32/dims=2 143371 ns 144422 ns 0.99
array/accumulate/Float32/dims=2L 657279 ns 658433.5 ns 1.00
array/accumulate/Int64/1d 118263 ns 118510 ns 1.00
array/accumulate/Int64/dims=1 79581 ns 79806 ns 1.00
array/accumulate/Int64/dims=1L 1706091 ns 1705339 ns 1.00
array/accumulate/Int64/dims=2 156925 ns 156169.5 ns 1.00
array/accumulate/Int64/dims=2L 961168 ns 961890 ns 1.00
array/broadcast 20513 ns 20634 ns 0.99
array/construct 1274.6 ns 1265.4 ns 1.01
array/copy 18269 ns 18079 ns 1.01
array/copyto!/cpu_to_gpu 214123 ns 216989 ns 0.99
array/copyto!/gpu_to_cpu 280436 ns 284915 ns 0.98
array/copyto!/gpu_to_gpu 10977 ns 10779 ns 1.02
array/iteration/findall/bool 134196 ns 134702.5 ns 1.00
array/iteration/findall/int 148851 ns 150167 ns 0.99
array/iteration/findfirst/bool 80843 ns 81791 ns 0.99
array/iteration/findfirst/int 83434 ns 83856 ns 0.99
array/iteration/findmin/1d 86703.5 ns 86603.5 ns 1.00
array/iteration/findmin/2d 116663.5 ns 117335 ns 0.99
array/iteration/logical 195739 ns 198412.5 ns 0.99
array/iteration/scalar 67797 ns 69274.5 ns 0.98
array/permutedims/2d 51765 ns 52565 ns 0.98
array/permutedims/3d 52061 ns 52656 ns 0.99
array/permutedims/4d 51013 ns 51433.5 ns 0.99
array/random/rand/Float32 12888 ns 13120 ns 0.98
array/random/rand/Int64 24944 ns 24957 ns 1.00
array/random/rand!/Float32 8401.666666666666 ns 9858.333333333334 ns 0.85
array/random/rand!/Int64 21701 ns 21835 ns 0.99
array/random/randn/Float32 36602 ns 43515.5 ns 0.84
array/random/randn!/Float32 30844 ns 30949.5 ns 1.00
array/reductions/mapreduce/Float32/1d 34290.5 ns 34651 ns 0.99
array/reductions/mapreduce/Float32/dims=1 49209.5 ns 44540.5 ns 1.10
array/reductions/mapreduce/Float32/dims=1L 51219 ns 51138 ns 1.00
array/reductions/mapreduce/Float32/dims=2 56416 ns 56567 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 69240 ns 69274 ns 1.00
array/reductions/mapreduce/Int64/1d 42241 ns 43548.5 ns 0.97
array/reductions/mapreduce/Int64/dims=1 50891.5 ns 43771.5 ns 1.16
array/reductions/mapreduce/Int64/dims=1L 87167 ns 87029 ns 1.00
array/reductions/mapreduce/Int64/dims=2 59275 ns 59317 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 84506 ns 84771 ns 1.00
array/reductions/reduce/Float32/1d 34558 ns 35127 ns 0.98
array/reductions/reduce/Float32/dims=1 39512 ns 42032 ns 0.94
array/reductions/reduce/Float32/dims=1L 51243 ns 51445 ns 1.00
array/reductions/reduce/Float32/dims=2 56338 ns 57148 ns 0.99
array/reductions/reduce/Float32/dims=2L 69769 ns 70183 ns 0.99
array/reductions/reduce/Int64/1d 42569 ns 43251 ns 0.98
array/reductions/reduce/Int64/dims=1 41961 ns 51781 ns 0.81
array/reductions/reduce/Int64/dims=1L 86985 ns 87117 ns 1.00
array/reductions/reduce/Int64/dims=2 59408 ns 59507 ns 1.00
array/reductions/reduce/Int64/dims=2L 84411 ns 84592 ns 1.00
array/reverse/1d 17512 ns 17712 ns 0.99
array/reverse/1dL 68271 ns 68269 ns 1.00
array/reverse/1dL_inplace 65675 ns 65689 ns 1.00
array/reverse/1d_inplace 10248 ns 10212.666666666666 ns 1.00
array/reverse/2d 20802 ns 20819 ns 1.00
array/reverse/2dL 72786 ns 72976 ns 1.00
array/reverse/2dL_inplace 65736 ns 65812 ns 1.00
array/reverse/2d_inplace 10221 ns 10867 ns 0.94
array/sorting/1d 2734313.5 ns 2735648 ns 1.00
array/sorting/2d 1067889 ns 1069639 ns 1.00
array/sorting/by 3303732 ns 3305008 ns 1.00
cuda/synchronization/context/auto 1192.3 ns 1149 ns 1.04
cuda/synchronization/context/blocking 968.4827586206897 ns 961.56 ns 1.01
cuda/synchronization/context/nonblocking 7872.1 ns 6843.4 ns 1.15
cuda/synchronization/stream/auto 997.3 ns 1040.15 ns 0.96
cuda/synchronization/stream/blocking 827.7010309278351 ns 834.2666666666667 ns 0.99
cuda/synchronization/stream/nonblocking 7482 ns 7317.4 ns 1.02
integration/byval/reference 143800.5 ns 143691 ns 1.00
integration/byval/slices=1 145773 ns 145548 ns 1.00
integration/byval/slices=2 284740 ns 284456 ns 1.00
integration/byval/slices=3 423037 ns 423048 ns 1.00
integration/cudadevrt 102329 ns 102314 ns 1.00
integration/volumerhs 23439655.5 ns 23431006.5 ns 1.00
kernel/indexing 13295 ns 13281.5 ns 1.00
kernel/indexing_checked 14021 ns 13966 ns 1.00
kernel/launch 2028.9 ns 2168.222222222222 ns 0.94
kernel/occupancy 667.2967741935483 ns 669.85625 ns 1.00
kernel/rand 15019 ns 14425 ns 1.04
latency/import 3825173202 ns 3813842835.5 ns 1.00
latency/precompile 4577693352.5 ns 4579317015 ns 1.00
latency/ttfp 4418922320.5 ns 4395787908 ns 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@kshyatt kshyatt force-pushed the ksh/sparse branch 2 times, most recently from a2f9500 to 35efee6 Compare April 8, 2026 08:05
@gdalle
Copy link
Copy Markdown
Contributor

gdalle commented Apr 11, 2026

Looks like it's time to tag a release of GPUArrays?

Comment thread lib/cusparse/src/array.jl Outdated
kshyatt and others added 2 commits April 17, 2026 13:52
Co-authored-by: Guillaume Dalle <22795598+gdalle@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants