Skip to content

bertmaher/simplegemm

Repository files navigation

Pingpong GEMM from scratch

I wrote this kernel to see if I could match CUTLASS's "pingpong" GEMM algorithm using hand-written CUDA. I used https://github.com/pranjalssh/fast.cu by Pranjal Shankhdhar as a starting point, having been heavily inspired by the fantastic blog post Outperforming cuBLAS on H100.

You can run a quick check of the kernel with:

make gemm && ./gemm

And run a sweep through a bunch of different shapes with:

python setup.py develop && python benchmark.py

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published