Conversation
|
Sorry for the previous chaos, I thought these parts will not be publish as part of the package. The following changes have been made:
|
|
Hi @maleadt, I am mentoring an Open Source Promotion Plan student to implement Tropical GEMM on GPUs. Regarding the recent update in GemmKernels.jl: JuliaGPU/GemmKernels.jl#101, I was suggesting him to try the GemmKernels.jl to make the implementation compatible with Julia CUDA ecosystem. However from the above benchmark, we can see its performance is not as good as the 600 line C code. We might need your help to decide which way to go is technically more feasible:
Also, @ArrogantGao found NOTE: All the benchmarks and implementations are included in this repo. |
| @@ -0,0 +1 @@ | |||
| {} No newline at end of file | |||
There was a problem hiding this comment.
vscode configuration files should not be commited.
| @@ -0,0 +1,627 @@ | |||
| // This CUDA code is modified based on github repo https://github.com/Yinghan-Li/YHs_Sample, which is under GPL 3.0 License | |||
There was a problem hiding this comment.
Holy, the GPL3 license, that is sexy. If we decide to keep this version in our code base, we have to include GPL3 license.
To “propagate” a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well.
I would recommend doing so. An all-Julia implementation is always preferable, for so many reasons: support for different datatypes, easier to tune using metaprogramming instead of the hard-coded 128x128x8 here, easier for other people to contribute to, etc. The code generated by GemmKernels.jl is generally pretty good, so it should be possible to compare the generated PTX code of both implementations, and/or use NSight Compute to compare executions. Maybe it's something simple, like GemmKernels.jl not using |
|
Remove the .vscode file and changed the license to GPL 3.0 (indeed, I also like that better). |
|
@maleadt Thank you first your prompt reply. @ArrogantGao Let us do some profiling and get some understanding about the performance issues. Let me merge the PR first, and move the discussion to: #2 , we can update the profiling result and generated ptx code there. |


No description provided.