Skip to content

Commit f4533a7

Browse files
minor update
1 parent 3dfc77e commit f4533a7

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ The current version only achieves on average 70% performance of CuBLAS. I am sti
3131

3232
## Performance on A100 GPU
3333
<!-- ![image](static/a100-perf.png) -->
34-
<img src=static/this.png alt="A100-GEMM-perf" width="2000" height="700">
34+
<img src=static/this.png alt="A100-GEMM-perf" width="2000" height="600">
3535
The overall performance comparison among Relay, CuBLAS, CUTLASS, TensorIR, Triton, and our implementations. The y-axis is speedup to Relay+CUTLASS.
3636

3737
**Overall, the geometric mean speedup to Relay+CUTLASS is 1.73x, to TensorIR (1000 tuning trials using MetaSchedule per case) is 1.22x, to CuBLAS is 1.00x, to CUTLASS is 0.999x, to Triton is 1.07x.**

0 commit comments

Comments
 (0)