Skip to content

Conversation

@zhang-hui-yulo
Copy link
Contributor

Enable mmf for RDNA3, all mul_mat_f related cases shall pass, still getting the perf data.

There is also perf regression in mul_mat_f on my 7900XTX, I assume it's the similar issue as ROCm/ROCm#5727.

If anyone can help to collect the perf data of MUL_MAT on other RDNA3, that will be very helpful. If there is perf improvement, I will still enable mul_mat_f on RDNA3 and ask ROCm to improve the perf, or I will suggest to disable mul_mat_f on RDNA3.

MUL_MAT_ID_FUSION_rdna3_test.txt
MUL_MAT_ID_rdna3_test.txt
MUL_MAT_rdna3_test.txt
MUL_MAT_ID_FUSION_rdna4_test.txt
MUL_MAT_ID_rdna4_test.txt
MUL_MAT_rdna4_test.txt

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Dec 9, 2025
@zhang-hui-yulo zhang-hui-yulo changed the title enable mmf for RDNA3 HIP: enable mmf for RDNA3 Dec 9, 2025
@zhang-hui-yulo
Copy link
Contributor Author

zhang-hui-yulo commented Dec 10, 2025

Add the perf data of ops on windows, windows data is unstable, but this is the only RDNA3 I have. I will be very helpful if anyone can have a test other RDNA3 GPUs on Linux, thank you.

MUL_MAT
Backend GGML op Op parameters TFLOPS master TFLOPS mmf_for_rdna3 Speedup
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 0.84 0.83 0.98
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.66 1.65 1.00
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.78 1.76 0.99
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.61 2.75 1.05
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.27 3.23 0.99
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 60.03 57.96 0.97
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.14 3.16 0.61
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=128,n=1,k=16416,bs=[8,1],nr=[4,1],per=[0,1,2,3],k_v=32832,o=1 1.77 1.68 0.95
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=16416,n=1,k=128,bs=[8,1],nr=[4,1],per=[0,2,1,3],k_v=0,o=1 0.19 0.19 0.99
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 0.85 0.84 0.99
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.67 1.66 0.99
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.26 2.17 0.96
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.63 2.55 0.97
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.88 2.80 0.97
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 54.62 52.17 0.96
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.65 5.81 1.25
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 0.46 0.45 0.99
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 0.91 0.91 1.00
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.35 1.34 1.00
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.78 1.78 1.00
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.18 2.10 0.96
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 21.71 21.58 0.99
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.77 2.75 1.00
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.69 4.63 0.99
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.99 6.78 0.97
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.54 8.44 0.99
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.38 9.23 0.98
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 10.00 9.98 1.00
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 49.96 50.82 1.02
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.76 9.86 1.01
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.24 4.94 0.94
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.78 7.37 0.95
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.36 9.45 1.01
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 10.14 10.06 0.99
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 11.46 11.39 0.99
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 47.74 48.65 1.02
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 12.31 12.16 0.99
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.04 2.02 0.99
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.69 3.65 0.99
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.98 4.80 0.96
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.90 5.73 0.97
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.94 5.94 1.00
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 41.00 41.54 1.01
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.64 7.60 0.99
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.83 2.76 0.97
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.86 4.71 0.97
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.38 6.35 1.00
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.59 7.41 0.98
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.14 8.17 1.00
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 38.11 38.96 1.02
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.92 8.96 1.00
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.12 2.08 0.98
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.85 3.76 0.98
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.21 5.08 0.98
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.29 6.10 0.97
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.96 6.91 0.99
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 47.61 48.78 1.02
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.83 7.84 1.00
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.96 1.94 0.99
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.68 3.55 0.97
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.00 4.87 0.97
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.02 5.92 0.98
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.08 7.13 1.01
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 47.28 49.05 1.04
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.51 8.54 1.00
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.71 2.65 0.98
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.79 4.69 0.98
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.41 6.36 0.99
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.55 7.54 1.00
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.41 8.39 1.00
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 50.52 52.03 1.03
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.92 9.03 1.01
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.07 4.95 0.98
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.64 7.51 0.98
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.99 8.79 0.98
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.71 9.62 0.99
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 10.42 10.41 1.00
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 52.90 56.76 1.07
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 12.32 12.33 1.00
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.87 5.71 0.97
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.14 8.91 0.97
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 10.52 10.31 0.98
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 11.43 11.26 0.98
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 12.38 12.31 0.99
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 54.99 56.08 1.02
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 12.89 12.95 1.01
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.99 4.87 0.98
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.64 7.48 0.98
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.00 8.82 0.98
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.53 9.47 0.99
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 10.66 10.55 0.99
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 52.26 55.86 1.07
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 12.26 12.32 1.00
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.40 3.39 1.00
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.45 4.28 0.96
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.84 4.74 0.98
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.01 4.87 0.97
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.05 4.94 0.98
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 23.15 23.55 1.02
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.82 4.81 1.00
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.12 2.07 0.98
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.54 3.44 0.97
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.27 4.18 0.98
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.74 4.62 0.97
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.04 4.97 0.99
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 44.18 45.33 1.03
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.90 4.89 1.00
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.41 5.22 0.96
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.19 7.88 0.96
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.41 9.12 0.97
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.97 9.62 0.97
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 10.20 10.08 0.99
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 51.53 53.54 1.04
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 12.04 11.97 0.99
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.98 5.80 0.97
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.94 8.59 0.96
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 10.49 10.26 0.98
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 10.42 10.07 0.97
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 11.03 10.69 0.97
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 46.48 48.31 1.04
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 12.47 12.69 1.02
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.67 2.60 0.97
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.91 3.83 0.98
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.52 4.42 0.98
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.71 4.57 0.97
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.86 4.78 0.98
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 48.81 48.84 1.00
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.15 5.14 1.00
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.01 3.92 0.98
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.44 6.16 0.96
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.95 7.71 0.97
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.57 8.25 0.96
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.40 9.13 0.97
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 48.59 51.12 1.05
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 11.14 11.10 1.00
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.13 4.06 0.98
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.78 6.54 0.96
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.30 7.99 0.96
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.37 9.11 0.97
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 10.07 9.76 0.97
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 44.47 45.25 1.02
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 11.84 11.84 1.00
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.55 2.47 0.97
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.74 3.69 0.99
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.33 4.28 0.99
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.62 4.50 0.98
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.73 4.66 0.98
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 45.06 46.36 1.03
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.02 5.01 1.00
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.26 2.20 0.97
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.55 3.47 0.98
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.38 4.30 0.98
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.94 4.90 0.99
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.35 5.30 0.99
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 32.18 32.19 1.00
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.12 6.12 1.00
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.71 3.59 0.97
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.10 5.85 0.96
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.42 7.17 0.97
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.24 7.91 0.96
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.66 8.49 0.98
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 50.38 51.88 1.03
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.55 9.43 0.99
MUL_MAT_ID
Backend GGML op Op parameters TFLOPS master TFLOPS mmf_for_rdna3 Speedup
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 0.94 0.95 1.00
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 1.07 4.91 4.59
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 2.10 6.23 2.96
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 0.32 1.79 5.66
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 0.09 1.43 16.66
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 3.83 6.80 1.77
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 0.55 2.92 5.30
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 0.11 0.89 7.87
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 0.94 0.94 1.00
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 2.80 6.00 2.14
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 6.75 5.13 0.76
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 0.69 2.86 4.16
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 0.19 2.30 11.93
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 12.17 9.31 0.77
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 1.42 4.95 3.49
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 0.28 1.08 3.81
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 0.91 0.91 1.00
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 0.75 0.75 1.00
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 0.85 0.93 1.10
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 0.53 0.52 0.97
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 0.20 0.20 1.01
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 1.56 1.59 1.02
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 0.81 0.82 1.02
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 0.27 0.25 0.94
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 0.91 0.92 1.01
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 1.39 1.34 0.96
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 2.77 2.59 0.93
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 0.95 1.00 1.05
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 0.29 0.26 0.90
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 4.45 4.12 0.93
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 1.15 1.18 1.03
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 0.46 0.39 0.84
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 2.04 2.05 1.00
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 2.83 2.81 0.99
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 5.17 5.22 1.01
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 2.33 2.27 0.98
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 0.58 0.60 1.04
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 9.49 9.13 0.96
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 2.75 2.72 0.99
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 0.77 0.85 1.11
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 2.34 2.33 1.00
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 5.31 5.31 1.00
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 10.09 10.09 1.00
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 3.86 3.83 0.99
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 0.73 0.64 0.87
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 19.33 19.38 1.00
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 5.02 5.01 1.00
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 1.05 1.02 0.96
ROCm0 MUL_MAT_ID type_a=mxfp4,type_b=f32,n_mats=32,n_used=4,b=0,m=2880,n=1,k=2880 4.19 4.21 1.00
ROCm0 MUL_MAT_ID type_a=mxfp4,type_b=f32,n_mats=32,n_used=4,b=0,m=2880,n=4,k=2880 1.79 1.80 1.00
ROCm0 MUL_MAT_ID type_a=mxfp4,type_b=f32,n_mats=32,n_used=4,b=0,m=2880,n=512,k=2880 25.36 25.53 1.01
ROCm0 MUL_MAT_ID type_a=mxfp4,type_b=f32,n_mats=32,n_used=4,b=0,m=2880,n=8,k=2880 2.32 2.36 1.02
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 2.69 2.94 1.09
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 3.76 3.74 0.99
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 6.85 6.75 0.99
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 1.27 1.33 1.05
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 1.47 1.36 0.93
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 12.77 12.68 0.99
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 3.06 3.05 1.00
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 1.89 1.83 0.97
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 3.52 3.13 0.89
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 7.37 7.34 1.00
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 14.07 14.14 1.01
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 2.43 2.43 1.00
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 1.61 1.59 0.99
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 25.60 25.76 1.01
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 6.02 6.01 1.00
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 2.40 2.02 0.84
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 2.22 2.11 0.95
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 3.49 3.47 0.99
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 6.35 6.28 0.99
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 2.80 2.58 0.92
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 1.70 1.63 0.96
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 11.80 11.83 1.00
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 3.12 3.12 1.00
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 1.99 2.21 1.11
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 2.18 2.20 1.01
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 6.77 6.76 1.00
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 12.93 12.95 1.00
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 4.95 4.92 0.99
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 1.77 2.10 1.19
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 23.69 23.89 1.01
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 6.17 6.16 1.00
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 2.57 2.46 0.96
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 1.69 1.69 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 2.29 2.28 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 4.20 4.14 0.99
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 1.72 1.77 1.03
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 1.07 1.10 1.03
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 7.94 7.91 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 2.01 2.00 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 1.30 1.34 1.03
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 1.84 1.85 1.01
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 4.31 4.28 0.99
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 8.20 8.20 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 3.10 3.09 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 1.10 1.21 1.09
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 15.47 15.44 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 3.86 3.89 1.01
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 1.75 1.85 1.06
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 2.68 2.70 1.01
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 3.70 3.69 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 6.85 6.82 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 1.27 1.24 0.98
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 1.37 1.48 1.08
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 12.84 12.80 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 2.92 2.93 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 1.81 1.60 0.88
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 2.75 2.75 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 7.02 6.98 0.99
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 13.46 13.46 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 2.23 2.26 1.01
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 1.58 1.68 1.07
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 24.87 24.91 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 5.53 5.50 0.99
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 2.18 2.23 1.02

@zhang-hui-yulo
Copy link
Contributor Author

zhang-hui-yulo commented Dec 10, 2025

Finally I can get an Ubuntu 22.04 work, just add the data on it with ROCm 7.1.0, unlike my 9070XT, looks like that 7900XTX can get perf improvement on mul_mat_f, this is why I doubt that ROCm compiler doesn't do optimization for RDNA4.

MUL_MAT
Backend GGML op Op parameters TFLOPS master TFLOPS mmf_for_rdna3 Speedup
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 0.80 0.80 1.00
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.58 1.58 1.00
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.71 1.71 0.99
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.49 2.99 1.20
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.10 3.70 1.19
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 62.91 63.53 1.01
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.94 5.80 1.17
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=128,n=1,k=16416,bs=[8,1],nr=[4,1],per=[0,1,2,3],k_v=32832,o=1 1.67 1.66 1.00
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=16416,n=1,k=128,bs=[8,1],nr=[4,1],per=[0,2,1,3],k_v=0,o=1 0.12 0.12 1.02
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 0.98 0.80 0.82
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.59 1.59 1.00
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.16 2.15 1.00
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.56 2.55 1.00
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.80 2.81 1.00
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 58.77 58.40 0.99
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.46 5.90 1.32
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 0.45 0.45 1.00
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 0.89 0.89 1.00
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.31 1.31 1.00
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.75 1.74 1.00
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.16 2.15 1.00
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 21.93 21.95 1.00
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.79 2.80 1.00
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.11 4.09 0.99
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.44 6.41 1.00
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.85 7.83 1.00
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.87 8.85 1.00
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.96 9.00 1.00
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 51.83 51.90 1.00
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.27 8.22 0.99
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.56 4.51 0.99
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.01 6.97 0.99
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.69 8.61 0.99
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.49 9.63 1.01
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 10.15 10.23 1.01
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 49.47 49.42 1.00
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 10.74 10.70 1.00
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.90 1.89 1.00
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.36 3.39 1.01
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.56 4.60 1.01
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.32 5.30 1.00
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.18 5.18 1.00
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 41.64 41.60 1.00
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.86 6.88 1.00
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.63 2.62 0.99
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.48 4.47 1.00
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.96 6.00 1.01
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.05 7.04 1.00
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.67 7.68 1.00
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 37.68 37.54 1.00
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.58 7.60 1.00
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.92 1.94 1.01
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.43 3.40 0.99
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.64 4.62 1.00
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.51 5.52 1.00
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.41 6.41 1.00
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 49.06 49.07 1.00
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.82 6.83 1.00
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.84 1.83 1.00
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.33 3.30 0.99
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.61 4.59 1.00
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.56 5.54 1.00
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.48 6.49 1.00
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 49.16 49.01 1.00
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.54 7.57 1.00
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.51 2.51 1.00
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.43 4.40 0.99
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.10 6.07 1.00
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.17 7.15 1.00
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.09 8.08 1.00
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 51.92 51.88 1.00
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.61 7.61 1.00
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.16 4.14 0.99
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.72 6.68 0.99
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.10 8.08 1.00
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.92 8.95 1.00
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.73 9.76 1.00
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 55.80 55.77 1.00
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 11.89 11.93 1.00
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.84 4.80 0.99
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.96 7.95 1.00
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.29 9.26 1.00
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 10.57 10.58 1.00
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 11.53 11.55 1.00
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 55.71 55.64 1.00
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 12.63 12.65 1.00
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.95 3.95 1.00
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.53 6.49 0.99
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.61 7.60 1.00
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.71 8.70 1.00
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.78 9.77 1.00
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 56.94 56.21 0.99
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 11.73 11.76 1.00
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.10 3.09 1.00
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.03 4.03 1.00
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.44 4.43 1.00
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.38 4.39 1.00
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.73 4.75 1.00
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 22.47 22.21 0.99
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.41 4.42 1.00
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.95 1.97 1.01
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.23 3.22 1.00
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.75 3.74 1.00
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.37 4.36 1.00
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.61 4.62 1.00
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 46.00 45.96 1.00
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.13 4.16 1.01
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.50 4.49 1.00
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.00 7.00 1.00
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.19 8.13 0.99
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.98 9.00 1.00
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.45 9.44 1.00
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 54.01 54.05 1.00
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 11.03 11.07 1.00
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.96 4.92 0.99
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.86 7.87 1.00
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.09 9.06 1.00
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.43 9.40 1.00
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 10.04 10.10 1.01
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 49.50 49.56 1.00
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 11.82 11.81 1.00
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.43 2.42 1.00
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.69 3.68 1.00
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.07 4.06 1.00
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.41 4.40 1.00
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.54 4.56 1.00
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 49.32 49.28 1.00
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.51 4.54 1.01
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.47 3.47 1.00
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.79 5.78 1.00
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.17 7.15 1.00
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.83 7.82 1.00
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.71 8.74 1.00
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 51.73 51.70 1.00
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 10.80 10.84 1.00
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.59 3.57 0.99
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.02 5.97 0.99
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.51 7.48 1.00
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.53 8.54 1.00
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.44 9.45 1.00
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 46.27 46.20 1.00
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 11.15 11.18 1.00
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.32 2.33 1.00
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.52 3.50 0.99
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.99 3.98 1.00
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.43 4.42 1.00
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.60 4.60 1.00
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 47.06 47.06 1.00
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.48 4.52 1.01
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.96 1.95 1.00
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.02 3.01 0.99
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.84 3.84 1.00
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.53 4.54 1.00
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.04 5.07 1.01
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 29.28 29.32 1.00
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.95 5.96 1.00
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.31 3.30 1.00
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.58 5.53 0.99
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.79 6.78 1.00
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.60 7.60 1.00
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.13 8.15 1.00
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 54.36 50.99 0.94
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.96 9.03 1.01
MUL_MAT_ID
Backend GGML op Op parameters TFLOPS master TFLOPS mmf_for_rdna3 Speedup
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 0.87 0.87 1.00
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 1.24 4.89 3.95
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 2.35 6.27 2.67
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 0.37 1.69 4.60
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 0.16 1.63 9.97
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 4.46 6.86 1.54
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 0.63 2.94 4.64
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 0.18 0.94 5.18
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 0.88 0.88 1.00
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 4.33 6.37 1.47
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 8.12 8.12 1.00
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 1.12 2.84 2.55
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 0.33 1.44 4.33
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 14.52 14.42 0.99
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 2.20 4.69 2.13
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 0.47 1.06 2.27
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 0.85 0.84 0.99
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 0.73 0.72 0.99
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 0.95 0.95 1.00
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 0.31 0.30 0.98
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 0.13 0.13 0.95
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 1.81 1.84 1.01
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 0.51 0.51 1.00
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 0.16 0.16 1.00
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 0.85 0.84 1.00
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 1.58 1.62 1.02
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 2.88 2.88 1.00
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 1.07 1.04 0.97
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 0.31 0.35 1.13
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 4.60 4.60 1.00
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 1.26 1.25 1.00
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 0.38 0.44 1.16
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 1.48 1.49 1.01
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 2.50 2.51 1.00
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 4.62 4.57 0.99
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 2.18 2.18 1.00
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 0.61 0.65 1.06
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 8.99 8.94 0.99
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 2.74 2.72 0.99
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 0.77 0.85 1.10
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 1.61 1.62 1.01
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 4.95 4.89 0.99
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 9.52 9.47 0.99
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 3.70 3.76 1.01
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 0.54 0.63 1.18
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 19.17 19.07 0.99
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 5.01 5.03 1.00
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 0.85 0.85 1.00
ROCm0 MUL_MAT_ID type_a=mxfp4,type_b=f32,n_mats=32,n_used=4,b=0,m=2880,n=1,k=2880 3.28 3.28 1.00
ROCm0 MUL_MAT_ID type_a=mxfp4,type_b=f32,n_mats=32,n_used=4,b=0,m=2880,n=4,k=2880 1.66 1.46 0.88
ROCm0 MUL_MAT_ID type_a=mxfp4,type_b=f32,n_mats=32,n_used=4,b=0,m=2880,n=512,k=2880 25.63 25.68 1.00
ROCm0 MUL_MAT_ID type_a=mxfp4,type_b=f32,n_mats=32,n_used=4,b=0,m=2880,n=8,k=2880 2.34 2.49 1.06
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 1.03 1.03 1.00
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 3.61 3.60 1.00
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 6.61 6.58 1.00
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 1.30 1.33 1.02
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 1.25 1.44 1.15
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 12.44 12.38 1.00
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 2.95 2.93 0.99
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 1.62 1.80 1.11
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 2.45 2.45 1.00
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 7.17 7.18 1.00
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 13.87 13.88 1.00
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 2.33 2.38 1.02
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 1.43 1.39 0.97
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 25.46 25.27 0.99
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 5.90 5.89 1.00
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 2.23 2.08 0.93
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 0.92 0.91 0.98
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 3.38 3.38 1.00
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 6.21 6.17 0.99
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 2.45 2.63 1.07
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 1.57 1.51 0.96
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 11.69 11.58 0.99
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 3.02 3.09 1.03
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 2.03 1.98 0.97
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 1.76 1.75 1.00
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 6.61 6.62 1.00
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 12.85 12.80 1.00
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 4.71 4.67 0.99
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 1.94 1.66 0.86
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 23.85 23.41 0.98
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 6.06 6.06 1.00
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 2.45 2.52 1.03
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 1.32 1.33 1.01
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 2.05 2.04 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 3.71 3.70 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 1.66 1.65 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 0.97 1.18 1.21
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 7.14 7.12 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 1.91 1.91 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 1.34 1.24 0.93
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 1.46 1.46 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 3.90 3.88 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 7.41 7.53 1.02
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 3.10 2.91 0.94
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 1.12 1.21 1.08
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 14.05 14.02 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 3.85 3.84 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 1.58 1.52 0.96
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 1.02 1.02 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 3.60 3.59 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 6.61 6.57 0.99
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 1.21 1.19 0.99
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 1.37 1.35 0.99
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 12.40 12.38 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 2.86 2.84 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 1.75 1.75 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 2.01 2.00 0.99
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 6.78 6.79 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 13.15 13.16 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 2.12 2.08 0.98
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 1.79 1.46 0.82
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 24.45 24.58 1.01
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 5.41 5.41 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 2.32 2.12 0.92

@lovedheart
Copy link
Contributor

lovedheart commented Dec 10, 2025

Sees remarkable boost for MUL_MAT_ID on the entry-level iGPU 780M in Linux

MUL_MAT_ID Master GFLOPS PR GFLOPS Speedup (%)
MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048) 40.50 40.44 -0.15%
MUL_MAT_ID(type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048) 75.97 72.12 -5.07%
MUL_MAT_ID(type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048) 236.84 235.41 -0.60%
MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048) 136.09 135.77 -0.24%
MUL_MAT_ID(type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048) 225.53 228.93 +1.51%
MUL_MAT_ID(type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048) 166.65 166.10 -0.33%
MUL_MAT_ID(type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048) 261.99 266.16 +1.60%
MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048) 41.86 40.89 -2.32%
MUL_MAT_ID(type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048) 39.03 83.35 +113.58%
MUL_MAT_ID(type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048) 164.28 171.61 +4.46%
MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048) 121.42 122.17 +0.62%
MUL_MAT_ID(type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048) 171.09 174.34 +1.90%
MUL_MAT_ID(type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048) 133.35 143.51 +7.61%
MUL_MAT_ID(type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048) 97.91 108.93 +11.26%
MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048) 47.29 48.20 +1.92%
MUL_MAT_ID(type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048) 44.51 95.61 +114.81%
MUL_MAT_ID(type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048) 196.88 208.02 +5.65%
MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048) 164.40 149.59 -8.99%
MUL_MAT_ID(type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048) 193.17 188.16 -2.60%
MUL_MAT_ID(type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048) 179.62 154.65 -13.89%
MUL_MAT_ID(type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048) 113.61 111.65 -1.72%
MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048) 81.90 85.48 +4.37%
MUL_MAT_ID(type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048) 81.54 179.12 +119.57%
MUL_MAT_ID(type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048) 136.73 147.13 +7.60%
MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048) 132.98 137.55 +3.44%
MUL_MAT_ID(type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048) 284.92 285.65 +0.26%
MUL_MAT_ID(type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048) 117.35 114.91 -2.08%
MUL_MAT_ID(type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048) 161.61 171.32 +6.00%
MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048) 138.24 141.48 +2.34%
MUL_MAT_ID(type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048) 152.01 284.85 +87.54%
MUL_MAT_ID(type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048) 143.82 144.02 +0.14%
MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048) 154.02 150.55 -2.25%
MUL_MAT_ID(type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048) 225.08 236.00 +4.85%
MUL_MAT_ID(type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048) 249.66 246.03 -1.45%
MUL_MAT_ID(type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048) 167.35 168.32 +0.58%
MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048) 169.13 167.47 -0.98%
MUL_MAT_ID(type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048) 276.95 480.21 +73.43%
MUL_MAT_ID(type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048) 493.82 495.35 +0.31%
MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048) 143.68 144.96 +0.89%
MUL_MAT_ID(type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048) 432.26 436.68 +1.02%
MUL_MAT_ID(type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048) 280.60 286.80 +2.21%
MUL_MAT_ID(type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048) 397.85 409.08 +2.82%
MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048) 231.17 233.17 +0.86%
MUL_MAT_ID(type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048) 525.46 596.78 +13.57%
MUL_MAT_ID(type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048) 907.54 983.31 +8.35%
MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048) 282.41 280.18 -0.79%
MUL_MAT_ID(type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048) 854.11 855.28 +0.14%
MUL_MAT_ID(type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048) 555.12 581.03 +4.67%
MUL_MAT_ID(type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048) 786.57 810.20 +3.00%
MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048) 350.28 345.20 -1.45%
MUL_MAT_ID(type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048) 912.37 677.10 -25.78%
MUL_MAT_ID(type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048) 1840.00 1840.00 +0.00%
MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048) 553.47 552.88 -0.11%
MUL_MAT_ID(type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048) 1580.00 1590.00 +0.63%
MUL_MAT_ID(type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048) 1150.00 1150.00 +0.00%
MUL_MAT_ID(type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048) 1550.00 1560.00 +0.65%
MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048) 40.37 39.31 -2.63%
MUL_MAT_ID(type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048) 72.68 70.66 -2.78%
MUL_MAT_ID(type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048) 247.79 244.66 -1.27%
MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048) 135.52 138.55 +2.24%
MUL_MAT_ID(type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048) 240.35 236.84 -1.46%
MUL_MAT_ID(type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048) 169.76 169.09 -0.40%
MUL_MAT_ID(type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048) 272.53 269.17 -1.23%
MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048) 52.76 52.78 +0.04%
MUL_MAT_ID(type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048) 37.88 91.53 +141.63%
MUL_MAT_ID(type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048) 215.37 235.55 +9.37%
MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048) 162.75 150.86 -7.30%
MUL_MAT_ID(type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048) 182.63 204.08 +11.75%
MUL_MAT_ID(type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048) 152.32 146.52 -3.81%
MUL_MAT_ID(type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048) 122.33 112.20 -8.27%
MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048) 66.83 60.74 -9.12%
MUL_MAT_ID(type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048) 50.04 120.82 +141.45%
MUL_MAT_ID(type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048) 274.38 275.83 +0.53%
MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048) 192.81 185.93 -3.57%
MUL_MAT_ID(type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048) 244.31 245.63 +0.54%
MUL_MAT_ID(type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048) 196.42 199.60 +1.62%
MUL_MAT_ID(type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048) 138.05 178.02 +28.96%
MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048) 150.08 150.04 -0.03%
MUL_MAT_ID(type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048) 127.40 300.01 +135.50%
MUL_MAT_ID(type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048) 236.29 238.07 +0.75%
MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048) 227.32 230.48 +1.39%
MUL_MAT_ID(type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048) 563.40 549.86 -2.40%
MUL_MAT_ID(type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048) 205.24 204.53 -0.35%
MUL_MAT_ID(type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048) 291.42 319.31 +9.57%
MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048) 182.24 188.07 +3.21%
MUL_MAT_ID(type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048) 251.97 541.53 +114.93%
MUL_MAT_ID(type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048) 280.83 269.39 -4.07%
MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048) 282.81 275.54 -2.57%
MUL_MAT_ID(type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048) 420.40 415.02 -1.28%
MUL_MAT_ID(type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048) 485.17 460.99 -5.00%
MUL_MAT_ID(type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048) 341.14 319.07 -6.47%
MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048) 275.71 283.87 +2.96%
MUL_MAT_ID(type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048) 480.38 730.11 +52.00%
MUL_MAT_ID(type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048) 954.25 968.50 +1.50%
MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048) 286.95 284.66 -0.79%
MUL_MAT_ID(type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048) 867.92 873.01 +0.58%
MUL_MAT_ID(type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048) 560.34 545.36 -2.67%
MUL_MAT_ID(type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048) 826.45 820.29 -0.75%
MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048) 397.44 400.42 +0.75%
MUL_MAT_ID(type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048) 896.66 891.31 -0.60%
MUL_MAT_ID(type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048) 1960.00 1960.00 +0.00%
MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048) 573.49 566.48 -1.22%
MUL_MAT_ID(type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048) 1730.00 1760.00 +1.73%
MUL_MAT_ID(type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048) 1170.00 1170.00 +0.00%
MUL_MAT_ID(type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048) 1600.00 1620.00 +1.25%
MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048) 505.53 503.97 -0.31%
MUL_MAT_ID(type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048) 1530.00 1540.00 +0.65%
MUL_MAT_ID(type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048) 3760.00 3620.00 -3.72%
MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048) 1140.00 1130.00 -0.88%
MUL_MAT_ID(type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048) 3170.00 3280.00 +3.47%
MUL_MAT_ID(type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048) 2280.00 2250.00 -1.32%
MUL_MAT_ID(type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048) 3050.00 3120.00 +2.30%
MUL_MAT_ID(type_a=mxfp4,type_b=f32,n_mats=32,n_used=4,b=0,m=2880,n=1,k=2880) 266.90 270.19 +1.23%
MUL_MAT_ID(type_a=mxfp4,type_b=f32,n_mats=32,n_used=4,b=0,m=2880,n=4,k=2880) 173.09 231.78 +33.93%
MUL_MAT_ID(type_a=mxfp4,type_b=f32,n_mats=32,n_used=4,b=0,m=2880,n=8,k=2880) 276.54 254.75 -7.88%
MUL_MAT_ID(type_a=mxfp4,type_b=f32,n_mats=32,n_used=4,b=0,m=2880,n=512,k=2880) 993.47 994.54 +0.11%

Largest gains:

f16 with n=4, n=8, n=32, n=64, n=128 → +114% to +141%  
q4_K with n=4, n=32, n=512 → +11.75% to +3.47%

Largest losses:

f16 with n=512 → -25.78%  
q4_0 with n=512 → -3.72%
MUL_MAT Master GFLOPS PR GFLOPS Speedup (%)
MUL_MAT(type_a=f16,type_b=f32,m=16416,n=1,k=128,bs=[8,1],nr=[4,1],per=[0,2,1,3],k_v=0,o=1) 20.22 19.51 -3.51%
MUL_MAT(type_a=f16,type_b=f32,m=128,n=1,k=16416,bs=[8,1],nr=[4,1],per=[0,1,2,3],k_v=32832,o=1) 73.50 73.43 -0.10%
MUL_MAT(type_a=f32,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 40.46 40.50 0.10%
MUL_MAT(type_a=f16,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 76.33 76.36 0.04%
MUL_MAT(type_a=bf16,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 75.92 76.04 0.16%
MUL_MAT(type_a=q4_0,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 275.05 275.04 0.00%
MUL_MAT(type_a=q4_1,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 239.54 249.82 4.29%
MUL_MAT(type_a=q5_0,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 221.20 219.87 -0.60%
MUL_MAT(type_a=q5_1,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 208.12 210.05 0.93%
MUL_MAT(type_a=q8_0,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 148.59 148.77 0.12%
MUL_MAT(type_a=mxfp4,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 268.81 288.83 7.45%
MUL_MAT(type_a=q2_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 428.65 435.62 1.63%
MUL_MAT(type_a=q3_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 280.58 305.21 8.78%
MUL_MAT(type_a=q4_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 262.12 266.61 1.71%
MUL_MAT(type_a=q5_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 215.64 222.78 3.31%
MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 187.18 176.74 -5.58%
MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 262.70 255.01 -2.93%
MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 411.52 364.84 -11.34%
MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 244.39 245.09 0.29%
MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 356.77 356.66 -0.03%
MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 627.69 625.74 -0.31%
MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 554.11 554.68 0.10%
MUL_MAT(type_a=iq4_nl,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 269.05 256.91 -4.51%
MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 216.11 228.83 5.89%
MUL_MAT(type_a=iq4_xs,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 282.11 290.74 3.06%
MUL_MAT(type_a=f32,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 80.34 80.43 0.11%
MUL_MAT(type_a=f16,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 150.65 149.48 -0.78%
MUL_MAT(type_a=bf16,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 147.70 147.85 0.10%
MUL_MAT(type_a=q4_0,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 523.72 519.24 -0.86%
MUL_MAT(type_a=q4_1,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 483.51 466.36 -3.55%
MUL_MAT(type_a=q5_0,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 452.00 439.52 -2.76%
MUL_MAT(type_a=q5_1,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 392.54 399.51 1.78%
MUL_MAT(type_a=q8_0,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 295.51 294.83 -0.23%
MUL_MAT(type_a=mxfp4,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 558.01 551.76 -1.12%
MUL_MAT(type_a=q2_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 667.15 668.84 0.25%
MUL_MAT(type_a=q3_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 527.49 541.40 2.64%
MUL_MAT(type_a=q4_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 410.62 438.11 6.69%
MUL_MAT(type_a=q5_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 373.56 368.55 -1.34%
MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 363.23 349.41 -3.80%
MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 479.64 506.91 5.69%
MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 665.95 672.52 0.99%
MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 437.29 440.13 0.65%
MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 618.33 634.69 2.65%
MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1040.00 1060.00 1.92%
MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 883.69 879.53 -0.47%
MUL_MAT(type_a=iq4_nl,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 543.23 542.01 -0.22%
MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 429.51 436.61 1.65%
MUL_MAT(type_a=iq4_xs,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 573.45 574.21 0.13%
MUL_MAT(type_a=f32,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 120.26 120.35 0.07%
MUL_MAT(type_a=f16,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 216.74 217.30 0.26%
MUL_MAT(type_a=bf16,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 162.14 160.30 -1.13%
MUL_MAT(type_a=q4_0,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 759.68 760.90 0.16%
MUL_MAT(type_a=q4_1,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 714.67 731.86 2.41%
MUL_MAT(type_a=q5_0,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 650.08 640.43 -1.48%
MUL_MAT(type_a=q5_1,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 604.43 615.72 1.87%
MUL_MAT(type_a=q8_0,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 439.82 437.58 -0.51%
MUL_MAT(type_a=mxfp4,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 810.97 831.60 2.54%
MUL_MAT(type_a=q2_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 746.16 746.71 0.07%
MUL_MAT(type_a=q3_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 610.47 589.86 -3.38%
MUL_MAT(type_a=q4_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 506.20 511.74 1.09%
MUL_MAT(type_a=q5_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 474.03 480.72 1.41%
MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 510.60 518.90 1.63%
MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 714.20 715.29 0.15%
MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 786.43 792.06 0.72%
MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 570.54 642.69 12.65%
MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 784.85 732.70 -6.64%
MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1150.00 1150.00 0.00%
MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1010.00 1060.00 4.95%
MUL_MAT(type_a=iq4_nl,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 779.32 770.50 -1.13%
MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 555.41 641.51 15.50%
MUL_MAT(type_a=iq4_xs,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 818.27 797.41 -2.55%
MUL_MAT(type_a=f32,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 160.64 160.73 0.06%
MUL_MAT(type_a=f16,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 271.54 269.82 -0.63%
MUL_MAT(type_a=bf16,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 220.30 313.90 42.49%
MUL_MAT(type_a=q4_0,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 943.85 1010.00 7.01%
MUL_MAT(type_a=q4_1,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 921.69 945.65 2.60%
MUL_MAT(type_a=q5_0,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 823.89 790.56 -4.05%
MUL_MAT(type_a=q5_1,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 795.60 760.19 -4.45%
MUL_MAT(type_a=q8_0,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 567.26 577.90 1.88%
MUL_MAT(type_a=mxfp4,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 986.44 995.99 0.97%
MUL_MAT(type_a=q2_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 581.73 606.76 4.30%
MUL_MAT(type_a=q3_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 479.96 510.14 6.29%
MUL_MAT(type_a=q4_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 577.80 566.34 -1.98%
MUL_MAT(type_a=q5_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 536.14 532.65 -0.65%
MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 648.26 634.53 -2.12%
MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 818.14 818.15 0.00%
MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 913.05 916.15 0.34%
MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 789.45 778.53 -1.38%
MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 834.26 833.42 -0.10%
MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1220.00 1220.00 0.00%
MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1100.00 1100.00 0.00%
MUL_MAT(type_a=iq4_nl,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 942.04 977.91 3.81%
MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 721.69 682.72 -5.40%
MUL_MAT(type_a=iq4_xs,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 980.47 984.41 0.40%
MUL_MAT(type_a=f32,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 200.67 200.74 0.03%
MUL_MAT(type_a=f16,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 287.06 292.48 1.89%
MUL_MAT(type_a=bf16,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 272.60 387.97 42.32%
MUL_MAT(type_a=q4_0,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1090.00 1160.00 6.42%
MUL_MAT(type_a=q4_1,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1020.00 1120.00 9.80%
MUL_MAT(type_a=q5_0,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 929.60 994.19 6.95%
MUL_MAT(type_a=q5_1,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 918.36 893.82 -2.67%
MUL_MAT(type_a=q8_0,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 698.54 680.85 -2.53%
MUL_MAT(type_a=mxfp4,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1100.00 1200.00 9.09%
MUL_MAT(type_a=q2_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 680.92 680.54 -0.06%
MUL_MAT(type_a=q3_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 568.59 598.92 5.33%
MUL_MAT(type_a=q4_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 534.39 531.48 -0.54%
MUL_MAT(type_a=q5_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 518.46 508.56 -1.91%
MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 733.91 723.39 -1.43%
MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 903.36 903.34 0.00%
MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1010.00 1030.00 1.98%
MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 825.89 881.97 6.79%
MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 926.92 938.02 1.20%
MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1520.00 1500.00 -1.32%
MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1160.00 1160.00 0.00%
MUL_MAT(type_a=iq4_nl,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1120.00 1100.00 -1.79%
MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 821.36 812.72 -1.05%
MUL_MAT(type_a=iq4_xs,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1120.00 1090.00 -2.68%
MUL_MAT(type_a=f32,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 282.07 281.72 -0.12%
MUL_MAT(type_a=f16,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 438.35 600.71 37.04%
MUL_MAT(type_a=bf16,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 438.54 313.33 -28.55%
MUL_MAT(type_a=q4_0,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1290.00 1280.00 -0.78%
MUL_MAT(type_a=q4_1,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1200.00 1260.00 5.00%
MUL_MAT(type_a=q5_0,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1070.00 1090.00 1.87%
MUL_MAT(type_a=q5_1,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 886.14 951.43 7.37%
MUL_MAT(type_a=q8_0,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 785.82 825.89 5.10%
MUL_MAT(type_a=mxfp4,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1230.00 1180.00 -4.07%
MUL_MAT(type_a=q2_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 732.25 745.44 1.80%
MUL_MAT(type_a=q3_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 720.29 708.72 -1.61%
MUL_MAT(type_a=q4_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 568.94 554.47 -2.54%
MUL_MAT(type_a=q5_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 528.08 530.77 0.51%
MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 769.58 801.41 4.14%
MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1130.00 1130.00 0.00%
MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1200.00 1260.00 5.00%
MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1100.00 1100.00 0.00%
MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1200.00 1270.00 5.83%
MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1560.00 1560.00 0.00%
MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1150.00 1150.00 0.00%
MUL_MAT(type_a=iq4_nl,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1130.00 1130.00 0.00%
MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1080.00 1040.00 -3.70%
MUL_MAT(type_a=iq4_xs,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 1340.00 1350.00 0.75%
MUL_MAT(type_a=f32,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 581.27 613.53 5.55%
MUL_MAT(type_a=f16,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 4460.00 4400.00 -1.35%
MUL_MAT(type_a=bf16,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 4460.00 4420.00 -0.90%
MUL_MAT(type_a=q4_0,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 7500.00 7590.00 1.20%
MUL_MAT(type_a=q4_1,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 6790.00 6800.00 0.15%
MUL_MAT(type_a=q5_0,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 2370.00 2390.00 0.84%
MUL_MAT(type_a=q5_1,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 6110.00 6080.00 -0.49%
MUL_MAT(type_a=q8_0,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 2450.00 2460.00 0.41%
MUL_MAT(type_a=mxfp4,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 2630.00 2610.00 -0.76%
MUL_MAT(type_a=q2_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 3180.00 3200.00 0.63%
MUL_MAT(type_a=q3_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 6440.00 6400.00 -0.62%
MUL_MAT(type_a=q4_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 6590.00 6510.00 -1.21%
MUL_MAT(type_a=q5_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 6510.00 6540.00 0.46%
MUL_MAT(type_a=q6_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 4550.00 4570.00 0.44%
MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 2760.00 2710.00 -1.81%
MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 6230.00 6190.00 -0.64%
MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 5890.00 5860.00 -0.51%
MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 2760.00 2740.00 -0.72%
MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 6760.00 6630.00 -1.92%
MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 3970.00 3900.00 -1.76%
MUL_MAT(type_a=iq4_nl,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 2530.00 2580.00 1.98%
MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 2580.00 2630.00 1.94%
MUL_MAT(type_a=iq4_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) 2500.00 2550.00 2.00%

@zhang-hui-yulo
Copy link
Contributor Author

Add the data in the real model, the weird thing is the perf difference deepseek bf16 and fp16 version.

DeepSeek-R1-Distill-Qwen-1.5B_f16
GPU Model Microbatch size Test t/s master t/s mmf_for_rdna3 Speedup
Radeon RX 7900 XTX qwen2 1.5B F16 1 pp512 172.61 172.55 1.00
Radeon RX 7900 XTX qwen2 1.5B F16 2 pp512 286.70 286.68 1.00
Radeon RX 7900 XTX qwen2 1.5B F16 4 pp512 471.29 472.59 1.00
Radeon RX 7900 XTX qwen2 1.5B F16 8 pp512 711.10 1149.95 1.62
Radeon RX 7900 XTX qwen2 1.5B F16 16 pp512 1373.28 957.58 0.70
Radeon RX 7900 XTX qwen2 1.5B F16 32 pp512 2534.73 2538.37 1.00
Radeon RX 7900 XTX qwen2 1.5B F16 64 pp512 3650.84 3655.20 1.00
Radeon RX 7900 XTX qwen2 1.5B F16 128 pp512 6815.18 6872.39 1.01
Radeon RX 7900 XTX qwen2 1.5B F16 256 pp512 13620.10 13813.76 1.01
Radeon RX 7900 XTX qwen2 1.5B F16 512 pp512 12553.66 12670.59 1.01
DeepSeek-R1-Distill-Qwen-1.5B_bf16
GPU Model Microbatch size Test t/s master t/s mmf_for_rdna3 Speedup
Radeon RX 7900 XTX qwen2 1.5B BF16 1 pp512 173.63 173.45 1.00
Radeon RX 7900 XTX qwen2 1.5B BF16 2 pp512 286.77 287.34 1.00
Radeon RX 7900 XTX qwen2 1.5B BF16 4 pp512 355.76 591.93 1.66
Radeon RX 7900 XTX qwen2 1.5B BF16 8 pp512 695.16 1164.28 1.67
Radeon RX 7900 XTX qwen2 1.5B BF16 16 pp512 1340.37 952.50 0.71
Radeon RX 7900 XTX qwen2 1.5B BF16 32 pp512 2523.77 2509.62 0.99
Radeon RX 7900 XTX qwen2 1.5B BF16 64 pp512 3615.21 3586.78 0.99
Radeon RX 7900 XTX qwen2 1.5B BF16 128 pp512 6774.06 6779.01 1.00
Radeon RX 7900 XTX qwen2 1.5B BF16 256 pp512 11010.66 10921.43 0.99
Radeon RX 7900 XTX qwen2 1.5B BF16 512 pp512 12319.22 12368.17 1.00
granite-3.1-1b-a400m-instruct_f16
GPU Model Microbatch size Test t/s master t/s mmf_for_rdna3 Speedup
Radeon RX 7900 XTX granitemoe ?B F16 1 pp512 289.94 289.23 1.00
Radeon RX 7900 XTX granitemoe ?B F16 2 pp512 108.62 449.11 4.13
Radeon RX 7900 XTX granitemoe ?B F16 4 pp512 187.51 811.02 4.33
Radeon RX 7900 XTX granitemoe ?B F16 8 pp512 312.58 1470.76 4.71
Radeon RX 7900 XTX granitemoe ?B F16 16 pp512 596.65 2003.47 3.36
Radeon RX 7900 XTX granitemoe ?B F16 32 pp512 978.60 3550.03 3.63
Radeon RX 7900 XTX granitemoe ?B F16 64 pp512 1731.90 5529.81 3.19
Radeon RX 7900 XTX granitemoe ?B F16 128 pp512 3085.14 8928.85 2.89
Radeon RX 7900 XTX granitemoe ?B F16 256 pp512 5242.57 12423.84 2.37
Radeon RX 7900 XTX granitemoe ?B F16 512 pp512 9171.05 15309.36 1.67
granite-3.1-1b-a400m-instruct_bf16
GPU Model Microbatch size Test t/s master t/s mmf_for_rdna3 Speedup
Radeon RX 7900 XTX granitemoe ?B BF16 1 pp512 288.69 289.18 1.00
Radeon RX 7900 XTX granitemoe ?B BF16 2 pp512 107.47 451.01 4.20
Radeon RX 7900 XTX granitemoe ?B BF16 4 pp512 179.41 834.92 4.65
Radeon RX 7900 XTX granitemoe ?B BF16 8 pp512 316.67 1473.22 4.65
Radeon RX 7900 XTX granitemoe ?B BF16 16 pp512 535.20 1976.66 3.69
Radeon RX 7900 XTX granitemoe ?B BF16 32 pp512 968.18 3461.44 3.58
Radeon RX 7900 XTX granitemoe ?B BF16 64 pp512 1714.48 5399.33 3.15
Radeon RX 7900 XTX granitemoe ?B BF16 128 pp512 3040.33 8514.31 2.80
Radeon RX 7900 XTX granitemoe ?B BF16 256 pp512 5872.70 12166.70 2.07
Radeon RX 7900 XTX granitemoe ?B BF16 512 pp512 9283.42 15351.12 1.65

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants