[webgpu] Use 64 as the workgroup size of DP4AMatMulQuantize #24129

qjia7 · 2025-03-21T10:11:08Z

Usually, workgroup size 1 is not a good option for compute shader. It means that only one thread is active in one workgroup. This PR uses 64 as the workgroup size of DP4AMatMulQuantize. Notice that half of time of DP4AMatMulQuantize is reduced in phi4 on NV RTX 2000 Ada.

Ususally, workgroup size 1 is not a good option for compute shader. It means that only one thread is active in one workgroup. This PR uses 64 as the workgroup size of DP4AMatMulQuantize. Notice that half of time of DP4AMatMulQuantize is reduced in phi4 on NV RTX 2000 Ada.

qjia7 requested review from sushraja-msft and guschmue March 21, 2025 10:12

guschmue added the ep:WebGPU ort-web webgpu provider label Mar 21, 2025

qjia7 added 3 commits March 22, 2025 11:02

simplify

54dced4

add subgroup 32 support

20032c7

add subgroup 16 support

3e0451d

qjia7 mentioned this pull request Mar 26, 2025

Make quantize shader work for all gpus #23676

Closed

Merge branch 'main' into matmul_quantize

3f79803

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[webgpu] Use 64 as the workgroup size of DP4AMatMulQuantize #24129

[webgpu] Use 64 as the workgroup size of DP4AMatMulQuantize #24129

qjia7 commented Mar 21, 2025

[webgpu] Use 64 as the workgroup size of DP4AMatMulQuantize #24129

Are you sure you want to change the base?

[webgpu] Use 64 as the workgroup size of DP4AMatMulQuantize #24129

Conversation

qjia7 commented Mar 21, 2025