Vulkan: Tuning warptile for Mali GPU Performance #13483

rmatif · 2025-05-12T15:20:30Z

rmatif
May 12, 2025

I'm working on Local Diffusion, using stable-diffusion.cpp on Android. Vulkan performance on Mali GPUs is currently very poor

Disabling mul_mat_l in ggml-vulkan.cpp helped a bit. I then tried modifying the m_warptile and s_warptile values. Reducing the first element (m tile?) from 128 to 64 gave a ~3x inference speedup, but the output images were garbage/noisy.

Questions:

How can I correctly tune m_warptile and s_warptile for Mali GPUs to get both performance and correct output?
Are there specific alignment requirements for these values on Mali?
Do the matmul shaders need to be adapted if these warptile values are changed?

Looking for guidance to improve Vulkan matmul performance on Mali without breaking correctness

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan: Tuning warptile for Mali GPU Performance #13483

{{title}}

Replies: 0 comments

Select a reply

Vulkan: Tuning warptile for Mali GPU Performance #13483

rmatif May 12, 2025

Replies: 0 comments

rmatif
May 12, 2025