You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working on Local Diffusion, using stable-diffusion.cpp on Android. Vulkan performance on Mali GPUs is currently very poor
Disabling mul_mat_l in ggml-vulkan.cpp helped a bit. I then tried modifying the m_warptile and s_warptile values. Reducing the first element (m tile?) from 128 to 64 gave a ~3x inference speedup, but the output images were garbage/noisy.
Questions:
How can I correctly tune m_warptile and s_warptile for Mali GPUs to get both performance and correct output?
Are there specific alignment requirements for these values on Mali?
Do the matmul shaders need to be adapted if these warptile values are changed?
Looking for guidance to improve Vulkan matmul performance on Mali without breaking correctness
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm working on Local Diffusion, using stable-diffusion.cpp on Android. Vulkan performance on Mali GPUs is currently very poor
Disabling
mul_mat_l
inggml-vulkan.cpp
helped a bit. I then tried modifying them_warptile
ands_warptile
values. Reducing the first element (m tile?) from 128 to 64 gave a ~3x inference speedup, but the output images were garbage/noisy.Questions:
m_warptile
ands_warptile
for Mali GPUs to get both performance and correct output?Looking for guidance to improve Vulkan matmul performance on Mali without breaking correctness
Beta Was this translation helpful? Give feedback.
All reactions