Skip to content

Conversation

@joeldushouyu
Copy link

Summary

This pr allows running vision models(tested with gemma4b) on Hexagon NPU.

For now, it only supports using the CDSP for doing fp16xfp32.
Note: I am fully aware that the current FP16xFP32 implementation is not the most optimal. For example, we can easily reduce unnecessary data repetition by using the vtcm as cache, but I think that should probably go into a separate pr that focuses solely on optimization.

Test

I used the f16 vision weights and q40 language weights from unsloth.

1. build hexagon in docker

cmake --preset arm64-android-snapdragon-release -B build-snapdragon
cmake --build build-snapdragon
cmake --install build-snapdragon --prefix pkg-adb/llama.cpp

2. push the weights to phone(tested with samsung s25 ultra

adb push mmproj-F16.gguf data/local/tmp/gguf
adb push gemma-3-4b-it-Q4_0.gguf /data/local/tmp/gguf
adb push hydro_1.png /data/local/tmp/gguf   #Image for testing 

3. run the run-mtmd script

E=1 NDEV=1 D=HTP0 MTMD_DEVICE=HTP0 PROF=1 V=1 M=gemma-3-4b-it-Q4_0.gguf MMPROJ=mmproj-F16.gguf IMG=hydro_1.png ./scripts/snapdragon/adb/run-mtmd.sh -p '"What is in this image."'

@joeldushouyu joeldushouyu changed the title Mtmd hexagon ggml-hexagon: mm for mtmd Dec 9, 2025
@joeldushouyu joeldushouyu marked this pull request as ready for review December 9, 2025 22:27
@github-actions github-actions bot added script Script related ggml changes relating to the ggml tensor library for machine learning labels Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning script Script related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant