Replies: 1 comment 3 replies
-
The problem is that Line 604 in 657b8a7 which does this: Lines 447 to 451 in 657b8a7 which only knows of these (text models): Lines 7 to 97 in 657b8a7 Support can be added by bypassing this for |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
On arm64 devices,
Q4_0
delivers the best speed.mmproj
can be converted into the datatypes available in the conversion script{f32,f16,bf16,q8_0,tq1_0,tq2_0}
.However, when I try
an error is thrown
Is there a way to quantize
mmproj
intoQ4_0
?Here are comparison results between ExecuTorch (mmproj in
Q4_0
) andllama.cpp
on Raspberry Pi 5.Prompt evaluation time is 2.55s vs 4.15s.
cc: @ngxson
Beta Was this translation helpful? Give feedback.
All reactions