llama-bench and llama-cli inference differs #13273
Unanswered
afsara-ben
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When I run
llama-bench
I get an eval rate of 88t/s but the same prompt len and --n usingllama-cli
gives me 85 t/s. Is there a reason for the difference?Beta Was this translation helpful? Give feedback.
All reactions