-
It seems llama-batched-bench just uses token id 0 as inputs for a batch. For MoE models, same input tokens in a batch will activate the same experts. While in reality different inputs in a batch may activate different experts. The performance should be different? |
Beta Was this translation helpful? Give feedback.
Answered by
ggerganov
Aug 14, 2025
Replies: 1 comment 1 reply
-
Yes, this is a mistake and should be fixed. Would you like to submit a PR? |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
enh6
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Yes, this is a mistake and should be fixed. Would you like to submit a PR?