Results of llama-batched-bench skewed with MoE models? #15306

enh6 · 2025-08-14T08:53:09Z

enh6
Aug 14, 2025

It seems llama-batched-bench just uses token id 0 as inputs for a batch. For MoE models, same input tokens in a batch will activate the same experts. While in reality different inputs in a batch may activate different experts. The performance should be different?
Maybe we can use random tokens like llama-bench?

Answered by ggerganov

Aug 14, 2025

Yes, this is a mistake and should be fixed. Would you like to submit a PR?

View full answer

ggerganov · 2025-08-14T10:20:06Z

ggerganov
Aug 14, 2025
Maintainer

Yes, this is a mistake and should be fixed. Would you like to submit a PR?

1 reply

enh6 Aug 15, 2025
Author

Yeah, I can submit a PR later.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Results of llama-batched-bench skewed with MoE models? #15306

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Results of llama-batched-bench skewed with MoE models? #15306

Uh oh!

enh6 Aug 14, 2025

Replies: 1 comment · 1 reply

Uh oh!

ggerganov Aug 14, 2025 Maintainer

Uh oh!

enh6 Aug 15, 2025 Author

enh6
Aug 14, 2025

Replies: 1 comment 1 reply

ggerganov
Aug 14, 2025
Maintainer

enh6 Aug 15, 2025
Author