[Bug]: FusedMoE kernel performance depends on input prompt length while decoding #10313
Open
1 task done
Labels
bug
Something isn't working
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
Environment
Description
How to resolve
Bug found with @Byeong-Chan
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: