[Bug]: Get meaningless output when run long context inference of Qwen2.5 model with vllm>=0.6.3 #10298

piamo · 2024-11-13T15:56:09Z

The output of `python collect_env.py`

Your output of `python collect_env.py` here

models: Qwen2.5-Coder-7B-Instcut, Qwen2.5-7B-Instruct
vllm: 0.6.3
input token: >8000 tokens

I have tested vllm 0.6.0~0.6.2, 0.5.5, all old versions are just ok.

So this bug was introduced since 0.6.3

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

CHNtentes · 2024-11-14T01:16:09Z

same here, used qwen2.5-72b-instruct-awq and 10000 tokens input, the output is garbage

CHNtentes · 2024-11-14T01:23:16Z

downgraded to vllm 0.6.2 and it's much better

piamo added the bug Something isn't working label Nov 13, 2024

Provide feedback