[Bug] custom chat template sends to model [{'type': 'text', 'text': '...'}] #10324

victorserbu2709 · 2024-11-14T12:06:50Z

Your current environment

The output of `python collect_env.py`

Collecting environment information...

PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35

Python version: 3.12.7 (main, Oct  1 2024, 08:52:12) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-118-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA H100 80GB HBM3
GPU 1: NVIDIA H100 80GB HBM3
GPU 2: NVIDIA H100 80GB HBM3
GPU 3: NVIDIA H100 80GB HBM3
GPU 4: NVIDIA H100 80GB HBM3
GPU 5: NVIDIA H100 80GB HBM3
GPU 6: NVIDIA H100 80GB HBM3
GPU 7: NVIDIA H100 80GB HBM3

Nvidia driver version: 555.42.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] flashinfer==0.1.6+cu121torch2.4
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] nvidia-nvjitlink-cu12==12.6.77
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] pyzmq==26.2.0
[pip3] torch==2.4.0
[pip3] torchvision==0.19.0
[pip3] transformers==4.45.2
[pip3] triton==3.0.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.3.post1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_REQUIRE_CUDA=cuda>=12.4 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=525,driver<526 brand=unknown,driver>=525,driver<526 brand=nvidia,driver>=525,driver<526 brand=nvidiartx,driver>=525,driver<526 brand=geforce,driver>=525,driver<526 brand=geforcertx,driver>=525,driver<526 brand=quadro,driver>=525,driver<526 brand=quadrortx,driver>=525,driver<526 brand=titan,driver>=525,driver<526 brand=titanrtx,driver>=525,driver<526 brand=tesla,driver>=535,driver<536 brand=unknown,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=geforce,driver>=535,driver<536 brand=geforcertx,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=titan,driver>=535,driver<536 brand=titanrtx,driver>=535,driver<536
NVIDIA_DRIVER_CAPABILITIES=compute,utility
VLLM_USAGE_SOURCE=production-docker-image
CUDA_VERSION=12.4.1
LD_LIBRARY_PATH=/usr/local/lib/python3.12/dist-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
CUDA_MODULE_LOADING=LAZY

Model Input Dumps

prompt: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 14 Nov 2024\n\n[{'type': 'text', 'text': 'you are a helpful assistant'}]<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n[{'type': 'text', 'text': 'hello\\n'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n", params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=131004, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), guided_decoding=GuidedDecodingParams(json=None, regex=None, choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None)

🐛 Describe the bug

Hello.
I created a simple container image that contains latest tool_chat_template_llama3.2_json.jinja

FROM docker.io/vllm/vllm-openai:v0.6.3.post1
COPY tool_chat_template_llama3.2_json.jinja vllm-workspace/tool_chat_template_llama3.2_json.jinja

The container is started using

localhost/vllm/vllm-openai:v0.6.3.post1-tools \
  --model neuralmagic/Llama-3.2-90B-Vision-Instruct-FP8-dynamic \
  --tensor-parallel-size 8 \
  --served-model-name "Llama3.2 90B" \
  --trust-remote-code \
  --gpu-memory-utilization 0.95 \
  --distributed-executor-backend mp \
  --enforce-eager \
  --max-num-seqs 2 \
  --limit-mm-per-prompt image=5 \
  --tool-call-parser llama3_json --chat-template /vllm-workspace/examples/tool_chat_template_llama3.2_json.jinja --enable-auto-tool-choice

Vllm openai receives following request

curl -v http://localhost:8000/v1/chat/completions -H 'content-type: application/json' --data '{"stream": false, "model": "Llama3.2 90B", "messages": [{"role": "system", "content": "you are a helpful assistant"}, {"role": "user", "content": "hello\n"}]}'

but in vllm logs i see user<|end_header_id|>\n\n[{'type': 'text', 'text': 'hello\n'}]<|eot_id|

INFO 11-14 03:51:42 logger.py:37] Received request chat-585357994ead43ab8d485844b632d641: prompt: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 14 Nov 2024\n\n[{'type': 'text', 'text': 'you are a helpful assistant'}]<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n[{'type': 'text', 'text': 'hello\\n'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n", params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=131004, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), guided_decoding=GuidedDecodingParams(json=None, regex=None, choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None), prompt_token_ids: [128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696, 25, 220, 975, 4723, 220, 2366, 19, 271, 58, 13922, 1337, 1232, 364, 1342, 518, 364, 1342, 1232, 364, 9514, 527, 264, 11190, 18328, 8439, 60, 128009, 128006, 882, 128007, 271, 58, 13922, 1337, 1232, 364, 1342, 518, 364, 1342, 1232, 364, 15339, 1734, 8439, 60, 128009, 128006, 78191, 128007, 271], lora_request: None, prompt_adapter_request: None.

However, if i remove only

--chat-template /vllm-workspace/examples/tool_chat_template_llama3.2_json.jinja

from vllm start options, the model receives expected text (user<|end_header_id|>\n\nhello\n<|eot_id|)

curl -v http://localhost:8000/v1/chat/completions -H 'content-type: application/json' --data '{"stream": false, "model": "Llama3.2 90B", "messages": [{"role": "system", "content": "you are a helpful assistant"}, {"role": "user", "content": "hello\n"}]}'

INFO 11-14 04:00:43 logger.py:37] Received request chat-fb75d50bb91b4eb68814b86dbe0d4833: prompt: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 14 Nov 2024\n\n[{'type': 'text', 'text': 'you are a helpful assistant'}]<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nhello\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n", params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=131017, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), guided_decoding=GuidedDecodingParams(json=None, regex=None, choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None), prompt_token_ids: [128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696, 25, 220, 975, 4723, 220, 2366, 19, 271, 58, 13922, 1337, 1232, 364, 1342, 518, 364, 1342, 1232, 364, 9514, 527, 264, 11190, 18328, 8439, 60, 128009, 128006, 882, 128007, 271, 15339, 198, 128009, 128006, 78191, 128007, 271], lora_request: None, prompt_adapter_request: None.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-11-14T12:33:04Z

Can you try out #10164?

victorserbu2709 · 2024-11-14T13:13:35Z

Thank you @DarkLight1337 , it works

victorserbu2709 added the bug Something isn't working label Nov 14, 2024

DarkLight1337 linked a pull request Nov 14, 2024 that will close this issue

[Bugfix][Frontend] Update Llama 3.2 Chat Template to support Vision and Non-Tool use #10164

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] custom chat template sends to model [{'type': 'text', 'text': '...'}] #10324

[Bug] custom chat template sends to model [{'type': 'text', 'text': '...'}] #10324

victorserbu2709 commented Nov 14, 2024 •

edited

Loading

DarkLight1337 commented Nov 14, 2024

victorserbu2709 commented Nov 14, 2024

[Bug] custom chat template sends to model [{'type': 'text', 'text': '...'}] #10324

[Bug] custom chat template sends to model [{'type': 'text', 'text': '...'}] #10324

Comments

victorserbu2709 commented Nov 14, 2024 • edited Loading

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

DarkLight1337 commented Nov 14, 2024

victorserbu2709 commented Nov 14, 2024

victorserbu2709 commented Nov 14, 2024 •

edited

Loading