-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix][Frontend] Update Llama 3.2 Chat Template to support Vision and Non-Tool use #10164
base: main
Are you sure you want to change the base?
[Bugfix][Frontend] Update Llama 3.2 Chat Template to support Vision and Non-Tool use #10164
Conversation
Signed-off-by: Travis Johnson <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
cc: @K-Mistele and @maxdebayser for review. Thanks! |
@@ -56,7 +84,7 @@ | |||
{{- "Given the following functions, please respond with a JSON for a function call " }} | |||
{{- "with its proper arguments that best answers the given prompt.\n\n" }} | |||
{{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }} | |||
{{- "Do not use variables.\n\n" }} | |||
{{- " Do not use variables.\n\n" }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these changes are related to vision inputs? (since this is only supposed to be activated for tool use, which is incompatible with vision inputs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, this change will only matter when using tool calling. It is just a small tweak to add a space after the period from the line above it
cc @K-Mistele see if this chat template still looks good to you for tool use. |
Thanks for the ping! I'm getting ready for some travel but can take a look while I'm on the plane tomorrow. |
Possibly related #9859 |
@@ -80,7 +120,7 @@ | |||
{{- "<|eot_id|>" }} | |||
{%- elif message.role == "tool" or message.role == "ipython" %} | |||
{{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }} | |||
{%- if message.content is mapping %} | |||
{%- if message.content is mapping or message.content is iterable %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had problems with is iterable
in the past because it's also True for strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nice catch! I pulled this change from the HF template, but I can remove it to keep the tool-calling support as it was.
{%- if not image_ns.has_images %} | ||
{{- "<|start_header_id|>system<|end_header_id|>\n\n" }} | ||
{%- if tools is not none %} | ||
{{- "Environment: ipython\n" }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine for JSON tool calling, but is it also true that the pythonic tool calling is incompatible with images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tool Calling does NOT work with images in the prompt as of now.
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
@@ -120,7 +125,7 @@ | |||
{{- "<|eot_id|>" }} | |||
{%- elif message.role == "tool" or message.role == "ipython" %} | |||
{{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }} | |||
{%- if message.content is mapping or message.content is iterable %} | |||
{%- if message.content is mapping %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can invert the condition? i.e. check whether it is a string like in L32
For our use-case, we want to serve the Llama 3.2 Vision models while also supporting non-vision requests that use tools. The current recommended/example chat template assumes tool use. It injects a tool-use system prompt even when tools are not requested and it does not support image inputs. This PR updates the template to support tool-use, vision inputs, and plain chat generation depending on the input conversation.
Examples below show the results of templating for a few different use-cases. This was done using the
meta-llama/Llama-3.2-11B-Vision-Instruct
model's tokenizer. "New" refers to the template in this PR, "Old" is the current vLLM example template frommain
, and "Base" is using the template from thetokenizer_config.json
in HF Hub.FIX #10324
Basic Chat
Input
Old
New
Base
System Prompt
Input
Old
New
Base
NB: vLLM transforms the system prompt's string content into a JSON object for mllama, but the base template assumes it will always be a string.
Image
Input
Old
New
Base
Image with System Prompt
Input
Old
New
Throws exception:
Base
Throws exception:
Tool Use Request
Input
Old
New
Base
NB: vLLM transforms the string content into a JSON object for mllama, but the base template assumes it will be a string when merging the user message with the tool info.