Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix][Frontend] Update Llama 3.2 Chat Template to support Vision and Non-Tool use #10164

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

tjohnson31415
Copy link
Contributor

@tjohnson31415 tjohnson31415 commented Nov 8, 2024

For our use-case, we want to serve the Llama 3.2 Vision models while also supporting non-vision requests that use tools. The current recommended/example chat template assumes tool use. It injects a tool-use system prompt even when tools are not requested and it does not support image inputs. This PR updates the template to support tool-use, vision inputs, and plain chat generation depending on the input conversation.

Examples below show the results of templating for a few different use-cases. This was done using the meta-llama/Llama-3.2-11B-Vision-Instruct model's tokenizer. "New" refers to the template in this PR, "Old" is the current vLLM example template from main, and "Base" is using the template from the tokenizer_config.json in HF Hub.

FIX #10324

Basic Chat

Input

[
    {
      "role": "user",
      "content": "What is vLLM?\n"
    }
]

Old

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question.<|eot_id|><|start_header_id|>user<|end_header_id|>

[{'type': 'text', 'text': 'What is vLLM?\n'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>


New

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

What is vLLM?<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Base

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

What is vLLM?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>


System Prompt

Input

[
    {
        "role": "system",
        "content": "You are an expert on vLLM.",
    },
    {
      "role": "user",
      "content": "What is vLLM?\n",dd
    }
]

Old

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

[{'type': 'text', 'text': 'You are an expert on vLLM.'}]<|eot_id|><|start_header_id|>user<|end_header_id|>

[{'type': 'text', 'text': 'What is vLLM?\n'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>


New

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

<|eot_id|><|start_header_id|>system<|end_header_id|>

You are an expert on vLLM.<|eot_id|><|start_header_id|>user<|end_header_id|>

What is vLLM?<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Base

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

[{'type': 'text', 'text': 'You are an expert on vLLM.'}]<|eot_id|><|start_header_id|>user<|end_header_id|>

What is vLLM?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>


NB: vLLM transforms the system prompt's string content into a JSON object for mllama, but the base template assumes it will always be a string.

Image

Input

[
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": image_url}},
        {"type": "text", "text": "Describe the image."},
      ]
    }
]

Old

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question.<|eot_id|><|start_header_id|>user<|end_header_id|>

[{'type': 'image'}, {'type': 'text', 'text': 'Describe the image.'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>


New

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>Describe the image.<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Base

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>Describe the image.<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Image with System Prompt

Input

[
    {
        "role": "system",
        "content": "You are a helpful assistant model.",
    },
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": image_url}},
        {"type": "text", "text": "Describe the image."},
      ]
    }
]

Old

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

[{'type': 'text', 'text': 'You are a helpful assistant model.'}]<|eot_id|><|start_header_id|>user<|end_header_id|>

[{'type': 'image'}, {'type': 'text', 'text': 'Describe the image.'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>


New

Throws exception:

TemplateError: Prompting with images is incompatible with system messages and tool use.

Base

Throws exception:

TemplateError: Prompting with images is incompatible with system messages and tool use.
Tool Use Request

Input

messages = [
    {
      "role": "user",
      "content": "What is the weather in San Fransisco?",
    }
]

get_current_weather = {
    "type": "function",
    "function": {
        "name": "get_current_temperature",
        "description": "Gets the temperature at a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The location to get the temperature for"
                }
            },
            "required": [
                "location"
            ]
        }
    }
}
tools = [get_current_weather]

Old

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

You have access to the following functions. To call a function, please respond with JSON for a function call.Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.Do not use variables.

{
    "type": "function",
    "function": {
        "name": "get_current_temperature",
        "description": "Gets the temperature at a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The location to get the temperature for"
                }
            },
            "required": [
                "location"
            ]
        }
    }
}

You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question.<|eot_id|><|start_header_id|>user<|end_header_id|>

[{'type': 'text', 'text': 'What is the weather in San Fransisco?'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>


New

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

You have access to the following functions. To call a function, please respond with JSON for a function call. Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

{
    "type": "function",
    "function": {
        "name": "get_current_temperature",
        "description": "Gets the temperature at a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The location to get the temperature for"
                }
            },
            "required": [
                "location"
            ]
        }
    }
}

You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question.<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the weather in San Fransisco?<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Base

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.Do not use variables.

{
    "type": "function",
    "function": {
        "name": "get_current_temperature",
        "description": "Gets the temperature at a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The location to get the temperature for"
                }
            },
            "required": [
                "location"
            ]
        }
    }
}

[{'type': 'text', 'text': 'What is the weather in San Fransisco?'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>


NB: vLLM transforms the string content into a JSON object for mllama, but the base template assumes it will be a string when merging the user message with the tool info.

Signed-off-by: Travis Johnson <[email protected]>
Copy link

github-actions bot commented Nov 8, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@tjohnson31415
Copy link
Contributor Author

cc: @K-Mistele and @maxdebayser for review. Thanks!

@@ -56,7 +84,7 @@
{{- "Given the following functions, please respond with a JSON for a function call " }}
{{- "with its proper arguments that best answers the given prompt.\n\n" }}
{{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
{{- "Do not use variables.\n\n" }}
{{- " Do not use variables.\n\n" }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these changes are related to vision inputs? (since this is only supposed to be activated for tool use, which is incompatible with vision inputs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this change will only matter when using tool calling. It is just a small tweak to add a space after the period from the line above it

@DarkLight1337
Copy link
Member

cc @K-Mistele see if this chat template still looks good to you for tool use.

@K-Mistele
Copy link
Contributor

cc @K-Mistele see if this chat template still looks good to you for tool use.

Thanks for the ping! I'm getting ready for some travel but can take a look while I'm on the plane tomorrow.

@K-Mistele
Copy link
Contributor

Possibly related #9859

@@ -80,7 +120,7 @@
{{- "<|eot_id|>" }}
{%- elif message.role == "tool" or message.role == "ipython" %}
{{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
{%- if message.content is mapping %}
{%- if message.content is mapping or message.content is iterable %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had problems with is iterable in the past because it's also True for strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice catch! I pulled this change from the HF template, but I can remove it to keep the tool-calling support as it was.

{%- if not image_ns.has_images %}
{{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
{%- if tools is not none %}
{{- "Environment: ipython\n" }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for JSON tool calling, but is it also true that the pythonic tool calling is incompatible with images?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -120,7 +125,7 @@
{{- "<|eot_id|>" }}
{%- elif message.role == "tool" or message.role == "ipython" %}
{{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
{%- if message.content is mapping or message.content is iterable %}
{%- if message.content is mapping %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can invert the condition? i.e. check whether it is a string like in L32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] custom chat template sends to model [{'type': 'text', 'text': '...'}]
4 participants