response_format with regex does not seem to work #2423

aymeric-roucher · 2024-07-26T11:37:32Z

Describe the bug

When passing a response_format of type regex to chat_completion, the output does not always respect the format.

Reproduction

This does not follow the regex:

from huggingface_hub import InferenceClient

client = InferenceClient("meta-llama/Meta-Llama-3.1-8B-Instruct")

output = client.chat_completion([{"role": "user", "content": "ok"}], response_format={"type": "regex", "value": ".+?\n\nCode:+?"})

print(output.choices[0].message.content)

But going through OpenAI Messages API does work:

url = "https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3.1-8B-Instruct/v1"

from openai import OpenAI

# init the client but point it to TGI
client = OpenAI(
    base_url=url,
    api_key="*"
)

chat_completion = client.chat.completions.create(
    model="tgi",
    messages=[
        {"role": "user", "content": "ok"}
    ],
    response_format={"type": "regex", "value": ".+?\n\nCode:+?"}
)
print(chat_completion.choices[0].message.content)

Logs

No response

System info

Copy-and-paste the text below in your GitHub issue.

- huggingface_hub version: 0.24.2
- Platform: macOS-14.1-arm64-arm-64bit
- Python version: 3.10.14
- Running in iPython ?: Yes
- iPython shell: ZMQInteractiveShell
- Running in notebook ?: Yes
- Running in Google Colab ?: No
- Token path ?: /Users/aymeric/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: m-ric
- Configured git credential helpers: osxkeychain, store
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.3.0
- Jinja2: 3.1.4
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 10.3.0
- hf_transfer: N/A
- gradio: 4.38.1
- tensorboard: N/A
- numpy: 1.26.4
- pydantic: 2.7.1
- aiohttp: 3.9.5
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /Users/aymeric/.cache/huggingface/hub
- HF_ASSETS_CACHE: /Users/aymeric/.cache/huggingface/assets
- HF_TOKEN_PATH: /Users/aymeric/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10

The text was updated successfully, but these errors were encountered:

aymeric-roucher · 2024-07-26T11:38:42Z

cc @Wauplin this follows up on #2383.

Wauplin · 2024-07-30T08:18:33Z

Hi @aymeric-roucher, I've made some tests with the reproducible example you've shared. I do think this is a cache issue that has to be fixed either in Inference API or TGI directly. The only difference between OpenAI and InferenceClient clients in your example is that in the first case model="tgi" is passed while in the second model="meta-llama/Meta-Llama-3.1-8B-Instruct" is passed. When I use InferenceClient and also pass model="tgi" (with base_url pointing to Llama 3.1), then the regex works as expected.

I also tested the failing case with "ok 2" instead of "ok" and the answer was correctly using the regex.
Finally, I tested to disable the cache with InferenceClient(..., headers={"x-use-cache": "0"} and the answer was correctly using the regex, which implies the problem comes from an invalid cached answer.

I then tried to reproduce the error by sending a random string twice. First time without a regex (to warm-up the cache) and second time with the regex (to test if it would reuse the cache). I did not manage to reproduce the bug with this technique.

I don't know what is specific with the "ok" message and how it got an invalid answer cached. This is not an huggingace_hub.InferenceClient bug but it still needs to be investigated server-side cc @Narsil @OlivierDehaene

Narsil · 2024-07-30T08:29:15Z

The cache key is computed by hashing the entire input (including parameters so including the regex).

This is unlikely to be a cache issue.
The invalid answer might have been cached, but it seems more a case of your regex formatting was ignored.

I think it's possible that some regex can be ignored in some circumstances (basically to avoid critical failure).
This however is not intended behavior.
If you're able to reproduce that'd help a lot.

Wauplin · 2024-07-30T08:41:55Z

@Narsil I've been able to reproduce it with cache disabled by repeating the exact same request until it fails:

from huggingface_hub import InferenceClient

client = InferenceClient("meta-llama/Meta-Llama-3.1-8B-Instruct", headers={"x-use-cache": "0"})

for i in range(50):
    output = client.chat_completion(
        [{"role": "user", "content": "ok"}],
        response_format={"type": "regex", "value": ".+?\n\nCode:+?"},
    )
    answer = output.choices[0].message.content

    if "Code:" in answer:
        print(f"Iteration {i}: OK")
    else:
        print(f"Iteration {i}: NOT OK\n{answer}")
        break

which outputs:

Iteration 0: OK
Iteration 1: OK
Iteration 2: OK
Iteration 3: OK
Iteration 4: OK
Iteration 5: OK
Iteration 6: NOT OK
It seems like you're ready to chat. Is there something specific you'd like to talk about or ask about? I can help with any questions or just have a conversation if you'd like. What's on your mind?  Would you like some suggestions? I could share some interesting topics if you are interested? like how the processor works, Some conspiracy theories or fun facts. let me know! :). There are some space news updates. Or there have been some epilepsy clues about satellites, other

Though it's not reproducing the error 100% of the time, it's still happening once every few requests.

Narsil · 2024-07-30T17:37:36Z

Calling in @drbh on this. I know it can happen, I didn't expect 6 iteration would be enough to trigger.

aymeric-roucher added the bug Something isn't working label Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

response_format with regex does not seem to work #2423

response_format with regex does not seem to work #2423

aymeric-roucher commented Jul 26, 2024

aymeric-roucher commented Jul 26, 2024

Wauplin commented Jul 30, 2024

Narsil commented Jul 30, 2024

Wauplin commented Jul 30, 2024

Narsil commented Jul 30, 2024

response_format with regex does not seem to work #2423

response_format with regex does not seem to work #2423

Comments

aymeric-roucher commented Jul 26, 2024

Describe the bug

Reproduction

Logs

System info

aymeric-roucher commented Jul 26, 2024

Wauplin commented Jul 30, 2024

Narsil commented Jul 30, 2024

Wauplin commented Jul 30, 2024

Narsil commented Jul 30, 2024