Skip to content

MistralCommonTokenizer does not match PreTrainedTokenizer #39841

@Fhrozen

Description

@Fhrozen

System Info

on docker
os: ubuntu 24.04
transformers: 4.55.0.dev0
mistral_common: 1.8.3

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Command to lauch container:

docker run --gpus all -p 8000:8000 --ipc=host vllm/vllm-openai:latest --model mistralai/Voxtral-Mini-3B-2507

Expected behavior

The output will finish in:

vllm-1  |   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer_group.py", line 24, in __init__  
vllm-1  |     self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config)
vllm-1  |                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1  |   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer.py", line 309, in get_tokenizer
vllm-1  |     tokenizer = get_cached_tokenizer(tokenizer)
vllm-1  |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1  |   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer.py", line 104, in get_cached_tokenizer
vllm-1  |     tokenizer_all_special_tokens = tokenizer.all_special_tokens
vllm-1  |                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1  | AttributeError: 'MistralCommonTokenizer' object has no attribute 'all_special_tokens'. Did you mean: '_all_special_ids'?

vLLM docker server uses the pretrained tokenizer format:
https://github.com/vllm-project/vllm/blob/49314869887e169be080201ab8bcda14e745c080/vllm/transformers_utils/tokenizer.py#L97-L101

Which must include: all_special_ids, all_special_tokens, all_special_tokens_extended default properties. However, MistralCommonTokenizer does not have implemented them. Is there a plan to standarize both tokenizers?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions