Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions keras_hub/src/models/mistral/mistral_presets.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,12 @@
},
"kaggle_handle": "kaggle://keras/mistral/keras/mistral_0.3_instruct_7b_en/1",
},
"devstral_small_1_1": {
"metadata": {
"description": "Devstral Small 1.1 24B finetuned base model",
"params": 23572403200,
"path": "devstral_small_1_1",
},
# "kaggle_handle": "kaggle://keras/mistral/keras/devstral_small_1_1/1",
},
}
2 changes: 2 additions & 0 deletions keras_hub/src/utils/transformers/convert_mistral.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,4 +113,6 @@ def convert_weights(backbone, loader, transformers_config):


def convert_tokenizer(cls, preset, **kwargs):
if preset == "devstral_small_1_1":
preset = "mistralai/Mistral-Small-24B-Base-2501"
return cls(get_file(preset, "tokenizer.model"), **kwargs)
18 changes: 16 additions & 2 deletions tools/checkpoint_conversion/convert_mistral_checkpoints.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
"mistral_instruct_7b_en": "mistralai/Mistral-7B-Instruct-v0.1",
"mistral_0.2_instruct_7b_en": "mistralai/Mistral-7B-Instruct-v0.2",
"mistral_0.3_instruct_7b_en": "mistralai/Mistral-7B-Instruct-v0.3",
"devstral_small_1_1": "mistralai/Devstral-Small-2507",
}

FLAGS = flags.FLAGS
Expand Down Expand Up @@ -220,7 +221,13 @@ def main(_):
try:
# === Load the Huggingface model ===
hf_model = MistralForCausalLM.from_pretrained(hf_preset)
hf_tokenizer = AutoTokenizer.from_pretrained(hf_preset)

if preset == "devstral_small_1_1":
hf_tokenizer = AutoTokenizer.from_pretrained(
"mistralai/Mistral-Small-24B-Base-2501"
)
else:
hf_tokenizer = AutoTokenizer.from_pretrained(hf_preset)
Comment on lines +224 to +230
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use tekken.json since they have mentioned Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would need to add a dependancy on https://github.com/mistralai/mistral-common since transformers Autotokenizer does not support tekken.json

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, they have mentioned going forward they will only use tekken.json, what difference is between tokenizer.json from base model to devstral's tekken.json?

As I observed they included tokenizer.json also in the today's release of Devstral 2 model.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If adding dependency is required by observing other models of mistral, if they only have tekken.json like this model, then we can think of adding dependency.

Copy link
Author

@omkar-334 omkar-334 Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, they have mentioned going forward they will only use tekken.json, what difference is between tokenizer.json from base model to devstral's tekken.json?

As I observed they included tokenizer.json also in the today's release of Devstral 2 model.

I think they are including tokenizer.json so that people can continue using them until frameworks support tekken.json

This is the current state of their tokenizer formats for newer models -

  1. mistralai/Devstral-Small-2507 - tekken.json (Add Devstral Small 1.1 #2333)
  2. mistralai/Devstral-Small-2-24B-Instruct-2512 - tekken.json, tokenizer.json
  3. mistralai/Mistral-Small-24B-Base-2501 - tekken.json, tokenizer.json
  4. mistralai/Mistral-Small-3.1-24B-Base-2503 - tekken.json, tokenizer.json (Add Mistral-Small-3.1 #2334)
  5. mistralai/Ministral-3-8B-Base-2512 - tekken.json, tokenizer.json
  6. mistralai/Magistral-Small-2509 - tekken.json (Add Magistral to Keras-Hub #2314)
  7. mistralai/Voxtral-Mini-3B-2507 - tekken.json (Add Voxtral #2349)

Older Models -

  1. All of the mistral and mixtral models that are implemented in keras-hub include tokenizer.model and tokenizer.json.
  2. Hence, the keras-hub implementation loads the tokenizer using tokenizer.model file format.

My earlier changes do not work since we don't use the tokenizer.json format.
Going forward, we need to use tekken.json

transformers has started supporting the tekken tokenizer and has used the mistral-common as its backend for the Mistral models. (https://github.com/huggingface/transformers/blob/471d7ce9abbb3bc1b3bab673367378f9dbc3caac/src/transformers/tokenization_mistral_common.py)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, Thanks for putting all the details here. if Autotokenizer is supporting tekken.json loading, we can handle with if else condition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we can do that for hf_tokenizer, but how to support it in Keras-hub? Should we write a NewMistralTokenizer for tekken.json based models?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add the underlying TikTokenizer implementation to Keras Hub here https://github.com/keras-team/keras-hub/tree/master/keras_hub/src/tokenizers unless NewMistralTokenizer is different than the TikTokenizer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And an option to use base TikTokenizer in the Mistral model here https://github.com/keras-team/keras-hub/blob/master/keras_hub/src/models/mistral/mistral_tokenizer.py.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll work on that and update this. Till then I'll mark this PR as draft.
thanks !

hf_model.eval()
print("\n-> Huggingface model and tokenizer loaded")

Expand All @@ -239,7 +246,14 @@ def main(_):
)
keras_hub_backbone = MistralBackbone(**backbone_kwargs)

keras_hub_tokenizer = MistralTokenizer.from_preset(f"hf://{hf_preset}")
if "devstral" in hf_preset.lower():
keras_hub_tokenizer = MistralTokenizer.from_preset(
"hf://mistralai/Mistral-Small-24B-Base-2501"
)
else:
keras_hub_tokenizer = MistralTokenizer.from_preset(
f"hf://{hf_preset}"
)
print("\n-> Keras 3 model and tokenizer loaded.")

# === Port the weights ===
Expand Down
Loading