Skip to content

Conversation

@omkar-334
Copy link

@omkar-334 omkar-334 commented Dec 9, 2025

Description of the change

Added presets for Devstral Small 1.1

Reference

Github Issue - #2333
Model HF - https://huggingface.co/mistralai/Devstral-Small-2507
The Devstral HF contains only tekken.json but we need tokenizers.json format. There are 2 solutions for this -

  1. Use the conversion script from huggingface/transformers - This converts the tekken.json and loads Autotokenizer
  2. Use tokenizer from mistralai/Mistral-Small-24B-Base-2501 - referencing this issue from unsloth, Since Devstral is just finetuned, we can use the earlier model to obtain tokenizer (yes, this model has a tokenizers.json format)

I've gone ahead with the Option 2 and implemented it in the code.
I've updated presets in mistral_presets.py, convert_mistral.py, and convert_mistral_checkpoints.py.

Colab Notebook

I could not load it in colab since the model is 24B and runtime is crashing, but i will try it in Modal/Lambda and attach the results here.

Doubts

  1. This is the kaggle link for the model - https://www.kaggle.com/models/mistral-ai/devstral-small-2507, but I'm unsure on the kaggle_handle format
  2. cc @sachinprasadhs let me know if there any changes to make

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @omkar-334, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the Devstral Small 1.1 model into the system by adding its configuration and adapting the tokenizer conversion process. The primary challenge addressed was the absence of a tokenizers.json file for the Devstral model, which was resolved by leveraging the tokenizer from a compatible Mistral base model. This ensures the new model can be properly utilized within the existing framework.

Highlights

  • New Model Integration: Presets for the Devstral Small 1.1 model have been added, which is a 24B parameter model finetuned from Mistral-Small-3.1.
  • Tokenizer Resolution: A specific solution was implemented to use the tokenizer from "mistralai/Mistral-Small-24B-Base-2501" for Devstral Small 1.1, addressing the issue where the original Devstral HF model lacked a tokenizers.json file.
  • Configuration Updates: Relevant configuration files, including mistral_presets.py, convert_mistral.py, and convert_mistral_checkpoints.py, were updated to support the new model and its unique tokenizer loading requirements.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds presets for the Devstral Small 1.1 model. The changes correctly handle the tokenizer issue by using the tokenizer from a compatible base model. However, there's a minor typo in the model description within the preset file. Additionally, the code in convert_mistral.py and convert_mistral_checkpoints.py for handling the special case of the 'devstral' model can be improved by using a more robust check and avoiding hardcoded strings to enhance maintainability and readability. I've provided suggestions to address these points.

omkar-334 and others added 4 commits December 9, 2025 17:54
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@sachinprasadhs sachinprasadhs self-requested a review December 9, 2025 19:04
Copy link
Collaborator

@sachinprasadhs sachinprasadhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, please attach screenshots matching numerics, parameter count, tokenizer matching and output matching.

Comment on lines +224 to +230

if preset == "devstral_small_1_1":
hf_tokenizer = AutoTokenizer.from_pretrained(
"mistralai/Mistral-Small-24B-Base-2501"
)
else:
hf_tokenizer = AutoTokenizer.from_pretrained(hf_preset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use tekken.json since they have mentioned Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would need to add a dependancy on https://github.com/mistralai/mistral-common since transformers Autotokenizer does not support tekken.json

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, they have mentioned going forward they will only use tekken.json, what difference is between tokenizer.json from base model to devstral's tekken.json?

As I observed they included tokenizer.json also in the today's release of Devstral 2 model.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If adding dependency is required by observing other models of mistral, if they only have tekken.json like this model, then we can think of adding dependency.

Copy link
Author

@omkar-334 omkar-334 Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, they have mentioned going forward they will only use tekken.json, what difference is between tokenizer.json from base model to devstral's tekken.json?

As I observed they included tokenizer.json also in the today's release of Devstral 2 model.

I think they are including tokenizer.json so that people can continue using them until frameworks support tekken.json

This is the current state of their tokenizer formats for newer models -

  1. mistralai/Devstral-Small-2507 - tekken.json (Add Devstral Small 1.1 #2333)
  2. mistralai/Devstral-Small-2-24B-Instruct-2512 - tekken.json, tokenizer.json
  3. mistralai/Mistral-Small-24B-Base-2501 - tekken.json, tokenizer.json
  4. mistralai/Mistral-Small-3.1-24B-Base-2503 - tekken.json, tokenizer.json (Add Mistral-Small-3.1 #2334)
  5. mistralai/Ministral-3-8B-Base-2512 - tekken.json, tokenizer.json
  6. mistralai/Magistral-Small-2509 - tekken.json (Add Magistral to Keras-Hub #2314)
  7. mistralai/Voxtral-Mini-3B-2507 - tekken.json (Add Voxtral #2349)

Older Models -

  1. All of the mistral and mixtral models that are implemented in keras-hub include tokenizer.model and tokenizer.json.
  2. Hence, the keras-hub implementation loads the tokenizer using tokenizer.model file format.

My earlier changes do not work since we don't use the tokenizer.json format.
Going forward, we need to use tekken.json

transformers has started supporting the tekken tokenizer and has used the mistral-common as its backend for the Mistral models. (https://github.com/huggingface/transformers/blob/471d7ce9abbb3bc1b3bab673367378f9dbc3caac/src/transformers/tokenization_mistral_common.py)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would need to add a dependancy on https://github.com/mistralai/mistral-common since transformers Autotokenizer does not support tekken.json

Apparently, the latest version does support it. My bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants