-
Notifications
You must be signed in to change notification settings - Fork 311
Add Devstral Small 1.1 #2468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add Devstral Small 1.1 #2468
Conversation
Summary of ChangesHello @omkar-334, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates the Devstral Small 1.1 model into the system by adding its configuration and adapting the tokenizer conversion process. The primary challenge addressed was the absence of a Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds presets for the Devstral Small 1.1 model. The changes correctly handle the tokenizer issue by using the tokenizer from a compatible base model. However, there's a minor typo in the model description within the preset file. Additionally, the code in convert_mistral.py and convert_mistral_checkpoints.py for handling the special case of the 'devstral' model can be improved by using a more robust check and avoiding hardcoded strings to enhance maintainability and readability. I've provided suggestions to address these points.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
sachinprasadhs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, please attach screenshots matching numerics, parameter count, tokenizer matching and output matching.
|
|
||
| if preset == "devstral_small_1_1": | ||
| hf_tokenizer = AutoTokenizer.from_pretrained( | ||
| "mistralai/Mistral-Small-24B-Base-2501" | ||
| ) | ||
| else: | ||
| hf_tokenizer = AutoTokenizer.from_pretrained(hf_preset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use tekken.json since they have mentioned Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would need to add a dependancy on https://github.com/mistralai/mistral-common since transformers Autotokenizer does not support tekken.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, they have mentioned going forward they will only use tekken.json, what difference is between tokenizer.json from base model to devstral's tekken.json?
As I observed they included tokenizer.json also in the today's release of Devstral 2 model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If adding dependency is required by observing other models of mistral, if they only have tekken.json like this model, then we can think of adding dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, they have mentioned going forward they will only use tekken.json, what difference is between tokenizer.json from base model to devstral's tekken.json?
As I observed they included
tokenizer.jsonalso in the today's release of Devstral 2 model.
I think they are including tokenizer.json so that people can continue using them until frameworks support tekken.json
This is the current state of their tokenizer formats for newer models -
- mistralai/Devstral-Small-2507 -
tekken.json(Add Devstral Small 1.1 #2333) - mistralai/Devstral-Small-2-24B-Instruct-2512 -
tekken.json,tokenizer.json - mistralai/Mistral-Small-24B-Base-2501 -
tekken.json,tokenizer.json - mistralai/Mistral-Small-3.1-24B-Base-2503 -
tekken.json,tokenizer.json(Add Mistral-Small-3.1 #2334) - mistralai/Ministral-3-8B-Base-2512 -
tekken.json,tokenizer.json - mistralai/Magistral-Small-2509 -
tekken.json(Add Magistral to Keras-Hub #2314) - mistralai/Voxtral-Mini-3B-2507 -
tekken.json(Add Voxtral #2349)
Older Models -
- All of the mistral and mixtral models that are implemented in
keras-hubincludetokenizer.modelandtokenizer.json. - Hence, the
keras-hubimplementation loads the tokenizer usingtokenizer.modelfile format.
My earlier changes do not work since we don't use the tokenizer.json format.
Going forward, we need to use tekken.json
transformers has started supporting the tekken tokenizer and has used the mistral-common as its backend for the Mistral models. (https://github.com/huggingface/transformers/blob/471d7ce9abbb3bc1b3bab673367378f9dbc3caac/src/transformers/tokenization_mistral_common.py)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would need to add a dependancy on
https://github.com/mistralai/mistral-commonsince transformersAutotokenizerdoes not supporttekken.json
Apparently, the latest version does support it. My bad.
Description of the change
Added presets for Devstral Small 1.1
Reference
Github Issue - #2333
Model HF - https://huggingface.co/mistralai/Devstral-Small-2507
The Devstral HF contains only
tekken.jsonbut we needtokenizers.jsonformat. There are 2 solutions for this -huggingface/transformers- This converts thetekken.jsonand loadsAutotokenizermistralai/Mistral-Small-24B-Base-2501- referencing this issue from unsloth, Since Devstral is just finetuned, we can use the earlier model to obtain tokenizer (yes, this model has atokenizers.jsonformat)I've gone ahead with the Option 2 and implemented it in the code.
I've updated presets in
mistral_presets.py,convert_mistral.py, andconvert_mistral_checkpoints.py.Colab Notebook
I could not load it in colab since the model is 24B and runtime is crashing, but i will try it in Modal/Lambda and attach the results here.
Doubts