Add mistral-small-24b-instruct-2501 model to llm-chatbot (#3097)

aleksandr-mokrov · web-flow · commit d4204449ba6b · 2025-10-28T13:05:57.000+01:00
CVS-174002
diff --git a/notebooks/llm-chatbot/README.md b/notebooks/llm-chatbot/README.md
@@ -65,6 +65,7 @@ For more details, please refer to [model_card](https://huggingface.co/Qwen/Qwen2
 * **mistral-7B-Instruct-v0.3** - The Mistral-7B-Instruct-v0.3 state‑of‑the‑art large language model (LLM) useful on a variety of language understanding and generation tasks, it is an instruct fine-tuned version of the Mistral-7B-v0.3. You can find more details about model in the [model card](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3).
 >**Note**: run model with demo, you will need to accept license agreement.
 >You must be a registered user in 🤗 Hugging Face Hub. Please visit [HuggingFace model card](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), carefully read terms of usage and click accept button.  You will need to use an access token for the code below to run. For more information on access tokens, refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
+* **Mistral-Small-24B-Instruct-2501** - Mistral Small is a 24B parameter instruction-tuned language model from Mistral AI. More details about the model can be found in the [Mistral AI model card](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501).
 * **zephyr-7b-beta** - Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-beta is the second model in the series, and is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) that was trained on on a mix of publicly available, synthetic datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). You can find more details about model in [technical report](https://arxiv.org/abs/2310.16944) and [HuggingFace model card](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta).
 * **neural-chat-7b-v3-1** - Mistral-7b model fine-tuned using Intel Gaudi. The model fine-tuned on the open source dataset [Open-Orca/SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) and aligned with [Direct Preference Optimization (DPO) algorithm](https://arxiv.org/abs/2305.18290). More details can be found in [model card](https://huggingface.co/Intel/neural-chat-7b-v3-1) and [blog post](https://medium.com/@NeuralCompressor/the-practice-of-supervised-finetuning-and-direct-preference-optimization-on-habana-gaudi2-a1197d8a3cd3).
 * **notus-7b-v1** - Notus is a collection of fine-tuned models using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). and related [RLHF](https://huggingface.co/blog/rlhf) techniques. This model is the first version, fine-tuned with DPO over zephyr-7b-sft. Following a data-first approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO. Proposed approach for dataset creation helps to effectively fine-tune Notus-7b that surpasses Zephyr-7B-beta and Claude 2 on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). More details about model can be found in [model card](https://huggingface.co/argilla/notus-7b-v1).
diff --git a/notebooks/llm-chatbot/llm-chatbot-generate-api.ipynb b/notebooks/llm-chatbot/llm-chatbot-generate-api.ipynb
@@ -88,7 +88,7 @@
     "\"accelerate\" \\\n",
     "\"gradio>=4.19\" \\\n",
     "\"transformers==4.53.3\" \\\n",
-    "\"huggingface-hub>=0.26.5\" \\\n",
+    "\"huggingface-hub===0.35.3\" \\\n",
     "\"einops\" \"transformers_stream_generator\" \"tiktoken\" \"bitsandbytes\"\n",
     "\n",
     "if platform.system() == \"Darwin\":\n",
@@ -403,6 +403,22 @@
     "    except OSError:\n",
     "        notebook_login()\n",
     "```\n",
+    "* **Mistral-Small-24B-Instruct-2501** - Mistral Small is a 24B parameter instruction-tuned language model from Mistral AI. More details about the model can be found in the [Mistral AI model card](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501).\n",
+    ">**Note**: run model with demo, you will need to accept license agreement. \n",
+    ">You must be a registered user in 🤗 Hugging Face Hub. Please visit [HuggingFace model card](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), carefully read terms of usage and click accept button.  You will need to use an access token for the code below to run. For more information on access tokens, refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).\n",
+    ">You can login on Hugging Face Hub in notebook environment, using following code:\n",
+    " \n",
+    "```python\n",
+    "    ## login to huggingfacehub to get access to pretrained model \n",
+    "\n",
+    "    from huggingface_hub import notebook_login, whoami\n",
+    "\n",
+    "    try:\n",
+    "        whoami()\n",
+    "        print('Authorization token already provided')\n",
+    "    except OSError:\n",
+    "        notebook_login()\n",
+    "```\n",
     "* **zephyr-7b-beta** - Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-beta is the second model in the series, and is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) that was trained on on a mix of publicly available, synthetic datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). You can find more details about model in [technical report](https://arxiv.org/abs/2310.16944) and [HuggingFace model card](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta).\n",
     "* **neural-chat-7b-v3-1** - Mistral-7b model fine-tuned using Intel Gaudi. The model fine-tuned on the open source dataset [Open-Orca/SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) and aligned with [Direct Preference Optimization (DPO) algorithm](https://arxiv.org/abs/2305.18290). More details can be found in [model card](https://huggingface.co/Intel/neural-chat-7b-v3-1) and [blog post](https://medium.com/@NeuralCompressor/the-practice-of-supervised-finetuning-and-direct-preference-optimization-on-habana-gaudi2-a1197d8a3cd3).\n",
     "* **notus-7b-v1** - Notus is a collection of fine-tuned models using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). and related [RLHF](https://huggingface.co/blog/rlhf) techniques. This model is the first version, fine-tuned with DPO over zephyr-7b-sft. Following a data-first approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO. Proposed approach for dataset creation helps to effectively fine-tune Notus-7b that surpasses Zephyr-7B-beta and Claude 2 on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). More details about model can be found in [model card](https://huggingface.co/argilla/notus-7b-v1).\n",
diff --git a/notebooks/llm-chatbot/llm-chatbot.ipynb b/notebooks/llm-chatbot/llm-chatbot.ipynb
@@ -87,7 +87,7 @@
     "\"datasets<4.0.0\" \\\n",
     "\"accelerate\" \\\n",
     "\"gradio>=4.19\" \\\n",
-    "\"huggingface-hub>=0.26.5\" \\\n",
+    "\"huggingface-hub==0.35.3\" \\\n",
     " \"einops\" \"transformers==4.53.3\" \"transformers_stream_generator\" \"tiktoken\" \"bitsandbytes\"\n",
     "\n",
     "if platform.system() == \"Darwin\":\n",
@@ -298,7 +298,22 @@
     "    except OSError:\n",
     "        notebook_login()\n",
     "```\n",
+    "* **Mistral-Small-24B-Instruct-2501** - Mistral Small is a 24B parameter instruction-tuned language model from Mistral AI. More details about the model can be found in the [Mistral AI model card](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501).\n",
+    ">**Note**: run model with demo, you will need to accept license agreement. \n",
+    ">You must be a registered user in 🤗 Hugging Face Hub. Please visit [HuggingFace model card](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), carefully read terms of usage and click accept button.  You will need to use an access token for the code below to run. For more information on access tokens, refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).\n",
+    ">You can login on Hugging Face Hub in notebook environment, using following code:\n",
+    " \n",
+    "```python\n",
+    "    ## login to huggingfacehub to get access to pretrained model \n",
+    "\n",
+    "    from huggingface_hub import notebook_login, whoami\n",
     "\n",
+    "    try:\n",
+    "        whoami()\n",
+    "        print('Authorization token already provided')\n",
+    "    except OSError:\n",
+    "        notebook_login()\n",
+    "```\n",
     "* **zephyr-7b-beta** - Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-beta is the second model in the series, and is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) that was trained on on a mix of publicly available, synthetic datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). You can find more details about model in [technical report](https://arxiv.org/abs/2310.16944) and [HuggingFace model card](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta).\n",
     "* **neural-chat-7b-v3-1** - Mistral-7b model fine-tuned using Intel Gaudi. The model fine-tuned on the open source dataset [Open-Orca/SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) and aligned with [Direct Preference Optimization (DPO) algorithm](https://arxiv.org/abs/2305.18290). More details can be found in [model card](https://huggingface.co/Intel/neural-chat-7b-v3-1) and [blog post](https://medium.com/@NeuralCompressor/the-practice-of-supervised-finetuning-and-direct-preference-optimization-on-habana-gaudi2-a1197d8a3cd3).\n",
     "* **notus-7b-v1** - Notus is a collection of fine-tuned models using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). and related [RLHF](https://huggingface.co/blog/rlhf) techniques. This model is the first version, fine-tuned with DPO over zephyr-7b-sft. Following a data-first approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO. Proposed approach for dataset creation helps to effectively fine-tune Notus-7b that surpasses Zephyr-7B-beta and Claude 2 on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). More details about model can be found in [model card](https://huggingface.co/argilla/notus-7b-v1).\n",
@@ -624,7 +639,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": null,
    "id": "c4ef9112",
    "metadata": {
     "collapsed": false,
@@ -637,6 +652,9 @@
    "source": [
     "from pathlib import Path\n",
     "\n",
+    "from llm_config import compression_configs\n",
+    "\n",
+    "\n",
     "pt_model_id = model_configuration[\"model_id\"]\n",
     "pt_model_name = model_id.value.split(\"-\")[0]\n",
     "fp16_model_dir = Path(model_id.value) / \"FP16\"\n",
@@ -672,78 +690,6 @@
     "\n",
     "\n",
     "def convert_to_int4():\n",
-    "    compression_configs = {\n",
-    "        \"zephyr-7b-beta\": {\n",
-    "            \"sym\": True,\n",
-    "            \"group_size\": 64,\n",
-    "            \"ratio\": 0.6,\n",
-    "        },\n",
-    "        \"mistral-7b\": {\n",
-    "            \"sym\": True,\n",
-    "            \"group_size\": 64,\n",
-    "            \"ratio\": 0.6,\n",
-    "        },\n",
-    "        \"minicpm-2b-dpo\": {\n",
-    "            \"sym\": True,\n",
-    "            \"group_size\": 64,\n",
-    "            \"ratio\": 0.6,\n",
-    "        },\n",
-    "        \"gemma-2b-it\": {\n",
-    "            \"sym\": True,\n",
-    "            \"group_size\": 64,\n",
-    "            \"ratio\": 0.6,\n",
-    "        },\n",
-    "        \"notus-7b-v1\": {\n",
-    "            \"sym\": True,\n",
-    "            \"group_size\": 64,\n",
-    "            \"ratio\": 0.6,\n",
-    "        },\n",
-    "        \"neural-chat-7b-v3-1\": {\n",
-    "            \"sym\": True,\n",
-    "            \"group_size\": 64,\n",
-    "            \"ratio\": 0.6,\n",
-    "        },\n",
-    "        \"llama-2-chat-7b\": {\n",
-    "            \"sym\": True,\n",
-    "            \"group_size\": 128,\n",
-    "            \"ratio\": 0.8,\n",
-    "        },\n",
-    "        \"llama-3-8b-instruct\": {\n",
-    "            \"sym\": True,\n",
-    "            \"group_size\": 128,\n",
-    "            \"ratio\": 0.8,\n",
-    "        },\n",
-    "        \"llama-3.1-8b-instruct\": {\n",
-    "            \"sym\": True,\n",
-    "            \"group_size\": 128,\n",
-    "            \"ratio\": 1.0,\n",
-    "        },\n",
-    "        \"gemma-7b-it\": {\n",
-    "            \"sym\": True,\n",
-    "            \"group_size\": 128,\n",
-    "            \"ratio\": 0.8,\n",
-    "        },\n",
-    "        \"chatglm2-6b\": {\n",
-    "            \"sym\": True,\n",
-    "            \"group_size\": 128,\n",
-    "            \"ratio\": 0.72,\n",
-    "        },\n",
-    "        \"qwen-7b-chat\": {\"sym\": True, \"group_size\": 128, \"ratio\": 0.6},\n",
-    "        \"red-pajama-3b-chat\": {\n",
-    "            \"sym\": False,\n",
-    "            \"group_size\": 128,\n",
-    "            \"ratio\": 0.5,\n",
-    "        },\n",
-    "        \"qwen2.5-7b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
-    "        \"qwen2.5-3b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
-    "        \"qwen2.5-14b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
-    "        \"qwen2.5-1.5b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
-    "        \"qwen2.5-0.5b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
-    "        \"default\": {\n",
-    "            \"sym\": False,\n",
-    "        },\n",
-    "    }\n",
-    "\n",
     "    model_compression_params = compression_configs.get(model_id.value, compression_configs[\"default\"])\n",
     "    if (int4_model_dir / \"openvino_model.xml\").exists():\n",
     "        return\n",