|
86 | 86 | "\n", |
87 | 87 | "* **tiny-llama-1b-chat** - This is the chat model finetuned on top of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T). The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens with the adoption of the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint. More details about model can be found in [model card](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)\n", |
88 | 88 | "* **phi-2** - Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as [Phi-1.5](https://huggingface.co/microsoft/phi-1_5), augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters. More details about model can be found in [model card](https://huggingface.co/microsoft/phi-2#limitations-of-phi-2).\n", |
89 | | - "* **dolly-v2-3b** - Dolly 2.0 is an instruction-following large language model trained on the Databricks machine-learning platform that is licensed for commercial use. It is based on [Pythia](https://github.com/EleutherAI/pythia) and is trained on ~15k instruction/response fine-tuning records generated by Databricks employees in various capability domains, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. Dolly 2.0 works by processing natural language instructions and generating responses that follow the given instructions. It can be used for a wide range of applications, including closed question-answering, summarization, and generation. More details about model can be found in [model card](https://huggingface.co/databricks/dolly-v2-3b).\n", |
90 | 89 | "* **red-pajama-3b-instruct** - A 2.8B parameter pre-trained language model based on GPT-NEOX architecture. The model was fine-tuned for few-shot applications on the data of [GPT-JT](https://huggingface.co/togethercomputer/GPT-JT-6B-v1), with exclusion of tasks that overlap with the HELM core scenarios.More details about model can be found in [model card](https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-3B-v1).\n", |
91 | 90 | "* **mistral-7b** - The Mistral-7B-v0.2 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. You can find more details about model in the [model card](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), [paper](https://arxiv.org/abs/2310.06825) and [release blog post](https://mistral.ai/news/announcing-mistral-7b/).\n", |
92 | 91 | "* **llama-3-8b-instruct** - Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. More details about model can be found in [Meta blog post](https://ai.meta.com/blog/meta-llama-3/), [model website](https://llama.meta.com/llama3) and [model card](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).\n", |
|
138 | 137 | }, |
139 | 138 | { |
140 | 139 | "cell_type": "code", |
141 | | - "execution_count": 3, |
| 140 | + "execution_count": null, |
142 | 141 | "id": "27b42290-a9b5-4453-9a4c-ffa44bbd966d", |
143 | 142 | "metadata": {}, |
144 | | - "outputs": [ |
145 | | - { |
146 | | - "data": { |
147 | | - "application/vnd.jupyter.widget-view+json": { |
148 | | - "model_id": "aed83fe4a63f44a8af7508cd115fcf4b", |
149 | | - "version_major": 2, |
150 | | - "version_minor": 0 |
151 | | - }, |
152 | | - "text/plain": [ |
153 | | - "Dropdown(description='Model:', index=1, options=('tiny-llama-1b', 'phi-2', 'dolly-v2-3b', 'red-pajama-instruct…" |
154 | | - ] |
155 | | - }, |
156 | | - "execution_count": 3, |
157 | | - "metadata": {}, |
158 | | - "output_type": "execute_result" |
159 | | - } |
160 | | - ], |
| 143 | + "outputs": [], |
161 | 144 | "source": [ |
162 | 145 | "model_ids = list(SUPPORTED_LLM_MODELS)\n", |
163 | 146 | "\n", |
|
173 | 156 | }, |
174 | 157 | { |
175 | 158 | "cell_type": "code", |
176 | | - "execution_count": 4, |
| 159 | + "execution_count": null, |
177 | 160 | "id": "37e9634f-4fc7-4d9c-9ade-b3e8684a0828", |
178 | 161 | "metadata": {}, |
179 | | - "outputs": [ |
180 | | - { |
181 | | - "name": "stdout", |
182 | | - "output_type": "stream", |
183 | | - "text": [ |
184 | | - "Selected model dolly-v2-3b\n" |
185 | | - ] |
186 | | - } |
187 | | - ], |
| 162 | + "outputs": [], |
188 | 163 | "source": [ |
189 | 164 | "model_configuration = SUPPORTED_LLM_MODELS[model_id.value]\n", |
190 | 165 | "print(f\"Selected model {model_id.value}\")" |
|
328 | 303 | }, |
329 | 304 | { |
330 | 305 | "cell_type": "code", |
331 | | - "execution_count": 6, |
| 306 | + "execution_count": null, |
332 | 307 | "id": "0430babb-f8c3-4b7e-86e0-1ea23de68477", |
333 | 308 | "metadata": {}, |
334 | 309 | "outputs": [ |
|
390 | 365 | " \"group_size\": 128,\n", |
391 | 366 | " \"ratio\": 0.5,\n", |
392 | 367 | " },\n", |
393 | | - " \"dolly-v2-3b\": {\"sym\": False, \"group_size\": 32, \"ratio\": 0.5},\n", |
394 | 368 | " \"llama-3-8b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n", |
395 | 369 | " \"default\": {\n", |
396 | 370 | " \"sym\": False,\n", |
|
514 | 488 | }, |
515 | 489 | { |
516 | 490 | "cell_type": "code", |
517 | | - "execution_count": 10, |
| 491 | + "execution_count": null, |
518 | 492 | "id": "5259c1c5-4128-4210-9ad2-faf33ee40e86", |
519 | 493 | "metadata": {}, |
520 | | - "outputs": [ |
521 | | - { |
522 | | - "name": "stdout", |
523 | | - "output_type": "stream", |
524 | | - "text": [ |
525 | | - "Loading model from dolly-v2-3b/INT8_compressed_weights\n" |
526 | | - ] |
527 | | - } |
528 | | - ], |
| 494 | + "outputs": [], |
529 | 495 | "source": [ |
530 | 496 | "from transformers import AutoTokenizer\n", |
531 | 497 | "from openvino_tokenizers import convert_tokenizer\n", |
|
0 commit comments