Remove deprecated models (#3073)

aleksandr-mokrov · web-flow · commit 0aef05505ccf · 2025-09-03T19:14:16.000+02:00
CVS-173118
diff --git a/notebooks/distil-whisper-asr/distil-whisper-asr.ipynb b/notebooks/distil-whisper-asr/distil-whisper-asr.ipynb
@@ -139,7 +139,6 @@
     "        \"openai/whisper-medium\",\n",
     "        \"openai/whisper-small\",\n",
     "        \"openai/whisper-base\",\n",
-    "        \"openai/whisper-tiny\",\n",
     "        \"openai/whisper-medium.en\",\n",
     "        \"openai/whisper-small.en\",\n",
     "        \"openai/whisper-base.en\",\n",
diff --git a/notebooks/llm-question-answering/README.md b/notebooks/llm-question-answering/README.md
@@ -11,7 +11,6 @@ The available options are:
 
 * **tiny-llama-1b-chat** - This is the chat model finetuned on top of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T). The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens with the adoption of the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint. More details about model can be found in [model card](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
 * **phi-2** - Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as [Phi-1.5](https://huggingface.co/microsoft/phi-1_5), augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters. More details about model can be found in [model card](https://huggingface.co/microsoft/phi-2#limitations-of-phi-2).
-* **dolly-v2-3b** - Dolly 2.0 is an instruction-following large language model trained on the Databricks machine-learning platform that is licensed for commercial use. It is based on [Pythia](https://github.com/EleutherAI/pythia) and is trained on ~15k instruction/response fine-tuning records generated by Databricks employees in various capability domains, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. Dolly 2.0 works by processing natural language instructions and generating responses that follow the given instructions. It can be used for a wide range of applications, including closed question-answering, summarization, and generation. More details about model can be found in [model card](https://huggingface.co/databricks/dolly-v2-3b).
 * **red-pajama-3b-instruct** -  A 2.8B parameter pre-trained language model based on GPT-NEOX architecture. The model was fine-tuned for few-shot applications on the data of [GPT-JT](https://huggingface.co/togethercomputer/GPT-JT-6B-v1), with exclusion of tasks that overlap with the HELM core scenarios.More details about model can be found in [model card](https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-3B-v1).
 * **mistral-7b** - The Mistral-7B-v0.2 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. You can find more details about model in the [model card](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), [paper](https://arxiv.org/abs/2310.06825) and [release blog post](https://mistral.ai/news/announcing-mistral-7b/).
 
diff --git a/notebooks/llm-question-answering/config.py b/notebooks/llm-question-answering/config.py
@@ -8,19 +8,6 @@
         "model_id": "susnato/phi-2",
         "prompt_template": "Instruct:{instruction}\nOutput:",
     },
-    "dolly-v2-3b": {
-        "model_id": "databricks/dolly-v2-3b",
-        "instriction_key": "### Instruction:",
-        "response_key": "### Response:",
-        "end_key": "### End",
-        "prompt_template": """Below is an instruction that describes a task. Write a response that appropriately completes the request.
-
-### Instruction:
-{instruction}
-
-### Response:
-""",
-    },
     "red-pajama-instruct-3b": {
         "model_id": "togethercomputer/RedPajama-INCITE-Instruct-3B-v1",
         "prompt_template": "Q: {instruction}\nA:",
diff --git a/notebooks/llm-question-answering/llm-question-answering.ipynb b/notebooks/llm-question-answering/llm-question-answering.ipynb
@@ -86,7 +86,6 @@
     "\n",
     "* **tiny-llama-1b-chat** - This is the chat model finetuned on top of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T). The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens with the adoption of the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint. More details about model can be found in [model card](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)\n",
     "* **phi-2** - Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as [Phi-1.5](https://huggingface.co/microsoft/phi-1_5), augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters. More details about model can be found in [model card](https://huggingface.co/microsoft/phi-2#limitations-of-phi-2).\n",
-    "* **dolly-v2-3b** - Dolly 2.0 is an instruction-following large language model trained on the Databricks machine-learning platform that is licensed for commercial use. It is based on [Pythia](https://github.com/EleutherAI/pythia) and is trained on ~15k instruction/response fine-tuning records generated by Databricks employees in various capability domains, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. Dolly 2.0 works by processing natural language instructions and generating responses that follow the given instructions. It can be used for a wide range of applications, including closed question-answering, summarization, and generation. More details about model can be found in [model card](https://huggingface.co/databricks/dolly-v2-3b).\n",
     "* **red-pajama-3b-instruct** -  A 2.8B parameter pre-trained language model based on GPT-NEOX architecture. The model was fine-tuned for few-shot applications on the data of [GPT-JT](https://huggingface.co/togethercomputer/GPT-JT-6B-v1), with exclusion of tasks that overlap with the HELM core scenarios.More details about model can be found in [model card](https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-3B-v1).\n",
     "* **mistral-7b** - The Mistral-7B-v0.2 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. You can find more details about model in the [model card](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), [paper](https://arxiv.org/abs/2310.06825) and [release blog post](https://mistral.ai/news/announcing-mistral-7b/).\n",
     "* **llama-3-8b-instruct** - Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. More details about model can be found in [Meta blog post](https://ai.meta.com/blog/meta-llama-3/), [model website](https://llama.meta.com/llama3) and [model card](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).\n",
@@ -138,26 +137,10 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": null,
    "id": "27b42290-a9b5-4453-9a4c-ffa44bbd966d",
    "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "aed83fe4a63f44a8af7508cd115fcf4b",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "Dropdown(description='Model:', index=1, options=('tiny-llama-1b', 'phi-2', 'dolly-v2-3b', 'red-pajama-instruct…"
-      ]
-     },
-     "execution_count": 3,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "model_ids = list(SUPPORTED_LLM_MODELS)\n",
     "\n",
@@ -173,18 +156,10 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": null,
    "id": "37e9634f-4fc7-4d9c-9ade-b3e8684a0828",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Selected model dolly-v2-3b\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "model_configuration = SUPPORTED_LLM_MODELS[model_id.value]\n",
     "print(f\"Selected model {model_id.value}\")"
@@ -328,7 +303,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": null,
    "id": "0430babb-f8c3-4b7e-86e0-1ea23de68477",
    "metadata": {},
    "outputs": [
@@ -390,7 +365,6 @@
     "            \"group_size\": 128,\n",
     "            \"ratio\": 0.5,\n",
     "        },\n",
-    "        \"dolly-v2-3b\": {\"sym\": False, \"group_size\": 32, \"ratio\": 0.5},\n",
     "        \"llama-3-8b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
     "        \"default\": {\n",
     "            \"sym\": False,\n",
@@ -514,18 +488,10 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": null,
    "id": "5259c1c5-4128-4210-9ad2-faf33ee40e86",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Loading model from dolly-v2-3b/INT8_compressed_weights\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "from transformers import AutoTokenizer\n",
     "from openvino_tokenizers import convert_tokenizer\n",
diff --git a/notebooks/whisper-asr-genai/whisper-asr-genai.ipynb b/notebooks/whisper-asr-genai/whisper-asr-genai.ipynb
@@ -135,7 +135,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": null,
    "id": "756cd923",
    "metadata": {
     "collapsed": false,
@@ -172,7 +172,6 @@
     "        \"openai/whisper-medium\",\n",
     "        \"openai/whisper-small\",\n",
     "        \"openai/whisper-base\",\n",
-    "        \"openai/whisper-tiny\",\n",
     "    ],\n",
     "    \"English-only models\": [\n",
     "        \"distil-whisper/distil-large-v2\",\n",
@@ -1001,25 +1000,15 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": null,
    "id": "31a08241-497e-4fd9-9ca9-d59c2602b8d4",
    "metadata": {
     "ExecuteTime": {
      "end_time": "2023-11-08T15:06:45.795886200Z",
      "start_time": "2023-11-08T15:06:45.788400400Z"
     }
    },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Mean torch openai/whisper-tiny generation time: 0.524s\n",
-      "Mean openvino openai/whisper-tiny generation time: 0.286s\n",
-      "Performance openai/whisper-tiny openvino speedup: 1.834\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "print(f\"Mean torch {model_id.value} generation time: {perf_torch:.3f}s\")\n",
     "print(f\"Mean openvino {model_id.value} generation time: {perf_ov:.3f}s\")\n",
@@ -1109,14 +1098,14 @@
     "\n",
     "Optimum Intel can be used to load optimized models from the [Hugging Face Hub](https://huggingface.co/docs/optimum/intel/hf.co/models) or local folder to create pipelines to run an inference with OpenVINO Runtime using Hugging Face APIs. The Optimum Inference models are API compatible with Hugging Face Transformers models. This means we just need to replace the `AutoModelForXxx` class with the corresponding `OVModelForXxx` class.\n",
     "\n",
-    "Below is an example of the whisper-tiny model\n",
+    "Below is an example of the whisper-base model\n",
     "\n",
     "```diff\n",
     "-from transformers import AutoModelForSpeechSeq2Seq\n",
     "+from optimum.intel.openvino import OVModelForSpeechSeq2Seq\n",
     "from transformers import AutoTokenizer, pipeline\n",
     "\n",
-    "model_id = \"openai/whisper-tiny\"\n",
+    "model_id = \"openai/whisper-base\"\n",
     "-model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id)\n",
     "+model = OVModelForSpeechSeq2Seq.from_pretrained(model_id, export=True)\n",
     "```\n",
diff --git a/notebooks/whisper-subtitles-generation/whisper-subtitles-generation.ipynb b/notebooks/whisper-subtitles-generation/whisper-subtitles-generation.ipynb
@@ -161,7 +161,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -191,12 +191,11 @@
     "    \"openai/whisper-medium\",\n",
     "    \"openai/whisper-small\",\n",
     "    \"openai/whisper-base\",\n",
-    "    \"openai/whisper-tiny\",\n",
     "]\n",
     "\n",
     "model_id = widgets.Dropdown(\n",
     "    options=list(MODELS),\n",
-    "    value=\"openai/whisper-tiny\",\n",
+    "    value=\"openai/whisper-base\",\n",
     "    description=\"Model:\",\n",
     "    disabled=False,\n",
     ")\n",
@@ -693,14 +692,14 @@
     "\n",
     "Optimum Intel can be used to load optimized models from the [Hugging Face Hub](https://huggingface.co/docs/optimum/intel/hf.co/models) or local folder to create pipelines to run an inference with OpenVINO Runtime using Hugging Face APIs. The Optimum Inference models are API compatible with Hugging Face Transformers models. This means we just need to replace the `AutoModelForXxx` class with the corresponding `OVModelForXxx` class.\n",
     "\n",
-    "Below is an example of the whisper-tiny model\n",
+    "Below is an example of the whisper-base model\n",
     "\n",
     "```diff\n",
     "-from transformers import AutoModelForSpeechSeq2Seq\n",
     "+from optimum.intel.openvino import OVModelForSpeechSeq2Seq\n",
     "from transformers import AutoTokenizer, pipeline\n",
     "\n",
-    "model_id = \"openai/whisper-tiny\"\n",
+    "model_id = \"openai/whisper-base\"\n",
     "-model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id)\n",
     "+model = OVModelForSpeechSeq2Seq.from_pretrained(model_id, export=True)\n",
     "```\n",