intel
diff --git a/‎python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md
+11-4 b/‎python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md
+11-4
diff --git a/‎python/llm/example/NPU/HF-Transformers-AutoModels/LLM/llama2.py renamed to ‎python/llm/example/NPU/HF-Transformers-AutoModels/LLM/llama.py b/‎python/llm/example/NPU/HF-Transformers-AutoModels/LLM/llama2.py renamed to ‎python/llm/example/NPU/HF-Transformers-AutoModels/LLM/llama.py
@@ -78,12 +78,16 @@ done
 
 ## 4. Run Optimized Models (Experimental)
 The example below shows how to run the **_optimized model implementations_** on Intel NPU, including
-- [Llama2-7B](./llama2.py)
+- [Llama2-7B](./llama.py)
+- [Llama3-8B](./llama.py)
 - [Qwen2-1.5B](./qwen2.py)
 
-```
+```bash
 # to run Llama-2-7b-chat-hf
-python llama2.py
+python llama.py
+
+# to run Meta-Llama-3-8B-Instruct
+python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct
 
 # to run Qwen2-1.5B-Instruct
 python qwen2.py
@@ -102,7 +106,10 @@ Arguments info:
 If you encounter output problem, please try to disable the optimization of transposing value cache with following command:
 ```bash
 # to run Llama-2-7b-chat-hf
-python  llama2.py --disable-transpose-value-cache
+python  llama.py --disable-transpose-value-cache
+
+# to run Meta-Llama-3-8B-Instruct
+python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --disable-transpose-value-cache
 
 # to run Qwen2-1.5B-Instruct
 python qwen2.py --disable-transpose-value-cache