Skip to content

Commit e246f1e

Browse files
authored
update llama3 npu example (#11933)
1 parent 14dddfc commit e246f1e

File tree

2 files changed

+11
-4
lines changed

2 files changed

+11
-4
lines changed

python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md

+11-4
Original file line numberDiff line numberDiff line change
@@ -78,12 +78,16 @@ done
7878

7979
## 4. Run Optimized Models (Experimental)
8080
The example below shows how to run the **_optimized model implementations_** on Intel NPU, including
81-
- [Llama2-7B](./llama2.py)
81+
- [Llama2-7B](./llama.py)
82+
- [Llama3-8B](./llama.py)
8283
- [Qwen2-1.5B](./qwen2.py)
8384

84-
```
85+
```bash
8586
# to run Llama-2-7b-chat-hf
86-
python llama2.py
87+
python llama.py
88+
89+
# to run Meta-Llama-3-8B-Instruct
90+
python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct
8791

8892
# to run Qwen2-1.5B-Instruct
8993
python qwen2.py
@@ -102,7 +106,10 @@ Arguments info:
102106
If you encounter output problem, please try to disable the optimization of transposing value cache with following command:
103107
```bash
104108
# to run Llama-2-7b-chat-hf
105-
python  llama2.py --disable-transpose-value-cache
109+
python  llama.py --disable-transpose-value-cache
110+
111+
# to run Meta-Llama-3-8B-Instruct
112+
python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --disable-transpose-value-cache
106113

107114
# to run Qwen2-1.5B-Instruct
108115
python qwen2.py --disable-transpose-value-cache

0 commit comments

Comments
 (0)