File tree 2 files changed +11
-4
lines changed
python/llm/example/NPU/HF-Transformers-AutoModels/LLM
2 files changed +11
-4
lines changed Original file line number Diff line number Diff line change 78
78
79
79
## 4. Run Optimized Models (Experimental)
80
80
The example below shows how to run the ** _ optimized model implementations_ ** on Intel NPU, including
81
- - [ Llama2-7B] ( ./llama2.py )
81
+ - [ Llama2-7B] ( ./llama.py )
82
+ - [ Llama3-8B] ( ./llama.py )
82
83
- [ Qwen2-1.5B] ( ./qwen2.py )
83
84
84
- ```
85
+ ``` bash
85
86
# to run Llama-2-7b-chat-hf
86
- python llama2.py
87
+ python llama.py
88
+
89
+ # to run Meta-Llama-3-8B-Instruct
90
+ python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct
87
91
88
92
# to run Qwen2-1.5B-Instruct
89
93
python qwen2.py
@@ -102,7 +106,10 @@ Arguments info:
102
106
If you encounter output problem, please try to disable the optimization of transposing value cache with following command:
103
107
``` bash
104
108
# to run Llama-2-7b-chat-hf
105
- python llama2.py --disable-transpose-value-cache
109
+ python llama.py --disable-transpose-value-cache
110
+
111
+ # to run Meta-Llama-3-8B-Instruct
112
+ python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --disable-transpose-value-cache
106
113
107
114
# to run Qwen2-1.5B-Instruct
108
115
python qwen2.py --disable-transpose-value-cache
File renamed without changes.
You can’t perform that action at this time.
0 commit comments