You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/mddocs/Quickstart/llama_cpp_quickstart.md
+116-102
Original file line number
Diff line number
Diff line change
@@ -12,9 +12,9 @@
12
12
> For installation on Intel Arc B-Series GPU (such as **B580**), please refer to this [guide](./bmg_quickstart.md).
13
13
14
14
> [!NOTE]
15
-
> Our latest version is consistent with [3f1ae2e](https://github.com/ggerganov/llama.cpp/commit/3f1ae2e32cde00c39b96be6d01c2997c29bae555) of llama.cpp.
15
+
> Our latest version is consistent with [d7cfe1f](https://github.com/ggml-org/llama.cpp/commit/d7cfe1ffe0f435d0048a6058d529daf76e072d9c) of llama.cpp.
16
16
>
17
-
> `ipex-llm[cpp]==2.2.0b20241204` is consistent with [a1631e5](https://github.com/ggerganov/llama.cpp/commit/a1631e53f6763e17da522ba219b030d8932900bd) of llama.cpp.
17
+
> `ipex-llm[cpp]==2.2.0b20250320` is consistent with [ba1cb19](https://github.com/ggml-org/llama.cpp/commit/ba1cb19cdd0d92e012e0f6e009e0620f854b6afd) of llama.cpp.
18
18
19
19
See the demo of running LLaMA2-7B on Intel Arc GPU below.
20
20
@@ -158,7 +158,7 @@ Before running, you should download or copy community GGUF model to your current
158
158
- For **Linux users**:
159
159
160
160
```bash
161
-
./llama-cli -m mistral-7b-instruct-v0.1.Q4_K_M.gguf -n 32 --prompt "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun" -c 1024 -t 8 -e -ngl 99 --color
161
+
./llama-cli -m mistral-7b-instruct-v0.1.Q4_K_M.gguf -n 32 --prompt "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun" -c 1024 -t 8 -e -ngl 99 --color -no-cnv
162
162
```
163
163
164
164
> **Note**:
@@ -170,7 +170,7 @@ Before running, you should download or copy community GGUF model to your current
170
170
Please run the following command in Miniforge Prompt.
171
171
172
172
```cmd
173
-
llama-cli -m mistral-7b-instruct-v0.1.Q4_K_M.gguf -n 32 --prompt "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun" -c 1024 -t 8 -e -ngl 99 --color
173
+
llama-cli -m mistral-7b-instruct-v0.1.Q4_K_M.gguf -n 32 --prompt "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun" -c 1024 -t 8 -e -ngl 99 --color -no-cnv
174
174
```
175
175
176
176
> **Note**:
@@ -179,11 +179,10 @@ Before running, you should download or copy community GGUF model to your current
179
179
180
180
#### Sample Output
181
181
```
182
-
Log start
183
-
main: build = 1 (6f4ec98)
184
-
main: built with MSVC 19.39.33519.0 for
185
-
main: seed = 1724921424
186
-
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from D:\gguf-models\mistral-7b-instruct-v0.1.Q4_K_M.gguf (version GGUF V2)
182
+
main: llama backend init
183
+
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /home/arda/ruonan/mistral-7b-instruct-v0.1.Q4_K_M.gguf (version GGUF V2)
187
186
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun exploring the world. But sometimes, she found it hard to find friends who shared her interests.
304
319
305
-
Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun exploring the world. She lived in a small village where there weren't many opportunities for adventures, but that didn't stop her. She would often read
306
-
llama_print_timings: load time = xxxx ms
307
-
llama_print_timings: sample time = x.xx ms / 32 runs ( xx.xx ms per token, xx.xx tokens per second)
308
-
llama_print_timings: prompt eval time = xx.xx ms / 31 tokens ( xx.xx ms per token, xx.xx tokens per second)
309
-
llama_print_timings: eval time = xx.xx ms / 31 runs ( xx.xx ms per token, xx.xx tokens per second)
310
-
llama_print_timings: total time = xx.xx ms / 62 tokens
311
-
Log end
320
+
One day, she decided to take matters into her own
312
321
322
+
llama_perf_sampler_print: sampling time = x.xx ms / 63 runs ( x.xx ms per token, xx.xx tokens per second)
323
+
llama_perf_context_print: load time = xx.xx ms
324
+
llama_perf_context_print: prompt eval time = xx.xx ms / 31 tokens ( xx.xx ms per token, xx.xx tokens per second)
325
+
llama_perf_context_print: eval time = xx.xx ms / 31 runs ( xx.xx ms per token, xx.xx tokens per second)
326
+
llama_perf_context_print: total time = xx.xx ms / 62 tokens
0 commit comments