Skip to content

Commit 33b9a9c

Browse files
Update part of Overview guide in mddocs (1/2) (#11378)
* Create install.md * Update install_cpu.md * Delete original docs/mddocs/Overview/install_cpu.md * Update install_cpu.md * Update install_gpu.md * update llm.md and install.md * Update docs in KeyFeatures * Review and fix typos * Fix on folded NOTE * Small fix * Small fix * Remove empty known_issue.md * Small fix * Small fix * Further fix * Fixes * Fix --------- Co-authored-by: Yuwen Hu <[email protected]>
1 parent 4ba8219 commit 33b9a9c

11 files changed

+478
-558
lines changed

docs/mddocs/Overview/KeyFeatures/multi_gpus_selection.md

+28-35
Original file line numberDiff line numberDiff line change
@@ -6,24 +6,22 @@ In [Inference on GPU](inference_on_gpu.md) and [Finetune (QLoRA)](finetune.md),
66

77
The `sycl-ls` tool enumerates a list of devices available in the system. You can use it after you setup oneapi environment:
88

9-
```eval_rst
10-
.. tabs::
11-
.. tab:: Windows
9+
- For **Windows users**:
1210

13-
Please make sure you are using CMD (Miniforge Prompt if using conda):
11+
Please make sure you are using CMD (Miniforge Prompt if using conda):
1412

15-
.. code-block:: cmd
13+
```cmd
14+
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
15+
sycl-ls
16+
```
1617

17-
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
18-
sycl-ls
18+
- For **Linux users**:
1919

20-
.. tab:: Linux
20+
```bash
21+
source /opt/intel/oneapi/setvars.sh
22+
sycl-ls
23+
```
2124

22-
.. code-block:: bash
23-
24-
source /opt/intel/oneapi/setvars.sh
25-
sycl-ls
26-
```
2725

2826
If you have two Arc770 GPUs, you can get something like below:
2927
```
@@ -40,7 +38,7 @@ This output shows there are two Arc A770 GPUs as well as an Intel iGPU on this m
4038

4139
## Devices selection
4240
To enable xpu, you should convert your model and input to xpu by below code:
43-
```
41+
```python
4442
model = model.to('xpu')
4543
input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu')
4644
```
@@ -50,7 +48,7 @@ To select the desired devices, there are two ways: one is changing the code, ano
5048
To specify a xpu, you can change the `to('xpu')` to `to('xpu:[device_id]')`, this device_id is counted from zero.
5149

5250
If you you want to use the second device, you can change the code like this:
53-
```
51+
```python
5452
model = model.to('xpu:1')
5553
input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu:1')
5654
```
@@ -59,28 +57,23 @@ input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu:1')
5957
Device selection environment variable, `ONEAPI_DEVICE_SELECTOR`, can be used to limit the choice of Intel GPU devices. As upon `sycl-ls` shows, the last three lines are three Level Zero GPU devices. So we can use `ONEAPI_DEVICE_SELECTOR=level_zero:[gpu_id]` to select devices.
6058
For example, you want to use the second A770 GPU, you can run the python like this:
6159

62-
```eval_rst
63-
.. tabs::
64-
.. tab:: Windows
65-
66-
.. code-block:: cmd
60+
- For **Windows users**:
6761

68-
set ONEAPI_DEVICE_SELECTOR=level_zero:1
69-
python generate.py
62+
```cmd
63+
set ONEAPI_DEVICE_SELECTOR=level_zero:1
64+
python generate.py
65+
```
66+
Through ``set ONEAPI_DEVICE_SELECTOR=level_zero:1``, only the second A770 GPU will be available for the current environment.
7067

71-
Through ``set ONEAPI_DEVICE_SELECTOR=level_zero:1``, only the second A770 GPU will be available for the current environment.
68+
- For **Linux users**:
7269

73-
.. tab:: Linux
70+
```bash
71+
ONEAPI_DEVICE_SELECTOR=level_zero:1 python generate.py
72+
```
7473

75-
.. code-block:: bash
74+
``ONEAPI_DEVICE_SELECTOR=level_zero:1`` in upon command only affect in current python program. Also, you can export the environment variable, then run your python:
7675

77-
ONEAPI_DEVICE_SELECTOR=level_zero:1 python generate.py
78-
79-
``ONEAPI_DEVICE_SELECTOR=level_zero:1`` in upon command only affect in current python program. Also, you can export the environment variable, then run your python:
80-
81-
.. code-block:: bash
82-
83-
export ONEAPI_DEVICE_SELECTOR=level_zero:1
84-
python generate.py
85-
86-
```
76+
```bash
77+
export ONEAPI_DEVICE_SELECTOR=level_zero:1
78+
python generate.py
79+
```

docs/mddocs/Overview/KeyFeatures/native_format.md

+5-10
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,15 @@
22

33
You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows.
44

5-
```eval_rst
6-
.. note::
5+
> [!NOTE]
6+
> Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; you may use the corresponding API to load the converted model. (For other models, you can use the Hugging Face ``transformers`` format as described [here](./hugging_face_format.md))
77
8-
Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; you may use the corresponding API to load the converted model. (For other models, you can use the Hugging Face ``transformers`` format as described `here <./hugging_face_format.html>`_).
9-
```
108

119
```python
1210
# convert the model
1311
from ipex_llm import llm_convert
1412
ipex_llm_path = llm_convert(model='/path/to/model/',
15-
outfile='/path/to/output/', outtype='int4', model_family="llama")
13+
outfile='/path/to/output/', outtype='int4', model_family="llama")
1614

1715
# load the converted model
1816
# switch to ChatGLMForCausalLM/GptneoxForCausalLM/BloomForCausalLM/StarcoderForCausalLM to load other models
@@ -25,8 +23,5 @@ output_ids = llm.generate(input_ids, ...)
2523
output = llm.batch_decode(output_ids)
2624
```
2725

28-
```eval_rst
29-
.. seealso::
30-
31-
See the complete example `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/Native-Models>`_
32-
```
26+
> [!NOTE]
27+
> See the complete example [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/Native-Models)

docs/mddocs/Overview/KeyFeatures/optimize_model.md

+3-6
Original file line numberDiff line numberDiff line change
@@ -60,10 +60,7 @@ model = load_low_bit(model, saved_dir) # Load the optimized model
6060
```
6161

6262

63-
```eval_rst
64-
.. seealso::
63+
> [!NOTE]
64+
> - Please refer to the [API documentation](https://ipex-llm.readthedocs.io/en/latest/doc/PythonAPI/LLM/optimize.html) for more details.
65+
> - We also provide detailed examples on how to run PyTorch models (e.g., Openai Whisper, LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using IPEX-LLM. See the complete CPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models) and GPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models)
6566
66-
* Please refer to the `API documentation <https://ipex-llm.readthedocs.io/en/latest/doc/PythonAPI/LLM/optimize.html>`_ for more details.
67-
68-
* We also provide detailed examples on how to run PyTorch models (e.g., Openai Whisper, LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using IPEX-LLM. See the complete CPU examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models>`_ and GPU examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models>`_.
69-
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# `transformers`-style API
2+
3+
You may run the LLMs using `transformers`-style API in `ipex-llm`.
4+
5+
* [Hugging Face `transformers` Format](./hugging_face_format.md)
6+
* [Native Format](./native_format.md)

docs/mddocs/Overview/KeyFeatures/transformers_style_api.rst

-10
This file was deleted.

docs/mddocs/Overview/install.md

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# IPEX-LLM Installation
2+
3+
Here, we provide instructions on how to install `ipex-llm` and best practices for setting up your environment. Please refer to the appropriate guide based on your device:
4+
5+
- [CPU](./install_cpu.md)
6+
- [GPU](./install_gpu.md)

docs/mddocs/Overview/install.rst

-7
This file was deleted.

docs/mddocs/Overview/install_cpu.md

+35-51
Original file line numberDiff line numberDiff line change
@@ -4,33 +4,26 @@
44

55
Install IPEX-LLM for CPU supports using pip through:
66

7-
```eval_rst
8-
.. tabs::
7+
- For **Linux users**:
98

10-
.. tab:: Linux
9+
```bash
10+
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
11+
```
1112

12-
.. code-block:: bash
13+
- For **Windows users**:
1314

14-
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
15-
16-
.. tab:: Windows
17-
18-
.. code-block:: cmd
19-
20-
pip install --pre --upgrade ipex-llm[all]
21-
```
15+
```cmd
16+
pip install --pre --upgrade ipex-llm[all]
17+
```
2218

2319
Please refer to [Environment Setup](#environment-setup) for more information.
2420

25-
```eval_rst
26-
.. note::
21+
> [!NOTE]
22+
> `all` option will trigger installation of all the dependencies for common LLM application development.
2723
28-
``all`` option will trigger installation of all the dependencies for common LLM application development.
24+
> [!IMPORTANT]
25+
> `ipex-llm` is tested with Python 3.9, 3.10 and 3.11; Python 3.11 is recommended for best practices.
2926
30-
.. important::
31-
32-
``ipex-llm`` is tested with Python 3.9, 3.10 and 3.11; Python 3.11 is recommended for best practices.
33-
```
3427

3528
## Recommended Requirements
3629

@@ -53,48 +46,39 @@ For optimal performance with LLM models using IPEX-LLM optimizations on Intel CP
5346

5447
First we recommend using [Conda](https://conda-forge.org/download/) to create a python 3.11 enviroment:
5548

56-
```eval_rst
57-
.. tabs::
58-
59-
.. tab:: Linux
49+
- For **Linux users**:
6050

61-
.. code-block:: bash
51+
```bash
52+
conda create -n llm python=3.11
53+
conda activate llm
6254

63-
conda create -n llm python=3.11
64-
conda activate llm
55+
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
56+
```
6557

66-
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
58+
- For
59+
```cmd
60+
conda create -n llm python=3.11
61+
conda activate llm
6762
68-
.. tab:: Windows
69-
70-
.. code-block:: cmd
71-
72-
conda create -n llm python=3.11
73-
conda activate llm
74-
75-
pip install --pre --upgrade ipex-llm[all]
63+
pip install --pre --upgrade ipex-llm[all]
7664
```
7765

7866
Then for running a LLM model with IPEX-LLM optimizations (taking an `example.py` an example):
7967

80-
```eval_rst
81-
.. tabs::
82-
83-
.. tab:: Client
84-
85-
It is recommended to run directly with full utilization of all CPU cores:
86-
87-
.. code-block:: bash
68+
- For **running on Client**:
8869

89-
python example.py
70+
It is recommended to run directly with full utilization of all CPU cores:
9071

91-
.. tab:: Server
72+
```bash
73+
python example.py
74+
```
9275

93-
It is recommended to run with all the physical cores of a single socket:
76+
- For **running on Server**:
9477

95-
.. code-block:: bash
78+
It is recommended to run with all the physical cores of a single socket:
9679

97-
# e.g. for a server with 48 cores per socket
98-
export OMP_NUM_THREADS=48
99-
numactl -C 0-47 -m 0 python example.py
100-
```
80+
```bash
81+
# e.g. for a server with 48 cores per socket
82+
export OMP_NUM_THREADS=48
83+
numactl -C 0-47 -m 0 python example.py
84+
```

0 commit comments

Comments
 (0)