intel
diff --git a/‎docs/mddocs/Overview/KeyFeatures/multi_gpus_selection.md
Lines changed: 28 additions & 35 deletions b/‎docs/mddocs/Overview/KeyFeatures/multi_gpus_selection.md
Lines changed: 28 additions & 35 deletions
diff --git a/‎docs/mddocs/Overview/KeyFeatures/native_format.md
Lines changed: 5 additions & 10 deletions b/‎docs/mddocs/Overview/KeyFeatures/native_format.md
Lines changed: 5 additions & 10 deletions
diff --git a/‎docs/mddocs/Overview/KeyFeatures/optimize_model.md
Lines changed: 3 additions & 6 deletions b/‎docs/mddocs/Overview/KeyFeatures/optimize_model.md
Lines changed: 3 additions & 6 deletions
diff --git a/‎docs/mddocs/Overview/KeyFeatures/transformers_style_api.md
Lines changed: 6 additions & 0 deletions b/‎docs/mddocs/Overview/KeyFeatures/transformers_style_api.md
Lines changed: 6 additions & 0 deletions
diff --git a/‎docs/mddocs/Overview/KeyFeatures/transformers_style_api.rst
Lines changed: 0 additions & 10 deletions b/‎docs/mddocs/Overview/KeyFeatures/transformers_style_api.rst
Lines changed: 0 additions & 10 deletions
diff --git a/‎docs/mddocs/Overview/install.md
Lines changed: 6 additions & 0 deletions b/‎docs/mddocs/Overview/install.md
Lines changed: 6 additions & 0 deletions
diff --git a/‎docs/mddocs/Overview/install.rst
Lines changed: 0 additions & 7 deletions b/‎docs/mddocs/Overview/install.rst
Lines changed: 0 additions & 7 deletions
diff --git a/‎docs/mddocs/Overview/install_cpu.md
Lines changed: 35 additions & 51 deletions b/‎docs/mddocs/Overview/install_cpu.md
Lines changed: 35 additions & 51 deletions
@@ -6,24 +6,22 @@ In [Inference on GPU](inference_on_gpu.md) and [Finetune (QLoRA)](finetune.md),
 
 The `sycl-ls` tool enumerates a list of devices available in the system. You can use it after you setup oneapi environment:
 
-```eval_rst
-.. tabs::
-   .. tab:: Windows
+- For **Windows users**:
 
-      Please make sure you are using CMD (Miniforge Prompt if using conda):
+   Please make sure you are using CMD (Miniforge Prompt if using conda):
 
-      .. code-block:: cmd
+   ```cmd
+   call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
+   sycl-ls
+   ```
 
-        call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-        sycl-ls
+- For **Linux users**:
 
-   .. tab:: Linux
+   ```bash
+   source /opt/intel/oneapi/setvars.sh
+   sycl-ls
+   ```
 
-      .. code-block:: bash
-
-         source /opt/intel/oneapi/setvars.sh
-         sycl-ls
-```
 
 If you have two Arc770 GPUs, you can get something like below:
 ```
@@ -40,7 +38,7 @@ This output shows there are two Arc A770 GPUs as well as an Intel iGPU on this m
 
 ## Devices selection
 To enable xpu, you should convert your model and input to xpu by below code:
-```
+```python
 model = model.to('xpu')
 input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu')
 ```
@@ -50,7 +48,7 @@ To select the desired devices, there are two ways: one is changing the code, ano
 To specify a xpu, you can change the `to('xpu')` to `to('xpu:[device_id]')`, this device_id is counted from zero.
 
 If you you want to use the second device, you can change the code like this: 
-```
+```python
 model = model.to('xpu:1')
 input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu:1')
 ```
@@ -59,28 +57,23 @@ input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu:1')
 Device selection environment variable, `ONEAPI_DEVICE_SELECTOR`, can be used to limit the choice of Intel GPU devices. As upon `sycl-ls` shows, the last three lines are three Level Zero GPU devices. So we can use `ONEAPI_DEVICE_SELECTOR=level_zero:[gpu_id]` to select devices.
 For example, you want to use the second A770 GPU, you can run the python like this:
 
-```eval_rst
-.. tabs::
-   .. tab:: Windows
-
-      .. code-block:: cmd
+- For **Windows users**:
 
-         set ONEAPI_DEVICE_SELECTOR=level_zero:1 
-         python generate.py
+   ```cmd
+   set ONEAPI_DEVICE_SELECTOR=level_zero:1 
+   python generate.py
+   ```
+   Through ``set ONEAPI_DEVICE_SELECTOR=level_zero:1``, only the second A770 GPU will be available for the current environment.
 
-      Through ``set ONEAPI_DEVICE_SELECTOR=level_zero:1``, only the second A770 GPU will be available for the current environment.
+- For **Linux users**:
 
-   .. tab:: Linux
+   ```bash
+   ONEAPI_DEVICE_SELECTOR=level_zero:1 python generate.py
+   ```
 
-      .. code-block:: bash
+   ``ONEAPI_DEVICE_SELECTOR=level_zero:1`` in upon command only affect in current python program. Also, you can export the environment variable, then run your python:
 
-         ONEAPI_DEVICE_SELECTOR=level_zero:1 python generate.py
-
-      ``ONEAPI_DEVICE_SELECTOR=level_zero:1`` in upon command only affect in current python program. Also, you can export the environment variable, then run your python:
-
-      .. code-block:: bash
-
-         export ONEAPI_DEVICE_SELECTOR=level_zero:1
-         python generate.py
-
-```
+   ```bash
+   export ONEAPI_DEVICE_SELECTOR=level_zero:1
+   python generate.py
+   ```
@@ -2,17 +2,15 @@
 
 You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows.
 
-```eval_rst
-.. note::
+> [!NOTE]
+> Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; you may use the corresponding API to load the converted model. (For other models, you can use the Hugging Face ``transformers`` format as described [here](./hugging_face_format.md))
 
-   Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; you may use the corresponding API to load the converted model. (For other models, you can use the Hugging Face ``transformers`` format as described `here <./hugging_face_format.html>`_).
-```
 
 ```python
 # convert the model
 from ipex_llm import llm_convert
 ipex_llm_path = llm_convert(model='/path/to/model/',
-       outfile='/path/to/output/', outtype='int4', model_family="llama")
+                            outfile='/path/to/output/', outtype='int4', model_family="llama")
 
 # load the converted model
 # switch to ChatGLMForCausalLM/GptneoxForCausalLM/BloomForCausalLM/StarcoderForCausalLM to load other models
@@ -25,8 +23,5 @@ output_ids = llm.generate(input_ids, ...)
 output = llm.batch_decode(output_ids)
 ```
 
-```eval_rst
-.. seealso::
-   
-   See the complete example `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/Native-Models>`_
-```
+> [!NOTE] 
+> See the complete example [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/Native-Models)
@@ -60,10 +60,7 @@ model = load_low_bit(model, saved_dir) # Load the optimized model
 ```
 
 
-```eval_rst
-.. seealso::
+> [!NOTE]
+> - Please refer to the [API documentation](https://ipex-llm.readthedocs.io/en/latest/doc/PythonAPI/LLM/optimize.html) for more details.
+> - We also provide detailed examples on how to run PyTorch models (e.g., Openai Whisper, LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using IPEX-LLM. See the complete CPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models) and GPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models)
 
-   * Please refer to the `API documentation <https://ipex-llm.readthedocs.io/en/latest/doc/PythonAPI/LLM/optimize.html>`_ for more details.
-
-   * We also provide detailed examples on how to run PyTorch models (e.g., Openai Whisper, LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using IPEX-LLM. See the complete CPU examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models>`_ and GPU examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models>`_.
-```
@@ -0,0 +1,6 @@
+# `transformers`-style API
+
+You may run the LLMs using `transformers`-style API in `ipex-llm`.
+
+* [Hugging Face `transformers` Format](./hugging_face_format.md)
+* [Native Format](./native_format.md)
@@ -0,0 +1,6 @@
+# IPEX-LLM Installation
+
+Here, we provide instructions on how to install `ipex-llm` and best practices for setting up your environment. Please refer to the appropriate guide based on your device:
+
+- [CPU](./install_cpu.md)
+- [GPU](./install_gpu.md)
@@ -4,33 +4,26 @@
 
 Install IPEX-LLM for CPU supports using pip through:
 
-```eval_rst	
-.. tabs::
+- For **Linux users**:
 
-   .. tab:: Linux
+  ```bash
+  pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
+  ```
 
-      .. code-block:: bash
+- For **Windows users**:
 
-         pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
-
-   .. tab:: Windows
-
-      .. code-block:: cmd
-
-         pip install --pre --upgrade ipex-llm[all]
-```
+  ```cmd
+  pip install --pre --upgrade ipex-llm[all]
+  ```
 
 Please refer to [Environment Setup](#environment-setup) for more information.
 
-```eval_rst
-.. note::
+> [!NOTE]
+> `all` option will trigger installation of all the dependencies for common LLM application development.
 
-   ``all`` option will trigger installation of all the dependencies for common LLM application development.
+> [!IMPORTANT]
+> `ipex-llm` is tested with Python 3.9, 3.10 and 3.11; Python 3.11 is recommended for best practices.
 
-.. important::
-
-   ``ipex-llm`` is tested with Python 3.9, 3.10 and 3.11; Python 3.11 is recommended for best practices.
-```
 
 ## Recommended Requirements
 
@@ -53,48 +46,39 @@ For optimal performance with LLM models using IPEX-LLM optimizations on Intel CP
 
 First we recommend using [Conda](https://conda-forge.org/download/) to create a python 3.11 enviroment:
 
-```eval_rst	
-.. tabs::
-
-   .. tab:: Linux
+- For **Linux users**:
 
-      .. code-block:: bash
+  ```bash
+  conda create -n llm python=3.11
+  conda activate llm
 
-         conda create -n llm python=3.11
-         conda activate llm
+  pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
+  ```
 
-         pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
+- For 
+```cmd
+conda create -n llm python=3.11
+conda activate llm
 
-   .. tab:: Windows
-
-      .. code-block:: cmd
-
-         conda create -n llm python=3.11
-         conda activate llm
-
-         pip install --pre --upgrade ipex-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 ```
 
 Then for running a LLM model with IPEX-LLM optimizations (taking an `example.py` an example):
 
-```eval_rst	
-.. tabs::
-
-   .. tab:: Client
-
-      It is recommended to run directly with full utilization of all CPU cores:
-
-      .. code-block:: bash
+- For **running on Client**:
 
-         python example.py
+  It is recommended to run directly with full utilization of all CPU cores:
 
-   .. tab:: Server
+  ```bash
+  python example.py
+  ```
 
-      It is recommended to run with all the physical cores of a single socket:
+- For **running on Server**:
 
-      .. code-block:: bash
+  It is recommended to run with all the physical cores of a single socket:
 
-         # e.g. for a server with 48 cores per socket
-         export OMP_NUM_THREADS=48
-         numactl -C 0-47 -m 0 python example.py
-```
+  ```bash
+  # e.g. for a server with 48 cores per socket
+  export OMP_NUM_THREADS=48
+  numactl -C 0-47 -m 0 python example.py
+  ```