|
1 | 1 | ---
|
2 |
| -title: Exporting LLaMa |
| 2 | +title: Exporting Llama |
3 | 3 | sidebar_position: 2
|
4 | 4 | ---
|
5 | 5 |
|
6 | 6 | In order to make the process of export as simple as possible for you, we created a script that runs a Docker container and exports the model.
|
7 | 7 |
|
8 |
| -1. Get a [HuggingFace](https://huggingface.co/) account. This will allow you to download needed files. You can also use the [official LLaMa website](https://www.llama.com/llama-downloads/). |
9 |
| -2. Pick the model that suits your needs. Before you download it, you'll need to accept a license. For best performance, we recommend using Spin-Quant or QLoRA versions of the model: |
10 |
| - - [LLaMa 3.2 3B](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/tree/main/original) |
11 |
| - - [LLaMa 3.2 1B](https://huggingface.co/meta-llama/Llama-3.2-1B/tree/main/original) |
12 |
| - - [LLaMa 3.2 3B Spin-Quant](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8/tree/main) |
13 |
| - - [LLaMa 3.2 1B Spin-Quant](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8/tree/main) |
14 |
| - - [LLaMa 3.2 3B QLoRA](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8/tree/main) |
15 |
| - - [LLaMa 3.2 1B QLoRA](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8/tree/main) |
16 |
| -3. Download the `consolidated.00.pth`, `params.json` and `tokenizer.model` files. If you can't see them, make sure to check the `original` directory. Sometimes the files might have other names, for example `original_params.json`. |
17 |
| -4. Run `mv tokenizer.model tokenizer.bin`. The library expects the tokenizers to have .bin extension. |
18 |
| -5. Run `./build_llama_binary.sh --model-path /path/to/consolidated.00.pth --params-path /path/to/params.json script that's located in the `llama-export` directory. |
19 |
| -6. The script will pull a Docker image from docker hub, and then run it to export the model. By default the output (llama3_2.pte file) will be saved in the `llama-export/outputs` directory. However, you can override that behavior with the `--output-path [path]` flag. |
| 8 | +## Steps to export Llama |
| 9 | +### 1. Create an Account: |
| 10 | +Get a [HuggingFace](https://huggingface.co/) account. This will allow you to download needed files. You can also use the [official Llama website](https://www.llama.com/llama-downloads/). |
20 | 11 |
|
| 12 | +### 2. Select a Model: |
| 13 | +Pick the model that suits your needs. Before you download it, you'll need to accept a license. For best performance, we recommend using Spin-Quant or QLoRA versions of the model: |
| 14 | + - [Llama 3.2 3B](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/tree/main/original) |
| 15 | + - [Llama 3.2 1B](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/tree/main/original) |
| 16 | + - [Llama 3.2 3B Spin-Quant](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8/tree/main) |
| 17 | + - [Llama 3.2 1B Spin-Quant](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8/tree/main) |
| 18 | + - [Llama 3.2 3B QLoRA](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8/tree/main) |
| 19 | + - [Llama 3.2 1B QLoRA](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8/tree/main) |
| 20 | + |
| 21 | +### 3. Download Files: |
| 22 | +Download the `consolidated.00.pth`, `params.json` and `tokenizer.model` files. If you can't see them, make sure to check the `original` directory. |
| 23 | + |
| 24 | +### 4. Rename the Tokenizer File: |
| 25 | +Rename the `tokenizer.model` file to `tokenizer.bin` as required by the library: |
| 26 | +```bash |
| 27 | +mv tokenizer.model tokenizer.bin |
| 28 | +``` |
| 29 | + |
| 30 | +### 5. Run the Export Script: |
| 31 | +Navigate to the `llama_export` directory and run the following command: |
| 32 | +```bash |
| 33 | +./build_llama_binary.sh --model-path /path/to/consolidated.00.pth --params-path /path/to/params.json |
| 34 | +``` |
| 35 | + |
| 36 | +The script will pull a Docker image from docker hub, and then run it to export the model. By default the output (llama3_2.pte file) will be saved in the `llama-export/outputs` directory. However, you can override that behavior with the `--output-path [path]` flag. |
21 | 37 |
|
22 | 38 | :::note[Note]
|
23 | 39 | This Docker image was tested on MacOS with ARM chip. This might not work in other environments.
|
|
0 commit comments