Skip to content

Files

Latest commit

 

History

History

moonlight

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Moonlight

In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Moonlight model on Intel GPUs. For illustration purposes, we utilize moonshotai/Moonlight-16B-A3B-Instruct as reference Moonlight model.

0. Requirements & Installation

To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to here (Windows or Linux) for more information.

0.1 Installation

Visit Install IPEX-LLM on Intel GPU with PyTorch 2.6, and follow Install ipex-llm (Windows or Linux).

Then, install other dependencies for Moonlight model with IPEX-LLM optimizations:

conda activate llm-pt26

pip install transformers==4.45.0
pip install accelerate==0.33.0
pip install "trl<0.12.0" 

pip install tiktoken blobfile

0.2 Runtime Configuration

Visit Install IPEX-LLM on Intel GPU with PyTorch 2.6, and follow Runtime Configurations (Windows or Linux).

1. Download & Convert Model

To run the Moonlight model with IPEX-LLM optimizations, we need to download and convert it first to make sure it could be successfully loaded by transformers.

1.1 Download Model

To download moonshotai/Moonlight-16B-A3B-Instruct from Hugging Face, you could use download.py through:

download.py --repo-id moonshotai/Moonlight-16B-A3B-Instruct --commit-id 95583251e616c46a80715897a705cd38659afc27 

By default, Moonlight-16B-A3B-Instruct will be downloaded to the current folder. You could also define the download folder path by --download-dir-path DOWNLOAD_DIR_PATH.

Tip

Refer to here for althernative methods to download models from Hugging Face.

For moonshotai/Moonlight-16B-A3B-Instruct, please make sure to use its revision/commit id 95583251e616c46a80715897a705cd38659afc27.

1.2 Convert Model

Next, convert the downloaded model by convert.py:

convert.py --model-path DOWNLOAD_DIR_PATH

The converted model will be saved at <DOWNLOAD_DIR_PATH>-converted.

2. Example: Predict Tokens using generate() API

In the example generate.py, we show a basic use case for a Moonlight model to predict the next N tokens using generate() API, with IPEX-LLM INT4 optimizations on Intel GPUs.

2.1 Running example

python generate.py --converted-model-path `<DOWNLOAD_DIR_PATH>-converted` --prompt PROMPT --n-predict N_PREDICT

Arguments info:

  • --converted-model-path CONVERTED_MODEL_PATH: argument defining the converted model path by convert.py
  • --prompt PROMPT: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be 'What is AI?'.
  • --n-predict N_PREDICT: argument defining the max number of tokens to predict. It is default to be 32.

2.2 Sample Outputs

Inference time: xxxx s
-------------------- Prompt --------------------
Is 123 a prime?
-------------------- Output --------------------
<|im_system|>system<|im_middle|>You are a helpful assistant provided by Moonshot-AI.<|im_end|><|im_user|>user<|im_middle|>Is 123 a prime?<|im_end|><|im_assistant|>assistant<|im_middle|>No, 123 is not a prime number. A prime number is a number greater than 1 that has no positive divisors other than 1 and itself