Skip to content

Commit d1f2247

Browse files
prusnakggerganov
andauthored
Add quantize script for batch quantization (#92)
* Add quantize script for batch quantization * Indentation * README for new quantize.sh * Fix script name * Fix file list on Mac OS --------- Co-authored-by: Georgi Gerganov <[email protected]>
1 parent 1808ee0 commit d1f2247

File tree

2 files changed

+18
-31
lines changed

2 files changed

+18
-31
lines changed

README.md

+3-31
Original file line numberDiff line numberDiff line change
@@ -145,44 +145,16 @@ python3 -m pip install torch numpy sentencepiece
145145
python3 convert-pth-to-ggml.py models/7B/ 1
146146

147147
# quantize the model to 4-bits
148-
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
148+
./quantize.sh 7B
149149

150150
# run the inference
151151
./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128
152152
```
153153

154-
For the bigger models, there are a few extra quantization steps. For example, for LLaMA-13B, converting to FP16 format
155-
will create 2 ggml files, instead of one:
156-
157-
```bash
158-
ggml-model-f16.bin
159-
ggml-model-f16.bin.1
160-
```
161-
162-
You need to quantize each of them separately like this:
163-
164-
```bash
165-
./quantize ./models/13B/ggml-model-f16.bin ./models/13B/ggml-model-q4_0.bin 2
166-
./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2
167-
```
168-
169-
Everything else is the same. Simply run:
170-
171-
```bash
172-
./main -m ./models/13B/ggml-model-q4_0.bin -t 8 -n 128
173-
```
174-
175-
The number of files generated for each model is as follows:
176-
177-
```
178-
7B -> 1 file
179-
13B -> 2 files
180-
30B -> 4 files
181-
65B -> 8 files
182-
```
183-
184154
When running the larger models, make sure you have enough disk space to store all the intermediate files.
185155

156+
TODO: add model disk/mem requirements
157+
186158
### Interactive mode
187159

188160
If you want a more ChatGPT-like experience, you can run in interactive mode by passing `-i` as a parameter.

quantize.sh

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
#!/usr/bin/env bash
2+
3+
if ! [[ "$1" =~ ^[0-9]{1,2}B$ ]]; then
4+
echo
5+
echo "Usage: quantize.sh 7B|13B|30B|65B [--remove-f16]"
6+
echo
7+
exit 1
8+
fi
9+
10+
for i in `ls models/$1/ggml-model-f16.bin*`; do
11+
./quantize "$i" "${i/f16/q4_0}" 2
12+
if [[ "$2" == "--remove-f16" ]]; then
13+
rm "$i"
14+
fi
15+
done

0 commit comments

Comments
 (0)