Replies: 1 comment 1 reply
-
I get |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I was computing some conversion of Gemma 2 2b in GGUF (llama.cpp b3496 release) with the following code and wikitext-2-raw/wiki.test.raw as dataset:
``
git clone https://huggingface.co/google/gemma-2-2b
cd llama.cpp
python ./llama.cpp/convert_hf_to_gguf.py gemma-2-2b --outtype f32 --outfile gemma-2-2b.FP32.gguf
python ./llama.cpp/convert_hf_to_gguf.py gemma-2-2b --outtype q8_0 --outfile gemma-2-2b-Q8_0.gguf
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q6_K.gguf Q6_K
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q5_K_M.gguf Q5_K_M
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q5_K_S.gguf Q5_K_S
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q4_K_M.gguf Q4_K_M
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q4_K_S.gguf Q4_K_S
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q3_K_L.gguf Q3_K_L
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q3_K_M.gguf Q3_K_M
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q3_K_S.gguf Q3_K_S
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q2_K.gguf Q2_K
./llama-perplexity -m ../gemma-2-2b.FP32.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q8_0.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q6_K.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q5_K_M.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q5_K_S.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q4_K_M.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q4_K_S.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q3_K_L.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q3_K_M.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q3_K_S.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q2_K.gguf -f ../wikitext-2-raw/wiki.test.raw
``
And the results that I am obtaining for the FP32 version are very strange... they are the worst. For the other versions, the numbers look OK, I am, as expected a strong reduction in perplexity.
Beta Was this translation helpful? Give feedback.
All reactions