You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm quantizing my first model in llama.cpp. I understand that imatrices can improve this process.
I'm starting with a 2b parameter model, and the thing about this model is that it is pretty multilingual (I think it is 36 languages). I see there is a tool to generate the imatrix for the model from a set (I imagine I do this from the fp16 GGUF, then use it when quantizing).
How many samples should I prefer from each language to use the llama-imatrix tool to generate the imatrix? Should the sample size grow with larger models?
Also what do the samples look like? Do they need to be like:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm quantizing my first model in llama.cpp. I understand that imatrices can improve this process.
I'm starting with a 2b parameter model, and the thing about this model is that it is pretty multilingual (I think it is 36 languages). I see there is a tool to generate the imatrix for the model from a set (I imagine I do this from the fp16 GGUF, then use it when quantizing).
How many samples should I prefer from each language to use the llama-imatrix tool to generate the imatrix? Should the sample size grow with larger models?
Also what do the samples look like? Do they need to be like:
or .. what does structure matter?
Beta Was this translation helpful? Give feedback.
All reactions