Replies: 1 comment 7 replies
-
|
An alternative solution to filling zero or -inf could be to duplicate the vector from the output row of The consequence would be that the output logits for |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @danbev , thanks for the very detailed notes.
Re. your problem about n_vocab, what I observed is that the shape of tensors are:
language_model.model.embed_tokens.weight [128 264, 4 096]language_model.lm_head.weight [128 256, 4 096]So as you found out, the output tensor has 8 less tokens than the embd tensor.
However, instead of modifying internal llama.cpp to handle this exception, I propose that upon converting safetensors to GGUF, we simply extend the lm_head to add these tokens. We can set these parameters to 0 (or
-inf? I'm not sure yet) to make its logits always be small:Beta Was this translation helpful? Give feedback.
All reactions