Discussion about llama 3.2 vision #8

ngxson · 2025-01-23T10:24:25Z

ngxson
Jan 23, 2025

Hi @danbev , thanks for the very detailed notes.

Re. your problem about n_vocab, what I observed is that the shape of tensors are:

language_model.model.embed_tokens.weight [128 264, 4 096]
language_model.lm_head.weight [128 256, 4 096]

So as you found out, the output tensor has 8 less tokens than the embd tensor.

However, instead of modifying internal llama.cpp to handle this exception, I propose that upon converting safetensors to GGUF, we simply extend the lm_head to add these tokens. We can set these parameters to 0 (or -inf? I'm not sure yet) to make its logits always be small:

# adding 8 rows to output tensor
added_rows = torch.full((8, 4096), float(0.0))
output_extended = torch.cat((output, added_rows), dim=0)

ngxson · 2025-01-23T10:30:01Z

ngxson
Jan 23, 2025
Author

An alternative solution to filling zero or -inf could be to duplicate the vector from the output row of <|end_of_text|> token.

The consequence would be that the output logits for <|image|> will be exactly the same with <|end_of_text|>

7 replies

ngxson Jan 23, 2025
Author

Probably you run into the problem because llama_batch with tensor input is a bit buggy for now.

A bit more clear in this, ubatch does not support "cutting" the tensor in half if it does not fit into the physical batch limit.

I'll need to work on this later.

danbev Jan 23, 2025
Maintainer

Ah that is interesting and does sound like I might be running to this. I'll try what you suggested and set the image patch embedding on the llama_batch like the previous version does. Thanks!

danbev Jan 24, 2025
Maintainer

I finally found the issue and this was my fault.

So it might be the case that we can use embd_tensor for this after all, and I'll give this a try next. I'll also rebase my branch and start looking at the vocab solution you mentioned above. And then I'll start going through this to clean up. I mainly wanted to something working and then be able to improve it, so there is a lot of clean up and improvements to be made.

danbev Jan 26, 2025
Maintainer

I used your suggestion to add extend lm_head which seems to work nicely 👍

I did not get embd_tensor to work but I'll revisit this tomorrow and I ran out of time last week.

ngxson Jan 27, 2025
Author

I think for now you can temporary ignore the embd_tensor and just set the float * batch.embd as usual. Otherwise, you must set a high ubatch value so that the tensor don't get truncated.

Support "cutting" embd_tensor should be easy to add, just need to play with ggml_view. I'll have a look later on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discussion about llama 3.2 vision #8

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Discussion about llama 3.2 vision #8

Uh oh!

ngxson Jan 23, 2025

Replies: 1 comment · 7 replies

Uh oh!

ngxson Jan 23, 2025 Author

Uh oh!

ngxson Jan 23, 2025 Author

Uh oh!

danbev Jan 23, 2025 Maintainer

Uh oh!

danbev Jan 24, 2025 Maintainer

Uh oh!

danbev Jan 26, 2025 Maintainer

Uh oh!

ngxson Jan 27, 2025 Author

ngxson
Jan 23, 2025

Replies: 1 comment 7 replies

ngxson
Jan 23, 2025
Author

ngxson Jan 23, 2025
Author

danbev Jan 23, 2025
Maintainer

danbev Jan 24, 2025
Maintainer

danbev Jan 26, 2025
Maintainer

ngxson Jan 27, 2025
Author