What it takes to support bidirectional Llama3 for LLM2Vec #13368

nv-jsaito · 2025-05-08T00:31:20Z

nv-jsaito
May 8, 2025

Can someone help me understand llama.cpp modifications needed to support Llama3-based LLM2Vec?

Background

We have a project making use of Llama3-based LLM2Vec as text embedding for a text-to-content model. While the model is trained with the original Llama3-based LLM2Vec on HF, we would like to make the deployment easier by running LLM2Vec with llama.cpp.

AFAIK, LLM2Vec should be just finetuning on top of the original LLM architecture (Llama3 in our case) but with bidirectional attention. Because llama.cpp supports Llama3, I am hoping running Llama3-based LLM2Vec with llama.cpp is not too much work.

What has been done so far

Merged LLM2Vec finetuned parameters to the original Meta-Llama-3-8B-Instruct and saved as an HF model
- I confirmed this merged model outputs the same exact results as the original non-merged model
Converted the above to GGUF with a modified convert_hf_to_gguf.py like below to set the flag for non-causal attention

@Model.register("LlamaBiModel")
class LlamaBiModel(LlamaModel):
    model_arch = gguf.MODEL_ARCH.LLAMA
    def set_gguf_parameters(self):
        super().set_gguf_parameters()
        self.gguf_writer.add_causal_attention(False)

Made LLM_ARCH_LLAMA be aware of the optional causal attention flag in llama-model.cpp

Observations

The embeddings from above seems to be garbage
Interestingly, when I turn on the causal attention, the model performs close to expectations while losing some details in the text prompts
With or without the causal attention, the embeddings are surely significantly off numerically from the original HF model

Let me know what I am missing here and/or ideas to try.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What it takes to support bidirectional Llama3 for LLM2Vec #13368

{{title}}

Replies: 0 comments

Select a reply

What it takes to support bidirectional Llama3 for LLM2Vec #13368

nv-jsaito May 8, 2025

Background

What has been done so far

Observations

Replies: 0 comments

nv-jsaito
May 8, 2025