LLM as text encoder implementation from LLM2CLIP by gabrieletijunaityte · Pull Request #42 · WUR-AI/aether

gabrieletijunaityte · 2026-02-04T08:55:20Z

What does this PR do?

Normalisation check moved to init
Cosine similarity fixed not to normalise twice.
The implementation of LLM as a text encoder from LLM2CLIP.

I had to do a lot of work arounds to make it work due to dependency issues and unclear environment recommendations from LLM2LCIP. Right now, it uses my forked LLM2vec repo and LLAMA customised version to get rid of flash-atto and spda.
What remains is to unify/decide on model dtypes and potentially adapt LLM2CLIP vision branch as eo encoder.

Before submitting

Did you make sure title is self-explanatory and the description concisely explains the PR?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you list all the breaking changes introduced by this pull request?
Did you test your PR locally with pytest command?

…der dtype unification

vdplasthijs

Great work, curious to see the results! I'll merge this, and we can change the normalisation in a new PR? Thanks!

gabrieletijunaityte added 8 commits February 4, 2026 08:49

Fix cos sim so inputs are not normalised twice + add text and eo enco…

81b5fc2

…der dtype unification

Add Rob's note about training instructions

68cba0e

Add llm2vec and llama dependency

d4e3cd5

Move normalisation to init, handle devices and dtypes

eb28488

LLM2CLIP text encoder implementation

db9f3d4

Fix cosine similarity with the regards to normalisation presence

ce94f5e

Fix cosine similarity with the regards to normalisation presence

f42baa2

Add test

1b3f2b2

gabrieletijunaityte requested a review from vdplasthijs February 4, 2026 08:55

vdplasthijs approved these changes Feb 4, 2026

View reviewed changes

vdplasthijs merged commit 9ed2c26 into develop Feb 4, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM as text encoder implementation from LLM2CLIP#42

LLM as text encoder implementation from LLM2CLIP#42
vdplasthijs merged 8 commits intodevelopfrom
feature/llm2vec

gabrieletijunaityte commented Feb 4, 2026 •

edited

Loading

Uh oh!

vdplasthijs left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gabrieletijunaityte commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Uh oh!

vdplasthijs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gabrieletijunaityte commented Feb 4, 2026 •

edited

Loading