Handling Irrelevant Results in CLIP-Based Image Retrieval When No Match Exists #1058
Unanswered
spikejones1040
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am using CLIP-based embeddings for image retrieval and encountering an issue where irrelevant images are retrieved when no relevant images exist in the database. I have computed CLIP image embeddings for ~6000 images in my database. For retrieval, I compute text embeddings for the query and perform a cosine similarity search against image embeddings. This approach generally works well when relevant images are present in the dataset. However, when no relevant images exist (e.g., querying "badminton" when there are no badminton-related images in the dataset), CLIP still returns results with seemingly high cosine similarity but low actual relevance. I have tried thresholding on similarity scores, but it does not fully resolve the issue.
Even with a reasonable cosine similarity threshold, I still see false positives—retrieved images that are semantically unrelated to the query. Lowering the threshold too much reduces recall and prevents retrieval of relevant images when they do exist.
For text embeddings, I am using
'M-CLIP/XLM-Roberta-Large-Vit-B-16Plus'
For image embeddings, I am using https://huggingface.co/M-CLIP/XLM-Roberta-Large-Vit-B-16Plus
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-16-plus-240', pretrained="laion400m_e32")
What is the best way to handle false positives in CLIP retrieval when no relevant images exist?
Are there recommended techniques to detect or filter out such cases?
Are there research papers, prior discussions, or known best practices addressing this issue in CLIP retrieval?
Any insights, references, or sample implementations would be greatly appreciated!
Thanks in advance for your help.
Beta Was this translation helpful? Give feedback.
All reactions