Skip to content

SentenceTransformersTextEmbedder zero embeddings for single text queries when using precision="int8" #9100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
mattrothery opened this issue Mar 24, 2025 · 0 comments
Labels
P3 Low priority, leave it in the backlog

Comments

@mattrothery
Copy link

Describe the bug
I'm trying to use quantised embeddings within a RAG pipeline. The base model I am using is"sentence-transformers/all-mpnet-base-v2", and the precision I intend to use is precision="int8".

I can successfully SentenceTransformersDocumentEmbedder to compute document embeddings and store in an ElasticSearch document store. However, when it comes to using ElasticsearchEmbeddingRetriever with the query embedder, i get a division by zero error.

After digging into it in SentenceTransformers, I've found that when using quantized embedding models, a calibration dataset is usually passed in to compute the min/max value range to map floating-point embeddings into an 8-bit integer space.

I then considered using a subset of document embeddings in the document store as a 'calibration' set, but as far as I can tell there is no way of pulling all (or some) documents/embeddings from the store?

Is there a solution to this that I'm missing, appreciate your help!

Error message

/usr/local/lib/python3.10/site-packages/sentence_transformers/quantization.py:434: RuntimeWarning: invalid value encountered in divide
  return ((embeddings - starts) / steps - 128).astype(np.int8)
/usr/local/lib/python3.10/site-packages/sentence_transformers/quantization.py:434: RuntimeWarning: invalid value encountered in cast
  return ((embeddings - starts) / steps - 128).astype(np.int8)

BadRequestError(400, 'search_phase_execution_exception', 'failed to create query: The [cosine] similarity does not support vectors with zero magnitude. Preview of invalid vector: [0.0, 0.0, 0.0, 0.0, 0.0, ...]

Expected behavior
Single-query embeddings to be calculated correctly when using quantization, or a way of creating/passing calibration data

Additional context

# text embedding
        text_embedder = SentenceTransformersTextEmbedder(
            model="sentence-transformers/all-mpnet-base-v2",
            precision="int8",
            batch_size=self.batch_size,
            backend="openvino",
        )
        text_embedder.warm_up()

# pipeline
sparse_retriever = ElasticsearchBM25Retriever(
            document_store=self.document_store,
            top_k=top_k,
        )
        dense_retriever = ElasticsearchEmbeddingRetriever(
            document_store=self.document_store,
            top_k=top_k,
        )
        joiner = DocumentJoiner(
            join_mode="merge", weights=weights, top_k=top_k
        )
        pipeline = Pipeline()
        pipeline.add_component(
            name="text_embedder", instance=self.query_embedder
        )
        pipeline.add_component(
            name="sparse_retriever", instance=sparse_retriever
        )
        pipeline.add_component(
            name="dense_retriever", instance=dense_retriever
        )
        pipeline.add_component(name="joiner", instance=joiner)
        pipeline.connect("text_embedder", "dense_retriever")
        pipeline.connect("sparse_retriever", "joiner")
        pipeline.connect("dense_retriever", "joiner")

# running pipeline
result = pipeline.run(
                {
                    "text_embedder": {"text": question},
                    "sparse_retriever": {"query": question},
                }
            )

To Reproduce

  1. Create a document store
  2. Store documents using precision="int8"
  3. Use the EmbeddingRetriever for that store, on a single query, with precision="int8"

FAQ Check

System:

  • OS: 20.04.1-Ubuntu
  • GPU/CPU: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
  • Haystack version (commit or version number): haystack-ai==2.11.2
  • DocumentStore: ElasticSearch, elasticsearch==8.16.0, elasticsearch-haystack==1.0.1
  • Reader: -
  • Retriever: ElasticsearchEmbeddingRetriever
@julian-risch julian-risch added the P3 Low priority, leave it in the backlog label Mar 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 Low priority, leave it in the backlog
Projects
None yet
Development

No branches or pull requests

2 participants