MistrailAIEmbeddings 400 errors on documents of length over 27000 #30524

eric-burel · 2025-03-27T15:18:30Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

# generate a large doc
offending_doc=" ".join(["a" for i in range(0,28000)])
from langchain_mistralai import MistralAIEmbeddings
embeddings = MistralAIEmbeddings(
    model="mistral-embed",
    # should match your API limits
    max_concurrent_requests=6
)
embeddings.embed_query(offending_doc)

Enabling httpx logging might help to observe the response headers.

Error Message and Stack Trace (if applicable)

RetryError: RetryError[<Future at 0x7bbb65252990 state=finished raised HTTPStatusError>]

Description

I am embedding documents of varying length with Mistral model, usually through the in memory vector store.

I expect long documents to be batched with 16 000 tokens max. However when passing a document of around ~27000 chars or more, I hit a 400 issues.

It seems that there is some content-length rate limiting ongoing. The first problem is that the issue is obsfuscated:

there shouldn't be a retry in this case, although that might be Mistral's fault for not triggering a 429 status in this case
there is no explicit error messages
there aren't much debug info visible with logs, for instance to observe the batch calls, and LangSmith doesn't track embedding models as a default

Then I should obtain a batch of 2 requests or more in this example in order to respect MistralAI limits. I can't obtain logs to observe the batching logic, but it seems that I hit some size limitation.

MAX_TOKENS which sets the max length in Mistral is an hard-written value, so it doesn't seem to be configurable. I am not setting up a HuggingFace tokenizer so the token size computation might be approximate. Maybe we lack a safety margin in this case and it leads to documents slightly over the limit?

System Info

System Information

OS: Linux
OS Version: #1 SMP PREEMPT_DYNAMIC Thu Jun 27 21:05:47 UTC 2024
Python Version: 3.11.11 (main, Dec 4 2024, 08:55:07) [GCC 11.4.0]

Package Information

langchain_core: 0.3.47
langchain: 0.3.21
langchain_community: 0.3.20
langsmith: 0.3.18
langchain_docling: 0.2.0
langchain_mistralai: 0.2.9
langchain_text_splitters: 0.3.7

Optional packages not installed

langserve

Other Dependencies

aiohttp<4.0.0,>=3.8.3: Installed. No version info available.
async-timeout<5.0.0,>=4.0.0;: Installed. No version info available.
dataclasses-json<0.7,>=0.5.7: Installed. No version info available.
docling: 2.28.2
httpx: 0.28.1
httpx-sse<1,>=0.3.1: Installed. No version info available.
httpx-sse<1.0.0,>=0.4.0: Installed. No version info available.
httpx<1,>=0.25.2: Installed. No version info available.
jsonpatch<2.0,>=1.33: Installed. No version info available.
langchain-anthropic;: Installed. No version info available.
langchain-aws;: Installed. No version info available.
langchain-azure-ai;: Installed. No version info available.
langchain-cohere;: Installed. No version info available.
langchain-community;: Installed. No version info available.
langchain-core<1.0.0,>=0.3.45: Installed. No version info available.
langchain-core<1.0.0,>=0.3.47: Installed. No version info available.
langchain-deepseek;: Installed. No version info available.
langchain-fireworks;: Installed. No version info available.
langchain-google-genai;: Installed. No version info available.
langchain-google-vertexai;: Installed. No version info available.
langchain-groq;: Installed. No version info available.
langchain-huggingface;: Installed. No version info available.
langchain-mistralai;: Installed. No version info available.
langchain-ollama;: Installed. No version info available.
langchain-openai;: Installed. No version info available.
langchain-text-splitters<1.0.0,>=0.3.7: Installed. No version info available.
langchain-together;: Installed. No version info available.
langchain-xai;: Installed. No version info available.
langchain<1.0.0,>=0.3.21: Installed. No version info available.
langsmith-pyo3: Installed. No version info available.
langsmith<0.4,>=0.1.125: Installed. No version info available.
langsmith<0.4,>=0.1.17: Installed. No version info available.
numpy<3,>=1.26.2: Installed. No version info available.
openai-agents: Installed. No version info available.
opentelemetry-api: 1.31.1
opentelemetry-exporter-otlp-proto-http: Installed. No version info available.
opentelemetry-sdk: 1.31.1
orjson: 3.10.15
packaging: 24.2
packaging<25,>=23.2: Installed. No version info available.
pydantic: 2.10.6
pydantic-settings<3.0.0,>=2.4.0: Installed. No version info available.
pydantic<3,>=2: Installed. No version info available.
pydantic<3.0.0,>=2.5.2;: Installed. No version info available.
pydantic<3.0.0,>=2.7.4: Installed. No version info available.
pydantic<3.0.0,>=2.7.4;: Installed. No version info available.
pytest: 8.3.5
PyYAML>=5.3: Installed. No version info available.
requests: 2.32.3
requests-toolbelt: 1.0.0
requests<3,>=2: Installed. No version info available.
rich: 13.9.4
SQLAlchemy<3,>=1.4: Installed. No version info available.
tenacity!=8.4.0,<10,>=8.1.0: Installed. No version info available.
tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
tokenizers<1,>=0.15.1: Installed. No version info available.
typing-extensions>=4.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MistrailAIEmbeddings 400 errors on documents of length over 27000 #30524

MistrailAIEmbeddings 400 errors on documents of length over 27000 #30524

eric-burel commented Mar 27, 2025 •

edited

Loading

MistrailAIEmbeddings 400 errors on documents of length over 27000 #30524

MistrailAIEmbeddings 400 errors on documents of length over 27000 #30524

Comments

eric-burel commented Mar 27, 2025 • edited Loading

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

System Information

Package Information

Optional packages not installed

Other Dependencies

eric-burel commented Mar 27, 2025 •

edited

Loading