You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
# generate a large docoffending_doc=" ".join(["a"foriinrange(0,28000)])
fromlangchain_mistralaiimportMistralAIEmbeddingsembeddings=MistralAIEmbeddings(
model="mistral-embed",
# should match your API limitsmax_concurrent_requests=6
)
embeddings.embed_query(offending_doc)
Enabling httpx logging might help to observe the response headers.
Error Message and Stack Trace (if applicable)
RetryError: RetryError[<Future at 0x7bbb65252990 state=finished raised HTTPStatusError>]
Description
I am embedding documents of varying length with Mistral model, usually through the in memory vector store.
I expect long documents to be batched with 16 000 tokens max. However when passing a document of around ~27000 chars or more, I hit a 400 issues.
It seems that there is some content-length rate limiting ongoing. The first problem is that the issue is obsfuscated:
there shouldn't be a retry in this case, although that might be Mistral's fault for not triggering a 429 status in this case
there is no explicit error messages
there aren't much debug info visible with logs, for instance to observe the batch calls, and LangSmith doesn't track embedding models as a default
Then I should obtain a batch of 2 requests or more in this example in order to respect MistralAI limits. I can't obtain logs to observe the batching logic, but it seems that I hit some size limitation.
MAX_TOKENS which sets the max length in Mistral is an hard-written value, so it doesn't seem to be configurable. I am not setting up a HuggingFace tokenizer so the token size computation might be approximate. Maybe we lack a safety margin in this case and it leads to documents slightly over the limit?
System Info
System Information
OS: Linux
OS Version: #1 SMP PREEMPT_DYNAMIC Thu Jun 27 21:05:47 UTC 2024
Python Version: 3.11.11 (main, Dec 4 2024, 08:55:07) [GCC 11.4.0]
aiohttp<4.0.0,>=3.8.3: Installed. No version info available.
async-timeout<5.0.0,>=4.0.0;: Installed. No version info available.
dataclasses-json<0.7,>=0.5.7: Installed. No version info available.
docling: 2.28.2
httpx: 0.28.1
httpx-sse<1,>=0.3.1: Installed. No version info available.
httpx-sse<1.0.0,>=0.4.0: Installed. No version info available.
httpx<1,>=0.25.2: Installed. No version info available.
jsonpatch<2.0,>=1.33: Installed. No version info available.
langchain-anthropic;: Installed. No version info available.
langchain-aws;: Installed. No version info available.
langchain-azure-ai;: Installed. No version info available.
langchain-cohere;: Installed. No version info available.
langchain-community;: Installed. No version info available.
langchain-core<1.0.0,>=0.3.45: Installed. No version info available.
langchain-core<1.0.0,>=0.3.47: Installed. No version info available.
langchain-deepseek;: Installed. No version info available.
langchain-fireworks;: Installed. No version info available.
langchain-google-genai;: Installed. No version info available.
langchain-google-vertexai;: Installed. No version info available.
langchain-groq;: Installed. No version info available.
langchain-huggingface;: Installed. No version info available.
langchain-mistralai;: Installed. No version info available.
langchain-ollama;: Installed. No version info available.
langchain-openai;: Installed. No version info available.
langchain-text-splitters<1.0.0,>=0.3.7: Installed. No version info available.
langchain-together;: Installed. No version info available.
langchain-xai;: Installed. No version info available.
langchain<1.0.0,>=0.3.21: Installed. No version info available.
langsmith-pyo3: Installed. No version info available.
langsmith<0.4,>=0.1.125: Installed. No version info available.
langsmith<0.4,>=0.1.17: Installed. No version info available.
numpy<3,>=1.26.2: Installed. No version info available.
openai-agents: Installed. No version info available.
opentelemetry-api: 1.31.1
opentelemetry-exporter-otlp-proto-http: Installed. No version info available.
opentelemetry-sdk: 1.31.1
orjson: 3.10.15
packaging: 24.2
packaging<25,>=23.2: Installed. No version info available.
pydantic: 2.10.6
pydantic-settings<3.0.0,>=2.4.0: Installed. No version info available.
pydantic<3,>=2: Installed. No version info available.
pydantic<3.0.0,>=2.5.2;: Installed. No version info available.
pydantic<3.0.0,>=2.7.4: Installed. No version info available.
pydantic<3.0.0,>=2.7.4;: Installed. No version info available.
pytest: 8.3.5
PyYAML>=5.3: Installed. No version info available.
requests: 2.32.3
requests-toolbelt: 1.0.0
requests<3,>=2: Installed. No version info available.
rich: 13.9.4
SQLAlchemy<3,>=1.4: Installed. No version info available.
tenacity!=8.4.0,<10,>=8.1.0: Installed. No version info available.
tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
tokenizers<1,>=0.15.1: Installed. No version info available.
typing-extensions>=4.
The text was updated successfully, but these errors were encountered:
Checked other resources
Example Code
Enabling httpx logging might help to observe the response headers.
Error Message and Stack Trace (if applicable)
Description
I am embedding documents of varying length with Mistral model, usually through the in memory vector store.
I expect long documents to be batched with 16 000 tokens max. However when passing a document of around ~27000 chars or more, I hit a 400 issues.
It seems that there is some content-length rate limiting ongoing. The first problem is that the issue is obsfuscated:
Then I should obtain a batch of 2 requests or more in this example in order to respect MistralAI limits. I can't obtain logs to observe the batching logic, but it seems that I hit some size limitation.
MAX_TOKENS
which sets the max length in Mistral is an hard-written value, so it doesn't seem to be configurable. I am not setting up a HuggingFace tokenizer so the token size computation might be approximate. Maybe we lack a safety margin in this case and it leads to documents slightly over the limit?System Info
System Information
Package Information
Optional packages not installed
Other Dependencies
The text was updated successfully, but these errors were encountered: