Allowing for controlling maximum image size before feeding image into LLMImageBlobParser #30391

alberto-agudo · 2025-03-20T13:09:19Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from dotenv import load_dotenv

from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.document_loaders.parsers.images import LLMImageBlobParser
from langchain_aws.chat_models import ChatBedrock

def main():
    # Include Bedrock credentials
    load_dotenv()
    
    # Ingest document
    # Note you can download this file from: https://documents1.worldbank.org/curated/en/099101824180532047/pdf/BOSIB13bdde89d07f1b3711dd8e86adb477.pdf
    fp = "./data/world-bank-report-example.pdf"
    
    prompt = (
    "You are an assistant tasked with describing images for retrieval. "
    "1. These descriptions will be embedded and used to retrieve the raw image. "
    "Give a concise description of the image that is well optimized for retrieval\n"
    "2. extract all the text from the image. "
    "Do not exclude any content from the page.\n"
    "Format your answer in markdown without explanatory text "
    "and without markdown delimiter ``` at the beginning. "
)

    # 1) Load and parse documents
    llm_img_parser = ChatBedrock(
        model_id = "anthropic.claude-3-sonnet-20240229-v1:0",
        model_kwargs=dict(temperature=0.1)
    )

    img_parser = LLMImageBlobParser(
        model=llm_img_parser,
        prompt=prompt
    )
    loader = PyMuPDFLoader(
        file_path=fp,
        mode="page",
        extract_images=True, 
        images_parser=img_parser,
        extract_tables="markdown",
        images_inner_format="text"
    )
    
    docs = []
    docs_lazy = loader.lazy_load()
    
    for doc in docs_lazy:
        print(f"Processing doc {doc}")
        docs.append(doc)
    print(docs[0].page_content[:100])
    print(docs[0].metadata)


if __name__ == "__main__": 
    main()

Error Message and Stack Trace (if applicable)

Error raised by bedrock service
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/langchain_aws/llms/bedrock.py", line 956, in _prepare_input_and_invoke
response = self.client.invoke_model(**request_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/botocore/client.py", line 570, in _api_call
return self._make_api_call(operation_name, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/botocore/context.py", line 124, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/botocore/client.py", line 1031, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: messages.0.content.0.image.source: image exceeds 5 MB maximum: 8033316 bytes > 5242880 bytes

Description

I want to use the LLMImageBlobParser to extract descriptions from images in a PDF, as an addition to the PyMuPDF document loader
I am using Anthropic in Bedrock to parse these images. I am aware that the maximum size for an image passed to Anthropic models is 5 MB
The inner behavior of parsing in the PDF Parsers doesn't take into account these limits in file sizes. There should be a helper function to resize images to a maximum amount of MB that the user knows and can pre-specify when calling for instance the PyMuPDF document loader

Here's the detail of where the images are created, where the helper size reduction function should go:

langchain/libs/community/langchain_community/document_loaders/parsers/pdf.py

Lines 1090 to 1104 in 1103bdf

    
           img_list = page.get_images() 
        
           images = [] 
        
           for img in img_list: 
        
               if self.images_parser: 
        
                   xref = img[0] 
        
                   pix = pymupdf.Pixmap(doc, xref) 
        
                   image = np.frombuffer(pix.samples, dtype=np.uint8).reshape( 
        
                       pix.height, pix.width, -1 
        
                   ) 
        
                   image_bytes = io.BytesIO() 
        
                   numpy.save(image_bytes, image) 
        
                   blob = Blob.from_data( 
        
                       image_bytes.getvalue(), mime_type="application/x-npy" 
        
                   ) 
        
                   image_text = next(self.images_parser.lazy_parse(blob)).page_content

Another option instead would be to modify this part of the LLMImageBlobParser, probably more modular:

langchain/libs/community/langchain_community/document_loaders/parsers/images.py

Lines 188 to 200 in 1103bdf

    
               def _analyze_image(self, img: "Image") -> str: 
        
                   """Analyze an image using the provided language model. 
        
                   Args: 
        
                       img: The image to be analyzed. 
        
                   Returns: 
        
                       The extracted textual content. 
        
                   """ 
        
                   image_bytes = io.BytesIO() 
        
                   img.save(image_bytes, format="PNG") 
        
                   img_base64 = base64.b64encode(image_bytes.getvalue()).decode("utf-8") 
        
                   msg = self.model.invoke(

System Info

System Information

OS: Linux
OS Version: #1 SMP Fri Mar 29 23:14:13 UTC 2024
Python Version: 3.12.9 (main, Feb 25 2025, 02:40:13) [GCC 12.2.0]

Package Information

langchain_core: 0.3.46
langchain: 0.3.21
langchain_community: 0.3.20
langsmith: 0.3.18
langchain_aws: 0.2.16
langchain_text_splitters: 0.3.7

Optional packages not installed

langserve

Other Dependencies

aiohttp<4.0.0,>=3.8.3: Installed. No version info available.
async-timeout<5.0.0,>=4.0.0;: Installed. No version info available.
boto3: 1.37.13
dataclasses-json<0.7,>=0.5.7: Installed. No version info available.
httpx: 0.28.1
httpx-sse<1.0.0,>=0.4.0: Installed. No version info available.
jsonpatch<2.0,>=1.33: Installed. No version info available.
langchain-anthropic;: Installed. No version info available.
langchain-aws;: Installed. No version info available.
langchain-azure-ai;: Installed. No version info available.
langchain-cohere;: Installed. No version info available.
langchain-community;: Installed. No version info available.
langchain-core<1.0.0,>=0.3.45: Installed. No version info available.
langchain-deepseek;: Installed. No version info available.
langchain-fireworks;: Installed. No version info available.
langchain-google-genai;: Installed. No version info available.
langchain-google-vertexai;: Installed. No version info available.
langchain-groq;: Installed. No version info available.
langchain-huggingface;: Installed. No version info available.
langchain-mistralai;: Installed. No version info available.
langchain-ollama;: Installed. No version info available.
langchain-openai;: Installed. No version info available.
langchain-text-splitters<1.0.0,>=0.3.7: Installed. No version info available.
langchain-together;: Installed. No version info available.
langchain-xai;: Installed. No version info available.
langchain<1.0.0,>=0.3.21: Installed. No version info available.
langsmith-pyo3: Installed. No version info available.
langsmith<0.4,>=0.1.125: Installed. No version info available.
langsmith<0.4,>=0.1.17: Installed. No version info available.
numpy: 2.2.4
numpy<3,>=1.26.2: Installed. No version info available.
openai-agents: Installed. No version info available.
opentelemetry-api: Installed. No version info available.
opentelemetry-exporter-otlp-proto-http: Installed. No version info available.
opentelemetry-sdk: Installed. No version info available.
orjson: 3.10.15
packaging: 24.2
packaging<25,>=23.2: Installed. No version info available.
pydantic: 2.10.6
pydantic-settings<3.0.0,>=2.4.0: Installed. No version info available.
pydantic<3.0.0,>=2.5.2;: Installed. No version info available.
pydantic<3.0.0,>=2.7.4: Installed. No version info available.
pydantic<3.0.0,>=2.7.4;: Installed. No version info available.
pytest: Installed. No version info available.
PyYAML>=5.3: Installed. No version info available.
requests: 2.32.3
requests-toolbelt: 1.0.0
requests<3,>=2: Installed. No version info available.
rich: Installed. No version info available.
SQLAlchemy<3,>=1.4: Installed. No version info available.
tenacity!=8.4.0,<10,>=8.1.0: Installed. No version info available.
tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
typing-extensions>=4.7: Installed. No version info available.
zstandard: 0.23.0

keenborder786 · 2025-03-23T03:03:53Z

I understand the requirement but it is difficult to change either LLMImageBlobParser or PyMuPDFLoader because every llm requirement will be different for the max image size? How do you propose we make it dynamic to account for every llm provider. We just can't make LLMImageBlobParser or PyMuPDFLoader dependent on specific llm provider requirements

alberto-agudo · 2025-03-23T15:42:20Z

Yes you’re absolutely right. Tailoring the solution to every LLM provider will be an overkill. However, what I’d propose is allowing an argument for the user to set a maximum image size, so that there are resizing strategies provided within the function; hence leaving it to the developer to check maximum image sizes for their FM provider of choice. I think this will be helpful while maintaining a flexible design, especially for the LLMImageBlobParser function.

dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Mar 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowing for controlling maximum image size before feeding image into LLMImageBlobParser #30391

Allowing for controlling maximum image size before feeding image into LLMImageBlobParser #30391

alberto-agudo commented Mar 20, 2025 •

edited

Loading

keenborder786 commented Mar 23, 2025

alberto-agudo commented Mar 23, 2025

Allowing for controlling maximum image size before feeding image into LLMImageBlobParser #30391

Allowing for controlling maximum image size before feeding image into LLMImageBlobParser #30391

Comments

alberto-agudo commented Mar 20, 2025 • edited Loading

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

System Information

Package Information

Optional packages not installed

Other Dependencies

keenborder786 commented Mar 23, 2025

alberto-agudo commented Mar 23, 2025

alberto-agudo commented Mar 20, 2025 •

edited

Loading