Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing for controlling maximum image size before feeding image into LLMImageBlobParser #30391

Open
5 tasks done
alberto-agudo opened this issue Mar 20, 2025 · 2 comments
Open
5 tasks done
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@alberto-agudo
Copy link

alberto-agudo commented Mar 20, 2025

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from dotenv import load_dotenv

from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.document_loaders.parsers.images import LLMImageBlobParser
from langchain_aws.chat_models import ChatBedrock

def main():
    # Include Bedrock credentials
    load_dotenv()
    
    # Ingest document
    # Note you can download this file from: https://documents1.worldbank.org/curated/en/099101824180532047/pdf/BOSIB13bdde89d07f1b3711dd8e86adb477.pdf
    fp = "./data/world-bank-report-example.pdf"
    
    prompt = (
    "You are an assistant tasked with describing images for retrieval. "
    "1. These descriptions will be embedded and used to retrieve the raw image. "
    "Give a concise description of the image that is well optimized for retrieval\n"
    "2. extract all the text from the image. "
    "Do not exclude any content from the page.\n"
    "Format your answer in markdown without explanatory text "
    "and without markdown delimiter ``` at the beginning. "
)

    # 1) Load and parse documents
    llm_img_parser = ChatBedrock(
        model_id = "anthropic.claude-3-sonnet-20240229-v1:0",
        model_kwargs=dict(temperature=0.1)
    )

    img_parser = LLMImageBlobParser(
        model=llm_img_parser,
        prompt=prompt
    )
    loader = PyMuPDFLoader(
        file_path=fp,
        mode="page",
        extract_images=True, 
        images_parser=img_parser,
        extract_tables="markdown",
        images_inner_format="text"
    )
    
    docs = []
    docs_lazy = loader.lazy_load()
    
    for doc in docs_lazy:
        print(f"Processing doc {doc}")
        docs.append(doc)
    print(docs[0].page_content[:100])
    print(docs[0].metadata)


if __name__ == "__main__": 
    main()

Error Message and Stack Trace (if applicable)

Error raised by bedrock service
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/langchain_aws/llms/bedrock.py", line 956, in _prepare_input_and_invoke
response = self.client.invoke_model(**request_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/botocore/client.py", line 570, in _api_call
return self._make_api_call(operation_name, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/botocore/context.py", line 124, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/botocore/client.py", line 1031, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: messages.0.content.0.image.source: image exceeds 5 MB maximum: 8033316 bytes > 5242880 bytes

Description

  • I want to use the LLMImageBlobParser to extract descriptions from images in a PDF, as an addition to the PyMuPDF document loader
  • I am using Anthropic in Bedrock to parse these images. I am aware that the maximum size for an image passed to Anthropic models is 5 MB
  • The inner behavior of parsing in the PDF Parsers doesn't take into account these limits in file sizes. There should be a helper function to resize images to a maximum amount of MB that the user knows and can pre-specify when calling for instance the PyMuPDF document loader
  • Here's the detail of where the images are created, where the helper size reduction function should go:
    img_list = page.get_images()
    images = []
    for img in img_list:
    if self.images_parser:
    xref = img[0]
    pix = pymupdf.Pixmap(doc, xref)
    image = np.frombuffer(pix.samples, dtype=np.uint8).reshape(
    pix.height, pix.width, -1
    )
    image_bytes = io.BytesIO()
    numpy.save(image_bytes, image)
    blob = Blob.from_data(
    image_bytes.getvalue(), mime_type="application/x-npy"
    )
    image_text = next(self.images_parser.lazy_parse(blob)).page_content
  • Another option instead would be to modify this part of the LLMImageBlobParser, probably more modular:
    def _analyze_image(self, img: "Image") -> str:
    """Analyze an image using the provided language model.
    Args:
    img: The image to be analyzed.
    Returns:
    The extracted textual content.
    """
    image_bytes = io.BytesIO()
    img.save(image_bytes, format="PNG")
    img_base64 = base64.b64encode(image_bytes.getvalue()).decode("utf-8")
    msg = self.model.invoke(

System Info

System Information

OS: Linux
OS Version: #1 SMP Fri Mar 29 23:14:13 UTC 2024
Python Version: 3.12.9 (main, Feb 25 2025, 02:40:13) [GCC 12.2.0]

Package Information

langchain_core: 0.3.46
langchain: 0.3.21
langchain_community: 0.3.20
langsmith: 0.3.18
langchain_aws: 0.2.16
langchain_text_splitters: 0.3.7

Optional packages not installed

langserve

Other Dependencies

aiohttp<4.0.0,>=3.8.3: Installed. No version info available.
async-timeout<5.0.0,>=4.0.0;: Installed. No version info available.
boto3: 1.37.13
dataclasses-json<0.7,>=0.5.7: Installed. No version info available.
httpx: 0.28.1
httpx-sse<1.0.0,>=0.4.0: Installed. No version info available.
jsonpatch<2.0,>=1.33: Installed. No version info available.
langchain-anthropic;: Installed. No version info available.
langchain-aws;: Installed. No version info available.
langchain-azure-ai;: Installed. No version info available.
langchain-cohere;: Installed. No version info available.
langchain-community;: Installed. No version info available.
langchain-core<1.0.0,>=0.3.45: Installed. No version info available.
langchain-deepseek;: Installed. No version info available.
langchain-fireworks;: Installed. No version info available.
langchain-google-genai;: Installed. No version info available.
langchain-google-vertexai;: Installed. No version info available.
langchain-groq;: Installed. No version info available.
langchain-huggingface;: Installed. No version info available.
langchain-mistralai;: Installed. No version info available.
langchain-ollama;: Installed. No version info available.
langchain-openai;: Installed. No version info available.
langchain-text-splitters<1.0.0,>=0.3.7: Installed. No version info available.
langchain-together;: Installed. No version info available.
langchain-xai;: Installed. No version info available.
langchain<1.0.0,>=0.3.21: Installed. No version info available.
langsmith-pyo3: Installed. No version info available.
langsmith<0.4,>=0.1.125: Installed. No version info available.
langsmith<0.4,>=0.1.17: Installed. No version info available.
numpy: 2.2.4
numpy<3,>=1.26.2: Installed. No version info available.
openai-agents: Installed. No version info available.
opentelemetry-api: Installed. No version info available.
opentelemetry-exporter-otlp-proto-http: Installed. No version info available.
opentelemetry-sdk: Installed. No version info available.
orjson: 3.10.15
packaging: 24.2
packaging<25,>=23.2: Installed. No version info available.
pydantic: 2.10.6
pydantic-settings<3.0.0,>=2.4.0: Installed. No version info available.
pydantic<3.0.0,>=2.5.2;: Installed. No version info available.
pydantic<3.0.0,>=2.7.4: Installed. No version info available.
pydantic<3.0.0,>=2.7.4;: Installed. No version info available.
pytest: Installed. No version info available.
PyYAML>=5.3: Installed. No version info available.
requests: 2.32.3
requests-toolbelt: 1.0.0
requests<3,>=2: Installed. No version info available.
rich: Installed. No version info available.
SQLAlchemy<3,>=1.4: Installed. No version info available.
tenacity!=8.4.0,<10,>=8.1.0: Installed. No version info available.
tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
typing-extensions>=4.7: Installed. No version info available.
zstandard: 0.23.0

@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Mar 20, 2025
@keenborder786
Copy link
Contributor

I understand the requirement but it is difficult to change either LLMImageBlobParser or PyMuPDFLoader because every llm requirement will be different for the max image size? How do you propose we make it dynamic to account for every llm provider. We just can't make LLMImageBlobParser or PyMuPDFLoader dependent on specific llm provider requirements

@alberto-agudo
Copy link
Author

Yes you’re absolutely right. Tailoring the solution to every LLM provider will be an overkill. However, what I’d propose is allowing an argument for the user to set a maximum image size, so that there are resizing strategies provided within the function; hence leaving it to the developer to check maximum image sizes for their FM provider of choice. I think this will be helpful while maintaining a flexible design, especially for the LLMImageBlobParser function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants