Skip to content

Dealing with images #201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
animanathome opened this issue May 15, 2025 · 2 comments · May be fixed by #210
Open

Dealing with images #201

animanathome opened this issue May 15, 2025 · 2 comments · May be fixed by #210
Assignees
Labels
enhancement New feature or request

Comments

@animanathome
Copy link

animanathome commented May 15, 2025

I'm trying to create an Agent that describes the content of an image. When doing so, I noticed that the Agent uses the text property type in the request and sends the raw image data, which is unexpected.

"data":{"data":{"meta":null,"content":[{"type":"text","text":"\ufffd\ufffd\ufffd\ufffd\u0000\u0010Lavc61.19.101\u0000\ufffd\ufffd\u0000C\u0000\b\f\f\u000e\f\u000e\u0010\u0010\u0010\u0010\u0010\u0010\u0013\u0012\u0013\u0014\u0014\u0014\u0013\u0013\u0013\u0013\u0014\u0014\u0014\u0015\u0015\u0015\u0019\u0019\u0019\u0015\u0015\u0015\u0014\u0....fd\ufffd\ufffd'}]

Instead of using the text property, the Agent should use the dedicated property {type: "image_url", {detail: "high", url: imageAsBase64String }} to make the request. See link for more information.

Am I doing something wrong, or maybe they don't support images yet? When looking at the roadmap, I noticed it's not listed @andrew-lastmile.

@saqadri saqadri self-assigned this May 15, 2025
@saqadri saqadri added the enhancement New feature or request label May 15, 2025
@saqadri
Copy link
Collaborator

saqadri commented May 15, 2025

@animanathome thanks for reporting and you're correct, currently images aren't supported properly. There are 2 changes required here --

  1. For AugmentedLLM providers like OpenAI, Anthropic, etc. to handle non-text messages correctly.
  2. To support resources and MCP message types for Image/AudioContent, etc.

cc'ing @StreetLamb if there is a fix we can add to OpenAIAugmentedLLM to unblock the first of these.

@StreetLamb
Copy link
Collaborator

StreetLamb commented May 17, 2025

Hi @animanathome, could you clarify how you’re passing the image to the agent? Are you attempting to return the image as a tool response from an MCP server? If so, one potential blocker is that OpenAI’s tool message currently support only text content. This means you cannot directly pass an image back to the LLM using a tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants