Skip to content

generate image responses to user questions of the EA where appropriate#37

Merged
bmiller59 merged 6 commits intomainfrom
jh/visual-preference
Mar 6, 2026
Merged

generate image responses to user questions of the EA where appropriate#37
bmiller59 merged 6 commits intomainfrom
jh/visual-preference

Conversation

@jencompgeek
Copy link
Collaborator

@jencompgeek jencompgeek commented Mar 3, 2026

What is in this PR?

  • Adds ability to use gemini-3-pro-image-preview to generate images in response to user questions if preferred and where appropriate
  • Uses LLM specified for the conversation to first classify whether visual is appropriate and to actually produce a textual answer to the question. If LLM determines visual response is warranted, gemini uses the generated answer to generate the image (async)
  • Adds ability to set user preferences, currently one pref - visualResponses

Changes in the codebase

  • Add preferences object to user.model and API methods to get and set preferences
  • Add imageGenerator helper and use by all flavors of EA to answer questions when visualResponse pref is set to true
  • EA answers questions as usual. If visualResponse pref set, it uses a classification prompt to determine if visual is appropriate. If so, it places the message with the text answer on the image-gen channel (as well as the direct channel so it goes back to the user immediately)
  • EA listens on image-gen channel for text response to the user question, then produces a new response message with the image, parentMessage set to the original user question so the UI can flag the relationship.
  • Small infra change to allow agents to receive messages from other agents, including themselves
  • Add multimodal bodyType to message.model
  • Increase websocket maxHttpBuffer to transmit entire base64 image data in a single message
  • Update reporting to include the text accompanying the image (blank, but at least gets the message reported) in the direct messages report with a [multimodal] flag to indicate that an image was generated.
  • Also includes an unrelated fix to a race condition b/w EA and periodic agents making chat contributions

Documentation and automated testing

Did you:

  • document any breaking changes in your commit messages?
  • document your changes as comments in the code? Use TSDoc format where appropriate.
  • update the README and docs to be clear and easy to use for end users and developers?
  • add and/or update automated tests?
  • update team documentation of any new or changed environment variables?

Testing this PR

  • Ensure GOOGLE_API_KEY and GOOGLE_BASE_URL env variables are set (should be unchanged from previous testing with Gemini models)
  • Test in conjunction with FE changes here show images in assistant messages and allow users to set a visual preference nextspace#49
  • Create an event assistant plus proactive conversation (any of the EA types should work).
  • Launch the assistant page. You should see a dialog on the EA tab allowing you to set a visual preference. Set that preference. Once set, dialog will disappear until you clear cookie and refresh as a different user)
  • Ask EA questions that you think should produce a visual response. asking specifically for a timeline, steps in a process, or even just asking it to summarize the event or what you've missed so far should do the trick
  • Verify the text response comes back in the usual amount of time with a banner on top showing the question it relates to
  • Verify image eventually shows up with the same banner (can try asking a different question in the meantime)
  • Reload the page and ensure all messages, including images, are still displayed

Additional information

Opening follow-up ticket to include images somehow in directMessagesReport for evaluation

@jencompgeek jencompgeek force-pushed the jh/visual-preference branch 3 times, most recently from ceb40be to 923a062 Compare March 4, 2026 23:34
…sual pref

classify if visual response, generate with gemini, store and send with msg (increasing websocket
buffer)
@jencompgeek jencompgeek force-pushed the jh/visual-preference branch from 923a062 to f4a6709 Compare March 5, 2026 00:23
Copy link
Collaborator

@bmiller59 bmiller59 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works! Very cool.

@bmiller59 bmiller59 merged commit 029296c into main Mar 6, 2026
1 check passed
@bmiller59 bmiller59 deleted the jh/visual-preference branch March 6, 2026 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants