Skip to content

Conversation

@KennyVaneetvelde
Copy link
Member

Summary

This PR implements support for nested multimodal content in ChatHistory, fixing Issue #141.

The Problem: Previously, ChatHistory.get_history() only detected multimodal content (Image, PDF, Audio from the instructor library) at the top level of input schemas. When multimodal content was nested within other schemas (e.g., a list of documents each containing a PDF), it was incorrectly serialized with json.dumps, causing API errors.

The Solution: Add recursive detection and extraction of multimodal content at any nesting depth:

  • _contains_multimodal(obj) - Recursively checks if an object contains any multimodal content
  • _extract_multimodal_objects(obj) - Recursively extracts all multimodal objects from nested structures
  • _build_non_multimodal_dict(obj) - Builds a JSON-serializable dict excluding multimodal content

Changes

  • atomic-agents/context/chat_history.py: Add 3 recursive helper functions and update get_history() to use them
  • atomic-agents/tests/context/test_chat_history.py: Add 6 new tests for nested multimodal scenarios
  • atomic-examples/nested-multimodal/: New example demonstrating the fix with nested ImageDocument schemas
  • Fix deprecated instructor.multimodal imports → instructor.processing.multimodal

Test Plan

  • All 30 unit tests pass
  • End-to-end validation with OpenAI GPT-4.1 using nested Image content
  • Example successfully processes nested List[ImageDocument] with Image fields

Closes #141

Add recursive detection and extraction of multimodal content (Image, PDF, Audio)
at any nesting depth within input schemas.

Changes:
- Add _contains_multimodal() for recursive multimodal detection
- Add _extract_multimodal_objects() for recursive extraction
- Add _build_non_multimodal_dict() for building JSON without multimodal content
- Update get_history() to use recursive functions
- Fix deprecated instructor.multimodal imports
- Add 6 new tests for nested multimodal scenarios
- Add nested-multimodal example demonstrating the fix

Closes #141
@greptile-apps
Copy link

greptile-apps bot commented Nov 25, 2025

Automated review by Greptile

Greptile Overview

Greptile Summary

This PR successfully implements nested multimodal content support in ChatHistory, solving Issue #141 where multimodal objects (Image, PDF, Audio) nested within schemas were incorrectly serialized.

Key Changes

  • Added _extract_multimodal_content() function with recursive extraction logic and circular reference protection using _seen set tracking
  • Refactored get_history() from field-by-field inspection to single-pass recursive extraction
  • Fixed deprecated instructor.multimodal imports → instructor.processing.multimodal
  • Added 6 comprehensive unit tests covering nested lists, dicts, deeply nested schemas, and edge cases
  • Includes working example demonstrating List[ImageDocument] with nested Image fields

Implementation Quality

The refactored approach is cleaner and more maintainable than the previous top-level-only implementation. The circular reference protection addresses the previously identified concern about infinite recursion.

Confidence Score: 4/5

  • This PR is safe to merge with good test coverage and circular reference protection
  • Score reflects solid implementation with comprehensive testing. Reduced one point due to the complexity of recursive traversal logic which could benefit from additional edge case validation in production use
  • Pay close attention to atomic-agents/atomic_agents/context/chat_history.py - the recursive extraction logic is complex and handles multiple object types

Important Files Changed

File Analysis

Filename Score Overview
atomic-agents/atomic_agents/context/chat_history.py 4/5 Added recursive multimodal extraction with circular reference protection; refactored get_history() to use single-pass extraction; updated imports from deprecated instructor.multimodal
atomic-agents/tests/context/test_chat_history.py 5/5 Added 6 comprehensive tests for nested multimodal content; updated deprecated imports; all tests verify correct extraction and JSON serialization
atomic-examples/nested-multimodal/nested_multimodal/main.py 5/5 New example demonstrating nested multimodal content with ImageDocument schema; supports both OpenAI and Gemini; includes custom .env loader

Sequence Diagram

sequenceDiagram
    participant User
    participant ChatHistory
    participant Extract as _extract_multimodal_content
    participant Message
    participant Instructor

    User->>ChatHistory: add_message(role, content)
    ChatHistory->>Message: Create Message with nested multimodal content
    Note over Message: content contains ImageDocument<br/>with nested Image objects

    User->>ChatHistory: get_history()
    ChatHistory->>Extract: _extract_multimodal_content(message.content)
    
    Extract->>Extract: Check if BaseModel (ImageDocument)
    Extract->>Extract: Add to _seen set (circular ref protection)
    
    loop For each field in BaseModel
        Extract->>Extract: _extract_multimodal_content(field_value)
        alt Field is Image/Audio/PDF
            Extract-->>Extract: Return MultimodalContent(objects=[obj], json_data=None)
        else Field is string/primitive
            Extract-->>Extract: Return MultimodalContent(objects=[], json_data=value)
        end
        Extract->>Extract: Accumulate objects and json_data
    end
    
    Extract-->>ChatHistory: MultimodalContent(objects=[Image, ...], json_data={owner, category, ...})
    
    ChatHistory->>ChatHistory: Build content array
    Note over ChatHistory: [json.dumps(json_data), Image1, Image2, ...]
    
    ChatHistory-->>Instructor: Return history with separated JSON and multimodal
    Note over Instructor: Instructor can now properly<br/>handle multimodal objects
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 16 to 39
def _contains_multimodal(obj) -> bool:
"""
Recursively checks if an object contains any multimodal content.
Args:
obj: The object to check.
Returns:
bool: True if the object contains multimodal content, False otherwise.
"""
if isinstance(obj, INSTRUCTOR_MULTIMODAL_TYPES):
return True
elif isinstance(obj, list):
return any(_contains_multimodal(item) for item in obj)
elif isinstance(obj, dict):
return any(_contains_multimodal(value) for value in obj.values())
elif hasattr(obj, "__class__") and hasattr(obj.__class__, "model_fields"):
# Pydantic model - check all fields
for field_name in obj.__class__.model_fields:
if hasattr(obj, field_name):
if _contains_multimodal(getattr(obj, field_name)):
return True
return False
return False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The recursive functions _contains_multimodal, _extract_multimodal_objects, and _build_non_multimodal_dict don't have protection against circular references in object graphs. If a Pydantic model has circular references (e.g., a parent-child relationship where child references parent), these functions could cause infinite recursion and stack overflow. Consider adding a visited set parameter to track already-processed objects.

Prompt To Fix With AI
This is a comment left during a code review.
Path: atomic-agents/atomic_agents/context/chat_history.py
Line: 16:39

Comment:
**logic:** The recursive functions `_contains_multimodal`, `_extract_multimodal_objects`, and `_build_non_multimodal_dict` don't have protection against circular references in object graphs. If a Pydantic model has circular references (e.g., a parent-child relationship where child references parent), these functions could cause infinite recursion and stack overflow. Consider adding a `visited` set parameter to track already-processed objects.

How can I resolve this? If you propose a fix, please make it concise.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

- Replace 3 duplicate recursive functions with single _extract_multimodal_content()
- Add MultimodalContent dataclass for clean return type
- Use Python 3.10+ match/case for idiomatic pattern matching
- Fix circular reference vulnerability with _seen set tracking
- Fix Pydantic deprecation warning (use type(obj).model_fields)
- Consolidate 3 unit tests into 1 comprehensive test
- Simplify README to be user-facing
- Remove python-dotenv dependency, use simple .env loader

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@KennyVaneetvelde KennyVaneetvelde force-pushed the feature/nested-multimodal-support branch from a3db8d4 to 228d967 Compare November 30, 2025 11:27
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AgentMemory: support nested multimodal data

2 participants