feat: Support nested multimodal content in ChatHistory (Issue #141) #192

KennyVaneetvelde · 2025-11-25T20:53:52Z

Summary

This PR implements support for nested multimodal content in ChatHistory, fixing Issue #141.

The Problem: Previously, ChatHistory.get_history() only detected multimodal content (Image, PDF, Audio from the instructor library) at the top level of input schemas. When multimodal content was nested within other schemas (e.g., a list of documents each containing a PDF), it was incorrectly serialized with json.dumps, causing API errors.

The Solution: Add recursive detection and extraction of multimodal content at any nesting depth:

_contains_multimodal(obj) - Recursively checks if an object contains any multimodal content
_extract_multimodal_objects(obj) - Recursively extracts all multimodal objects from nested structures
_build_non_multimodal_dict(obj) - Builds a JSON-serializable dict excluding multimodal content

Changes

atomic-agents/context/chat_history.py: Add 3 recursive helper functions and update get_history() to use them
atomic-agents/tests/context/test_chat_history.py: Add 6 new tests for nested multimodal scenarios
atomic-examples/nested-multimodal/: New example demonstrating the fix with nested ImageDocument schemas
Fix deprecated instructor.multimodal imports → instructor.processing.multimodal

Test Plan

All 30 unit tests pass
End-to-end validation with OpenAI GPT-4.1 using nested Image content
Example successfully processes nested List[ImageDocument] with Image fields

Closes #141

Add recursive detection and extraction of multimodal content (Image, PDF, Audio) at any nesting depth within input schemas. Changes: - Add _contains_multimodal() for recursive multimodal detection - Add _extract_multimodal_objects() for recursive extraction - Add _build_non_multimodal_dict() for building JSON without multimodal content - Update get_history() to use recursive functions - Fix deprecated instructor.multimodal imports - Add 6 new tests for nested multimodal scenarios - Add nested-multimodal example demonstrating the fix Closes #141

greptile-apps · 2025-11-25T21:00:28Z

Automated review by Greptile

Greptile Overview

Greptile Summary

This PR successfully implements nested multimodal content support in ChatHistory, solving Issue #141 where multimodal objects (Image, PDF, Audio) nested within schemas were incorrectly serialized.

Key Changes

Added _extract_multimodal_content() function with recursive extraction logic and circular reference protection using _seen set tracking
Refactored get_history() from field-by-field inspection to single-pass recursive extraction
Fixed deprecated instructor.multimodal imports → instructor.processing.multimodal
Added 6 comprehensive unit tests covering nested lists, dicts, deeply nested schemas, and edge cases
Includes working example demonstrating List[ImageDocument] with nested Image fields

Implementation Quality

The refactored approach is cleaner and more maintainable than the previous top-level-only implementation. The circular reference protection addresses the previously identified concern about infinite recursion.

Confidence Score: 4/5

This PR is safe to merge with good test coverage and circular reference protection
Score reflects solid implementation with comprehensive testing. Reduced one point due to the complexity of recursive traversal logic which could benefit from additional edge case validation in production use
Pay close attention to atomic-agents/atomic_agents/context/chat_history.py - the recursive extraction logic is complex and handles multiple object types

Important Files Changed

File Analysis

Filename	Score	Overview
atomic-agents/atomic_agents/context/chat_history.py	4/5	Added recursive multimodal extraction with circular reference protection; refactored get_history() to use single-pass extraction; updated imports from deprecated instructor.multimodal
atomic-agents/tests/context/test_chat_history.py	5/5	Added 6 comprehensive tests for nested multimodal content; updated deprecated imports; all tests verify correct extraction and JSON serialization
atomic-examples/nested-multimodal/nested_multimodal/main.py	5/5	New example demonstrating nested multimodal content with ImageDocument schema; supports both OpenAI and Gemini; includes custom .env loader

Sequence Diagram

sequenceDiagram
    participant User
    participant ChatHistory
    participant Extract as _extract_multimodal_content
    participant Message
    participant Instructor

    User->>ChatHistory: add_message(role, content)
    ChatHistory->>Message: Create Message with nested multimodal content
    Note over Message: content contains ImageDocument<br/>with nested Image objects

    User->>ChatHistory: get_history()
    ChatHistory->>Extract: _extract_multimodal_content(message.content)
    
    Extract->>Extract: Check if BaseModel (ImageDocument)
    Extract->>Extract: Add to _seen set (circular ref protection)
    
    loop For each field in BaseModel
        Extract->>Extract: _extract_multimodal_content(field_value)
        alt Field is Image/Audio/PDF
            Extract-->>Extract: Return MultimodalContent(objects=[obj], json_data=None)
        else Field is string/primitive
            Extract-->>Extract: Return MultimodalContent(objects=[], json_data=value)
        end
        Extract->>Extract: Accumulate objects and json_data
    end
    
    Extract-->>ChatHistory: MultimodalContent(objects=[Image, ...], json_data={owner, category, ...})
    
    ChatHistory->>ChatHistory: Build content array
    Note over ChatHistory: [json.dumps(json_data), Image1, Image2, ...]
    
    ChatHistory-->>Instructor: Return history with separated JSON and multimodal
    Note over Instructor: Instructor can now properly<br/>handle multimodal objects

greptile-apps

_{7 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-11-25T21:00:28Z

atomic-agents/atomic_agents/context/chat_history.py

+def _contains_multimodal(obj) -> bool:
+    """
+    Recursively checks if an object contains any multimodal content.
+
+    Args:
+        obj: The object to check.
+
+    Returns:
+        bool: True if the object contains multimodal content, False otherwise.
+    """
+    if isinstance(obj, INSTRUCTOR_MULTIMODAL_TYPES):
+        return True
+    elif isinstance(obj, list):
+        return any(_contains_multimodal(item) for item in obj)
+    elif isinstance(obj, dict):
+        return any(_contains_multimodal(value) for value in obj.values())
+    elif hasattr(obj, "__class__") and hasattr(obj.__class__, "model_fields"):
+        # Pydantic model - check all fields
+        for field_name in obj.__class__.model_fields:
+            if hasattr(obj, field_name):
+                if _contains_multimodal(getattr(obj, field_name)):
+                    return True
+        return False
+    return False


logic: The recursive functions _contains_multimodal, _extract_multimodal_objects, and _build_non_multimodal_dict don't have protection against circular references in object graphs. If a Pydantic model has circular references (e.g., a parent-child relationship where child references parent), these functions could cause infinite recursion and stack overflow. Consider adding a visited set parameter to track already-processed objects.

Prompt To Fix With AI

This is a comment left during a code review. Path: atomic-agents/atomic_agents/context/chat_history.py Line: 16:39 Comment: **logic:** The recursive functions `_contains_multimodal`, `_extract_multimodal_objects`, and `_build_non_multimodal_dict` don't have protection against circular references in object graphs. If a Pydantic model has circular references (e.g., a parent-child relationship where child references parent), these functions could cause infinite recursion and stack overflow. Consider adding a `visited` set parameter to track already-processed objects. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps