-
-
Notifications
You must be signed in to change notification settings - Fork 439
feat: Support nested multimodal content in ChatHistory (Issue #141) #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add recursive detection and extraction of multimodal content (Image, PDF, Audio) at any nesting depth within input schemas. Changes: - Add _contains_multimodal() for recursive multimodal detection - Add _extract_multimodal_objects() for recursive extraction - Add _build_non_multimodal_dict() for building JSON without multimodal content - Update get_history() to use recursive functions - Fix deprecated instructor.multimodal imports - Add 6 new tests for nested multimodal scenarios - Add nested-multimodal example demonstrating the fix Closes #141
|
Automated review by Greptile Greptile OverviewGreptile SummaryThis PR successfully implements nested multimodal content support in Key Changes
Implementation QualityThe refactored approach is cleaner and more maintainable than the previous top-level-only implementation. The circular reference protection addresses the previously identified concern about infinite recursion. Confidence Score: 4/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User
participant ChatHistory
participant Extract as _extract_multimodal_content
participant Message
participant Instructor
User->>ChatHistory: add_message(role, content)
ChatHistory->>Message: Create Message with nested multimodal content
Note over Message: content contains ImageDocument<br/>with nested Image objects
User->>ChatHistory: get_history()
ChatHistory->>Extract: _extract_multimodal_content(message.content)
Extract->>Extract: Check if BaseModel (ImageDocument)
Extract->>Extract: Add to _seen set (circular ref protection)
loop For each field in BaseModel
Extract->>Extract: _extract_multimodal_content(field_value)
alt Field is Image/Audio/PDF
Extract-->>Extract: Return MultimodalContent(objects=[obj], json_data=None)
else Field is string/primitive
Extract-->>Extract: Return MultimodalContent(objects=[], json_data=value)
end
Extract->>Extract: Accumulate objects and json_data
end
Extract-->>ChatHistory: MultimodalContent(objects=[Image, ...], json_data={owner, category, ...})
ChatHistory->>ChatHistory: Build content array
Note over ChatHistory: [json.dumps(json_data), Image1, Image2, ...]
ChatHistory-->>Instructor: Return history with separated JSON and multimodal
Note over Instructor: Instructor can now properly<br/>handle multimodal objects
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 files reviewed, 1 comment
| def _contains_multimodal(obj) -> bool: | ||
| """ | ||
| Recursively checks if an object contains any multimodal content. | ||
| Args: | ||
| obj: The object to check. | ||
| Returns: | ||
| bool: True if the object contains multimodal content, False otherwise. | ||
| """ | ||
| if isinstance(obj, INSTRUCTOR_MULTIMODAL_TYPES): | ||
| return True | ||
| elif isinstance(obj, list): | ||
| return any(_contains_multimodal(item) for item in obj) | ||
| elif isinstance(obj, dict): | ||
| return any(_contains_multimodal(value) for value in obj.values()) | ||
| elif hasattr(obj, "__class__") and hasattr(obj.__class__, "model_fields"): | ||
| # Pydantic model - check all fields | ||
| for field_name in obj.__class__.model_fields: | ||
| if hasattr(obj, field_name): | ||
| if _contains_multimodal(getattr(obj, field_name)): | ||
| return True | ||
| return False | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: The recursive functions _contains_multimodal, _extract_multimodal_objects, and _build_non_multimodal_dict don't have protection against circular references in object graphs. If a Pydantic model has circular references (e.g., a parent-child relationship where child references parent), these functions could cause infinite recursion and stack overflow. Consider adding a visited set parameter to track already-processed objects.
Prompt To Fix With AI
This is a comment left during a code review.
Path: atomic-agents/atomic_agents/context/chat_history.py
Line: 16:39
Comment:
**logic:** The recursive functions `_contains_multimodal`, `_extract_multimodal_objects`, and `_build_non_multimodal_dict` don't have protection against circular references in object graphs. If a Pydantic model has circular references (e.g., a parent-child relationship where child references parent), these functions could cause infinite recursion and stack overflow. Consider adding a `visited` set parameter to track already-processed objects.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 files reviewed, no comments
- Replace 3 duplicate recursive functions with single _extract_multimodal_content() - Add MultimodalContent dataclass for clean return type - Use Python 3.10+ match/case for idiomatic pattern matching - Fix circular reference vulnerability with _seen set tracking - Fix Pydantic deprecation warning (use type(obj).model_fields) - Consolidate 3 unit tests into 1 comprehensive test - Simplify README to be user-facing - Remove python-dotenv dependency, use simple .env loader 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
a3db8d4 to
228d967
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 files reviewed, no comments
Summary
This PR implements support for nested multimodal content in
ChatHistory, fixing Issue #141.The Problem: Previously,
ChatHistory.get_history()only detected multimodal content (Image, PDF, Audio from the instructor library) at the top level of input schemas. When multimodal content was nested within other schemas (e.g., a list of documents each containing a PDF), it was incorrectly serialized withjson.dumps, causing API errors.The Solution: Add recursive detection and extraction of multimodal content at any nesting depth:
_contains_multimodal(obj)- Recursively checks if an object contains any multimodal content_extract_multimodal_objects(obj)- Recursively extracts all multimodal objects from nested structures_build_non_multimodal_dict(obj)- Builds a JSON-serializable dict excluding multimodal contentChanges
get_history()to use themImageDocumentschemasinstructor.multimodalimports →instructor.processing.multimodalTest Plan
List[ImageDocument]withImagefieldsCloses #141