-
Notifications
You must be signed in to change notification settings - Fork 448
Enable multi agent session persistence #900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Enable multi agent session persistence #900
Conversation
a2fa154
to
29a7dd8
Compare
29a7dd8
to
41d2f44
Compare
41d2f44
to
64a2ecb
Compare
src/strands/experimental/multiagent_session/multiagent_events.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/multiagent_session/multiagent_events.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/multiagent_session/multiagent_state.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/multiagent_session/multiagent_state_adapter.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/multiagent_session/multiagent_state_adapter.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/multiagent_session/multiagent_state_adapter.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/multiagent_session/multiagent_state_adapter.py
Outdated
Show resolved
Hide resolved
c9a0034
to
a774434
Compare
a774434
to
b53ba36
Compare
b53ba36
to
947261a
Compare
947261a
to
4e3c44b
Compare
4e3c44b
to
f47d3b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you doing getattr
throughout the code. I would like to avoid doing this, and instead check if objects are instances of classes. Two classes could have the same attribute, leading to undesirable behavior.
src/strands/experimental/hooks/multiagent_hooks/multiagent_events.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/hooks/multiagent_hooks/multiagent_events.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/hooks/multiagent_hooks/multiagent_events.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/hooks/multiagent_hooks/multiagent_events.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/hooks/multiagent_hooks/multiagent_events.py
Outdated
Show resolved
Hide resolved
…s, refactor interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments.
"""Return a JSON-serializable snapshot of the orchestrator state.""" | ||
raise NotImplementedError | ||
|
||
def deserialize_state(self, payload: dict[str, Any]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see deserialize as being a mutative action. If this is actually doing
"""Restore orchestrator state from a session dict and prepare for execution.
why not call restore_from_state or session
self.state.results[node_id] = node_result | ||
|
||
# Persist failure here | ||
self.hooks.invoke_callbacks(AfterNodeCallEvent(self, node_id=node_id, invocation_state=invocation_state)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we just do a finally here for both the happy and sad path case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is we should fire this event right after a node failed. But for sucess node we should fire after it has been added to self.state.node_history
that's where it represent state
updated. .
return result | ||
|
||
except Exception as e: | ||
execution_time = round((time.time() - start_time) * 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about execution time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above.
payload: Dictionary containing persisted state data including status, | ||
completed nodes, results, and next nodes to execute. | ||
""" | ||
if not payload.get("next_nodes_to_execute"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For graph is this actually expected behavior? Ignoring persisted state, if I had a graph locally only, the graph completed, then I re-executed, would we reset? This seems like strange logic to have ONLY for session peristence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a graph is failed with a blocking node or graph is completed , next_nodes_to_execute
will be [] , otherwise we will always have some [node_ids] here.
How graph complete is while ready_nodes
which is a [nodes] so it is actually the same logic.
from typing import Any, Callable, Optional, Tuple | ||
|
||
from opentelemetry import trace as trace_api | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment, I am having a difficult time reviewing all of this code. It is complex. Can we please merge the events and base (potentially in a branch) and then split graph and swarm as well as the session manager implementations.
I think this review is taking so long because every time I take a look I am focussing on different things because the cognitive load is so high in each review session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright
Description
This PR introduce interfaces, hooks and utils that multiagent persistence needs.
Key Change:
Hooks and Events
add Multiagent events and register within
SessionManager
. All newly added events are used now.Graph and Swarm:
add session persistence to Graph and Swarm. Swarm and Graph can restore from saved session.
completed
.completed
since Swarm has execution different logic but Swarm will call _build_result() with saved context.New APIs:
Test code/ User reference:
S3:
File:
Save multiagent_state_json example:
Related Issues
#867
#500
Documentation PR
Type of Change
Bug fix
New feature
Breaking change
Documentation update
Other (please describe):
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
hatch run prepare
Checklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.