Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disconnected runtime now stops the agent loop #6829

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

tofarr
Copy link
Collaborator

@tofarr tofarr commented Feb 19, 2025

End-user friendly description of the problem this fixes or functionality that this introduces
The AgentLoop is now stopped when it detects that a runtime disconnects.

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

If the runtime stops (Possibly due to an external error, out of memory, or issue with Kubernetes / Docker), the server would continue without a runtime, spewing errors but not actually handling the error properly. After this change, the AgentLoop will stop after the final error is emitted.

Example
This silly conversation...
image

If the docker container is deleted...
image

On main subsequent prompts fail, telling users to refresh the page...
image

But refreshing the page does not clear the issue - the Agent remains in the Error state.

After the change, a page refresh will restart the runloop. (Because it has been stopped!) The agent is still aware that something went wrong, as evidenced by the output from a continue prompt:
image


Link of any specific issues this addresses


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:e65a8db-nikolaik   --name openhands-app-e65a8db   docker.all-hands.dev/all-hands-ai/openhands:e65a8db

async def stop_agent_loop_for_error(self):
if self.controller is not None:
await self.controller.set_agent_state_to(AgentState.ERROR)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this method as...

  • Contrary to it's name, It does not actually stop the agent loop
  • Stopping the agent loop should be done in the conversation manager.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't stop in the sense of closing it, but it paused it, basically, right? And then the UI waited for the user to take action.

@enyst
Copy link
Collaborator

enyst commented Feb 19, 2025

Can't we try, on refresh, to reconnect the runtime?

@tofarr
Copy link
Collaborator Author

tofarr commented Feb 19, 2025

Can't we try, on refresh, to reconnect the runtime?

That is what this PR does - before this PR, the problem was that the agent loop would be running, and therefore would not restart. Now, a disconnected runtime triggers the agent loop to stop, so that a page refresh will restart the agent loop and thereby trigger a reconnect / restart of the runtime.

@tofarr tofarr marked this pull request as ready for review February 19, 2025 21:19
Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right we need a solution for that behavior, thank you for this.

I'm not sure this is quite the way to do it, but I may be wrong. The message callback channel was intended for displaying the error strings in the UI, using it to close the entire session is a bit surprising. What if we reconnect the runtime at refresh, and close the old agent loop if it was disconnected, at that time, at refresh time, does that make sense?

@enyst
Copy link
Collaborator

enyst commented Feb 19, 2025

On a side note, I wonder what will happen when the user has more useful things to do even without a runtime available right now: right now they can chat with the LLM, and they can try to create a delegate (these actions are not runnable actions, so they don't require a runtime). What if we make a summarization tool, the user could use? (it's not runtime either), or integrate MCP?

Just wondering, maybe I'm missing something, would they be possible with this PR?

@tofarr
Copy link
Collaborator Author

tofarr commented Feb 19, 2025

What if we reconnect the runtime at refresh, and close the old agent loop if it was disconnected, at that time, at refresh

I like this approach better. I'll update the PR.

@tofarr tofarr marked this pull request as draft February 19, 2025 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants