Python: Add CuaAgentMiddleware for Computer-Use tool #1338
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
This PR implements the integration between Microsoft Agent Framework and Cua as discussed in issue #1095.
Why is this needed?
Implementation approach:
Following @eavanvalkenburg's guidance in #1095, this uses the
ChatMiddleware
pattern rather than implementing Cua as a Tool. This delegates the entire agent loop to Cua while maintaining Agent Framework's orchestration and human-in-the-loop capabilities.Why wrap
ComputerAgent
instead of justComputer
?ComputerAgent
provides the complete agent loop (model inference → parsing → computer actions → multi-step execution) with support for 100+ model configurationsComputer
is just the low-level tool for executing actions (click, type, screenshot, etc.)ComputerAgent
, we get all of Cua's model support for free without reimplementing provider-agnostic parsers for OpenCUA, InternVL, UI-Tars, GLM, etc.Related issue: #1095
Description
This PR adds
agent-framework-cua
, a new integration package that providesCuaAgentMiddleware
.Key components:
CuaAgentMiddleware
- Middleware that intercepts chat requests and delegates to Cua'sComputerAgent
context.terminate = True
ComputerAgent
(supports 100+ models)require_approval
,approval_interval
)ChatResponse
formatType definitions -
CuaModelId
,CuaProviderType
,CuaOSType
, etc. for type safetyExamples:
basic_example.py
- Claude Sonnet 4.5 with Linux Dockercomposite_agent_example.py
- UI-Tars + GPT-4o composite agentPackage structure - Follows existing integration patterns (
agent-framework-redis
,agent-framework-mem0
)Architecture:
The chat client becomes a no-op since
CuaAgentMiddleware
terminates middleware execution and returns the response directly from Cua.Technical notes:
cua-agent
dependency)chat_client
since middleware terminates execution before reaching itChatMessage.content
→ChatMessage.text
/contents
attribute usage in middlewareContribution Checklist