-
Notifications
You must be signed in to change notification settings - Fork 468
Open
Labels
.NETpythonsquad: workflowsAgent Framework Workflows SquadAgent Framework Workflows SquadworkflowsRelated to Workflows in agent-frameworkRelated to Workflows in agent-framework
Description
Define Workflow Concurrency Semantics
Problem
Workflows execute with concurrent operations in multiple scenarios (fan-out edges, multiple source executors, shared state access), but the concurrency guarantees, ordering, and thread-safety characteristics are not explicitly documented or consistently defined between Python and .NET. This creates:
- Ambiguity for developers - Unclear what ordering and atomicity guarantees exist
- Risk of race conditions - Developers may write incorrect concurrent code without realizing it
- Non-deterministic behavior - Workflows may behave differently on each run
- Inconsistent implementations - Python and .NET may have different concurrency semantics
- Debugging difficulties - Hard to reproduce issues without clear concurrency rules
Current State
Python:
- Superstep-based execution with
asyncio.gather()
for concurrent delivery SharedState
usesasyncio.Lock
withhold()
context manager for multi-operation atomicity- Message ordering within supersteps: unspecified
- Fan-out delivery: concurrent with no ordering guarantees
.NET:
- Superstep-based execution with configurable modes (OffThread, Lockstep, Subworkflow)
Task.WhenAll()
for concurrent fan-out- Shared state synchronization: not explicitly documented
- Message ordering: unspecified across execution modes
Desired End State
Well-Defined Concurrency Model
A documented concurrency model that clearly specifies:
-
Message Ordering Guarantees
- Within a single source → target path
- From multiple sources to one target in the same superstep
- During fan-out to multiple targets
- During fan-in from multiple sources
-
Shared State Semantics
- What operations are atomic
- What locking patterns are required for read-modify-write operations
- Whether locks span operation boundaries or individual calls
- How to perform multi-operation atomic blocks
-
Parallelism Guarantees
- Whether fan-out execution is guaranteed concurrent or implementation-defined
- Whether executors in the same superstep can run in parallel
- What happens when one executor blocks
-
Superstep Boundary Semantics
- When exactly a superstep completes
- Whether all executors must finish before boundary
- What state is visible at boundaries
- Checkpointing consistency guarantees
-
Determinism vs Performance Trade-offs
- Whether message processing order is deterministic
- Whether workflows can be made reproducible
- Performance implications of ordering guarantees
-
Deadlock Prevention
- Whether circular message dependencies are allowed
- How to detect and handle deadlock scenarios
- Runtime enforcement mechanisms
Documentation Requirements
For Developers:
- Clear API documentation for
SharedState
operations and locking requirements - Examples showing correct concurrent patterns
- Anti-patterns and common pitfalls
- Guidelines for when atomicity is needed
For Framework:
- Specification of execution semantics that both languages must implement
- Test cases demonstrating expected concurrent behavior
- Decision on whether Python and .NET must have identical semantics or language-appropriate patterns
Real-World Impact Scenarios
Without clear concurrency semantics:
- Lost Updates - Multiple executors increment a shared counter, some updates are lost due to race conditions
- Non-Deterministic Results - Aggregator receives messages in random order, producing different outputs each run
- Checkpoint Corruption - Checkpoint occurs mid-execution with partially updated shared state
- Workflow Hangs - Circular message dependencies cause deadlock
- Performance Confusion - Developer expects parallelism but sees serial execution
Questions
- Should message ordering be guaranteed within supersteps? If so, how (insertion order, timestamp, explicit priority)?
- Should shared state operations document required locking patterns for atomic updates?
- Should fan-out guarantee concurrent execution, or allow runtime flexibility?
- Should superstep boundaries be fully synchronous (all executors complete) or allow streaming?
- Should workflows provide determinism modes (reproducible order vs maximum parallelism)?
- How should deadlock scenarios be detected and handled?
- Should .NET and Python have identical concurrency semantics, or language-appropriate patterns?
- Should concurrency semantics be part of the workflow specification or implementation detail?
Related Code
Python:
python/packages/core/agent_framework/_workflows/_runner.py
- superstep executionpython/packages/core/agent_framework/_workflows/_edge_runner.py
- fan-out/fan-in runnerspython/packages/core/agent_framework/_workflows/_shared_state.py
- locking semantics
.NET:
dotnet/src/Microsoft.Agents.AI.Workflows/InProcessExecution.cs
- execution modesdotnet/src/Microsoft.Agents.AI.Workflows/Execution/FanOutEdgeRunner.cs
- concurrent instantiationdotnet/src/Microsoft.Agents.AI.Workflows/Execution/
- execution infrastructure
Metadata
Metadata
Assignees
Labels
.NETpythonsquad: workflowsAgent Framework Workflows SquadAgent Framework Workflows SquadworkflowsRelated to Workflows in agent-frameworkRelated to Workflows in agent-framework