Improve code coverage of mcp-agent #215

StreetLamb · 2025-05-17T11:06:38Z

Summary by CodeRabbit

Bug Fixes
- Improved handling of type serialization, model reconstruction, and tool call logic across multiple LLM integrations.
- Enhanced error handling and robustness in tool execution and message processing for Azure, Bedrock, Google, and OpenAI LLM workflows.
- Fixed variable naming and data structure consistency for signal handling and workflow orchestration.
New Features
- Added extensive new and refactored unit and integration tests for agents, workflows, LLM providers, orchestrator, router, swarm, and connection management.
- Introduced new test fixtures and documentation to support comprehensive testing and easier maintenance.
Chores
- Updated and added development dependencies for coverage reporting and asynchronous concurrency.
- Removed obsolete or redundant test fixtures and documentation files.
Style
- Refined prompt templates and documentation for orchestrator and intent classifier workflows.
- Corrected naming and formatting issues in orchestrator and signal handler modules.
Tests
- Significantly expanded test coverage for all major components, including LLM integrations, agent logic, workflow orchestration, and routing strategies.
- Improved test isolation, mocking, and fixture management for reliable and maintainable test suites.

…ation

- Deleted the `test_event_progress.py` file as it is no longer needed. - Added a new README.md file in the LLM tests directory to document test coverage and usage. - Created a `conftest.py` file for shared fixtures used across LLM provider tests. - Implemented comprehensive tests for the `OpenAIAugmentedLLM` class, covering various functionalities including text generation, structured output, tool usage, error handling, and request parameter merging.

* add support for Azure Identity instead of keys * make credential_scopes configurable, but with a default to `https://cognitiveservices.azure.com/.default` --------- Co-authored-by: Brian Caswell <[email protected]>

* Add streamable HTTP support to mcp-agent * Session ID support for session resumption * Example to showcase streamable HTTP server support

#192)

…kMCPTypeConverter

…TypeConverter

…blocks in anthropic_content_to_mcp_content

…add-tests

…beddingIntentClassifier

…tentClassifier functionality

- Created test package for orchestrator workflow module. - Implemented unit tests for orchestrator methods, including initialization and execution of plans. - Added integration tests for full and iterative workflow executions. - Developed tests for orchestrator data models and prompt templates. - Ensured comprehensive coverage of task, step, and plan result formatting.

…ither 'full' or 'iterative'

…add-tests

…tedLLM

… test_parallel_llm.py

coderabbitai

Actionable comments posted: 27

🔭 Outside diff range comments (2)

src/mcp_agent/workflows/llm/augmented_llm_openai.py (1)
1029-1051: 🛠️ Refactor suggestion

Limited content-type coverage in mcp_content_to_openai_content_part
The helper downgrades ImageContent and binary EmbeddedResource blocks to plain text ("<mime>:<data>"). The OpenAI SDK already offers ChatCompletionContentPartImageParam; losing the structured form reduces model performance on vision-capable models.

Recommend adding proper image part construction:
@@
     elif isinstance(content, ImageContent):
-        # Best effort to convert an image to text
-        return ChatCompletionContentPartTextParam(
-            type="text", text=f"{content.mimeType}:{content.data}"
-        )
+        return ChatCompletionContentPartImageParam(
+            type="image_url",
+            image_url={"url": f"data:{content.mimeType};base64,{content.data}"},
+        )
…and import ChatCompletionContentPartImageParam.
src/mcp_agent/utils/pydantic_type_serializer.py (1)
648-662: ⚠️ Potential issue

Custom default_factory functions are silently discarded

Only the three built-ins are restored; any user-defined factory is lost,
leading to wrong defaults or validation errors.
-                else:
-                    default_factory = None
+                else:
+                    # Attempt to re-import the factory by name from the original
+                    # module, otherwise keep the string for later resolution.
+                    try:
+                        mod = importlib.import_module(field_info["default_factory_module"])
+                        default_factory = getattr(mod, default_factory)
+                    except Exception:
+                        # Fallback: preserve the string so consumers know it existed
+                        default_factory = default_factory
You’ll need to store default_factory_module during serialization to
support this restoration.

♻️ Duplicate comments (2)

tests/workflows/orchestrator/test_orchestrator.py (2)
357-365: Same SIM117 issue—combine the with statements

See previous comment for justification.
-with patch.object(Orchestrator, "_get_next_step", mock_get_next_step):
-    with patch.object(Orchestrator, "_execute_step", mock_execute_step):
+with patch.object(Orchestrator, "_get_next_step", mock_get_next_step), \
+     patch.object(Orchestrator, "_execute_step", mock_execute_step):
🧰 Tools

🪛 Ruff (0.11.9)

357-358: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

430-437: Combine the patch contexts here as well
-with patch.object(Orchestrator, "_get_full_plan", mock_get_full_plan):
-    with patch.object(Orchestrator, "_execute_step", mock_execute_step):
+with patch.object(Orchestrator, "_get_full_plan", mock_get_full_plan), \
+     patch.object(Orchestrator, "_execute_step", mock_execute_step):
🧰 Tools

🪛 Ruff (0.11.9)

430-431: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

🧹 Nitpick comments (45)

src/mcp_agent/workflows/llm/augmented_llm_google.py (1)
442-451: Improve code readability with ternary operator.

The role normalization logic is correct and aligns with MCP compatibility requirements. However, the code can be simplified using a ternary operator as suggested by static analysis.

Apply this diff to improve readability:
-        if result.role == "model":
-            role = "assistant"
-        else:
-            role = result.role
+        role = "assistant" if result.role == "model" else result.role
         return MCPMessageResult(
             role=role,
             content=contents[0],
             model="",
             stopReason=None,
         )
🧰 Tools

🪛 Ruff (0.11.9)

442-445: Use ternary operator role = "assistant" if result.role == "model" else result.role instead of if-else-block

Replace if-else-block with role = "assistant" if result.role == "model" else result.role

(SIM108)
tests/workflows/swarm/test_swarm_anthropic.py (1)
135-144: Extract repeated message string to a constant.

The follow-up message "Please resolve my original request. If it has already been resolved then end turn" is repeated in multiple test assertions (lines 137 and 143). This should be extracted to a module-level constant for better maintainability.
+# At the top of the file, after imports
+FOLLOW_UP_MESSAGE = "Please resolve my original request. If it has already been resolved then end turn"

 # In test assertions, replace the hardcoded strings:
-            assert (
-                second_call_kwargs["message"]
-                == "Please resolve my original request. If it has already been resolved then end turn"
-            )
+            assert second_call_kwargs["message"] == FOLLOW_UP_MESSAGE

-            assert (
-                third_call_kwargs["message"]
-                == "Please resolve my original request. If it has already been resolved then end turn"
-            )
+            assert third_call_kwargs["message"] == FOLLOW_UP_MESSAGE
tests/workflows/llm/test_augmented_llm_ollama.py (1)
94-94: Remove or uncomment the commented code.

There's a commented line that should either be removed if it's not needed or uncommented with proper justification.
 mock_openai_config_for_custom.api_key = "test_api_key"
-        # mock_openai_config_for_custom.reasoning_effort = "medium"
If this line is needed, uncomment it. If not, remove it entirely to keep the code clean.
tests/mcp/test_mcp_connection_manager.py (1)

126-149: Address the commented out test with TODO.

There's a commented out test for initialization failure with a TODO comment from "jerron". This should either be fixed and uncommented, or removed if it's no longer needed.

The test appears to be checking for proper error handling when server initialization fails. If you need help implementing this test or determining whether it's still needed, I can assist with:

Analyzing why the test might be failing

Providing an alternative implementation approach

Helping determine if this test case is still relevant

Would you like me to help resolve this TODO item?
tests/executor/temporal/test_workflow_registry.py (2)
18-26: Good test coverage but consider adding edge case validation.

The test correctly verifies registration and retrieval functionality. However, consider adding assertions to verify the internal state consistency beyond just the workflow retrieval.
 async def test_register_and_get_workflow(registry):
     mock_workflow = MagicMock(name="test_workflow")
     run_id = "run-id"
     workflow_id = "workflow-id"
     await registry.register(mock_workflow, run_id, workflow_id)
     workflow = await registry.get_workflow(run_id)
     assert workflow == mock_workflow
     assert registry._workflow_ids[workflow_id] == [run_id]
+    # Verify the workflow is in local workflows
+    assert run_id in registry._local_workflows
+    assert registry._local_workflows[run_id] == mock_workflow
74-77: Test method name doesn't match the actual test behavior.

The test name suggests it's testing error status, but it's actually testing the case where a workflow doesn't exist. Consider renaming for clarity.
-async def test_get_workflow_status_error(registry, mock_executor):
+async def test_get_workflow_status_nonexistent_workflow(registry, mock_executor):
     # Should return error status if workflow_id is missing
     result = await registry.get_workflow_status("nonexistent")
     assert result is False
tests/workflows/router/test_router_base.py (1)
74-84: Good validation test but consider more specific error messages.

The test correctly validates that empty inputs raise a ValueError. Consider checking the specific error message to ensure it's meaningful for users.
 def test_initialization_with_empty_inputs(self, mock_context):
     """Tests initialization fails when no routing targets are provided."""
-    with pytest.raises(ValueError):
+    with pytest.raises(ValueError, match="At least one of server_names, agents, or functions must be provided"):
         # Initialize with empty inputs
         _ = TestRouter(
             server_names=[],
             agents=[],
             functions=[],
             context=mock_context,
         )
tests/utils/test_pydantic_type_serializer.py (2)
213-213: Consider using model_config property consistently.

The config access pattern using getattr with defaults is defensive, but ensure this pattern is consistent with the actual Pydantic v2 ConfigDict structure.
 # Test config is preserved
-assert getattr(ComplexModelReconstructed.model_config, "validate_assignment", True)
+assert ComplexModelReconstructed.model_config.get("validate_assignment", True)
 assert getattr(
-    ComplexModelReconstructed.model_config, "arbitrary_types_allowed", True
+    ComplexModelReconstructed.model_config.get("arbitrary_types_allowed", True)
 )
Also applies to: 215-215

252-253: Address the commented-out tests to improve coverage.

The TODO comments indicate these tests need to be fixed. Consider prioritizing the resolution of these test issues to maintain comprehensive test coverage.

These tests appear to be related to Annotated fields and import cycles. Would you like me to help investigate the specific issues preventing these tests from passing and suggest potential fixes?

Also applies to: 296-297
tests/workflows/parallel/test_fan_out.py (2)
39-61: Fixture mis‐labels an AugmentedLLM as an “agent”.

fan_out_with_llms passes mock_llm_with_name via the agents parameter:
return FanOut(
    agents=[mock_llm_with_name],   # <-- actually an AugmentedLLM
    context=mock_context,
)
If FanOut differentiates between real Agent instances (which require llm_factory) and raw AugmentedLLM objects, this fixture relies on an implementation detail that “anything with a .generate() method will work”.
Should the public constructor instead expose a dedicated keyword (e.g. llms=[...]) to make the intent explicit? This would avoid confusion for future maintainers and catch mis-use at type-checking time.

290-319: Assertion misses argument verification for the executor.

execute_many is the work-horse that runs all fan-out calls concurrently, but the tests only assert assert_called_once(). Verifying the callable list and its length would give stronger coverage:
mock_context.executor.execute_many.assert_called_once()
calls = mock_context.executor.execute_many.call_args[0][0]
assert len(calls) == 3                     # agent, llm, function
This protects against regressions where items silently drop out of the fan-out collection.
tests/workflows/parallel/test_parallel_llm.py (1)
276-297: MagicMock for fan_in lacks async spec – risk of false positives.

parallel_llm_with_agent.fan_in is set to a bare MagicMock, and later tests monkey-patch fan_in.generate / generate_str with AsyncMock. Because MagicMock does not enforce attribute existence, typos such as genrate would silently pass. Prefer AsyncMock with spec_set to catch such mistakes:
pll.fan_in = AsyncMock(spec_set=['generate', 'generate_str', 'generate_structured'])
This strengthens the tests without additional complexity.
tests/test_app.py (2)
224-231: Combine nested with blocks to reduce indentation noise.

Ruff (SIM117) correctly points out that two sequential with statements can be merged:
with patch("mcp_agent.app.initialize_context", AsyncMock(return_value=mock_context)), \
     patch("mcp_agent.app.cleanup_context", AsyncMock()) as mock_cleanup:
    ...
This small refactor improves readability and avoids an extra indentation level. The same pattern recurs at lines 250-257 and 265-272.

🧰 Tools

🪛 Ruff (0.11.9)

224-227: Use a single with statement with multiple contexts instead of nested with statements

Combine with statements

(SIM117)

158-178: Windows loop-policy test leaks a fake asyncio module into sys.modules.

Within the context manager:
import_patch = patch.dict(
    "sys.modules",
    {"asyncio": MagicMock(WindowsProactorEventLoopPolicy=mock_policy_class)},
)
Any import of asyncio inside the body receives this stub rather than the real module – fine for the test – but if MCPApp (or its transitive imports) accesses other asyncio attributes during initialization they will be missing, leading to brittle test failures.

A safer pattern is:

Keep the real asyncio in sys.modules.

Monkey-patch only the missing attribute:
with patch.object(asyncio, "WindowsProactorEventLoopPolicy", mock_policy_class), \
     patch("sys.platform", "win32"), \
     patch("asyncio.set_event_loop_policy") as mock_set:
    MCPApp(name="test_app")
This preserves the full asyncio surface while still exercising the Windows branch.
tests/workflows/intent_classifier/test_intent_classifier_embedding_cohere.py (1)
243-243: Fix unused loop variable.

The loop variable intent_name is not used within the loop body. This should be renamed to indicate it's intentionally unused.
-            for intent_name, intent in classifier.intents.items():
+            for _intent_name, intent in classifier.intents.items():
🧰 Tools

🪛 Ruff (0.11.9)

243-243: Loop control variable intent_name not used within loop body

Rename unused intent_name to _intent_name

(B007)
tests/workflows/intent_classifier/test_intent_classifier_embedding_openai.py (1)
243-243: Fix unused loop variable.

The loop variable intent_name is not used within the loop body. Rename it to indicate it's intentionally unused.
-            for intent_name, intent in classifier.intents.items():
+            for _intent_name, intent in classifier.intents.items():
🧰 Tools

🪛 Ruff (0.11.9)

243-243: Loop control variable intent_name not used within loop body

Rename unused intent_name to _intent_name

(B007)
tests/workflows/swarm/conftest.py (1)
23-30: Explicitly set return_value on AsyncMocks to prevent accidental misuse

The fixtures mock_agent and mock_swarm_agent expose call_tool, initialize, and shutdown as AsyncMocks without a return_value.
If a test forgets to await one of these mocks (or inspects the returned value) it will receive another AsyncMock, leading to confusing RuntimeWarning: coroutine was never awaited messages or false-positive assertions.
 agent.call_tool = AsyncMock()
-# …
+# Give the mock a deterministic result that matches the production contract
+agent.call_tool.return_value = None
Apply the same pattern to the other async mocks (initialize, shutdown, and the equivalents on mock_swarm_agent) or, if a specific value is expected in a test, set it there.
This keeps the fixtures predictable and avoids latent test flakiness.
tests/workflows/orchestrator/test_orchestrator.py (2)
174-178: Rename the unused loop variable to underscore to silence Ruff B007

agent is not referenced inside the loop; Ruff correctly flags this as B007.
Renaming it to _ communicates the intent and keeps the static-analysis pipeline green.
-        for agent_name, agent in orchestrator.agents.items():
+        for agent_name, _ in orchestrator.agents.items():
🧰 Tools

🪛 Ruff (0.11.9)

174-174: Loop control variable agent not used within loop body

Rename unused agent to _agent

(B007)

295-303: Collapse the nested with patch blocks into a single context manager

Ruff SIM117 suggests combining adjacent with statements.
Doing so reduces indentation and improves readability:
-with patch.object(Orchestrator, "_get_full_plan", mock_get_full_plan):
-    with patch.object(Orchestrator, "_execute_step", mock_execute_step):
+with patch.object(Orchestrator, "_get_full_plan", mock_get_full_plan), \
+     patch.object(Orchestrator, "_execute_step", mock_execute_step):
Apply the same pattern to the similar blocks at 357-365 and 430-437.

🧰 Tools

🪛 Ruff (0.11.9)

295-296: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)
tests/workflows/router/test_router_llm_openai.py (1)
211-214: Prefer patch.object as a context manager when spying on an instance method

Directly overwriting router._route_with_llm works, but using patch.object keeps the original attribute intact after the test and avoids accidental leakage across tests:
with patch.object(router, "_route_with_llm", wraps=router._route_with_llm) as spy:
    result = await router.route("Test request")
    spy.assert_called_once_with("Test request")
This small change promotes isolation and ensures the router instance is restored to its original state.
tests/workflows/router/test_router_embedding_cohere.py (1)
25-27: Avoid mutating NumPy’s global RNG in tests

np.random.seed(seed) resets the global RNG state on every call, which can introduce cross-test flakiness or brittle ordering-dependencies (especially when the suite grows or runs in parallel).

A safer pattern is to create a local generator per sample so the remainder of the test-suite keeps its own random state untouched.
-            seed = sum(ord(c) for c in data[i])
-            np.random.seed(seed)
-            embeddings[i] = np.random.rand(embedding_dim).astype(np.float32)
+            # Derive a deterministic generator per input so global RNG state is
+            # left unchanged.
+            seed = sum(ord(c) for c in data[i])
+            rng = np.random.default_rng(seed)
+            embeddings[i] = rng.random(embedding_dim, dtype=np.float32).astype(
+                np.float32
+            )
tests/workflows/parallel/test_fan_in.py (1)
119-125: Patch stack objects with a single context-manager

Static analysis correctly points out that a nested with can be flattened for clarity:
-        with patch("contextlib.AsyncExitStack") as MockAsyncExitStack:
-            # Configure the mock stack
-            mock_stack = AsyncMock()
-            MockAsyncExitStack.return_value = mock_stack
-            mock_stack.__aenter__.return_value = mock_stack
-            mock_stack.enter_async_context.return_value = mock_agent
+        with patch("contextlib.AsyncExitStack") as MockAsyncExitStack:
+            mock_stack = AsyncMock()
+            MockAsyncExitStack.return_value = mock_stack
+            mock_stack.__aenter__.return_value = mock_stack
+            mock_stack.enter_async_context.return_value = mock_agent
Since only one object is patched, the suggestion is mostly stylistic, but flattening keeps the codebase consistent with the linter rule (SIM117) and reduces indentation noise.
tests/agents/test_agent.py (4)
219-223: Collapse nested with statements per SIM117

You can satisfy Ruff’s SIM117 rule and improve readability by combining these two context-managers:
-with patch("uuid.uuid4", return_value="test-uuid"):
-    # Mock the method to return directly
-    with patch.object(
-        Agent, "request_human_input", AsyncMock(return_value="Test user input")
-    ):
-        result = await agent_with_human_input.request_human_input(request)
+with patch("uuid.uuid4", return_value="test-uuid"), patch.object(
+    Agent, "request_human_input", AsyncMock(return_value="Test user input")
+):
+    result = await agent_with_human_input.request_human_input(request)
🧰 Tools

🪛 Ruff (0.11.9)

219-223: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

277-281: Duplicate nested-with pattern — flatten for consistency

The same SIM117 issue appears here; a one-liner context-manager improves legibility:
-with patch.object(
-    Agent, "request_human_input", side_effect=Exception("Callback error")
-):
+with patch.object(
+    Agent, "request_human_input", side_effect=Exception("Callback error")
+):
(The outer with can stay as-is or be merged depending on preference.)

🧰 Tools

🪛 Ruff (0.11.9)

277-281: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

263-270: Dead helper code is never executed

mock_implementation is defined but immediately superseded by the patch.object(...) call that injects a different side-effect. Unless you intend to reuse the richer assertions inside the helper, the function is redundant and can be removed to keep the test focused.
-        async def mock_implementation(self, req):
-            # Simulate the error handling logic from the original method
-            error_message = "Callback error"
-            self.executor.signal.assert_called_once()
-            signal_call = self.executor.signal.call_args[1]
-            assert "payload" in signal_call
-            assert error_message in signal_call["payload"]
-            raise Exception(error_message)
456-462: Small accuracy nit – misleading inline comment

tool_name = test_function.__name__ resolves to the actual function name (function).
The inline comment # Should be "function" not "test_function" is correct, but it can be removed (or re-phrased) to avoid confusion for future maintainers — the code already speaks for itself.
tests/workflows/router/test_router_llm_anthropic.py (1)
219-225: Possible type mismatch for confidence

StructuredResponseCategory.confidence is often typed as float (e.g. 0.83) in the production code.
Asserting against the string "high" (lines 222-225) may break if the implementation was changed to numeric.

Consider asserting on a float threshold instead:
-assert result[0].confidence == "high"
+assert 0.8 <= result[0].confidence <= 1.0
tests/workflows/llm/test_augmented_llm_google.py (1)
469-479: Assert the model actually used in the request payload

You verify that select_model was called, but not that the chosen model is present in the request sent to the executor.
Adding an assertion strengthens the test:
sent_model = mock_llm.executor.execute.call_args[0][1].payload["model"]
assert sent_model == "gemini-1.5-flash"
tests/workflows/router/conftest.py (2)
95-100: Avoid mutating NumPy’s global RNG in a shared test process

Calling np.random.seed(seed) inside the loop resets the global generator every iteration and can leak state into unrelated tests executed in the same Python process.

Instead, create a local generator:
-            seed = sum(ord(c) for c in data[i])
-            np.random.seed(seed)
-            embeddings[i] = np.random.rand(embedding_dim).astype(np.float32)
+            rng = np.random.default_rng(sum(ord(c) for c in data[i]))
+            embeddings[i] = rng.random(embedding_dim, dtype=np.float32)
This removes global side-effects and still gives you deterministic but distinct vectors.

114-118: Name shadowing between fixture and inner function

The fixture is called test_function and it returns a nested function with the same identifier.
Although legal, this can be confusing in stack-traces and when debugging.
Consider renaming the inner callable (e.g. echo_fn) or the fixture (echo_function_fixture) for clarity.
tests/executor/test_workflow_signal.py (4)
197-205: Raise expectation should reference asyncio.TimeoutError for precision

AsyncioSignalHandler.wait_for_signal() propagates asyncio.TimeoutError.
Catching the base‐class TimeoutError works today (they are the same object), but being explicit protects you if Python ever diverges again.
-with pytest.raises(TimeoutError):
+with pytest.raises(asyncio.TimeoutError):
285-305: Assertion on run_in_executor arguments is brittle

run_in_executor is called with positional args (None, input, "Enter value: ").
assert "Enter value: " in mock_loop.run_in_executor.call_args[0] passes only because the substring exists in one of the tuple elements.
If the executor signature changes (e.g. kwargs), this test will give a false negative.

A safer check:
executor_args = mock_loop.run_in_executor.call_args.args
assert executor_args[1] is input
assert executor_args[2] == "Enter value: "
This pins the contract clearly.

428-456: Use contextlib.suppress to silence expected timeouts cleanly

Instead of a try/except …: pass block you can rely on contextlib.suppress for fewer lines and clearer intent.
-try:
-    await store.wait_for("test_signal", timeout_seconds=0.1)
-except asyncio.TimeoutError:
-    pass
+from contextlib import suppress
+with suppress(asyncio.TimeoutError):
+    await store.wait_for("test_signal", timeout_seconds=0.1)
This also resolves the Ruff SIM105 hint.

🧰 Tools

🪛 Ruff (0.11.9)

452-455: Use contextlib.suppress(asyncio.TimeoutError) instead of try-except-pass

Replace with contextlib.suppress(asyncio.TimeoutError)

(SIM105)

183-191: asyncio.sleep(0.1) inflates suite time

The sleeps are only meant to yield control to the event-loop.
await asyncio.sleep(0) (or pytest.skip_event_loop() helper if available) achieves the same without slowing the suite.

Refactor all occurrences where the delay is not functionally needed.

Also applies to: 213-221, 545-553
tests/workflows/intent_classifier/test_intent_classifier_llm_openai.py (2)
185-188: Float equality assertions should use pytest.approx

Direct equality on floating-point numbers can lead to flaky tests across platforms.
Replace with an approximate comparison:
-assert results[0].p_score == 0.9
+assert results[0].p_score == pytest.approx(0.9)
Apply similarly to the other locations (0.7, 0.95, 0.85, 0.65, …).

Also applies to: 253-259, 371-379

70-80: Repeated patch blocks could be collapsed with a fixture or parametrisation

The boilerplate with patch(... OpenAIAugmentedLLM ...): clutter is repeated many times.
Creating a fixture:
@pytest.fixture(autouse=True)
def patched_llm(monkeypatch):
    monkeypatch.setattr(
        "mcp_agent.workflows.intent_classifier.intent_classifier_llm_openai.OpenAIAugmentedLLM",
        MockOpenAIAugmentedLLM,
    )
would make tests shorter and easier to maintain.

Also applies to: 118-130, 270-279
tests/workflows/orchestrator/conftest.py (1)
64-70: Factory omits context, which many LLM methods expect

factory(agent) instantiates MockAugmentedLLM(agent=agent) but does not forward the context created in mock_context().
If any test relies on llm.context being present (e.g. model-selection logic), a NoneType access will surface.
-    def factory(agent):
-        return MockAugmentedLLM(agent=agent)
+    def factory(agent, context=None):
+        # default to the shared mock_context fixture when not provided
+        return MockAugmentedLLM(agent=agent, context=context)
or pass mock_context explicitly from the caller.
This keeps the fixture self-contained and prevents hidden AttributeErrors.
tests/workflows/intent_classifier/test_intent_classifier_llm_anthropic.py (1)
319-324: Patching initialize with AsyncMock may mask real idempotency logic

Wrapping the real coroutine with AsyncMock(wraps=real_initialize) is handy for call-count checks, but it replaces the bound method on the instance.
If initialize contains self.initialized guards, those still run, yet future refactors that access super().initialize or rely on the original descriptor semantics will be obscured.

Consider:
with patch.object(classifier, "initialize", wraps=classifier.initialize) as init_mock:
    ...
so you avoid permanently altering the instance attribute and keep the patch within a controlled context.
src/mcp_agent/workflows/llm/augmented_llm_openai.py (1)

964-972: Single-element list creation is correct but leaves existing helper unused
You now wrap the part in [mcp_content_to_openai_content_part(...)], which is fine. However the old multi-content helper (mcp_content_to_openai_content) still exists (or did in sibling modules). Consider removing dead code to avoid confusion.
tests/workflows/llm/test_augmented_llm_bedrock.py (3)
130-148: Assertion accesses nested payload incorrectly & assumes request shape
first_call_args = mock_llm.executor.execute.call_args[0][1] retrieves the second positional argument, assuming the signature is execute(task, payload). If execute is later called with keyword arguments (e.g., payload=…), this indexing will break silently.

Prefer:
_, request_obj = mock_llm.executor.execute.call_args.args
payload = request_obj.payload
or access via call_kwargs. This keeps the test stable against minor signature changes.

278-311: custom_side_effect does not await tool-calls sequentially – race condition in test
Because all branches of custom_side_effect share mutable call_count, rapid await/AsyncMock scheduling could interleave, producing nondeterministic counts in rare CI conditions.

Guard with a simple lock or, easier, use side_effect=[resp1, resp2]:
mock_llm.executor.execute = AsyncMock(side_effect=[
    self.create_tool_use_response(...),
    self.create_text_response(...)
])
This is deterministic and easier to read.

501-525: typed_dict_extras tests import via bedrock module – fragile coupling
You import typed_dict_extras from mcp_agent.workflows.llm.augmented_llm_bedrock, but the helper actually lives in mcp_agent.utils.common and is only re-exported indirectly.

Importing from its canonical location avoids accidental breakage if the bedrock module stops re-exporting:
from mcp_agent.utils.common import typed_dict_extras
src/mcp_agent/workflows/llm/augmented_llm_azure.py (1)
377-381: Duplicate broad exception handling – consolidate to avoid masking errors

Two separate except Exception as e: blocks return very similar ToolMessage error payloads. Consolidating them improves readability and prevents subtle flow-control differences.

Consider:
-        except json.JSONDecodeError as e:
-            ...
-        except Exception as e:
-            ...
+        except (json.JSONDecodeError, Exception) as e:
+            return ToolMessage(
+                tool_call_id=tool_call_id,
+                content=f"Error executing tool '{tool_name}': {str(e)}",
+            )
Also applies to: 400-404
tests/workflows/orchestrator/test_orchestrator_integration.py (1)
37-41: Rename unused loop variable to underscore to satisfy linters

agent is never referenced inside the loop; Ruff flags this as B007.
Renaming improves clarity and keeps CI green.
-for agent_name, agent in orchestrator.agents.items():
+for agent_name, _agent in orchestrator.agents.items():
(Apply the same change in the second loop around line 182.)

Also applies to: 182-186

🧰 Tools

🪛 Ruff (0.11.9)

37-37: Loop control variable agent not used within loop body

Rename unused agent to _agent

(B007)
tests/workflows/llm/test_augmented_llm_openai.py (1)

149-156: Accessing request payload via positional indices is brittle

first_call_args = mock_llm.executor.execute.call_args_list[0][0] relies on a fixed positional signature of executor.execute(...).
If the executor’s API changes, the test will silently break.
Prefer using call_kwargs = mock_llm.executor.execute.call_args.kwargs (or args, kwargs = ...) and assert via named attributes. This keeps the test stable against parameter re-ordering.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c85782 and 4284d6d.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (68)

pyproject.toml (2 hunks)
src/mcp_agent/executor/workflow_signal.py (2 hunks)
src/mcp_agent/logging/transport.py (1 hunks)
src/mcp_agent/mcp/mcp_aggregator.py (2 hunks)
src/mcp_agent/utils/pydantic_type_serializer.py (8 hunks)
src/mcp_agent/workflows/llm/augmented_llm.py (1 hunks)
src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (1 hunks)
src/mcp_agent/workflows/llm/augmented_llm_azure.py (6 hunks)
src/mcp_agent/workflows/llm/augmented_llm_bedrock.py (2 hunks)
src/mcp_agent/workflows/llm/augmented_llm_google.py (1 hunks)
src/mcp_agent/workflows/llm/augmented_llm_openai.py (4 hunks)
src/mcp_agent/workflows/orchestrator/orchestrator.py (2 hunks)
src/mcp_agent/workflows/router/router_embedding.py (2 hunks)
src/mcp_agent/workflows/swarm/swarm.py (2 hunks)
tests/agents/conftest.py (1 hunks)
tests/agents/test_agent.py (1 hunks)
tests/executor/temporal/test_signal_handler.py (1 hunks)
tests/executor/temporal/test_temporal_executor.py (1 hunks)
tests/executor/temporal/test_workflow_registry.py (1 hunks)
tests/executor/test_workflow.py (1 hunks)
tests/executor/test_workflow_signal.py (1 hunks)
tests/fixture/README.md (0 hunks)
tests/fixture/expected_output.txt (0 hunks)
tests/fixture/mcp_basic_agent_20250131_205604.jsonl (0 hunks)
tests/mcp/test_mcp_aggregator.py (1 hunks)
tests/mcp/test_mcp_connection_manager.py (1 hunks)
tests/test_app.py (1 hunks)
tests/test_event_progress.py (0 hunks)
tests/utils/test_pydantic_type_serializer.py (6 hunks)
tests/workflows/evaluator_optimizer/test_evaluator_optimizer.py (1 hunks)
tests/workflows/intent_classifier/README.md (1 hunks)
tests/workflows/intent_classifier/conftest.py (1 hunks)
tests/workflows/intent_classifier/test_intent_classifier_embedding_cohere.py (1 hunks)
tests/workflows/intent_classifier/test_intent_classifier_embedding_openai.py (1 hunks)
tests/workflows/intent_classifier/test_intent_classifier_llm_anthropic.py (1 hunks)
tests/workflows/intent_classifier/test_intent_classifier_llm_openai.py (1 hunks)
tests/workflows/llm/README.md (1 hunks)
tests/workflows/llm/conftest.py (1 hunks)
tests/workflows/llm/test_augmented_llm_anthropic.py (8 hunks)
tests/workflows/llm/test_augmented_llm_azure.py (1 hunks)
tests/workflows/llm/test_augmented_llm_bedrock.py (1 hunks)
tests/workflows/llm/test_augmented_llm_google.py (1 hunks)
tests/workflows/llm/test_augmented_llm_ollama.py (1 hunks)
tests/workflows/llm/test_augmented_llm_openai.py (3 hunks)
tests/workflows/orchestrator/__init__.py (1 hunks)
tests/workflows/orchestrator/conftest.py (1 hunks)
tests/workflows/orchestrator/test_orchestrator.py (1 hunks)
tests/workflows/orchestrator/test_orchestrator_integration.py (1 hunks)
tests/workflows/orchestrator/test_orchestrator_models.py (1 hunks)
tests/workflows/orchestrator/test_orchestrator_prompts.py (1 hunks)
tests/workflows/parallel/conftest.py (1 hunks)
tests/workflows/parallel/test_fan_in.py (1 hunks)
tests/workflows/parallel/test_fan_out.py (1 hunks)
tests/workflows/parallel/test_parallel_llm.py (1 hunks)
tests/workflows/router/__init__.py (1 hunks)
tests/workflows/router/conftest.py (1 hunks)
tests/workflows/router/test_router_base.py (1 hunks)
tests/workflows/router/test_router_embedding.py (1 hunks)
tests/workflows/router/test_router_embedding_cohere.py (1 hunks)
tests/workflows/router/test_router_embedding_openai.py (1 hunks)
tests/workflows/router/test_router_llm.py (1 hunks)
tests/workflows/router/test_router_llm_anthropic.py (1 hunks)
tests/workflows/router/test_router_llm_openai.py (1 hunks)
tests/workflows/swarm/__init__.py (1 hunks)
tests/workflows/swarm/conftest.py (1 hunks)
tests/workflows/swarm/test_swarm.py (1 hunks)
tests/workflows/swarm/test_swarm_anthropic.py (1 hunks)
tests/workflows/swarm/test_swarm_openai.py (1 hunks)

💤 Files with no reviewable changes (4)

tests/fixture/expected_output.txt
tests/fixture/README.md
tests/test_event_progress.py
tests/fixture/mcp_basic_agent_20250131_205604.jsonl

🧰 Additional context used

🧬 Code Graph Analysis (17)

src/mcp_agent/executor/workflow_signal.py (1)

src/mcp_agent/executor/executor.py (3)

signal (151-176)

signal (382-392)

uuid (215-222)

src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (1)

src/mcp_agent/utils/common.py (1)

to_string (32-39)

tests/executor/temporal/test_temporal_executor.py (3)

src/mcp_agent/executor/temporal/__init__.py (5)

ensure_client (255-265)

wrap_as_activity (85-107)

_execute_task_as_async (109-143)

start_workflow (267-350)

terminate_workflow (367-386)

src/mcp_agent/app.py (1)

config (128-129)

src/mcp_agent/server/app_server.py (1)

app (69-71)

tests/workflows/parallel/conftest.py (4)

src/mcp_agent/core/context.py (1)

Context (50-79)

src/mcp_agent/agents/agent.py (1)

Agent (48-750)

src/mcp_agent/workflows/llm/augmented_llm.py (1)

AugmentedLLM (208-661)

tests/workflows/router/conftest.py (3)

mock_context (17-54)

mock_agent (58-69)

mock_llm (73-81)

tests/executor/test_workflow_signal.py (1)

src/mcp_agent/executor/workflow_signal.py (8)

Signal (13-48)

SignalRegistration (51-59)

PendingSignal (88-95)

BaseSignalHandler (98-157)

AsyncioSignalHandler (219-297)

ConsoleSignalHandler (160-216)

LocalSignalStore (301-336)

wait_for (319-336)

tests/workflows/router/test_router_embedding.py (4)

src/mcp_agent/agents/agent.py (1)

Agent (48-750)

src/mcp_agent/workflows/router/router_embedding.py (4)

EmbeddingRouter (29-240)

EmbeddingRouterCategory (22-26)

_compute_embedding (233-240)

_route_with_embedding (189-231)

tests/workflows/router/conftest.py (5)

mock_context (17-54)

mock_embedding_model (85-105)

mock_agent (58-69)

test_function (109-118)

test_function (114-116)

src/mcp_agent/workflows/router/router_base.py (1)

RouterResult (19-31)

tests/executor/test_workflow.py (1)

src/mcp_agent/executor/workflow.py (8)

WorkflowState (33-52)

WorkflowResult (55-59)

Workflow (62-505)

record_error (47-52)

id (111-115)

run_id (118-123)

update_state (447-454)

cancel (331-352)

tests/workflows/router/conftest.py (10)

src/mcp_agent/core/context.py (1)

Context (50-79)

src/mcp_agent/workflows/embedding/embedding_base.py (1)

EmbeddingModel (14-32)

src/mcp_agent/workflows/router/router_base.py (3)

RouterCategory (34-49)

ServerRouterCategory (52-55)

AgentRouterCategory (58-61)

tests/test_app.py (1)

mock_context (16-29)

src/mcp_agent/mcp/mcp_server_registry.py (1)

get_server_config (362-378)

src/mcp_agent/agents/agent.py (1)

Agent (48-750)

tests/workflows/router/test_router_llm_openai.py (3)

generate (29-31)

generate_str (33-35)

generate_structured (37-39)

tests/workflows/router/test_router_llm_anthropic.py (3)

generate (29-31)

generate_str (33-35)

generate_structured (37-39)

tests/workflows/router/test_router_embedding_openai.py (1)

embed (19-28)

tests/workflows/router/test_router_embedding_cohere.py (1)

embed (19-28)

tests/workflows/llm/test_augmented_llm_ollama.py (3)

src/mcp_agent/config.py (1)

OpenAISettings (145-160)

src/mcp_agent/workflows/llm/augmented_llm_ollama.py (1)

OllamaAugmentedLLM (17-79)

tests/workflows/llm/conftest.py (1)

mock_aggregator (17-29)

tests/workflows/intent_classifier/test_intent_classifier_embedding_cohere.py (7)

src/mcp_agent/core/context.py (1)

Context (50-79)

src/mcp_agent/workflows/intent_classifier/intent_classifier_base.py (1)

IntentClassificationResult (27-40)

src/mcp_agent/workflows/intent_classifier/intent_classifier_embedding.py (1)

EmbeddingIntent (24-30)

src/mcp_agent/workflows/intent_classifier/intent_classifier_embedding_cohere.py (1)

CohereEmbeddingIntentClassifier (13-45)

tests/workflows/intent_classifier/conftest.py (4)

embed (55-65)

embedding_dim (68-69)

test_intents (28-46)

mock_context (11-24)

tests/workflows/intent_classifier/test_intent_classifier_embedding_openai.py (2)

embed (31-39)

embedding_dim (42-43)

tests/workflows/parallel/conftest.py (1)

mock_context (10-16)

tests/workflows/evaluator_optimizer/test_evaluator_optimizer.py (2)

src/mcp_agent/workflows/evaluator_optimizer/evaluator_optimizer.py (5)

EvaluatorOptimizerLLM (48-469)

EvaluationResult (33-45)

QualityRating (24-30)

_build_eval_prompt (425-446)

_build_refinement_prompt (448-469)

src/mcp_agent/workflows/llm/augmented_llm.py (1)

AugmentedLLM (208-661)

tests/workflows/swarm/test_swarm.py (4)

src/mcp_agent/agents/agent.py (1)

CallToolRequest (789-798)

src/mcp_agent/workflows/swarm/swarm.py (8)

AgentFunctionResult (142-156)

SwarmAgent (74-139)

DoneAgent (309-322)

create_agent_resource (50-58)

create_agent_function_result_resource (61-71)

get_tool (213-220)

set_agent (274-297)

should_continue (299-306)

src/mcp_agent/workflows/swarm/swarm_openai.py (1)

OpenAISwarm (9-41)

tests/workflows/swarm/conftest.py (8)

mock_context (11-17)

mock_swarm_agent (34-46)

test_function_result (56-58)

test_function_agent_result (62-64)

test_function_agent_function_result (68-70)

mock_tool_response (80-82)

mock_agent (21-30)

done_agent (50-52)

tests/workflows/llm/test_augmented_llm_bedrock.py (5)

src/mcp_agent/config.py (1)

BedrockSettings (121-132)

src/mcp_agent/workflows/llm/augmented_llm_bedrock.py (4)

BedrockAugmentedLLM (48-346)

BedrockMCPTypeConverter (434-496)

mcp_content_to_bedrock_content (499-531)

bedrock_content_to_mcp_content (534-569)

src/mcp_agent/workflows/llm/augmented_llm.py (5)

RequestParams (114-153)

select_model (302-343)

_log_chat_progress (511-521)

_log_chat_finished (523-526)

get_request_params (345-366)

src/mcp_agent/utils/common.py (1)

typed_dict_extras (27-29)

tests/workflows/llm/conftest.py (1)

mock_aggregator (17-29)

tests/mcp/test_mcp_aggregator.py (1)

src/mcp_agent/mcp/mcp_aggregator.py (12)

MCPAggregator (65-1102)

list_servers (536-548)

_parse_capability_name (946-991)

NamespacedTool (45-52)

NamespacedPrompt (55-62)

load_server (294-367)

load_servers (369-424)

MCPCompoundServer (1105-1171)

_list_tools (1121-1124)

_list_prompts (1141-1144)

_call_tool (1126-1139)

_get_prompt (1146-1162)

tests/workflows/orchestrator/test_orchestrator_integration.py (4)

src/mcp_agent/workflows/llm/augmented_llm.py (1)

RequestParams (114-153)

src/mcp_agent/workflows/orchestrator/orchestrator.py (1)

Orchestrator (46-586)

src/mcp_agent/workflows/orchestrator/orchestrator_models.py (5)

Plan (46-55)

Step (35-43)

NextStep (111-116)

PlanResult (86-108)

AgentTask (27-32)

tests/workflows/orchestrator/conftest.py (5)

mock_llm_factory (64-70)

mock_agents (74-87)

mock_context (42-60)

generate_str (34-35)

generate (31-32)

tests/workflows/llm/test_augmented_llm_openai.py (4)

src/mcp_agent/config.py (1)

OpenAISettings (145-160)

src/mcp_agent/workflows/llm/augmented_llm_openai.py (4)

MCPOpenAITypeConverter (920-1026)

generate (146-359)

to_mcp_message_result (944-953)

from_mcp_message_param (956-977)

src/mcp_agent/workflows/llm/augmented_llm.py (3)

RequestParams (114-153)

select_model (302-343)

get_request_params (345-366)

tests/workflows/llm/conftest.py (2)

mock_context (8-13)

mock_aggregator (17-29)

tests/workflows/llm/test_augmented_llm_anthropic.py (4)

src/mcp_agent/config.py (1)

AnthropicSettings (111-118)

src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (8)

AnthropicMCPTypeConverter (754-863)

mcp_content_to_anthropic_content (866-936)

anthropic_content_to_mcp_content (939-982)

mcp_stop_reason_to_anthropic_stop_reason (985-997)

anthropic_stop_reason_to_mcp_stop_reason (1000-1012)

generate (131-332)

to_mcp_message_result (779-796)

from_mcp_message_param (799-808)

src/mcp_agent/workflows/llm/augmented_llm.py (2)

RequestParams (114-153)

get_request_params (345-366)

src/mcp_agent/utils/common.py (1)

typed_dict_extras (27-29)

🪛 Ruff (0.11.9)

src/mcp_agent/workflows/swarm/swarm.py

87-87: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

88-88: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

src/mcp_agent/workflows/llm/augmented_llm_google.py

442-445: Use ternary operator role = "assistant" if result.role == "model" else result.role instead of if-else-block

Replace if-else-block with role = "assistant" if result.role == "model" else result.role

(SIM108)

tests/executor/test_workflow_signal.py

452-455: Use contextlib.suppress(asyncio.TimeoutError) instead of try-except-pass

Replace with contextlib.suppress(asyncio.TimeoutError)

(SIM105)

tests/test_app.py

224-227: Use a single with statement with multiple contexts instead of nested with statements

Combine with statements

(SIM117)

250-251: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

265-266: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

tests/workflows/intent_classifier/test_intent_classifier_embedding_openai.py

243-243: Loop control variable intent_name not used within loop body

Rename unused intent_name to _intent_name

(B007)

tests/workflows/intent_classifier/test_intent_classifier_embedding_cohere.py

243-243: Loop control variable intent_name not used within loop body

Rename unused intent_name to _intent_name

(B007)

tests/agents/test_agent.py

219-223: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

277-281: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

tests/workflows/orchestrator/test_orchestrator.py

174-174: Loop control variable agent not used within loop body

Rename unused agent to _agent

(B007)

295-296: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

357-358: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

430-431: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

tests/workflows/orchestrator/test_orchestrator_integration.py

37-37: Loop control variable agent not used within loop body

Rename unused agent to _agent

(B007)

182-182: Loop control variable agent not used within loop body

Rename unused agent to _agent

(B007)

🔇 Additional comments (62)

tests/workflows/router/__init__.py (1)

1-2: LGTM! Standard package marker file.

This is a proper addition of an __init__.py file to mark the directory as a Python package, enabling test discovery. The comments clearly explain the purpose.

pyproject.toml (1)

75-75: LGTM! Appropriate addition of trio for async testing.

Adding trio>=0.30.0 to dev dependencies is appropriate for supporting asynchronous concurrency testing in the new test suites.

src/mcp_agent/workflows/llm/augmented_llm.py (1)

256-261: LGTM! Clean conditional parameter passing.

Good improvement using dictionary unpacking to conditionally pass the instruction parameter only when it's not None. This prevents passing explicit None values and allows the Agent constructor to handle its own default value logic.

tests/workflows/parallel/conftest.py (4)

9-16: LGTM! Clean and simple mock context fixture.

The mock_context fixture provides a minimal mock Context instance with the essential executor attribute. This is appropriately simplified compared to the router conftest which includes provider configurations not needed for parallel workflow tests.

19-28: LGTM! Proper async context manager mocking.

The mock_agent fixture correctly mocks the Agent class with async context manager support (__aenter__ and __aexit__). This is essential since Agent implements these methods for proper lifecycle management.

31-40: LGTM! Comprehensive LLM method mocking.

The mock_llm fixture properly mocks all the key async generation methods (generate, generate_str, generate_structured) of the AugmentedLLM class, providing good test isolation.

43-48: LGTM! Useful factory pattern for async testing.

The mock_llm_factory fixture provides an async factory function that returns the mocked LLM instance. This pattern is useful for testing components that need to create LLM instances dynamically.

tests/workflows/intent_classifier/conftest.py (2)

10-24: Well-structured mock context fixture.

The mock context fixture properly isolates API configurations for testing and follows pytest best practices.

27-46: Good test intent examples for comprehensive testing.

The test intents provide diverse examples covering greeting, farewell, and help scenarios, which should enable thorough testing of intent classification logic.

tests/workflows/orchestrator/__init__.py (1)

1-1: Proper package initialization for test discovery.

The docstring appropriately marks this directory as a test package for the orchestrator workflow module.

tests/workflows/swarm/__init__.py (1)

1-1: Appropriate test package marker.

The comment clearly identifies this as the test package for swarm workflow components.
src/mcp_agent/logging/transport.py (1)

276-276:
❓ Verification inconclusive

Review the event loop handling in init_queue() before auto-initialization.

While automatically calling init_queue() in the constructor improves usability, it makes the problematic event loop handling in init_queue() more likely to cause issues.

The init_queue() method (lines 278-288) contains this concerning pattern:
try:
    self._loop = asyncio.get_running_loop()
except RuntimeError:
    self._loop = asyncio.new_event_loop()
    asyncio.set_event_loop(self._loop)  # This sets the loop globally!
Setting a new event loop globally in a constructor can interfere with existing event loops in the application. Consider one of these safer approaches:

Lazy initialization: Only call init_queue() when actually needed (e.g., in start() method)

Remove global loop setting: Store the loop reference without setting it globally

Constructor parameter: Allow passing the event loop as a constructor parameter

🏁 Script executed:
#!/bin/bash
# Search for other usages of AsyncEventBus constructor to assess impact
rg -A 5 -B 5 "AsyncEventBus\(" --type py
Length of output: 42
Verify global event loop handling in init_queue()

No other references to AsyncEventBus( were found in the codebase—please confirm whether this constructor is used elsewhere (e.g., plugins, downstream services) to avoid unintended side effects.

In src/mcp_agent/logging/transport.py (lines 276–288), init_queue() does:
try:
    self._loop = asyncio.get_running_loop()
except RuntimeError:
    self._loop = asyncio.new_event_loop()
    asyncio.set_event_loop(self._loop)  # resets the global loop!
Setting a new event loop globally in a constructor can silently break existing loops in your application. Consider one of these safer approaches:

Lazy initialization: defer init_queue() until you actually need to start processing (e.g., in a start() or run() method)

Remove global loop setting: keep a private loop reference without calling asyncio.set_event_loop()

Constructor parameter: accept an existing loop via __init__(..., loop: asyncio.AbstractEventLoop=None)

Please review all usages of AsyncEventBus in your projects and assess the impact of altering the global loop.
tests/agents/conftest.py (2)

7-17: Well-structured mock context fixture.

The fixture properly mocks the asynchronous methods and provides a realistic context structure for testing agent functionality.

20-27: Good test tool fixture design.

The mock tool provides a realistic schema structure that's suitable for testing tool-related functionality.

tests/workflows/llm/conftest.py (2)

7-13: Clean and focused mock context fixture.

The fixture provides essential mocked components (executor and model_selector) needed for LLM workflow testing.

16-29: Comprehensive mock aggregator fixture.

The fixture properly mocks both synchronous (tools list) and asynchronous (list_tools, call_tool) functionality needed for testing tool-enabled LLM workflows.

tests/workflows/llm/README.md (1)

1-95: Excellent test documentation.

The README provides comprehensive guidance for running, understanding, and extending the LLM provider tests. The structure is clear and includes practical examples for different scenarios.

src/mcp_agent/workflows/orchestrator/orchestrator.py (3)

102-107: LGTM - Typo fix and instruction improvement!

The correction from sythesizer to synthesizer fixes a typo, and the simplified instruction is more concise and clear.

109-113: Good addition of input validation!

Adding validation for the plan_type parameter is excellent defensive programming. This ensures only valid values are accepted and provides clear error messages.

358-358: LGTM - Consistent typo fix!

Good catch fixing the variable name from sythesizer to synthesizer to maintain consistency with the constructor fix.

src/mcp_agent/workflows/router/router_embedding.py (3)

101-101: Critical fix - Properly await async method!

Good catch adding await before self._compute_embedding(). The _compute_embedding method is async and must be awaited.

103-103: Correct data extraction using model_dump()!

Using model_dump() before unpacking ensures proper data extraction from the Pydantic model into the dictionary format expected by EmbeddingRouterCategory.

209-209: Critical fix - Properly await async method!

Good catch adding await before self._compute_embedding(). This was another missing await for the async embedding computation method.

tests/workflows/orchestrator/test_orchestrator_prompts.py (1)

1-131: Excellent comprehensive test suite for prompt templates!

This test file provides thorough coverage for all orchestrator prompt templates with several strengths:

Complete coverage: Tests all major prompt templates used in the orchestrator workflow

Clear structure: Each test follows a consistent pattern of format → assert

Content validation: Verifies both placeholder substitution and expected content presence

Consistency checks: The test_templates_consistency and test_template_order methods ensure uniform formatting and logical organization

Clear naming: Test method names clearly indicate what's being tested

The test structure is well-organized and provides good confidence in template correctness.

tests/workflows/intent_classifier/README.md (1)

1-70: Excellent test documentation providing comprehensive guidance!

This README effectively documents the intent classifier testing strategy with several strong points:

Clear purpose: Explains what's being tested and why it matters

Mock strategy: Well-explained rationale for using mocks (speed, determinism, no external dependencies)

Practical guidance: Specific commands for running tests at different granularities

Development support: Clear instructions for adding new tests with naming conventions

Test coverage guidelines: Comprehensive list of key test cases to ensure thorough coverage

The documentation will help developers understand the testing approach and maintain consistent test quality across the intent classifier modules.

src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (1)

956-976: LGTM! Improved type handling for content conversion.

The conditional logic to handle both Pydantic model instances and plain dictionaries makes the function more robust and defensive. The implementation correctly extracts type and text attributes based on the block's type, which aligns with the broader pattern of content conversion improvements across LLM provider modules.

tests/workflows/orchestrator/test_orchestrator_models.py (1)

1-266: Excellent comprehensive test coverage for orchestrator models.

This test module provides thorough coverage of the orchestrator data models including:

Object creation and attribute validation for all model types

Relationship handling between steps, tasks, and results

Collection operations (add_task_result, add_step_result)

Formatting functions with proper status indicators

Edge cases like completion state handling

The test structure is clean with descriptive names and clear assertions. This significantly improves the code coverage as mentioned in the PR objectives.

src/mcp_agent/executor/workflow_signal.py (1)

228-228: LGTM! Improved naming consistency and handler tracking.

The renaming from unique_name to unique_signal_name enhances code clarity. The modification to store handlers as [unique_signal_name, wrapped] tuples instead of just the wrapped function enables better handler tracking and management, which supports more robust signal handling capabilities.

These changes maintain backward compatibility while improving internal structure and readability.

Also applies to: 232-232, 260-260, 267-267, 275-277

src/mcp_agent/workflows/llm/augmented_llm_bedrock.py (2)

473-473: Critical fix: Wrap single content in list for conversion function.

This fix ensures mcp_content_to_bedrock_content receives a list as expected, preventing potential type errors during content conversion.

541-541: Critical bug fixes: Correct variable references in content loop.

These fixes change incorrect references from content (the entire list) to block (the current iteration item), ensuring proper access to individual content block fields like text, image, toolUse, and document. This prevents erroneous data extraction and ensures correct content conversion.

Also applies to: 546-547, 555-555, 563-564

src/mcp_agent/mcp/mcp_aggregator.py (2)

190-190: Good improvement for lifecycle state consistency.

Explicitly setting initialized = False when skipping closure ensures consistent state tracking regardless of connection persistence mode. This prevents potential issues where the aggregator might appear initialized when it shouldn't be.

983-985: Excellent change for deterministic behavior.

Iterating over servers in the configured server_names order instead of arbitrary dictionary iteration makes capability name parsing predictable and consistent. This ensures that when multiple servers provide the same capability name, the resolution order follows the intended server priority.

tests/workflows/swarm/test_swarm_openai.py (1)

1-192: Excellent comprehensive test suite for OpenAISwarm.

This test file provides thorough coverage of the OpenAISwarm functionality:

Proper async testing: Uses pytest.mark.asyncio consistently

Good fixture design: Well-structured fixtures for mock objects

Comprehensive scenarios: Tests initialization, default/custom parameters, multiple iterations, early termination, and DoneAgent behavior

Effective mocking: Proper use of AsyncMock and MagicMock to control test behavior

Validation coverage: Verifies both method calls and expected parameters/results

The tests effectively validate iteration control logic and agent state management, which are critical for swarm workflow reliability.

tests/executor/temporal/test_temporal_executor.py (1)

1-105: Well-structured test suite for TemporalExecutor.

This test file provides solid coverage of the TemporalExecutor functionality:

Good fixture design: Clear separation of concerns with mock_client, mock_context, and executor fixtures

Core functionality coverage: Tests client connection, activity wrapping, and task execution paths

Execution context handling: Tests both inside and outside workflow contexts with appropriate patching

Workflow management: Covers workflow starting and termination with proper mock verification

Async testing patterns: Consistent use of pytest.mark.asyncio and proper async/await handling

The tests effectively validate the Temporal integration patterns and ensure the executor behaves correctly under different execution contexts.

src/mcp_agent/workflows/swarm/swarm.py (1)

255-265: Good alignment with resource handling changes.

The update to check for AgentFunctionResultResource type and access nested attributes through content.result.* properly aligns with the resource wrapper pattern. This ensures consistent handling of agent function results and context variable updates.

tests/workflows/router/test_router_llm.py (1)

1-503: Excellent comprehensive test coverage!

This test suite provides thorough coverage of the LLMRouter functionality including:

Initialization scenarios (basic and factory methods)

Routing with different parameter combinations

Edge cases (empty responses, invalid categories)

All routing method variants (route, route_to_server, route_to_agent, route_to_function)

Context generation with different inclusion flags

The test structure is clean, well-organized, and follows good async testing patterns with proper mocking.

tests/executor/temporal/test_workflow_registry.py (1)

1-10: LGTM! Good import structure and fixture setup.

The imports are appropriate for async testing with proper mocking utilities. The mock executor fixture correctly sets up the necessary async mock structure.
tests/workflows/router/test_router_base.py (3)

14-49: Excellent concrete implementation for testing abstract base class.

The TestRouter class provides a clean, minimal implementation of the abstract Router methods that's perfect for testing the base class functionality without complex dependencies.

220-231: Good mock setup for tools formatting test.

The test correctly sets up mock tools with string attributes. This approach properly tests the formatting logic without relying on complex tool objects.

135-155:
✅ Verification successful

Verify the test logic for multiple initialize calls.

The test mocks get_server_category after initialization to verify it's not called again. However, this approach might not accurately test the actual implementation behavior.

🏁 Script executed:
#!/bin/bash
# Description: Check if Router.initialize method has proper initialization guards

# Search for the initialize method implementation
ast-grep --pattern $'async def initialize(self):
  $$$
  if self.initialized:
    $$$
  $$$'
Length of output: 9047

🏁 Script executed:
#!/bin/bash
# Find TestRouter definition in tests
rg --with-filename --line-number "class TestRouter" -n tests
Length of output: 202

🏁 Script executed:
#!/bin/bash
# Display TestRouter implementation
sed -n '1,200p' tests/workflows/router/test_router_base.py
Length of output: 7848
Multiple initialize calls guard is correctly tested

The initialize method in router_base.py has an early‐return when self.initialized is True, so replacing get_server_category with a mock after the first call and asserting its call_count remains zero on the second invocation accurately verifies that no re‐initialization occurs. No changes required.
tests/executor/test_workflow.py (3)

14-23: Excellent error recording test with proper exception handling.

The test correctly verifies that error information is captured with proper type, message, and timestamp fields. This ensures the error recording functionality works as expected.

35-37: Simple but effective MockWorkflow implementation.

The MockWorkflow provides a clean, minimal implementation of the abstract run method that's perfect for testing the base Workflow functionality.

89-94: Good test for executor property validation.

The test correctly verifies that accessing the executor property raises a ValueError when the executor is None. This ensures proper error handling for missing dependencies.

tests/utils/test_pydantic_type_serializer.py (2)

64-65: Correct update to Pydantic v2 field validator syntax.

The field validators are now properly declared as class methods with the @classmethod decorator, which is required in Pydantic v2.

Also applies to: 91-92

179-181: Good update to schema comparison method.

The change from schema() to model_json_schema() is correct for Pydantic v2 compatibility.
tests/workflows/parallel/test_fan_out.py (1)

214-238: generate/generate_str mocks may need to be AsyncMocks.

The LLM/Agent stubs here are patched with:
mock_llm_with_name.generate.return_value = expected_result
mock_llm_with_name.generate_str.return_value = expected_result
These attributes will only behave as awaitable coroutines if mock_llm_with_name.generate was already an AsyncMock in the upstream fixture. If the underlying fixture supplies a plain MagicMock, await llm.generate(...) inside the real FanOut will raise TypeError: object MagicMock can't be used in 'await' expression.

Consider enforcing the async nature explicitly for safety:
mock_llm_with_name.generate = AsyncMock(return_value=expected_result)
mock_llm_with_name.generate_str = AsyncMock(return_value=expected_result)
(or guarantee the upstream fixture uses AsyncMock).
tests/test_app.py (1)

400-417: Potential logger double-patch confusion.

The test first patches get_logger, initializes the context (which internally may call get_logger), then resets basic_app._logger and patches get_logger again. If the internal initialization logic caches the original logger elsewhere (e.g., in the Context), subsequent calls might unexpectedly use the first mock.

Consider making the intention explicit by asserting no hidden second reference exists:
assert basic_app._context.logger is None  # or whatever the contract is
or reuse the same patched get_logger for both phases.
tests/workflows/router/test_router_embedding.py (1)

1-498: Excellent comprehensive test coverage for EmbeddingRouter!

This test file demonstrates excellent testing practices with thorough coverage of the EmbeddingRouter class functionality:

Complete lifecycle testing: Covers initialization, factory methods, embedding computation, and all routing variants

Proper async testing: Correctly uses pytest.mark.asyncio and proper mocking for async methods

Edge case coverage: Tests empty categories, missing embeddings, and similarity scoring

Good isolation: Uses mocks effectively to avoid external dependencies

Clear documentation: Each test has descriptive docstrings explaining its purpose

The test structure follows best practices with logical grouping and comprehensive assertions that verify both functionality and ordering of results.

tests/workflows/intent_classifier/test_intent_classifier_embedding_cohere.py (1)

1-285: Well-structured test suite with comprehensive coverage.

This test file provides excellent coverage for the CohereEmbeddingIntentClassifier with:

Mock implementation: Clean MockCohereEmbeddingModel that provides deterministic embeddings for testing

Complete functionality testing: Covers initialization, factory methods, classification, and edge cases

Proper async patterns: Correctly structured async tests with appropriate mocking

Error handling: Tests empty intents scenario with proper exception handling

The test structure is consistent with other classifier tests in the codebase, promoting maintainability.

🧰 Tools

🪛 Ruff (0.11.9)

243-243: Loop control variable intent_name not used within loop body

Rename unused intent_name to _intent_name

(B007)

tests/workflows/intent_classifier/test_intent_classifier_embedding_openai.py (1)

1-285: Excellent test coverage for OpenAI embedding-based intent classification.

This test file provides comprehensive testing for the OpenAIEmbeddingIntentClassifier with:

Consistent structure: Follows the same testing patterns as other classifier tests, promoting codebase consistency

Mock implementation: MockOpenAIEmbeddingModel provides deterministic embeddings suitable for testing

Full feature coverage: Tests initialization, factory methods, classification functionality, and error scenarios

Proper async handling: Well-structured async tests with appropriate mocking to avoid external API calls

The parallel structure with the Cohere tests makes the test suite easy to understand and maintain.

🧰 Tools

🪛 Ruff (0.11.9)

243-243: Loop control variable intent_name not used within loop body

Rename unused intent_name to _intent_name

(B007)

tests/executor/temporal/test_signal_handler.py (1)

1-99: Comprehensive signal handling test coverage.

This test file provides excellent coverage for the Temporal signal handling functionality:

Clean fixture design: Well-organized pytest fixtures that promote test reusability

Complete scenario coverage: Tests both in-workflow and out-of-workflow signal handling paths

Proper mocking: Appropriately mocks Temporal workflow internals without testing external dependencies

Edge case handling: Includes tests for nonexistent signals and idempotent operations

Async testing: Correctly structured async tests for temporal operations

The tests effectively verify the signal mailbox functionality, workflow attachment behavior, and both internal and external signaling mechanisms.
tests/workflows/router/test_router_llm_anthropic.py (1)

155-176: Ensure test does not indirectly depend on the real anthropic package

The test patches AnthropicAugmentedLLM, but a hard import of mcp_agent.workflows.router.router_llm_anthropic (line 8) may still execute a top-level import anthropic inside that module.
If anthropic is not listed in the project’s test dependencies, CI will raise ModuleNotFoundError.

Options

Declare anthropic (or a lightweight stub such as anthropic==0.0.0) in the test extras / pyproject.toml.

Lazily import the router inside the patched context, e.g.:
with patch("mcp_agent.workflows.router.router_llm_anthropic.AnthropicAugmentedLLM",
           MockAnthropicAugmentedLLM):
    from mcp_agent.workflows.router.router_llm_anthropic import AnthropicLLMRouter
Or monkey-patch sys.modules["anthropic"] = ModuleType("anthropic") before the first import.

Pick whichever suits the repo’s dependency policy.
tests/workflows/intent_classifier/test_intent_classifier_llm_openai.py (1)

448-470: initialize spy counts two calls even when the second is a no-op

Wrapping the real initialize method with AsyncMock(wraps=real_initialize) increments the call-count before the method’s internal early-exit hits, so the assertion verifies the spy, not the optimisation.

If you want to assert the internal short-circuit, introspect a flag instead:
await classifier.initialize()
await classifier.initialize()
assert classifier._already_initialized  # or similar internal flag
Otherwise this test may pass even if the second call performs heavy work again.
tests/workflows/llm/test_augmented_llm_azure.py (1)

293-296: Assertion validates response content inside the request payload – very likely wrong

req.payload["messages"] represents the prompt sent to Azure, yet the test asserts it already contains the assistant’s reply:
assert req.payload["messages"][1].content == "Response without history"
At request-time the assistant message does not exist; only the user (and possibly system/history) messages should be present.
The check should compare against the user query or simply verify that only one user message is present when use_history=False.
src/mcp_agent/workflows/llm/augmented_llm_openai.py (1)

944-950: model="" may introduce sentinel-value ambiguity
to_mcp_message_result now hard-codes model="" instead of None. Down-stream code that previously relied on None to signal “model unknown/unset” will now receive an empty string, which is truthy in many contexts (e.g., if result.model:).

Unless you have audited all call-sites, prefer keeping None or make the change explicit via typed docs & unit tests.
src/mcp_agent/workflows/llm/augmented_llm_azure.py (1)

225-227: Token-usage accessor may not match SDK object – please double-check.

response.usage is typically an object exposing attributes such as prompt_tokens / completion_tokens (at least in the GA Azure SDK). Switching to response.usage["prompt_tokens"] and ["completion_tokens"] will raise TypeError: 'Usage' object is not subscriptable if the SDK version still uses attributes.

If you intentionally upgraded to a newer SDK that returns a mapping-style usage, please ignore; otherwise revert to attribute access or add a defensive fallback:
-total_input_tokens  += response.usage["prompt_tokens"]
-total_output_tokens += response.usage["completion_tokens"]
+total_input_tokens  += getattr(response.usage, "prompt_tokens", response.usage.get("prompt_tokens"))
+total_output_tokens += getattr(response.usage, "completion_tokens", response.usage.get("completion_tokens"))
tests/workflows/evaluator_optimizer/test_evaluator_optimizer.py (1)

120-122: Enum rendering mismatch – expected numeric vs. name

The test asserts the prompt contains the literal string "Quality Rating: 1".
_build_refinement_prompt interpolates feedback.rating, whose string
representation may be "FAIR" or "QualityRating.FAIR" depending on the enum
definition, not "1". The test could become flaky if the enum __str__
implementation changes.

Please verify the intended format and adjust either the production code or the
assertion:
-    assert "Quality Rating: 1" in prompt
+    assert "Quality Rating: FAIR" in prompt   # or the agreed representation
tests/workflows/llm/test_augmented_llm_anthropic.py (1)

318-334: create_tool_use_side_effect inspects kwargs["messages"] but executor passes messages positionally

Inside side_effect, the check
messages = kwargs.get("messages", [])
will always return [] because in production the request object is passed as a positional arg (e.g., await executor.execute(self, request)), not as a keyword. Consequently the “final iteration” condition may never trigger, making the test logic depend only on call_count.

Consider extracting the request object from args[1], as done elsewhere in the suite, e.g.:
request_obj = args[1]
messages = request_obj.payload["messages"]
src/mcp_agent/utils/pydantic_type_serializer.py (4)

303-314: Good migration to Pydantic v2 validator API

The switch to validator.info.fields is correct and the cast to str
keeps JSON payloads clean.
No further action required.

420-425: 👍 Prefer type_ for private attrs

Using type_ where available brings v2 parity while still falling back
to Any. Looks good.

445-455: Efficient handling of ConfigDict

Treating ConfigDict as a mapping and early-returning prevents an
unnecessary reflection pass—nice improvement.

538-541: Literal deserialization looks correct

Using Literal.__getitem__ with the raw tuple reconstructs the type
accurately. Good job mirroring the v2 typing internals.

pyproject.toml

tests/workflows/intent_classifier/conftest.py

src/mcp_agent/workflows/swarm/swarm.py

tests/mcp/test_mcp_connection_manager.py

tests/executor/temporal/test_workflow_registry.py

coderabbitai · 2025-05-25T04:42:20Z

tests/workflows/llm/test_augmented_llm_anthropic.py

+        # Setup mock executor to raise an exception
+        mock_llm.executor.execute = AsyncMock(return_value=Exception("API Error"))
+
+        # Call LLM
+        responses = await mock_llm.generate("Test query with API error")
+
+        # Assertions


⚠️ Potential issue

Same “exception should be raised” issue as in the OpenAI test

The mock returns an Exception object instead of raising it, so the code under test is never put in the error path.

-# Setup mock executor to raise an exception -mock_llm.executor.execute = AsyncMock(return_value=Exception("API Error")) +# Setup mock executor to raise an exception +mock_llm.executor.execute = AsyncMock(side_effect=Exception("API Error"))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Setup mock executor to raise an exception

mock_llm.executor.execute = AsyncMock(return_value=Exception("API Error"))

# Call LLM

responses = await mock_llm.generate("Test query with API error")

# Assertions

# Setup mock executor to raise an exception

mock_llm.executor.execute = AsyncMock(side_effect=Exception("API Error"))

# Call LLM

responses = await mock_llm.generate("Test query with API error")

# Assertions

🤖 Prompt for AI Agents

In tests/workflows/llm/test_augmented_llm_anthropic.py around lines 403 to 409, the mock executor is set to return an Exception object instead of raising it, so the error path is not triggered. Change the AsyncMock setup to raise the Exception by using side_effect=Exception("API Error") instead of return_value, ensuring the code under test properly encounters the raised exception.

src/mcp_agent/utils/pydantic_type_serializer.py

coderabbitai · 2025-05-25T04:42:21Z

src/mcp_agent/utils/pydantic_type_serializer.py

+        # Patch __init__ to ensure private attributes are initialized on instance
+        private_attrs = getattr(reconstructed_model, "__private_attributes__", {})
+        if private_attrs:
+            orig_init = reconstructed_model.__init__
+
+            def _init_with_private_attrs(self, *args, **kwargs):
+                orig_init(self, *args, **kwargs)
+                for attr_name, private_attr in private_attrs.items():
+                    # Only set if not already set
+                    if not hasattr(self, attr_name):
+                        default = private_attr.default
+                        # If default is ... (Ellipsis), treat as None
+                        if default is ...:
+                            default = None
+                        setattr(self, attr_name, default)
+
+            reconstructed_model.__init__ = _init_with_private_attrs


🛠️ Refactor suggestion

Monkey-patching __init__ may break dataclass-like signatures

Overwriting __init__:

Removes the rich signature Pydantic generates (hurts tooling &
validation).

Breaks multiple inheritance if models are later subclassed.

Runs for every reconstructed model, even when Pydantic would have
already initialised private attrs (v1 & v2).

Prefer letting Pydantic handle private attributes:

- private_attrs = getattr(reconstructed_model, "__private_attributes__", {}) - if private_attrs: - orig_init = reconstructed_model.__init__ - def _init_with_private_attrs(self, *args, **kwargs): - orig_init(self, *args, **kwargs) - ... - reconstructed_model.__init__ = _init_with_private_attrs + # Pydantic takes care of initialising PrivateAttr defaults; no patch needed.

If a specific bug required this hook, add a comment with a minimal
reproduction and scope the patch to that scenario.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/mcp_agent/utils/pydantic_type_serializer.py around lines 719 to 735, the current approach monkey-patches the __init__ method of reconstructed_model to initialize private attributes, which removes Pydantic's rich __init__ signature, breaks multiple inheritance, and runs unnecessarily for all models. To fix this, remove the __init__ monkey-patching entirely and rely on Pydantic's built-in handling of private attributes. If a specific bug requires this patch, add a detailed comment explaining the minimal reproduction and limit the patch only to that case.

coderabbitai · 2025-05-25T04:42:21Z

src/mcp_agent/utils/pydantic_type_serializer.py

                field_definitions[field_name] = (
                    field_type,
                    Field(
-                        default=default if default_factory is None else ...,
+                        default=... if default_factory is not None else default,
                        default_factory=default_factory,
                        **constraints,
                    ),
                )
            else:
-                field_definitions[field_name] = (field_type, default)
+                if is_required:
+                    field_definitions[field_name] = (field_type, Field(default=...))
+                else:
+                    field_definitions[field_name] = (
+                        field_type,
+                        Field(
+                            default=default,
+                        ),
+                    )



🛠️ Refactor suggestion

Avoid passing default_factory=None & reconcile required logic

Field(..., default_factory=None) is semantically meaningless and may
raise in future Pydantic releases.
Also, when is_required=False but default is still ..., the field
remains required.

- Field( - default=... if default_factory is not None else default, - default_factory=default_factory, + kwargs = {**constraints} + if default_factory is not None: + kwargs["default_factory"] = default_factory + kwargs["default"] = ... + elif default is not ...: + kwargs["default"] = default + else: + kwargs["default"] = ... + Field(**kwargs),

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/mcp_agent/utils/pydantic_type_serializer.py around lines 683 to 701, avoid passing default_factory=None to Field as it is meaningless and may cause errors in future Pydantic versions. Also, ensure that when is_required is False, the default is not set to ... which makes the field required. Fix this by only passing default_factory if it is not None and by setting default to a proper non-required value when is_required is False.

…er randomness

…mAgent to avoid mutable default arguments

…'src.' prefix

…un_async_asyncio

…n tests

…rtion for result validation

…text

…and improve context handling

…rehension

…lizer to avoid runtime TypeError

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (6)

tests/workflows/llm/test_augmented_llm_azure.py (1)
402-403: Fix pending: API error test never raises – use side_effect instead of return_value

The mock executor is still returning an Exception object instead of raising it, which means the error handling code path won't be properly tested.
- mock_llm.executor.execute = AsyncMock(return_value=Exception("API Error"))
+ mock_llm.executor.execute = AsyncMock(side_effect=Exception("API Error"))
src/mcp_agent/utils/pydantic_type_serializer.py (3)
393-393: Consider Pydantic v1 compatibility for broader adoption

While the current approach uses Pydantic v2's is_required() method, maintaining compatibility with v1 would increase the utility's adoption:
- "required": field_info.is_required(),
+ "required": getattr(field_info, "is_required", lambda: getattr(field_info, "required", True))(),
683-701: Avoid passing default_factory=None to prevent future issues

The current code may pass default_factory=None to Field(), which is semantically meaningless and could cause errors in future Pydantic versions:
- Field(
-     default=... if default_factory is not None else default,
-     default_factory=default_factory,
-     **constraints,
- ),
+ kwargs = {**constraints}
+ if default_factory is not None:
+     kwargs["default_factory"] = default_factory
+     kwargs["default"] = ...
+ elif default is not ...:
+     kwargs["default"] = default
+ else:
+     kwargs["default"] = ...
+ Field(**kwargs),
719-735: Consider removing __init__ monkey-patching for better maintainability

The current approach of monkey-patching __init__ can break Pydantic's rich signatures and inheritance patterns. Consider relying on Pydantic's built-in private attribute handling instead, unless this patch addresses a specific documented issue.
tests/workflows/llm/test_augmented_llm_openai.py (1)
382-383: Fix pending: API error test never raises – use side_effect instead of return_value

The mock executor is returning an Exception object instead of raising it, preventing proper testing of error handling logic:
- # Setup mock executor to raise an exception
- mock_llm.executor.execute = AsyncMock(return_value=Exception("API Error"))
+ # Setup mock executor to raise an exception
+ mock_llm.executor.execute = AsyncMock(side_effect=Exception("API Error"))
tests/workflows/swarm/test_swarm.py (1)
201-220: ⚠️ Potential issue

Fix mocking anti-pattern - the test defeats its own purpose.

This test mocks the call_tool method on mock_swarm_agent and then calls the same mocked method, which doesn't actually test the real SwarmAgent.call_tool logic when a tool is not in _function_tool_map.

The test should verify that SwarmAgent.call_tool falls back to super().call_tool() when the tool name is not found in _function_tool_map. Replace this with a test that uses a real SwarmAgent instance:
-    @pytest.mark.asyncio
-    async def test_call_tool_with_non_function_tool(
-        self, mock_swarm_agent, mock_tool_response
-    ):
-        """Test call_tool with a non-function tool."""
-        # Set up mocks
-        mock_swarm_agent._function_tool_map = {}
-        mock_swarm_agent.initialized = True
-        mock_swarm_agent.call_tool = AsyncMock(return_value=mock_tool_response)
-
-        # Call the method directly without using Agent.call_tool
-        # We're testing that the SwarmAgent's call_tool method works when the tool
-        # is not in the function tool map
-        result = await mock_swarm_agent.call_tool("non_function_tool", {"arg": "value"})
-
-        # Assert the call was made and the result was returned
-        mock_swarm_agent.call_tool.assert_called_once_with(
-            "non_function_tool", {"arg": "value"}
-        )
-        assert result == mock_tool_response
+    @pytest.mark.asyncio
+    async def test_call_tool_with_non_function_tool(self, mock_tool_response):
+        """Test call_tool with a non-function tool falls back to parent class."""
+        # Create a real SwarmAgent instance
+        agent = SwarmAgent(
+            name="test_agent",
+            instruction="Test instruction",
+            server_names=[],
+            functions=[],
+            context=Context(),
+        )
+        agent._function_tool_map = {}  # Ensure tool is not in function map
+        agent.initialized = True
+        
+        # Mock the parent class call_tool method
+        with patch.object(agent.__class__.__bases__[0], 'call_tool', new_callable=AsyncMock) as mock_parent_call_tool:
+            mock_parent_call_tool.return_value = mock_tool_response
+            
+            # Call the method
+            result = await agent.call_tool("non_function_tool", {"arg": "value"})
+            
+            # Assert parent method was called and result returned
+            mock_parent_call_tool.assert_called_once_with("non_function_tool", {"arg": "value"})
+            assert result == mock_tool_response
You'll also need to add the patch import:
+from unittest.mock import AsyncMock, MagicMock, patch
-from unittest.mock import AsyncMock, MagicMock

🧹 Nitpick comments (2)

tests/workflows/llm/test_augmented_llm_ollama.py (1)
186-208: Simplify complex mocking pattern.

The execute_side_effect function and nested mocking are quite complex and brittle. Consider a simpler approach:
-            # Patch executor.execute to forward to the patched structured task
-            async def execute_side_effect(task, request):
-                if (
-                    task is mock_structured_task._mock_wraps
-                    or task is mock_structured_task
-                ):
-                    return await mock_structured_task(request)
-                return MagicMock()
-
-            llm.executor.execute = AsyncMock(side_effect=execute_side_effect)
+            # Mock executor.execute to return a simple result
+            llm.executor.execute = AsyncMock(return_value=TestResponseModel(name="test"))
This approach:

Reduces complexity and potential brittleness

Makes the test easier to understand

Focuses on the essential behavior being tested
tests/workflows/swarm/test_swarm.py (1)
36-36: Consider using the mock_context fixture consistently.

Multiple tests create Context() instances directly instead of using the provided mock_context fixture. While this works, using the fixture would be more consistent and allow for better test isolation.

Consider updating these tests to use the mock_context fixture:
-        context=Context(),
+        context=mock_context,
This ensures all tests use the same mocked context setup and makes the tests more maintainable.

Also applies to: 58-58, 84-84, 93-93, 121-121, 149-149, 184-184

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4284d6d and c12c995.

📒 Files selected for processing (19)

pyproject.toml (1 hunks)
src/mcp_agent/utils/pydantic_type_serializer.py (8 hunks)
src/mcp_agent/workflows/llm/augmented_llm_openai.py (4 hunks)
src/mcp_agent/workflows/swarm/swarm.py (2 hunks)
tests/executor/temporal/test_workflow_registry.py (1 hunks)
tests/executor/test_workflow.py (1 hunks)
tests/mcp/test_mcp_connection_manager.py (1 hunks)
tests/workflows/intent_classifier/conftest.py (1 hunks)
tests/workflows/llm/conftest.py (1 hunks)
tests/workflows/llm/test_augmented_llm_azure.py (1 hunks)
tests/workflows/llm/test_augmented_llm_bedrock.py (1 hunks)
tests/workflows/llm/test_augmented_llm_google.py (1 hunks)
tests/workflows/llm/test_augmented_llm_ollama.py (1 hunks)
tests/workflows/llm/test_augmented_llm_openai.py (3 hunks)
tests/workflows/orchestrator/conftest.py (1 hunks)
tests/workflows/parallel/test_parallel_llm.py (1 hunks)
tests/workflows/router/conftest.py (1 hunks)
tests/workflows/router/test_router_embedding_openai.py (1 hunks)
tests/workflows/swarm/test_swarm.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

pyproject.toml

🚧 Files skipped from review as they are similar to previous changes (13)

tests/workflows/llm/conftest.py
tests/workflows/intent_classifier/conftest.py
tests/executor/test_workflow.py
src/mcp_agent/workflows/swarm/swarm.py
tests/mcp/test_mcp_connection_manager.py
tests/workflows/parallel/test_parallel_llm.py
tests/executor/temporal/test_workflow_registry.py
tests/workflows/router/test_router_embedding_openai.py
tests/workflows/llm/test_augmented_llm_google.py
tests/workflows/router/conftest.py
tests/workflows/orchestrator/conftest.py
tests/workflows/llm/test_augmented_llm_bedrock.py
src/mcp_agent/workflows/llm/augmented_llm_openai.py

🧰 Additional context used

🧬 Code Graph Analysis (1)

tests/workflows/swarm/test_swarm.py (5)

src/mcp_agent/agents/agent.py (1)

CallToolRequest (789-798)

src/mcp_agent/workflows/swarm/swarm.py (6)

AgentFunctionResult (147-161)

SwarmAgent (74-144)

DoneAgent (314-327)

create_agent_resource (50-58)

create_agent_function_result_resource (61-71)

should_continue (304-311)

src/mcp_agent/workflows/swarm/swarm_openai.py (1)

OpenAISwarm (9-41)

src/mcp_agent/core/context.py (1)

Context (50-79)

tests/workflows/swarm/conftest.py (7)

mock_context (11-17)

test_function_result (56-58)

test_function_agent_function_result (68-70)

mock_swarm_agent (34-46)

mock_tool_response (80-82)

mock_agent (21-30)

done_agent (50-52)

🔇 Additional comments (13)

src/mcp_agent/utils/pydantic_type_serializer.py (4)

184-191: LGTM! Literal values are now properly serialized

The code now correctly wraps each literal argument with make_serializable() to ensure JSON compatibility, addressing the previous concern about non-JSON-serializable types like bytes.

303-303: LGTM! Pydantic v2 field validator compatibility

The change from validator.fields to validator.info.fields correctly aligns with Pydantic v2 API structure.

420-424: LGTM! Enhanced private attribute handling for Pydantic v2

The fallback to Any when type_ is not available ensures compatibility while leveraging v2's improved type information when present.

445-455: LGTM! Improved config serialization with proper JSON validation

The new logic correctly handles both dict-based configs (Pydantic v2 ConfigDict) and class-based configs, with appropriate JSON serializability checks.

tests/workflows/llm/test_augmented_llm_openai.py (2)

70-91: LGTM! Clean test response factory method

The static method provides a clean way to create mock OpenAI chat completion responses with proper typing and structure.

94-126: LGTM! Comprehensive tool use response creation

The method correctly constructs tool call responses with proper OpenAI structure and handles the nested function call format appropriately.
tests/workflows/llm/test_augmented_llm_ollama.py (5)

1-17: LGTM: Clean test setup and clear documentation.

The imports are appropriate and the class documentation clearly explains the testing scope, focusing on Ollama-specific functionality while acknowledging inheritance from OpenAIAugmentedLLM.

19-50: Well-designed fixtures with good separation of concerns.

The fixture design is excellent:

mock_llm provides a pre-configured instance for common use cases

mock_context_factory allows creating clean contexts for specific test scenarios

Good use of realistic test data in the OpenAISettings

129-163: LGTM: Well-structured async test for JSON mode.

The test properly:

Uses appropriate async/await patterns

Mocks dependencies correctly

Tests the expected JSON mode behavior

Has clear assertions

214-218: LGTM: Clear verification of client initialization.

The assertions properly verify that the correct API key and base URL are passed to the structured completion task, which is the core behavior being tested.

93-93:
✅ Verification successful

Verify if commented line is needed.

There's a commented line for reasoning_effort. Ensure this isn't required for proper test functionality.

🏁 Script executed:
#!/bin/bash
# Check if reasoning_effort is accessed during OpenAIAugmentedLLM initialization
ast-grep --pattern 'reasoning_effort'
Length of output: 907

🏁 Script executed:
#!/bin/bash
# Display lines 1-120 around the commented reasoning_effort in the test file
sed -n '1,120p' tests/workflows/llm/test_augmented_llm_ollama.py
Length of output: 5013

🏁 Script executed:
#!/bin/bash
# Extract the OllamaAugmentedLLM class definition and its init method
ast-grep --pattern $'class OllamaAugmentedLLM { $$$ init($_) { $$$ } $$$ }' --language javascript
Length of output: 287
Drop the commented-out reasoning_effort assignment
The MagicMock in this test is already created with a spec that includes reasoning_effort, so it will automatically return a MagicMock value when accessed—and this attribute isn’t asserted anywhere in test_initialization. You can safely remove the commented line for clarity.

• tests/workflows/llm/test_augmented_llm_ollama.py (line ~93)
tests/workflows/swarm/test_swarm.py (2)

23-512: Comprehensive test coverage with good structure.

The test suite provides excellent coverage of the swarm workflow components with well-structured test cases that cover various scenarios including normal operations, edge cases, and error conditions. The tests properly exercise the real methods rather than mocking them (except for the one identified issue), and the assertions validate the expected behavior correctly.

206-206:
❌ Incorrect review comment

Potential AttributeError with spec-based mock.

Based on the fixture definition in conftest.py, mock_swarm_agent is created with MagicMock(spec=SwarmAgent). Direct assignment to _function_tool_map may cause an AttributeError since this is an instance attribute created in __init__, not a class attribute.

Let me verify if the fixture definition still has this issue:

🏁 Script executed:
#!/bin/bash
# Check the mock_swarm_agent fixture definition to see if it uses spec=SwarmAgent
cat tests/workflows/swarm/conftest.py | grep -A 15 "def mock_swarm_agent"
Length of output: 537

🏁 Script executed:
#!/bin/bash
# Search for the SwarmAgent class definition and its __init__ to locate _function_tool_map
rg -n "class SwarmAgent" -C5
rg -n "def __init__" -A10 $(rg -l "class SwarmAgent" -g '*.py')
Length of output: 2345

🏁 Script executed:
#!/bin/bash
# Check if _function_tool_map is defined in SwarmAgent class
rg -n "_function_tool_map" src/mcp_agent/workflows/swarm/swarm.py
Length of output: 166

🏁 Script executed:
#!/bin/bash
python3 - <<'EOF'
from unittest.mock import MagicMock
class Foo: pass

# Test spec-based mock assignment
m_spec = MagicMock(spec=Foo)
m_spec.new_attr = 5
print("spec assignment:", m_spec.new_attr)

# Test spec_set-based mock assignment
from unittest.mock import MagicMock as MM2
m_spec_set = MM2(spec_set=Foo)
try:
    m_spec_set.other = 6
    print("spec_set assignment succeeded:", m_spec_set.other)
except Exception as e:
    print("spec_set error:", type(e).__name__, e)
EOF
Length of output: 98
No change needed: MagicMock(spec=…) allows new attributes
Assigning mock_swarm_agent._function_tool_map = {} will not raise an AttributeError—only spec_set mocks disallow new attributes. The existing fixture is safe as written.

Likely an incorrect or invalid review comment.

tests/workflows/llm/test_augmented_llm_ollama.py

… field information in PydanticTypeSerializer

…ctly

…clarity and structure

… handling in tests

StreetLamb and others added 30 commits May 14, 2025 00:26

fix: Rename and update content conversion functions for OpenAI integr…

c1dec1c

…ation

add support for Azure Identity instead of keys (#190)

9b0e076

* add support for Azure Identity instead of keys * make credential_scopes configurable, but with a default to `https://cognitiveservices.azure.com/.default` --------- Co-authored-by: Brian Caswell <[email protected]>

Add streamable HTTP support to mcp-agent (#193)

8f05f91

* Add streamable HTTP support to mcp-agent * Session ID support for session resumption * Example to showcase streamable HTTP server support

fix: Conditionally include toolConfig in BedrockAugmentedLLM arguments (

b97fd35

#192)

fix get prompt no return (#194)

da1554e

fix: Remove unused imports from test_augmented_llm_openai.py

8779dfb

test: Add comprehensive tests for AzureAugmentedLLM functionality

3f5d453

feat: Add pytest-cov for test coverage reporting

4edc57f

fix: Update installation instructions in README

75c85e5

fix: Set model to an empty string in MCPAzureTypeConverter

741cfbd

fix: Update coverage command examples in README for clarity

dadcf02

fix: Correct bedrock content and mcp message param handling in Bedroc…

47a7d8e

…kMCPTypeConverter

test: Add comprehensive tests for BedrockAugmentedLLM functionality

9211913

fix: correct role handling and set model to empty string in GoogleMCP…

2ceb8dc

…TypeConverter

test: Add comprehensive tests for GoogleAugmentedLLM functionality

d38b77c

refactor: Update import statement for OpenAI in OllamaAugmentedLLM

1b39119

test: Add unit tests for OllamaAugmentedLLM functionality

fcbc090

fix: enable content conversion handling for pydantic models and dict …

32745f4

…blocks in anthropic_content_to_mcp_content

test: Enhance tests for AnthropicAugmentedLLM

25ea5ee

Merge branch 'main' of https://github.com/lastmile-ai/mcp-agent into …

a2d1dcf

…add-tests

test: Add unit tests for Agent class and mock fixtures

4589214

test: Add comprehensive unit tests for MCPApp class functionality

6319a8b

test: Add unit tests for OpenAIEmbeddingIntentClassifier and CohereEm…

272fa69

…beddingIntentClassifier

test: Add unit tests for OpenAILLMIntentClassifier and AnthropicLLMIn…

ae4fa5d

…tentClassifier functionality

fix: Validate plan_type in Orchestrator constructor to ensure it is e…

ef01720

…ither 'full' or 'iterative'

Merge branch 'main' of https://github.com/lastmile-ai/mcp-agent into …

21465b6

…add-tests

test: Add unit tests for FanOut and ParallelLLM classes

2ca05d0

test: Add tests for embedding routers and llm routers

8d4386d

StreetLamb added 6 commits May 25, 2025 12:20

test: update mock_context to include tracer attributes in various tests

e72b80f

fix: Ensure private attributes are initialized in PydanticTypeSerializer

0acd277

fix: Update response usage access to dictionary format in AzureAugmen…

5881f5a

…tedLLM

fix: Correct typo in synthesizer method call in Orchestrator class

2ad4a63

fix: Append ToolMessage results to responses in AzureAugmentedLLM

65de0b2

refactor: Remove unused imports from test fixtures in conftest.py and…

4284d6d

… test_parallel_llm.py

StreetLamb marked this pull request as ready for review May 25, 2025 04:36

coderabbitai bot reviewed May 25, 2025

View reviewed changes

StreetLamb added 16 commits May 25, 2025 14:10

fix: Move pytest-cov dependency to the correct section in pyproject.toml

bab2afa

refactor: Improve embedding generation in MockEmbeddingModel for bett…

672c895

…er randomness

fix: Update default parameters for server_names and functions in Swar…

7f6fa4b

…mAgent to avoid mutable default arguments

fix: Update import paths in test_mcp_connection_manager.py to remove …

20c083e

…'src.' prefix

test: Add test_resume_workflow_signal_error test case

cff4a3e

fix: Add explicit assertions for workflow state transitions in test_r…

91d96d6

…un_async_asyncio

refactor: Simplify ParallelLLM fixture setup and improve assertions i…

2f2d3ec

…n tests

fix: Improve randomness in mock embedding generation and enhance asse…

fc23a56

…rtion for result validation

fix: Make embed_side_effect an async coroutine to honour the protocol

8aac5c4

fix: Await asynchronous mock methods in MockAugmentedLLM class

931c1e3

fix: Add agent_name to CallToolRequest in test_swarm for improved con…

be8e799

…text

fix: Refactor test_call_tool methods to use real SwarmAgent instance …

c43e2b9

…and improve context handling

fix: Initialize context in TestSwarmAgent

0c4a532

fix: Update content formatting in OpenAIAugmentedLLM to use list comp…

5afb263

…rehension

fix: Remove unused mock_aggregator fixture from LLM test classes

8282269

fix: Handle non-JSON-serialisable Literal values in PydanticTypeSeria…

c12c995

…lizer to avoid runtime TypeError

coderabbitai bot reviewed May 25, 2025

View reviewed changes

tests/workflows/llm/test_augmented_llm_ollama.py Outdated Show resolved Hide resolved

StreetLamb added 4 commits May 25, 2025 16:26

fix: Preserve compatibility with Pydantic v1 for handling of required…

4d3ab76

… field information in PydanticTypeSerializer

fix: Remove mock_context fixture and update tests to use Context dire…

5a56dff

…ctly

fix: Refactor initialization tests for OllamaAugmentedLLM to improve …

847062f

…clarity and structure

fix: Refactor DummyClient to DummyRegistry for non-persistent session…

34f29c6

… handling in tests

StreetLamb requested a review from saqadri May 25, 2025 08:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve code coverage of mcp-agent #215

Improve code coverage of mcp-agent #215

Uh oh!

StreetLamb commented May 17, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot May 25, 2025

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot May 25, 2025

Uh oh!

coderabbitai bot May 25, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Improve code coverage of mcp-agent #215

Are you sure you want to change the base?

Improve code coverage of mcp-agent #215

Uh oh!

Conversation

StreetLamb commented May 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot May 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot May 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

StreetLamb commented May 17, 2025 •

edited by coderabbitai bot

Loading