Skip to content

Improve code coverage of mcp-agent #215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 93 commits into
base: main
Choose a base branch
from
Open

Improve code coverage of mcp-agent #215

wants to merge 93 commits into from

Conversation

StreetLamb
Copy link
Collaborator

@StreetLamb StreetLamb commented May 17, 2025

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of type serialization, model reconstruction, and tool call logic across multiple LLM integrations.
    • Enhanced error handling and robustness in tool execution and message processing for Azure, Bedrock, Google, and OpenAI LLM workflows.
    • Fixed variable naming and data structure consistency for signal handling and workflow orchestration.
  • New Features

    • Added extensive new and refactored unit and integration tests for agents, workflows, LLM providers, orchestrator, router, swarm, and connection management.
    • Introduced new test fixtures and documentation to support comprehensive testing and easier maintenance.
  • Chores

    • Updated and added development dependencies for coverage reporting and asynchronous concurrency.
    • Removed obsolete or redundant test fixtures and documentation files.
  • Style

    • Refined prompt templates and documentation for orchestrator and intent classifier workflows.
    • Corrected naming and formatting issues in orchestrator and signal handler modules.
  • Tests

    • Significantly expanded test coverage for all major components, including LLM integrations, agent logic, workflow orchestration, and routing strategies.
    • Improved test isolation, mocking, and fixture management for reliable and maintainable test suites.

StreetLamb and others added 30 commits May 14, 2025 00:26
- Deleted the `test_event_progress.py` file as it is no longer needed.
- Added a new README.md file in the LLM tests directory to document test coverage and usage.
- Created a `conftest.py` file for shared fixtures used across LLM provider tests.
- Implemented comprehensive tests for the `OpenAIAugmentedLLM` class, covering various functionalities including text generation, structured output, tool usage, error handling, and request parameter merging.
* add support for Azure Identity instead of keys

* make credential_scopes configurable, but with a default to `https://cognitiveservices.azure.com/.default`

---------

Co-authored-by: Brian Caswell <[email protected]>
* Add streamable HTTP support to mcp-agent
* Session ID support for session resumption
* Example to showcase streamable HTTP server support
- Created test package for orchestrator workflow module.
- Implemented unit tests for orchestrator methods, including initialization and execution of plans.
- Added integration tests for full and iterative workflow executions.
- Developed tests for orchestrator data models and prompt templates.
- Ensured comprehensive coverage of task, step, and plan result formatting.
@StreetLamb StreetLamb marked this pull request as ready for review May 25, 2025 04:36
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 27

🔭 Outside diff range comments (2)
src/mcp_agent/workflows/llm/augmented_llm_openai.py (1)

1029-1051: 🛠️ Refactor suggestion

Limited content-type coverage in mcp_content_to_openai_content_part
The helper downgrades ImageContent and binary EmbeddedResource blocks to plain text ("<mime>:<data>"). The OpenAI SDK already offers ChatCompletionContentPartImageParam; losing the structured form reduces model performance on vision-capable models.

Recommend adding proper image part construction:

@@
     elif isinstance(content, ImageContent):
-        # Best effort to convert an image to text
-        return ChatCompletionContentPartTextParam(
-            type="text", text=f"{content.mimeType}:{content.data}"
-        )
+        return ChatCompletionContentPartImageParam(
+            type="image_url",
+            image_url={"url": f"data:{content.mimeType};base64,{content.data}"},
+        )

…and import ChatCompletionContentPartImageParam.

src/mcp_agent/utils/pydantic_type_serializer.py (1)

648-662: ⚠️ Potential issue

Custom default_factory functions are silently discarded

Only the three built-ins are restored; any user-defined factory is lost,
leading to wrong defaults or validation errors.

-                else:
-                    default_factory = None
+                else:
+                    # Attempt to re-import the factory by name from the original
+                    # module, otherwise keep the string for later resolution.
+                    try:
+                        mod = importlib.import_module(field_info["default_factory_module"])
+                        default_factory = getattr(mod, default_factory)
+                    except Exception:
+                        # Fallback: preserve the string so consumers know it existed
+                        default_factory = default_factory

You’ll need to store default_factory_module during serialization to
support this restoration.

♻️ Duplicate comments (2)
tests/workflows/orchestrator/test_orchestrator.py (2)

357-365: Same SIM117 issue—combine the with statements

See previous comment for justification.

-with patch.object(Orchestrator, "_get_next_step", mock_get_next_step):
-    with patch.object(Orchestrator, "_execute_step", mock_execute_step):
+with patch.object(Orchestrator, "_get_next_step", mock_get_next_step), \
+     patch.object(Orchestrator, "_execute_step", mock_execute_step):
🧰 Tools
🪛 Ruff (0.11.9)

357-358: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


430-437: Combine the patch contexts here as well

-with patch.object(Orchestrator, "_get_full_plan", mock_get_full_plan):
-    with patch.object(Orchestrator, "_execute_step", mock_execute_step):
+with patch.object(Orchestrator, "_get_full_plan", mock_get_full_plan), \
+     patch.object(Orchestrator, "_execute_step", mock_execute_step):
🧰 Tools
🪛 Ruff (0.11.9)

430-431: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

🧹 Nitpick comments (45)
src/mcp_agent/workflows/llm/augmented_llm_google.py (1)

442-451: Improve code readability with ternary operator.

The role normalization logic is correct and aligns with MCP compatibility requirements. However, the code can be simplified using a ternary operator as suggested by static analysis.

Apply this diff to improve readability:

-        if result.role == "model":
-            role = "assistant"
-        else:
-            role = result.role
+        role = "assistant" if result.role == "model" else result.role
         return MCPMessageResult(
             role=role,
             content=contents[0],
             model="",
             stopReason=None,
         )
🧰 Tools
🪛 Ruff (0.11.9)

442-445: Use ternary operator role = "assistant" if result.role == "model" else result.role instead of if-else-block

Replace if-else-block with role = "assistant" if result.role == "model" else result.role

(SIM108)

tests/workflows/swarm/test_swarm_anthropic.py (1)

135-144: Extract repeated message string to a constant.

The follow-up message "Please resolve my original request. If it has already been resolved then end turn" is repeated in multiple test assertions (lines 137 and 143). This should be extracted to a module-level constant for better maintainability.

+# At the top of the file, after imports
+FOLLOW_UP_MESSAGE = "Please resolve my original request. If it has already been resolved then end turn"

 # In test assertions, replace the hardcoded strings:
-            assert (
-                second_call_kwargs["message"]
-                == "Please resolve my original request. If it has already been resolved then end turn"
-            )
+            assert second_call_kwargs["message"] == FOLLOW_UP_MESSAGE

-            assert (
-                third_call_kwargs["message"]
-                == "Please resolve my original request. If it has already been resolved then end turn"
-            )
+            assert third_call_kwargs["message"] == FOLLOW_UP_MESSAGE
tests/workflows/llm/test_augmented_llm_ollama.py (1)

94-94: Remove or uncomment the commented code.

There's a commented line that should either be removed if it's not needed or uncommented with proper justification.

 mock_openai_config_for_custom.api_key = "test_api_key"
-        # mock_openai_config_for_custom.reasoning_effort = "medium"

If this line is needed, uncomment it. If not, remove it entirely to keep the code clean.

tests/mcp/test_mcp_connection_manager.py (1)

126-149: Address the commented out test with TODO.

There's a commented out test for initialization failure with a TODO comment from "jerron". This should either be fixed and uncommented, or removed if it's no longer needed.

The test appears to be checking for proper error handling when server initialization fails. If you need help implementing this test or determining whether it's still needed, I can assist with:

  1. Analyzing why the test might be failing
  2. Providing an alternative implementation approach
  3. Helping determine if this test case is still relevant

Would you like me to help resolve this TODO item?

tests/executor/temporal/test_workflow_registry.py (2)

18-26: Good test coverage but consider adding edge case validation.

The test correctly verifies registration and retrieval functionality. However, consider adding assertions to verify the internal state consistency beyond just the workflow retrieval.

 async def test_register_and_get_workflow(registry):
     mock_workflow = MagicMock(name="test_workflow")
     run_id = "run-id"
     workflow_id = "workflow-id"
     await registry.register(mock_workflow, run_id, workflow_id)
     workflow = await registry.get_workflow(run_id)
     assert workflow == mock_workflow
     assert registry._workflow_ids[workflow_id] == [run_id]
+    # Verify the workflow is in local workflows
+    assert run_id in registry._local_workflows
+    assert registry._local_workflows[run_id] == mock_workflow

74-77: Test method name doesn't match the actual test behavior.

The test name suggests it's testing error status, but it's actually testing the case where a workflow doesn't exist. Consider renaming for clarity.

-async def test_get_workflow_status_error(registry, mock_executor):
+async def test_get_workflow_status_nonexistent_workflow(registry, mock_executor):
     # Should return error status if workflow_id is missing
     result = await registry.get_workflow_status("nonexistent")
     assert result is False
tests/workflows/router/test_router_base.py (1)

74-84: Good validation test but consider more specific error messages.

The test correctly validates that empty inputs raise a ValueError. Consider checking the specific error message to ensure it's meaningful for users.

 def test_initialization_with_empty_inputs(self, mock_context):
     """Tests initialization fails when no routing targets are provided."""
-    with pytest.raises(ValueError):
+    with pytest.raises(ValueError, match="At least one of server_names, agents, or functions must be provided"):
         # Initialize with empty inputs
         _ = TestRouter(
             server_names=[],
             agents=[],
             functions=[],
             context=mock_context,
         )
tests/utils/test_pydantic_type_serializer.py (2)

213-213: Consider using model_config property consistently.

The config access pattern using getattr with defaults is defensive, but ensure this pattern is consistent with the actual Pydantic v2 ConfigDict structure.

 # Test config is preserved
-assert getattr(ComplexModelReconstructed.model_config, "validate_assignment", True)
+assert ComplexModelReconstructed.model_config.get("validate_assignment", True)
 assert getattr(
-    ComplexModelReconstructed.model_config, "arbitrary_types_allowed", True
+    ComplexModelReconstructed.model_config.get("arbitrary_types_allowed", True)
 )

Also applies to: 215-215


252-253: Address the commented-out tests to improve coverage.

The TODO comments indicate these tests need to be fixed. Consider prioritizing the resolution of these test issues to maintain comprehensive test coverage.

These tests appear to be related to Annotated fields and import cycles. Would you like me to help investigate the specific issues preventing these tests from passing and suggest potential fixes?

Also applies to: 296-297

tests/workflows/parallel/test_fan_out.py (2)

39-61: Fixture mis‐labels an AugmentedLLM as an “agent”.

fan_out_with_llms passes mock_llm_with_name via the agents parameter:

return FanOut(
    agents=[mock_llm_with_name],   # <-- actually an AugmentedLLM
    context=mock_context,
)

If FanOut differentiates between real Agent instances (which require llm_factory) and raw AugmentedLLM objects, this fixture relies on an implementation detail that “anything with a .generate() method will work”.
Should the public constructor instead expose a dedicated keyword (e.g. llms=[...]) to make the intent explicit? This would avoid confusion for future maintainers and catch mis-use at type-checking time.


290-319: Assertion misses argument verification for the executor.

execute_many is the work-horse that runs all fan-out calls concurrently, but the tests only assert assert_called_once(). Verifying the callable list and its length would give stronger coverage:

mock_context.executor.execute_many.assert_called_once()
calls = mock_context.executor.execute_many.call_args[0][0]
assert len(calls) == 3                     # agent, llm, function

This protects against regressions where items silently drop out of the fan-out collection.

tests/workflows/parallel/test_parallel_llm.py (1)

276-297: MagicMock for fan_in lacks async spec – risk of false positives.

parallel_llm_with_agent.fan_in is set to a bare MagicMock, and later tests monkey-patch fan_in.generate / generate_str with AsyncMock. Because MagicMock does not enforce attribute existence, typos such as genrate would silently pass. Prefer AsyncMock with spec_set to catch such mistakes:

pll.fan_in = AsyncMock(spec_set=['generate', 'generate_str', 'generate_structured'])

This strengthens the tests without additional complexity.

tests/test_app.py (2)

224-231: Combine nested with blocks to reduce indentation noise.

Ruff (SIM117) correctly points out that two sequential with statements can be merged:

with patch("mcp_agent.app.initialize_context", AsyncMock(return_value=mock_context)), \
     patch("mcp_agent.app.cleanup_context", AsyncMock()) as mock_cleanup:
    ...

This small refactor improves readability and avoids an extra indentation level. The same pattern recurs at lines 250-257 and 265-272.

🧰 Tools
🪛 Ruff (0.11.9)

224-227: Use a single with statement with multiple contexts instead of nested with statements

Combine with statements

(SIM117)


158-178: Windows loop-policy test leaks a fake asyncio module into sys.modules.

Within the context manager:

import_patch = patch.dict(
    "sys.modules",
    {"asyncio": MagicMock(WindowsProactorEventLoopPolicy=mock_policy_class)},
)

Any import of asyncio inside the body receives this stub rather than the real module – fine for the test – but if MCPApp (or its transitive imports) accesses other asyncio attributes during initialization they will be missing, leading to brittle test failures.

A safer pattern is:

  1. Keep the real asyncio in sys.modules.
  2. Monkey-patch only the missing attribute:
with patch.object(asyncio, "WindowsProactorEventLoopPolicy", mock_policy_class), \
     patch("sys.platform", "win32"), \
     patch("asyncio.set_event_loop_policy") as mock_set:
    MCPApp(name="test_app")

This preserves the full asyncio surface while still exercising the Windows branch.

tests/workflows/intent_classifier/test_intent_classifier_embedding_cohere.py (1)

243-243: Fix unused loop variable.

The loop variable intent_name is not used within the loop body. This should be renamed to indicate it's intentionally unused.

-            for intent_name, intent in classifier.intents.items():
+            for _intent_name, intent in classifier.intents.items():
🧰 Tools
🪛 Ruff (0.11.9)

243-243: Loop control variable intent_name not used within loop body

Rename unused intent_name to _intent_name

(B007)

tests/workflows/intent_classifier/test_intent_classifier_embedding_openai.py (1)

243-243: Fix unused loop variable.

The loop variable intent_name is not used within the loop body. Rename it to indicate it's intentionally unused.

-            for intent_name, intent in classifier.intents.items():
+            for _intent_name, intent in classifier.intents.items():
🧰 Tools
🪛 Ruff (0.11.9)

243-243: Loop control variable intent_name not used within loop body

Rename unused intent_name to _intent_name

(B007)

tests/workflows/swarm/conftest.py (1)

23-30: Explicitly set return_value on AsyncMocks to prevent accidental misuse

The fixtures mock_agent and mock_swarm_agent expose call_tool, initialize, and shutdown as AsyncMocks without a return_value.
If a test forgets to await one of these mocks (or inspects the returned value) it will receive another AsyncMock, leading to confusing RuntimeWarning: coroutine was never awaited messages or false-positive assertions.

 agent.call_tool = AsyncMock()
-# …
+# Give the mock a deterministic result that matches the production contract
+agent.call_tool.return_value = None

Apply the same pattern to the other async mocks (initialize, shutdown, and the equivalents on mock_swarm_agent) or, if a specific value is expected in a test, set it there.
This keeps the fixtures predictable and avoids latent test flakiness.

tests/workflows/orchestrator/test_orchestrator.py (2)

174-178: Rename the unused loop variable to underscore to silence Ruff B007

agent is not referenced inside the loop; Ruff correctly flags this as B007.
Renaming it to _ communicates the intent and keeps the static-analysis pipeline green.

-        for agent_name, agent in orchestrator.agents.items():
+        for agent_name, _ in orchestrator.agents.items():
🧰 Tools
🪛 Ruff (0.11.9)

174-174: Loop control variable agent not used within loop body

Rename unused agent to _agent

(B007)


295-303: Collapse the nested with patch blocks into a single context manager

Ruff SIM117 suggests combining adjacent with statements.
Doing so reduces indentation and improves readability:

-with patch.object(Orchestrator, "_get_full_plan", mock_get_full_plan):
-    with patch.object(Orchestrator, "_execute_step", mock_execute_step):
+with patch.object(Orchestrator, "_get_full_plan", mock_get_full_plan), \
+     patch.object(Orchestrator, "_execute_step", mock_execute_step):

Apply the same pattern to the similar blocks at 357-365 and 430-437.

🧰 Tools
🪛 Ruff (0.11.9)

295-296: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

tests/workflows/router/test_router_llm_openai.py (1)

211-214: Prefer patch.object as a context manager when spying on an instance method

Directly overwriting router._route_with_llm works, but using patch.object keeps the original attribute intact after the test and avoids accidental leakage across tests:

with patch.object(router, "_route_with_llm", wraps=router._route_with_llm) as spy:
    result = await router.route("Test request")
    spy.assert_called_once_with("Test request")

This small change promotes isolation and ensures the router instance is restored to its original state.

tests/workflows/router/test_router_embedding_cohere.py (1)

25-27: Avoid mutating NumPy’s global RNG in tests

np.random.seed(seed) resets the global RNG state on every call, which can introduce cross-test flakiness or brittle ordering-dependencies (especially when the suite grows or runs in parallel).

A safer pattern is to create a local generator per sample so the remainder of the test-suite keeps its own random state untouched.

-            seed = sum(ord(c) for c in data[i])
-            np.random.seed(seed)
-            embeddings[i] = np.random.rand(embedding_dim).astype(np.float32)
+            # Derive a deterministic generator per input so global RNG state is
+            # left unchanged.
+            seed = sum(ord(c) for c in data[i])
+            rng = np.random.default_rng(seed)
+            embeddings[i] = rng.random(embedding_dim, dtype=np.float32).astype(
+                np.float32
+            )
tests/workflows/parallel/test_fan_in.py (1)

119-125: Patch stack objects with a single context-manager

Static analysis correctly points out that a nested with can be flattened for clarity:

-        with patch("contextlib.AsyncExitStack") as MockAsyncExitStack:
-            # Configure the mock stack
-            mock_stack = AsyncMock()
-            MockAsyncExitStack.return_value = mock_stack
-            mock_stack.__aenter__.return_value = mock_stack
-            mock_stack.enter_async_context.return_value = mock_agent
+        with patch("contextlib.AsyncExitStack") as MockAsyncExitStack:
+            mock_stack = AsyncMock()
+            MockAsyncExitStack.return_value = mock_stack
+            mock_stack.__aenter__.return_value = mock_stack
+            mock_stack.enter_async_context.return_value = mock_agent

Since only one object is patched, the suggestion is mostly stylistic, but flattening keeps the codebase consistent with the linter rule (SIM117) and reduces indentation noise.

tests/agents/test_agent.py (4)

219-223: Collapse nested with statements per SIM117

You can satisfy Ruff’s SIM117 rule and improve readability by combining these two context-managers:

-with patch("uuid.uuid4", return_value="test-uuid"):
-    # Mock the method to return directly
-    with patch.object(
-        Agent, "request_human_input", AsyncMock(return_value="Test user input")
-    ):
-        result = await agent_with_human_input.request_human_input(request)
+with patch("uuid.uuid4", return_value="test-uuid"), patch.object(
+    Agent, "request_human_input", AsyncMock(return_value="Test user input")
+):
+    result = await agent_with_human_input.request_human_input(request)
🧰 Tools
🪛 Ruff (0.11.9)

219-223: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


277-281: Duplicate nested-with pattern — flatten for consistency

The same SIM117 issue appears here; a one-liner context-manager improves legibility:

-with patch.object(
-    Agent, "request_human_input", side_effect=Exception("Callback error")
-):
+with patch.object(
+    Agent, "request_human_input", side_effect=Exception("Callback error")
+):

(The outer with can stay as-is or be merged depending on preference.)

🧰 Tools
🪛 Ruff (0.11.9)

277-281: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


263-270: Dead helper code is never executed

mock_implementation is defined but immediately superseded by the patch.object(...) call that injects a different side-effect. Unless you intend to reuse the richer assertions inside the helper, the function is redundant and can be removed to keep the test focused.

-        async def mock_implementation(self, req):
-            # Simulate the error handling logic from the original method
-            error_message = "Callback error"
-            self.executor.signal.assert_called_once()
-            signal_call = self.executor.signal.call_args[1]
-            assert "payload" in signal_call
-            assert error_message in signal_call["payload"]
-            raise Exception(error_message)

456-462: Small accuracy nit – misleading inline comment

tool_name = test_function.__name__ resolves to the actual function name (function).
The inline comment # Should be "function" not "test_function" is correct, but it can be removed (or re-phrased) to avoid confusion for future maintainers — the code already speaks for itself.

tests/workflows/router/test_router_llm_anthropic.py (1)

219-225: Possible type mismatch for confidence

StructuredResponseCategory.confidence is often typed as float (e.g. 0.83) in the production code.
Asserting against the string "high" (lines 222-225) may break if the implementation was changed to numeric.

Consider asserting on a float threshold instead:

-assert result[0].confidence == "high"
+assert 0.8 <= result[0].confidence <= 1.0
tests/workflows/llm/test_augmented_llm_google.py (1)

469-479: Assert the model actually used in the request payload

You verify that select_model was called, but not that the chosen model is present in the request sent to the executor.
Adding an assertion strengthens the test:

sent_model = mock_llm.executor.execute.call_args[0][1].payload["model"]
assert sent_model == "gemini-1.5-flash"
tests/workflows/router/conftest.py (2)

95-100: Avoid mutating NumPy’s global RNG in a shared test process

Calling np.random.seed(seed) inside the loop resets the global generator every iteration and can leak state into unrelated tests executed in the same Python process.

Instead, create a local generator:

-            seed = sum(ord(c) for c in data[i])
-            np.random.seed(seed)
-            embeddings[i] = np.random.rand(embedding_dim).astype(np.float32)
+            rng = np.random.default_rng(sum(ord(c) for c in data[i]))
+            embeddings[i] = rng.random(embedding_dim, dtype=np.float32)

This removes global side-effects and still gives you deterministic but distinct vectors.


114-118: Name shadowing between fixture and inner function

The fixture is called test_function and it returns a nested function with the same identifier.
Although legal, this can be confusing in stack-traces and when debugging.
Consider renaming the inner callable (e.g. echo_fn) or the fixture (echo_function_fixture) for clarity.

tests/executor/test_workflow_signal.py (4)

197-205: Raise expectation should reference asyncio.TimeoutError for precision

AsyncioSignalHandler.wait_for_signal() propagates asyncio.TimeoutError.
Catching the base‐class TimeoutError works today (they are the same object), but being explicit protects you if Python ever diverges again.

-with pytest.raises(TimeoutError):
+with pytest.raises(asyncio.TimeoutError):

285-305: Assertion on run_in_executor arguments is brittle

run_in_executor is called with positional args (None, input, "Enter value: ").
assert "Enter value: " in mock_loop.run_in_executor.call_args[0] passes only because the substring exists in one of the tuple elements.
If the executor signature changes (e.g. kwargs), this test will give a false negative.

A safer check:

executor_args = mock_loop.run_in_executor.call_args.args
assert executor_args[1] is input
assert executor_args[2] == "Enter value: "

This pins the contract clearly.


428-456: Use contextlib.suppress to silence expected timeouts cleanly

Instead of a try/except …: pass block you can rely on contextlib.suppress for fewer lines and clearer intent.

-try:
-    await store.wait_for("test_signal", timeout_seconds=0.1)
-except asyncio.TimeoutError:
-    pass
+from contextlib import suppress
+with suppress(asyncio.TimeoutError):
+    await store.wait_for("test_signal", timeout_seconds=0.1)

This also resolves the Ruff SIM105 hint.

🧰 Tools
🪛 Ruff (0.11.9)

452-455: Use contextlib.suppress(asyncio.TimeoutError) instead of try-except-pass

Replace with contextlib.suppress(asyncio.TimeoutError)

(SIM105)


183-191: asyncio.sleep(0.1) inflates suite time

The sleeps are only meant to yield control to the event-loop.
await asyncio.sleep(0) (or pytest.skip_event_loop() helper if available) achieves the same without slowing the suite.

Refactor all occurrences where the delay is not functionally needed.

Also applies to: 213-221, 545-553

tests/workflows/intent_classifier/test_intent_classifier_llm_openai.py (2)

185-188: Float equality assertions should use pytest.approx

Direct equality on floating-point numbers can lead to flaky tests across platforms.
Replace with an approximate comparison:

-assert results[0].p_score == 0.9
+assert results[0].p_score == pytest.approx(0.9)

Apply similarly to the other locations (0.7, 0.95, 0.85, 0.65, …).

Also applies to: 253-259, 371-379


70-80: Repeated patch blocks could be collapsed with a fixture or parametrisation

The boilerplate with patch(... OpenAIAugmentedLLM ...): clutter is repeated many times.
Creating a fixture:

@pytest.fixture(autouse=True)
def patched_llm(monkeypatch):
    monkeypatch.setattr(
        "mcp_agent.workflows.intent_classifier.intent_classifier_llm_openai.OpenAIAugmentedLLM",
        MockOpenAIAugmentedLLM,
    )

would make tests shorter and easier to maintain.

Also applies to: 118-130, 270-279

tests/workflows/orchestrator/conftest.py (1)

64-70: Factory omits context, which many LLM methods expect

factory(agent) instantiates MockAugmentedLLM(agent=agent) but does not forward the context created in mock_context().
If any test relies on llm.context being present (e.g. model-selection logic), a NoneType access will surface.

-    def factory(agent):
-        return MockAugmentedLLM(agent=agent)
+    def factory(agent, context=None):
+        # default to the shared mock_context fixture when not provided
+        return MockAugmentedLLM(agent=agent, context=context)

or pass mock_context explicitly from the caller.
This keeps the fixture self-contained and prevents hidden AttributeErrors.

tests/workflows/intent_classifier/test_intent_classifier_llm_anthropic.py (1)

319-324: Patching initialize with AsyncMock may mask real idempotency logic

Wrapping the real coroutine with AsyncMock(wraps=real_initialize) is handy for call-count checks, but it replaces the bound method on the instance.
If initialize contains self.initialized guards, those still run, yet future refactors that access super().initialize or rely on the original descriptor semantics will be obscured.

Consider:

with patch.object(classifier, "initialize", wraps=classifier.initialize) as init_mock:
    ...

so you avoid permanently altering the instance attribute and keep the patch within a controlled context.

src/mcp_agent/workflows/llm/augmented_llm_openai.py (1)

964-972: Single-element list creation is correct but leaves existing helper unused
You now wrap the part in [mcp_content_to_openai_content_part(...)], which is fine. However the old multi-content helper (mcp_content_to_openai_content) still exists (or did in sibling modules). Consider removing dead code to avoid confusion.

tests/workflows/llm/test_augmented_llm_bedrock.py (3)

130-148: Assertion accesses nested payload incorrectly & assumes request shape
first_call_args = mock_llm.executor.execute.call_args[0][1] retrieves the second positional argument, assuming the signature is execute(task, payload). If execute is later called with keyword arguments (e.g., payload=…), this indexing will break silently.

Prefer:

_, request_obj = mock_llm.executor.execute.call_args.args
payload = request_obj.payload

or access via call_kwargs. This keeps the test stable against minor signature changes.


278-311: custom_side_effect does not await tool-calls sequentially – race condition in test
Because all branches of custom_side_effect share mutable call_count, rapid await/AsyncMock scheduling could interleave, producing nondeterministic counts in rare CI conditions.

Guard with a simple lock or, easier, use side_effect=[resp1, resp2]:

mock_llm.executor.execute = AsyncMock(side_effect=[
    self.create_tool_use_response(...),
    self.create_text_response(...)
])

This is deterministic and easier to read.


501-525: typed_dict_extras tests import via bedrock module – fragile coupling
You import typed_dict_extras from mcp_agent.workflows.llm.augmented_llm_bedrock, but the helper actually lives in mcp_agent.utils.common and is only re-exported indirectly.

Importing from its canonical location avoids accidental breakage if the bedrock module stops re-exporting:

from mcp_agent.utils.common import typed_dict_extras
src/mcp_agent/workflows/llm/augmented_llm_azure.py (1)

377-381: Duplicate broad exception handling – consolidate to avoid masking errors

Two separate except Exception as e: blocks return very similar ToolMessage error payloads. Consolidating them improves readability and prevents subtle flow-control differences.

Consider:

-        except json.JSONDecodeError as e:
-            ...
-        except Exception as e:
-            ...
+        except (json.JSONDecodeError, Exception) as e:
+            return ToolMessage(
+                tool_call_id=tool_call_id,
+                content=f"Error executing tool '{tool_name}': {str(e)}",
+            )

Also applies to: 400-404

tests/workflows/orchestrator/test_orchestrator_integration.py (1)

37-41: Rename unused loop variable to underscore to satisfy linters

agent is never referenced inside the loop; Ruff flags this as B007.
Renaming improves clarity and keeps CI green.

-for agent_name, agent in orchestrator.agents.items():
+for agent_name, _agent in orchestrator.agents.items():

(Apply the same change in the second loop around line 182.)

Also applies to: 182-186

🧰 Tools
🪛 Ruff (0.11.9)

37-37: Loop control variable agent not used within loop body

Rename unused agent to _agent

(B007)

tests/workflows/llm/test_augmented_llm_openai.py (1)

149-156: Accessing request payload via positional indices is brittle

first_call_args = mock_llm.executor.execute.call_args_list[0][0] relies on a fixed positional signature of executor.execute(...).
If the executor’s API changes, the test will silently break.
Prefer using call_kwargs = mock_llm.executor.execute.call_args.kwargs (or args, kwargs = ...) and assert via named attributes. This keeps the test stable against parameter re-ordering.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c85782 and 4284d6d.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (68)
  • pyproject.toml (2 hunks)
  • src/mcp_agent/executor/workflow_signal.py (2 hunks)
  • src/mcp_agent/logging/transport.py (1 hunks)
  • src/mcp_agent/mcp/mcp_aggregator.py (2 hunks)
  • src/mcp_agent/utils/pydantic_type_serializer.py (8 hunks)
  • src/mcp_agent/workflows/llm/augmented_llm.py (1 hunks)
  • src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (1 hunks)
  • src/mcp_agent/workflows/llm/augmented_llm_azure.py (6 hunks)
  • src/mcp_agent/workflows/llm/augmented_llm_bedrock.py (2 hunks)
  • src/mcp_agent/workflows/llm/augmented_llm_google.py (1 hunks)
  • src/mcp_agent/workflows/llm/augmented_llm_openai.py (4 hunks)
  • src/mcp_agent/workflows/orchestrator/orchestrator.py (2 hunks)
  • src/mcp_agent/workflows/router/router_embedding.py (2 hunks)
  • src/mcp_agent/workflows/swarm/swarm.py (2 hunks)
  • tests/agents/conftest.py (1 hunks)
  • tests/agents/test_agent.py (1 hunks)
  • tests/executor/temporal/test_signal_handler.py (1 hunks)
  • tests/executor/temporal/test_temporal_executor.py (1 hunks)
  • tests/executor/temporal/test_workflow_registry.py (1 hunks)
  • tests/executor/test_workflow.py (1 hunks)
  • tests/executor/test_workflow_signal.py (1 hunks)
  • tests/fixture/README.md (0 hunks)
  • tests/fixture/expected_output.txt (0 hunks)
  • tests/fixture/mcp_basic_agent_20250131_205604.jsonl (0 hunks)
  • tests/mcp/test_mcp_aggregator.py (1 hunks)
  • tests/mcp/test_mcp_connection_manager.py (1 hunks)
  • tests/test_app.py (1 hunks)
  • tests/test_event_progress.py (0 hunks)
  • tests/utils/test_pydantic_type_serializer.py (6 hunks)
  • tests/workflows/evaluator_optimizer/test_evaluator_optimizer.py (1 hunks)
  • tests/workflows/intent_classifier/README.md (1 hunks)
  • tests/workflows/intent_classifier/conftest.py (1 hunks)
  • tests/workflows/intent_classifier/test_intent_classifier_embedding_cohere.py (1 hunks)
  • tests/workflows/intent_classifier/test_intent_classifier_embedding_openai.py (1 hunks)
  • tests/workflows/intent_classifier/test_intent_classifier_llm_anthropic.py (1 hunks)
  • tests/workflows/intent_classifier/test_intent_classifier_llm_openai.py (1 hunks)
  • tests/workflows/llm/README.md (1 hunks)
  • tests/workflows/llm/conftest.py (1 hunks)
  • tests/workflows/llm/test_augmented_llm_anthropic.py (8 hunks)
  • tests/workflows/llm/test_augmented_llm_azure.py (1 hunks)
  • tests/workflows/llm/test_augmented_llm_bedrock.py (1 hunks)
  • tests/workflows/llm/test_augmented_llm_google.py (1 hunks)
  • tests/workflows/llm/test_augmented_llm_ollama.py (1 hunks)
  • tests/workflows/llm/test_augmented_llm_openai.py (3 hunks)
  • tests/workflows/orchestrator/__init__.py (1 hunks)
  • tests/workflows/orchestrator/conftest.py (1 hunks)
  • tests/workflows/orchestrator/test_orchestrator.py (1 hunks)
  • tests/workflows/orchestrator/test_orchestrator_integration.py (1 hunks)
  • tests/workflows/orchestrator/test_orchestrator_models.py (1 hunks)
  • tests/workflows/orchestrator/test_orchestrator_prompts.py (1 hunks)
  • tests/workflows/parallel/conftest.py (1 hunks)
  • tests/workflows/parallel/test_fan_in.py (1 hunks)
  • tests/workflows/parallel/test_fan_out.py (1 hunks)
  • tests/workflows/parallel/test_parallel_llm.py (1 hunks)
  • tests/workflows/router/__init__.py (1 hunks)
  • tests/workflows/router/conftest.py (1 hunks)
  • tests/workflows/router/test_router_base.py (1 hunks)
  • tests/workflows/router/test_router_embedding.py (1 hunks)
  • tests/workflows/router/test_router_embedding_cohere.py (1 hunks)
  • tests/workflows/router/test_router_embedding_openai.py (1 hunks)
  • tests/workflows/router/test_router_llm.py (1 hunks)
  • tests/workflows/router/test_router_llm_anthropic.py (1 hunks)
  • tests/workflows/router/test_router_llm_openai.py (1 hunks)
  • tests/workflows/swarm/__init__.py (1 hunks)
  • tests/workflows/swarm/conftest.py (1 hunks)
  • tests/workflows/swarm/test_swarm.py (1 hunks)
  • tests/workflows/swarm/test_swarm_anthropic.py (1 hunks)
  • tests/workflows/swarm/test_swarm_openai.py (1 hunks)
💤 Files with no reviewable changes (4)
  • tests/fixture/expected_output.txt
  • tests/fixture/README.md
  • tests/test_event_progress.py
  • tests/fixture/mcp_basic_agent_20250131_205604.jsonl
🧰 Additional context used
🧬 Code Graph Analysis (17)
src/mcp_agent/executor/workflow_signal.py (1)
src/mcp_agent/executor/executor.py (3)
  • signal (151-176)
  • signal (382-392)
  • uuid (215-222)
src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (1)
src/mcp_agent/utils/common.py (1)
  • to_string (32-39)
tests/executor/temporal/test_temporal_executor.py (3)
src/mcp_agent/executor/temporal/__init__.py (5)
  • ensure_client (255-265)
  • wrap_as_activity (85-107)
  • _execute_task_as_async (109-143)
  • start_workflow (267-350)
  • terminate_workflow (367-386)
src/mcp_agent/app.py (1)
  • config (128-129)
src/mcp_agent/server/app_server.py (1)
  • app (69-71)
tests/workflows/parallel/conftest.py (4)
src/mcp_agent/core/context.py (1)
  • Context (50-79)
src/mcp_agent/agents/agent.py (1)
  • Agent (48-750)
src/mcp_agent/workflows/llm/augmented_llm.py (1)
  • AugmentedLLM (208-661)
tests/workflows/router/conftest.py (3)
  • mock_context (17-54)
  • mock_agent (58-69)
  • mock_llm (73-81)
tests/executor/test_workflow_signal.py (1)
src/mcp_agent/executor/workflow_signal.py (8)
  • Signal (13-48)
  • SignalRegistration (51-59)
  • PendingSignal (88-95)
  • BaseSignalHandler (98-157)
  • AsyncioSignalHandler (219-297)
  • ConsoleSignalHandler (160-216)
  • LocalSignalStore (301-336)
  • wait_for (319-336)
tests/workflows/router/test_router_embedding.py (4)
src/mcp_agent/agents/agent.py (1)
  • Agent (48-750)
src/mcp_agent/workflows/router/router_embedding.py (4)
  • EmbeddingRouter (29-240)
  • EmbeddingRouterCategory (22-26)
  • _compute_embedding (233-240)
  • _route_with_embedding (189-231)
tests/workflows/router/conftest.py (5)
  • mock_context (17-54)
  • mock_embedding_model (85-105)
  • mock_agent (58-69)
  • test_function (109-118)
  • test_function (114-116)
src/mcp_agent/workflows/router/router_base.py (1)
  • RouterResult (19-31)
tests/executor/test_workflow.py (1)
src/mcp_agent/executor/workflow.py (8)
  • WorkflowState (33-52)
  • WorkflowResult (55-59)
  • Workflow (62-505)
  • record_error (47-52)
  • id (111-115)
  • run_id (118-123)
  • update_state (447-454)
  • cancel (331-352)
tests/workflows/router/conftest.py (10)
src/mcp_agent/core/context.py (1)
  • Context (50-79)
src/mcp_agent/workflows/embedding/embedding_base.py (1)
  • EmbeddingModel (14-32)
src/mcp_agent/workflows/router/router_base.py (3)
  • RouterCategory (34-49)
  • ServerRouterCategory (52-55)
  • AgentRouterCategory (58-61)
tests/test_app.py (1)
  • mock_context (16-29)
src/mcp_agent/mcp/mcp_server_registry.py (1)
  • get_server_config (362-378)
src/mcp_agent/agents/agent.py (1)
  • Agent (48-750)
tests/workflows/router/test_router_llm_openai.py (3)
  • generate (29-31)
  • generate_str (33-35)
  • generate_structured (37-39)
tests/workflows/router/test_router_llm_anthropic.py (3)
  • generate (29-31)
  • generate_str (33-35)
  • generate_structured (37-39)
tests/workflows/router/test_router_embedding_openai.py (1)
  • embed (19-28)
tests/workflows/router/test_router_embedding_cohere.py (1)
  • embed (19-28)
tests/workflows/llm/test_augmented_llm_ollama.py (3)
src/mcp_agent/config.py (1)
  • OpenAISettings (145-160)
src/mcp_agent/workflows/llm/augmented_llm_ollama.py (1)
  • OllamaAugmentedLLM (17-79)
tests/workflows/llm/conftest.py (1)
  • mock_aggregator (17-29)
tests/workflows/intent_classifier/test_intent_classifier_embedding_cohere.py (7)
src/mcp_agent/core/context.py (1)
  • Context (50-79)
src/mcp_agent/workflows/intent_classifier/intent_classifier_base.py (1)
  • IntentClassificationResult (27-40)
src/mcp_agent/workflows/intent_classifier/intent_classifier_embedding.py (1)
  • EmbeddingIntent (24-30)
src/mcp_agent/workflows/intent_classifier/intent_classifier_embedding_cohere.py (1)
  • CohereEmbeddingIntentClassifier (13-45)
tests/workflows/intent_classifier/conftest.py (4)
  • embed (55-65)
  • embedding_dim (68-69)
  • test_intents (28-46)
  • mock_context (11-24)
tests/workflows/intent_classifier/test_intent_classifier_embedding_openai.py (2)
  • embed (31-39)
  • embedding_dim (42-43)
tests/workflows/parallel/conftest.py (1)
  • mock_context (10-16)
tests/workflows/evaluator_optimizer/test_evaluator_optimizer.py (2)
src/mcp_agent/workflows/evaluator_optimizer/evaluator_optimizer.py (5)
  • EvaluatorOptimizerLLM (48-469)
  • EvaluationResult (33-45)
  • QualityRating (24-30)
  • _build_eval_prompt (425-446)
  • _build_refinement_prompt (448-469)
src/mcp_agent/workflows/llm/augmented_llm.py (1)
  • AugmentedLLM (208-661)
tests/workflows/swarm/test_swarm.py (4)
src/mcp_agent/agents/agent.py (1)
  • CallToolRequest (789-798)
src/mcp_agent/workflows/swarm/swarm.py (8)
  • AgentFunctionResult (142-156)
  • SwarmAgent (74-139)
  • DoneAgent (309-322)
  • create_agent_resource (50-58)
  • create_agent_function_result_resource (61-71)
  • get_tool (213-220)
  • set_agent (274-297)
  • should_continue (299-306)
src/mcp_agent/workflows/swarm/swarm_openai.py (1)
  • OpenAISwarm (9-41)
tests/workflows/swarm/conftest.py (8)
  • mock_context (11-17)
  • mock_swarm_agent (34-46)
  • test_function_result (56-58)
  • test_function_agent_result (62-64)
  • test_function_agent_function_result (68-70)
  • mock_tool_response (80-82)
  • mock_agent (21-30)
  • done_agent (50-52)
tests/workflows/llm/test_augmented_llm_bedrock.py (5)
src/mcp_agent/config.py (1)
  • BedrockSettings (121-132)
src/mcp_agent/workflows/llm/augmented_llm_bedrock.py (4)
  • BedrockAugmentedLLM (48-346)
  • BedrockMCPTypeConverter (434-496)
  • mcp_content_to_bedrock_content (499-531)
  • bedrock_content_to_mcp_content (534-569)
src/mcp_agent/workflows/llm/augmented_llm.py (5)
  • RequestParams (114-153)
  • select_model (302-343)
  • _log_chat_progress (511-521)
  • _log_chat_finished (523-526)
  • get_request_params (345-366)
src/mcp_agent/utils/common.py (1)
  • typed_dict_extras (27-29)
tests/workflows/llm/conftest.py (1)
  • mock_aggregator (17-29)
tests/mcp/test_mcp_aggregator.py (1)
src/mcp_agent/mcp/mcp_aggregator.py (12)
  • MCPAggregator (65-1102)
  • list_servers (536-548)
  • _parse_capability_name (946-991)
  • NamespacedTool (45-52)
  • NamespacedPrompt (55-62)
  • load_server (294-367)
  • load_servers (369-424)
  • MCPCompoundServer (1105-1171)
  • _list_tools (1121-1124)
  • _list_prompts (1141-1144)
  • _call_tool (1126-1139)
  • _get_prompt (1146-1162)
tests/workflows/orchestrator/test_orchestrator_integration.py (4)
src/mcp_agent/workflows/llm/augmented_llm.py (1)
  • RequestParams (114-153)
src/mcp_agent/workflows/orchestrator/orchestrator.py (1)
  • Orchestrator (46-586)
src/mcp_agent/workflows/orchestrator/orchestrator_models.py (5)
  • Plan (46-55)
  • Step (35-43)
  • NextStep (111-116)
  • PlanResult (86-108)
  • AgentTask (27-32)
tests/workflows/orchestrator/conftest.py (5)
  • mock_llm_factory (64-70)
  • mock_agents (74-87)
  • mock_context (42-60)
  • generate_str (34-35)
  • generate (31-32)
tests/workflows/llm/test_augmented_llm_openai.py (4)
src/mcp_agent/config.py (1)
  • OpenAISettings (145-160)
src/mcp_agent/workflows/llm/augmented_llm_openai.py (4)
  • MCPOpenAITypeConverter (920-1026)
  • generate (146-359)
  • to_mcp_message_result (944-953)
  • from_mcp_message_param (956-977)
src/mcp_agent/workflows/llm/augmented_llm.py (3)
  • RequestParams (114-153)
  • select_model (302-343)
  • get_request_params (345-366)
tests/workflows/llm/conftest.py (2)
  • mock_context (8-13)
  • mock_aggregator (17-29)
tests/workflows/llm/test_augmented_llm_anthropic.py (4)
src/mcp_agent/config.py (1)
  • AnthropicSettings (111-118)
src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (8)
  • AnthropicMCPTypeConverter (754-863)
  • mcp_content_to_anthropic_content (866-936)
  • anthropic_content_to_mcp_content (939-982)
  • mcp_stop_reason_to_anthropic_stop_reason (985-997)
  • anthropic_stop_reason_to_mcp_stop_reason (1000-1012)
  • generate (131-332)
  • to_mcp_message_result (779-796)
  • from_mcp_message_param (799-808)
src/mcp_agent/workflows/llm/augmented_llm.py (2)
  • RequestParams (114-153)
  • get_request_params (345-366)
src/mcp_agent/utils/common.py (1)
  • typed_dict_extras (27-29)
🪛 Ruff (0.11.9)
src/mcp_agent/workflows/swarm/swarm.py

87-87: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


88-88: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

src/mcp_agent/workflows/llm/augmented_llm_google.py

442-445: Use ternary operator role = "assistant" if result.role == "model" else result.role instead of if-else-block

Replace if-else-block with role = "assistant" if result.role == "model" else result.role

(SIM108)

tests/executor/test_workflow_signal.py

452-455: Use contextlib.suppress(asyncio.TimeoutError) instead of try-except-pass

Replace with contextlib.suppress(asyncio.TimeoutError)

(SIM105)

tests/test_app.py

224-227: Use a single with statement with multiple contexts instead of nested with statements

Combine with statements

(SIM117)


250-251: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


265-266: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

tests/workflows/intent_classifier/test_intent_classifier_embedding_openai.py

243-243: Loop control variable intent_name not used within loop body

Rename unused intent_name to _intent_name

(B007)

tests/workflows/intent_classifier/test_intent_classifier_embedding_cohere.py

243-243: Loop control variable intent_name not used within loop body

Rename unused intent_name to _intent_name

(B007)

tests/agents/test_agent.py

219-223: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


277-281: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

tests/workflows/orchestrator/test_orchestrator.py

174-174: Loop control variable agent not used within loop body

Rename unused agent to _agent

(B007)


295-296: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


357-358: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


430-431: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

tests/workflows/orchestrator/test_orchestrator_integration.py

37-37: Loop control variable agent not used within loop body

Rename unused agent to _agent

(B007)


182-182: Loop control variable agent not used within loop body

Rename unused agent to _agent

(B007)

🔇 Additional comments (62)
tests/workflows/router/__init__.py (1)

1-2: LGTM! Standard package marker file.

This is a proper addition of an __init__.py file to mark the directory as a Python package, enabling test discovery. The comments clearly explain the purpose.

pyproject.toml (1)

75-75: LGTM! Appropriate addition of trio for async testing.

Adding trio>=0.30.0 to dev dependencies is appropriate for supporting asynchronous concurrency testing in the new test suites.

src/mcp_agent/workflows/llm/augmented_llm.py (1)

256-261: LGTM! Clean conditional parameter passing.

Good improvement using dictionary unpacking to conditionally pass the instruction parameter only when it's not None. This prevents passing explicit None values and allows the Agent constructor to handle its own default value logic.

tests/workflows/parallel/conftest.py (4)

9-16: LGTM! Clean and simple mock context fixture.

The mock_context fixture provides a minimal mock Context instance with the essential executor attribute. This is appropriately simplified compared to the router conftest which includes provider configurations not needed for parallel workflow tests.


19-28: LGTM! Proper async context manager mocking.

The mock_agent fixture correctly mocks the Agent class with async context manager support (__aenter__ and __aexit__). This is essential since Agent implements these methods for proper lifecycle management.


31-40: LGTM! Comprehensive LLM method mocking.

The mock_llm fixture properly mocks all the key async generation methods (generate, generate_str, generate_structured) of the AugmentedLLM class, providing good test isolation.


43-48: LGTM! Useful factory pattern for async testing.

The mock_llm_factory fixture provides an async factory function that returns the mocked LLM instance. This pattern is useful for testing components that need to create LLM instances dynamically.

tests/workflows/intent_classifier/conftest.py (2)

10-24: Well-structured mock context fixture.

The mock context fixture properly isolates API configurations for testing and follows pytest best practices.


27-46: Good test intent examples for comprehensive testing.

The test intents provide diverse examples covering greeting, farewell, and help scenarios, which should enable thorough testing of intent classification logic.

tests/workflows/orchestrator/__init__.py (1)

1-1: Proper package initialization for test discovery.

The docstring appropriately marks this directory as a test package for the orchestrator workflow module.

tests/workflows/swarm/__init__.py (1)

1-1: Appropriate test package marker.

The comment clearly identifies this as the test package for swarm workflow components.

src/mcp_agent/logging/transport.py (1)

276-276:

❓ Verification inconclusive

Review the event loop handling in init_queue() before auto-initialization.

While automatically calling init_queue() in the constructor improves usability, it makes the problematic event loop handling in init_queue() more likely to cause issues.

The init_queue() method (lines 278-288) contains this concerning pattern:

try:
    self._loop = asyncio.get_running_loop()
except RuntimeError:
    self._loop = asyncio.new_event_loop()
    asyncio.set_event_loop(self._loop)  # This sets the loop globally!

Setting a new event loop globally in a constructor can interfere with existing event loops in the application. Consider one of these safer approaches:

  1. Lazy initialization: Only call init_queue() when actually needed (e.g., in start() method)
  2. Remove global loop setting: Store the loop reference without setting it globally
  3. Constructor parameter: Allow passing the event loop as a constructor parameter

🏁 Script executed:

#!/bin/bash
# Search for other usages of AsyncEventBus constructor to assess impact
rg -A 5 -B 5 "AsyncEventBus\(" --type py

Length of output: 42


Verify global event loop handling in init_queue()

No other references to AsyncEventBus( were found in the codebase—please confirm whether this constructor is used elsewhere (e.g., plugins, downstream services) to avoid unintended side effects.

In src/mcp_agent/logging/transport.py (lines 276–288), init_queue() does:

try:
    self._loop = asyncio.get_running_loop()
except RuntimeError:
    self._loop = asyncio.new_event_loop()
    asyncio.set_event_loop(self._loop)  # resets the global loop!

Setting a new event loop globally in a constructor can silently break existing loops in your application. Consider one of these safer approaches:

  • Lazy initialization: defer init_queue() until you actually need to start processing (e.g., in a start() or run() method)
  • Remove global loop setting: keep a private loop reference without calling asyncio.set_event_loop()
  • Constructor parameter: accept an existing loop via __init__(..., loop: asyncio.AbstractEventLoop=None)

Please review all usages of AsyncEventBus in your projects and assess the impact of altering the global loop.

tests/agents/conftest.py (2)

7-17: Well-structured mock context fixture.

The fixture properly mocks the asynchronous methods and provides a realistic context structure for testing agent functionality.


20-27: Good test tool fixture design.

The mock tool provides a realistic schema structure that's suitable for testing tool-related functionality.

tests/workflows/llm/conftest.py (2)

7-13: Clean and focused mock context fixture.

The fixture provides essential mocked components (executor and model_selector) needed for LLM workflow testing.


16-29: Comprehensive mock aggregator fixture.

The fixture properly mocks both synchronous (tools list) and asynchronous (list_tools, call_tool) functionality needed for testing tool-enabled LLM workflows.

tests/workflows/llm/README.md (1)

1-95: Excellent test documentation.

The README provides comprehensive guidance for running, understanding, and extending the LLM provider tests. The structure is clear and includes practical examples for different scenarios.

src/mcp_agent/workflows/orchestrator/orchestrator.py (3)

102-107: LGTM - Typo fix and instruction improvement!

The correction from sythesizer to synthesizer fixes a typo, and the simplified instruction is more concise and clear.


109-113: Good addition of input validation!

Adding validation for the plan_type parameter is excellent defensive programming. This ensures only valid values are accepted and provides clear error messages.


358-358: LGTM - Consistent typo fix!

Good catch fixing the variable name from sythesizer to synthesizer to maintain consistency with the constructor fix.

src/mcp_agent/workflows/router/router_embedding.py (3)

101-101: Critical fix - Properly await async method!

Good catch adding await before self._compute_embedding(). The _compute_embedding method is async and must be awaited.


103-103: Correct data extraction using model_dump()!

Using model_dump() before unpacking ensures proper data extraction from the Pydantic model into the dictionary format expected by EmbeddingRouterCategory.


209-209: Critical fix - Properly await async method!

Good catch adding await before self._compute_embedding(). This was another missing await for the async embedding computation method.

tests/workflows/orchestrator/test_orchestrator_prompts.py (1)

1-131: Excellent comprehensive test suite for prompt templates!

This test file provides thorough coverage for all orchestrator prompt templates with several strengths:

  1. Complete coverage: Tests all major prompt templates used in the orchestrator workflow
  2. Clear structure: Each test follows a consistent pattern of format → assert
  3. Content validation: Verifies both placeholder substitution and expected content presence
  4. Consistency checks: The test_templates_consistency and test_template_order methods ensure uniform formatting and logical organization
  5. Clear naming: Test method names clearly indicate what's being tested

The test structure is well-organized and provides good confidence in template correctness.

tests/workflows/intent_classifier/README.md (1)

1-70: Excellent test documentation providing comprehensive guidance!

This README effectively documents the intent classifier testing strategy with several strong points:

  1. Clear purpose: Explains what's being tested and why it matters
  2. Mock strategy: Well-explained rationale for using mocks (speed, determinism, no external dependencies)
  3. Practical guidance: Specific commands for running tests at different granularities
  4. Development support: Clear instructions for adding new tests with naming conventions
  5. Test coverage guidelines: Comprehensive list of key test cases to ensure thorough coverage

The documentation will help developers understand the testing approach and maintain consistent test quality across the intent classifier modules.

src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (1)

956-976: LGTM! Improved type handling for content conversion.

The conditional logic to handle both Pydantic model instances and plain dictionaries makes the function more robust and defensive. The implementation correctly extracts type and text attributes based on the block's type, which aligns with the broader pattern of content conversion improvements across LLM provider modules.

tests/workflows/orchestrator/test_orchestrator_models.py (1)

1-266: Excellent comprehensive test coverage for orchestrator models.

This test module provides thorough coverage of the orchestrator data models including:

  • Object creation and attribute validation for all model types
  • Relationship handling between steps, tasks, and results
  • Collection operations (add_task_result, add_step_result)
  • Formatting functions with proper status indicators
  • Edge cases like completion state handling

The test structure is clean with descriptive names and clear assertions. This significantly improves the code coverage as mentioned in the PR objectives.

src/mcp_agent/executor/workflow_signal.py (1)

228-228: LGTM! Improved naming consistency and handler tracking.

The renaming from unique_name to unique_signal_name enhances code clarity. The modification to store handlers as [unique_signal_name, wrapped] tuples instead of just the wrapped function enables better handler tracking and management, which supports more robust signal handling capabilities.

These changes maintain backward compatibility while improving internal structure and readability.

Also applies to: 232-232, 260-260, 267-267, 275-277

src/mcp_agent/workflows/llm/augmented_llm_bedrock.py (2)

473-473: Critical fix: Wrap single content in list for conversion function.

This fix ensures mcp_content_to_bedrock_content receives a list as expected, preventing potential type errors during content conversion.


541-541: Critical bug fixes: Correct variable references in content loop.

These fixes change incorrect references from content (the entire list) to block (the current iteration item), ensuring proper access to individual content block fields like text, image, toolUse, and document. This prevents erroneous data extraction and ensures correct content conversion.

Also applies to: 546-547, 555-555, 563-564

src/mcp_agent/mcp/mcp_aggregator.py (2)

190-190: Good improvement for lifecycle state consistency.

Explicitly setting initialized = False when skipping closure ensures consistent state tracking regardless of connection persistence mode. This prevents potential issues where the aggregator might appear initialized when it shouldn't be.


983-985: Excellent change for deterministic behavior.

Iterating over servers in the configured server_names order instead of arbitrary dictionary iteration makes capability name parsing predictable and consistent. This ensures that when multiple servers provide the same capability name, the resolution order follows the intended server priority.

tests/workflows/swarm/test_swarm_openai.py (1)

1-192: Excellent comprehensive test suite for OpenAISwarm.

This test file provides thorough coverage of the OpenAISwarm functionality:

  • Proper async testing: Uses pytest.mark.asyncio consistently
  • Good fixture design: Well-structured fixtures for mock objects
  • Comprehensive scenarios: Tests initialization, default/custom parameters, multiple iterations, early termination, and DoneAgent behavior
  • Effective mocking: Proper use of AsyncMock and MagicMock to control test behavior
  • Validation coverage: Verifies both method calls and expected parameters/results

The tests effectively validate iteration control logic and agent state management, which are critical for swarm workflow reliability.

tests/executor/temporal/test_temporal_executor.py (1)

1-105: Well-structured test suite for TemporalExecutor.

This test file provides solid coverage of the TemporalExecutor functionality:

  • Good fixture design: Clear separation of concerns with mock_client, mock_context, and executor fixtures
  • Core functionality coverage: Tests client connection, activity wrapping, and task execution paths
  • Execution context handling: Tests both inside and outside workflow contexts with appropriate patching
  • Workflow management: Covers workflow starting and termination with proper mock verification
  • Async testing patterns: Consistent use of pytest.mark.asyncio and proper async/await handling

The tests effectively validate the Temporal integration patterns and ensure the executor behaves correctly under different execution contexts.

src/mcp_agent/workflows/swarm/swarm.py (1)

255-265: Good alignment with resource handling changes.

The update to check for AgentFunctionResultResource type and access nested attributes through content.result.* properly aligns with the resource wrapper pattern. This ensures consistent handling of agent function results and context variable updates.

tests/workflows/router/test_router_llm.py (1)

1-503: Excellent comprehensive test coverage!

This test suite provides thorough coverage of the LLMRouter functionality including:

  • Initialization scenarios (basic and factory methods)
  • Routing with different parameter combinations
  • Edge cases (empty responses, invalid categories)
  • All routing method variants (route, route_to_server, route_to_agent, route_to_function)
  • Context generation with different inclusion flags

The test structure is clean, well-organized, and follows good async testing patterns with proper mocking.

tests/executor/temporal/test_workflow_registry.py (1)

1-10: LGTM! Good import structure and fixture setup.

The imports are appropriate for async testing with proper mocking utilities. The mock executor fixture correctly sets up the necessary async mock structure.

tests/workflows/router/test_router_base.py (3)

14-49: Excellent concrete implementation for testing abstract base class.

The TestRouter class provides a clean, minimal implementation of the abstract Router methods that's perfect for testing the base class functionality without complex dependencies.


220-231: Good mock setup for tools formatting test.

The test correctly sets up mock tools with string attributes. This approach properly tests the formatting logic without relying on complex tool objects.


135-155:

✅ Verification successful

Verify the test logic for multiple initialize calls.

The test mocks get_server_category after initialization to verify it's not called again. However, this approach might not accurately test the actual implementation behavior.


🏁 Script executed:

#!/bin/bash
# Description: Check if Router.initialize method has proper initialization guards

# Search for the initialize method implementation
ast-grep --pattern $'async def initialize(self):
  $$$
  if self.initialized:
    $$$
  $$$'

Length of output: 9047


🏁 Script executed:

#!/bin/bash
# Find TestRouter definition in tests
rg --with-filename --line-number "class TestRouter" -n tests

Length of output: 202


🏁 Script executed:

#!/bin/bash
# Display TestRouter implementation
sed -n '1,200p' tests/workflows/router/test_router_base.py

Length of output: 7848


Multiple initialize calls guard is correctly tested

The initialize method in router_base.py has an early‐return when self.initialized is True, so replacing get_server_category with a mock after the first call and asserting its call_count remains zero on the second invocation accurately verifies that no re‐initialization occurs. No changes required.

tests/executor/test_workflow.py (3)

14-23: Excellent error recording test with proper exception handling.

The test correctly verifies that error information is captured with proper type, message, and timestamp fields. This ensures the error recording functionality works as expected.


35-37: Simple but effective MockWorkflow implementation.

The MockWorkflow provides a clean, minimal implementation of the abstract run method that's perfect for testing the base Workflow functionality.


89-94: Good test for executor property validation.

The test correctly verifies that accessing the executor property raises a ValueError when the executor is None. This ensures proper error handling for missing dependencies.

tests/utils/test_pydantic_type_serializer.py (2)

64-65: Correct update to Pydantic v2 field validator syntax.

The field validators are now properly declared as class methods with the @classmethod decorator, which is required in Pydantic v2.

Also applies to: 91-92


179-181: Good update to schema comparison method.

The change from schema() to model_json_schema() is correct for Pydantic v2 compatibility.

tests/workflows/parallel/test_fan_out.py (1)

214-238: generate/generate_str mocks may need to be AsyncMocks.

The LLM/Agent stubs here are patched with:

mock_llm_with_name.generate.return_value = expected_result
mock_llm_with_name.generate_str.return_value = expected_result

These attributes will only behave as awaitable coroutines if mock_llm_with_name.generate was already an AsyncMock in the upstream fixture. If the underlying fixture supplies a plain MagicMock, await llm.generate(...) inside the real FanOut will raise TypeError: object MagicMock can't be used in 'await' expression.

Consider enforcing the async nature explicitly for safety:

mock_llm_with_name.generate = AsyncMock(return_value=expected_result)
mock_llm_with_name.generate_str = AsyncMock(return_value=expected_result)

(or guarantee the upstream fixture uses AsyncMock).

tests/test_app.py (1)

400-417: Potential logger double-patch confusion.

The test first patches get_logger, initializes the context (which internally may call get_logger), then resets basic_app._logger and patches get_logger again. If the internal initialization logic caches the original logger elsewhere (e.g., in the Context), subsequent calls might unexpectedly use the first mock.

Consider making the intention explicit by asserting no hidden second reference exists:

assert basic_app._context.logger is None  # or whatever the contract is

or reuse the same patched get_logger for both phases.

tests/workflows/router/test_router_embedding.py (1)

1-498: Excellent comprehensive test coverage for EmbeddingRouter!

This test file demonstrates excellent testing practices with thorough coverage of the EmbeddingRouter class functionality:

  • Complete lifecycle testing: Covers initialization, factory methods, embedding computation, and all routing variants
  • Proper async testing: Correctly uses pytest.mark.asyncio and proper mocking for async methods
  • Edge case coverage: Tests empty categories, missing embeddings, and similarity scoring
  • Good isolation: Uses mocks effectively to avoid external dependencies
  • Clear documentation: Each test has descriptive docstrings explaining its purpose

The test structure follows best practices with logical grouping and comprehensive assertions that verify both functionality and ordering of results.

tests/workflows/intent_classifier/test_intent_classifier_embedding_cohere.py (1)

1-285: Well-structured test suite with comprehensive coverage.

This test file provides excellent coverage for the CohereEmbeddingIntentClassifier with:

  • Mock implementation: Clean MockCohereEmbeddingModel that provides deterministic embeddings for testing
  • Complete functionality testing: Covers initialization, factory methods, classification, and edge cases
  • Proper async patterns: Correctly structured async tests with appropriate mocking
  • Error handling: Tests empty intents scenario with proper exception handling

The test structure is consistent with other classifier tests in the codebase, promoting maintainability.

🧰 Tools
🪛 Ruff (0.11.9)

243-243: Loop control variable intent_name not used within loop body

Rename unused intent_name to _intent_name

(B007)

tests/workflows/intent_classifier/test_intent_classifier_embedding_openai.py (1)

1-285: Excellent test coverage for OpenAI embedding-based intent classification.

This test file provides comprehensive testing for the OpenAIEmbeddingIntentClassifier with:

  • Consistent structure: Follows the same testing patterns as other classifier tests, promoting codebase consistency
  • Mock implementation: MockOpenAIEmbeddingModel provides deterministic embeddings suitable for testing
  • Full feature coverage: Tests initialization, factory methods, classification functionality, and error scenarios
  • Proper async handling: Well-structured async tests with appropriate mocking to avoid external API calls

The parallel structure with the Cohere tests makes the test suite easy to understand and maintain.

🧰 Tools
🪛 Ruff (0.11.9)

243-243: Loop control variable intent_name not used within loop body

Rename unused intent_name to _intent_name

(B007)

tests/executor/temporal/test_signal_handler.py (1)

1-99: Comprehensive signal handling test coverage.

This test file provides excellent coverage for the Temporal signal handling functionality:

  • Clean fixture design: Well-organized pytest fixtures that promote test reusability
  • Complete scenario coverage: Tests both in-workflow and out-of-workflow signal handling paths
  • Proper mocking: Appropriately mocks Temporal workflow internals without testing external dependencies
  • Edge case handling: Includes tests for nonexistent signals and idempotent operations
  • Async testing: Correctly structured async tests for temporal operations

The tests effectively verify the signal mailbox functionality, workflow attachment behavior, and both internal and external signaling mechanisms.

tests/workflows/router/test_router_llm_anthropic.py (1)

155-176: Ensure test does not indirectly depend on the real anthropic package

The test patches AnthropicAugmentedLLM, but a hard import of mcp_agent.workflows.router.router_llm_anthropic (line 8) may still execute a top-level import anthropic inside that module.
If anthropic is not listed in the project’s test dependencies, CI will raise ModuleNotFoundError.

Options

  1. Declare anthropic (or a lightweight stub such as anthropic==0.0.0) in the test extras / pyproject.toml.
  2. Lazily import the router inside the patched context, e.g.:
with patch("mcp_agent.workflows.router.router_llm_anthropic.AnthropicAugmentedLLM",
           MockAnthropicAugmentedLLM):
    from mcp_agent.workflows.router.router_llm_anthropic import AnthropicLLMRouter
  1. Or monkey-patch sys.modules["anthropic"] = ModuleType("anthropic") before the first import.

Pick whichever suits the repo’s dependency policy.

tests/workflows/intent_classifier/test_intent_classifier_llm_openai.py (1)

448-470: initialize spy counts two calls even when the second is a no-op

Wrapping the real initialize method with AsyncMock(wraps=real_initialize) increments the call-count before the method’s internal early-exit hits, so the assertion verifies the spy, not the optimisation.

If you want to assert the internal short-circuit, introspect a flag instead:

await classifier.initialize()
await classifier.initialize()
assert classifier._already_initialized  # or similar internal flag

Otherwise this test may pass even if the second call performs heavy work again.

tests/workflows/llm/test_augmented_llm_azure.py (1)

293-296: Assertion validates response content inside the request payload – very likely wrong

req.payload["messages"] represents the prompt sent to Azure, yet the test asserts it already contains the assistant’s reply:

assert req.payload["messages"][1].content == "Response without history"

At request-time the assistant message does not exist; only the user (and possibly system/history) messages should be present.
The check should compare against the user query or simply verify that only one user message is present when use_history=False.

src/mcp_agent/workflows/llm/augmented_llm_openai.py (1)

944-950: model="" may introduce sentinel-value ambiguity
to_mcp_message_result now hard-codes model="" instead of None. Down-stream code that previously relied on None to signal “model unknown/unset” will now receive an empty string, which is truthy in many contexts (e.g., if result.model:).

Unless you have audited all call-sites, prefer keeping None or make the change explicit via typed docs & unit tests.

src/mcp_agent/workflows/llm/augmented_llm_azure.py (1)

225-227: Token-usage accessor may not match SDK object – please double-check.

response.usage is typically an object exposing attributes such as prompt_tokens / completion_tokens (at least in the GA Azure SDK). Switching to response.usage["prompt_tokens"] and ["completion_tokens"] will raise TypeError: 'Usage' object is not subscriptable if the SDK version still uses attributes.

If you intentionally upgraded to a newer SDK that returns a mapping-style usage, please ignore; otherwise revert to attribute access or add a defensive fallback:

-total_input_tokens  += response.usage["prompt_tokens"]
-total_output_tokens += response.usage["completion_tokens"]
+total_input_tokens  += getattr(response.usage, "prompt_tokens", response.usage.get("prompt_tokens"))
+total_output_tokens += getattr(response.usage, "completion_tokens", response.usage.get("completion_tokens"))
tests/workflows/evaluator_optimizer/test_evaluator_optimizer.py (1)

120-122: Enum rendering mismatch – expected numeric vs. name

The test asserts the prompt contains the literal string "Quality Rating: 1".
_build_refinement_prompt interpolates feedback.rating, whose string
representation may be "FAIR" or "QualityRating.FAIR" depending on the enum
definition, not "1". The test could become flaky if the enum __str__
implementation changes.

Please verify the intended format and adjust either the production code or the
assertion:

-    assert "Quality Rating: 1" in prompt
+    assert "Quality Rating: FAIR" in prompt   # or the agreed representation
tests/workflows/llm/test_augmented_llm_anthropic.py (1)

318-334: create_tool_use_side_effect inspects kwargs["messages"] but executor passes messages positionally

Inside side_effect, the check

messages = kwargs.get("messages", [])

will always return [] because in production the request object is passed as a positional arg (e.g., await executor.execute(self, request)), not as a keyword. Consequently the “final iteration” condition may never trigger, making the test logic depend only on call_count.

Consider extracting the request object from args[1], as done elsewhere in the suite, e.g.:

request_obj = args[1]
messages = request_obj.payload["messages"]
src/mcp_agent/utils/pydantic_type_serializer.py (4)

303-314: Good migration to Pydantic v2 validator API

The switch to validator.info.fields is correct and the cast to str
keeps JSON payloads clean.
No further action required.


420-425: 👍 Prefer type_ for private attrs

Using type_ where available brings v2 parity while still falling back
to Any. Looks good.


445-455: Efficient handling of ConfigDict

Treating ConfigDict as a mapping and early-returning prevents an
unnecessary reflection pass—nice improvement.


538-541: Literal deserialization looks correct

Using Literal.__getitem__ with the raw tuple reconstructs the type
accurately. Good job mirroring the v2 typing internals.

Comment on lines +403 to +409
# Setup mock executor to raise an exception
mock_llm.executor.execute = AsyncMock(return_value=Exception("API Error"))

# Call LLM
responses = await mock_llm.generate("Test query with API error")

# Assertions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Same “exception should be raised” issue as in the OpenAI test

The mock returns an Exception object instead of raising it, so the code under test is never put in the error path.

-# Setup mock executor to raise an exception
-mock_llm.executor.execute = AsyncMock(return_value=Exception("API Error"))
+# Setup mock executor to raise an exception
+mock_llm.executor.execute = AsyncMock(side_effect=Exception("API Error"))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Setup mock executor to raise an exception
mock_llm.executor.execute = AsyncMock(return_value=Exception("API Error"))
# Call LLM
responses = await mock_llm.generate("Test query with API error")
# Assertions
# Setup mock executor to raise an exception
mock_llm.executor.execute = AsyncMock(side_effect=Exception("API Error"))
# Call LLM
responses = await mock_llm.generate("Test query with API error")
# Assertions
🤖 Prompt for AI Agents
In tests/workflows/llm/test_augmented_llm_anthropic.py around lines 403 to 409,
the mock executor is set to return an Exception object instead of raising it, so
the error path is not triggered. Change the AsyncMock setup to raise the
Exception by using side_effect=Exception("API Error") instead of return_value,
ensuring the code under test properly encounters the raised exception.

Comment on lines +719 to +735
# Patch __init__ to ensure private attributes are initialized on instance
private_attrs = getattr(reconstructed_model, "__private_attributes__", {})
if private_attrs:
orig_init = reconstructed_model.__init__

def _init_with_private_attrs(self, *args, **kwargs):
orig_init(self, *args, **kwargs)
for attr_name, private_attr in private_attrs.items():
# Only set if not already set
if not hasattr(self, attr_name):
default = private_attr.default
# If default is ... (Ellipsis), treat as None
if default is ...:
default = None
setattr(self, attr_name, default)

reconstructed_model.__init__ = _init_with_private_attrs
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Monkey-patching __init__ may break dataclass-like signatures

Overwriting __init__:

  1. Removes the rich signature Pydantic generates (hurts tooling &
    validation).
  2. Breaks multiple inheritance if models are later subclassed.
  3. Runs for every reconstructed model, even when Pydantic would have
    already initialised private attrs (v1 & v2).

Prefer letting Pydantic handle private attributes:

-        private_attrs = getattr(reconstructed_model, "__private_attributes__", {})
-        if private_attrs:
-            orig_init = reconstructed_model.__init__
-            def _init_with_private_attrs(self, *args, **kwargs):
-                orig_init(self, *args, **kwargs)
-                ...
-            reconstructed_model.__init__ = _init_with_private_attrs
+        # Pydantic takes care of initialising PrivateAttr defaults; no patch needed.

If a specific bug required this hook, add a comment with a minimal
reproduction and scope the patch to that scenario.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/mcp_agent/utils/pydantic_type_serializer.py around lines 719 to 735, the
current approach monkey-patches the __init__ method of reconstructed_model to
initialize private attributes, which removes Pydantic's rich __init__ signature,
breaks multiple inheritance, and runs unnecessarily for all models. To fix this,
remove the __init__ monkey-patching entirely and rely on Pydantic's built-in
handling of private attributes. If a specific bug requires this patch, add a
detailed comment explaining the minimal reproduction and limit the patch only to
that case.

Comment on lines 683 to 701
field_definitions[field_name] = (
field_type,
Field(
default=default if default_factory is None else ...,
default=... if default_factory is not None else default,
default_factory=default_factory,
**constraints,
),
)
else:
field_definitions[field_name] = (field_type, default)
if is_required:
field_definitions[field_name] = (field_type, Field(default=...))
else:
field_definitions[field_name] = (
field_type,
Field(
default=default,
),
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid passing default_factory=None & reconcile required logic

Field(..., default_factory=None) is semantically meaningless and may
raise in future Pydantic releases.
Also, when is_required=False but default is still ..., the field
remains required.

-                    Field(
-                        default=... if default_factory is not None else default,
-                        default_factory=default_factory,
+                    kwargs = {**constraints}
+                    if default_factory is not None:
+                        kwargs["default_factory"] = default_factory
+                        kwargs["default"] = ...
+                    elif default is not ...:
+                        kwargs["default"] = default
+                    else:
+                        kwargs["default"] = ...
+                    Field(**kwargs),

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/mcp_agent/utils/pydantic_type_serializer.py around lines 683 to 701,
avoid passing default_factory=None to Field as it is meaningless and may cause
errors in future Pydantic versions. Also, ensure that when is_required is False,
the default is not set to ... which makes the field required. Fix this by only
passing default_factory if it is not None and by setting default to a proper
non-required value when is_required is False.

StreetLamb added 16 commits May 25, 2025 14:10
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
tests/workflows/llm/test_augmented_llm_azure.py (1)

402-403: Fix pending: API error test never raises – use side_effect instead of return_value

The mock executor is still returning an Exception object instead of raising it, which means the error handling code path won't be properly tested.

- mock_llm.executor.execute = AsyncMock(return_value=Exception("API Error"))
+ mock_llm.executor.execute = AsyncMock(side_effect=Exception("API Error"))
src/mcp_agent/utils/pydantic_type_serializer.py (3)

393-393: Consider Pydantic v1 compatibility for broader adoption

While the current approach uses Pydantic v2's is_required() method, maintaining compatibility with v1 would increase the utility's adoption:

- "required": field_info.is_required(),
+ "required": getattr(field_info, "is_required", lambda: getattr(field_info, "required", True))(),

683-701: Avoid passing default_factory=None to prevent future issues

The current code may pass default_factory=None to Field(), which is semantically meaningless and could cause errors in future Pydantic versions:

- Field(
-     default=... if default_factory is not None else default,
-     default_factory=default_factory,
-     **constraints,
- ),
+ kwargs = {**constraints}
+ if default_factory is not None:
+     kwargs["default_factory"] = default_factory
+     kwargs["default"] = ...
+ elif default is not ...:
+     kwargs["default"] = default
+ else:
+     kwargs["default"] = ...
+ Field(**kwargs),

719-735: Consider removing __init__ monkey-patching for better maintainability

The current approach of monkey-patching __init__ can break Pydantic's rich signatures and inheritance patterns. Consider relying on Pydantic's built-in private attribute handling instead, unless this patch addresses a specific documented issue.

tests/workflows/llm/test_augmented_llm_openai.py (1)

382-383: Fix pending: API error test never raises – use side_effect instead of return_value

The mock executor is returning an Exception object instead of raising it, preventing proper testing of error handling logic:

- # Setup mock executor to raise an exception
- mock_llm.executor.execute = AsyncMock(return_value=Exception("API Error"))
+ # Setup mock executor to raise an exception
+ mock_llm.executor.execute = AsyncMock(side_effect=Exception("API Error"))
tests/workflows/swarm/test_swarm.py (1)

201-220: ⚠️ Potential issue

Fix mocking anti-pattern - the test defeats its own purpose.

This test mocks the call_tool method on mock_swarm_agent and then calls the same mocked method, which doesn't actually test the real SwarmAgent.call_tool logic when a tool is not in _function_tool_map.

The test should verify that SwarmAgent.call_tool falls back to super().call_tool() when the tool name is not found in _function_tool_map. Replace this with a test that uses a real SwarmAgent instance:

-    @pytest.mark.asyncio
-    async def test_call_tool_with_non_function_tool(
-        self, mock_swarm_agent, mock_tool_response
-    ):
-        """Test call_tool with a non-function tool."""
-        # Set up mocks
-        mock_swarm_agent._function_tool_map = {}
-        mock_swarm_agent.initialized = True
-        mock_swarm_agent.call_tool = AsyncMock(return_value=mock_tool_response)
-
-        # Call the method directly without using Agent.call_tool
-        # We're testing that the SwarmAgent's call_tool method works when the tool
-        # is not in the function tool map
-        result = await mock_swarm_agent.call_tool("non_function_tool", {"arg": "value"})
-
-        # Assert the call was made and the result was returned
-        mock_swarm_agent.call_tool.assert_called_once_with(
-            "non_function_tool", {"arg": "value"}
-        )
-        assert result == mock_tool_response
+    @pytest.mark.asyncio
+    async def test_call_tool_with_non_function_tool(self, mock_tool_response):
+        """Test call_tool with a non-function tool falls back to parent class."""
+        # Create a real SwarmAgent instance
+        agent = SwarmAgent(
+            name="test_agent",
+            instruction="Test instruction",
+            server_names=[],
+            functions=[],
+            context=Context(),
+        )
+        agent._function_tool_map = {}  # Ensure tool is not in function map
+        agent.initialized = True
+        
+        # Mock the parent class call_tool method
+        with patch.object(agent.__class__.__bases__[0], 'call_tool', new_callable=AsyncMock) as mock_parent_call_tool:
+            mock_parent_call_tool.return_value = mock_tool_response
+            
+            # Call the method
+            result = await agent.call_tool("non_function_tool", {"arg": "value"})
+            
+            # Assert parent method was called and result returned
+            mock_parent_call_tool.assert_called_once_with("non_function_tool", {"arg": "value"})
+            assert result == mock_tool_response

You'll also need to add the patch import:

+from unittest.mock import AsyncMock, MagicMock, patch
-from unittest.mock import AsyncMock, MagicMock
🧹 Nitpick comments (2)
tests/workflows/llm/test_augmented_llm_ollama.py (1)

186-208: Simplify complex mocking pattern.

The execute_side_effect function and nested mocking are quite complex and brittle. Consider a simpler approach:

-            # Patch executor.execute to forward to the patched structured task
-            async def execute_side_effect(task, request):
-                if (
-                    task is mock_structured_task._mock_wraps
-                    or task is mock_structured_task
-                ):
-                    return await mock_structured_task(request)
-                return MagicMock()
-
-            llm.executor.execute = AsyncMock(side_effect=execute_side_effect)
+            # Mock executor.execute to return a simple result
+            llm.executor.execute = AsyncMock(return_value=TestResponseModel(name="test"))

This approach:

  • Reduces complexity and potential brittleness
  • Makes the test easier to understand
  • Focuses on the essential behavior being tested
tests/workflows/swarm/test_swarm.py (1)

36-36: Consider using the mock_context fixture consistently.

Multiple tests create Context() instances directly instead of using the provided mock_context fixture. While this works, using the fixture would be more consistent and allow for better test isolation.

Consider updating these tests to use the mock_context fixture:

-        context=Context(),
+        context=mock_context,

This ensures all tests use the same mocked context setup and makes the tests more maintainable.

Also applies to: 58-58, 84-84, 93-93, 121-121, 149-149, 184-184

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4284d6d and c12c995.

📒 Files selected for processing (19)
  • pyproject.toml (1 hunks)
  • src/mcp_agent/utils/pydantic_type_serializer.py (8 hunks)
  • src/mcp_agent/workflows/llm/augmented_llm_openai.py (4 hunks)
  • src/mcp_agent/workflows/swarm/swarm.py (2 hunks)
  • tests/executor/temporal/test_workflow_registry.py (1 hunks)
  • tests/executor/test_workflow.py (1 hunks)
  • tests/mcp/test_mcp_connection_manager.py (1 hunks)
  • tests/workflows/intent_classifier/conftest.py (1 hunks)
  • tests/workflows/llm/conftest.py (1 hunks)
  • tests/workflows/llm/test_augmented_llm_azure.py (1 hunks)
  • tests/workflows/llm/test_augmented_llm_bedrock.py (1 hunks)
  • tests/workflows/llm/test_augmented_llm_google.py (1 hunks)
  • tests/workflows/llm/test_augmented_llm_ollama.py (1 hunks)
  • tests/workflows/llm/test_augmented_llm_openai.py (3 hunks)
  • tests/workflows/orchestrator/conftest.py (1 hunks)
  • tests/workflows/parallel/test_parallel_llm.py (1 hunks)
  • tests/workflows/router/conftest.py (1 hunks)
  • tests/workflows/router/test_router_embedding_openai.py (1 hunks)
  • tests/workflows/swarm/test_swarm.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (13)
  • tests/workflows/llm/conftest.py
  • tests/workflows/intent_classifier/conftest.py
  • tests/executor/test_workflow.py
  • src/mcp_agent/workflows/swarm/swarm.py
  • tests/mcp/test_mcp_connection_manager.py
  • tests/workflows/parallel/test_parallel_llm.py
  • tests/executor/temporal/test_workflow_registry.py
  • tests/workflows/router/test_router_embedding_openai.py
  • tests/workflows/llm/test_augmented_llm_google.py
  • tests/workflows/router/conftest.py
  • tests/workflows/orchestrator/conftest.py
  • tests/workflows/llm/test_augmented_llm_bedrock.py
  • src/mcp_agent/workflows/llm/augmented_llm_openai.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/workflows/swarm/test_swarm.py (5)
src/mcp_agent/agents/agent.py (1)
  • CallToolRequest (789-798)
src/mcp_agent/workflows/swarm/swarm.py (6)
  • AgentFunctionResult (147-161)
  • SwarmAgent (74-144)
  • DoneAgent (314-327)
  • create_agent_resource (50-58)
  • create_agent_function_result_resource (61-71)
  • should_continue (304-311)
src/mcp_agent/workflows/swarm/swarm_openai.py (1)
  • OpenAISwarm (9-41)
src/mcp_agent/core/context.py (1)
  • Context (50-79)
tests/workflows/swarm/conftest.py (7)
  • mock_context (11-17)
  • test_function_result (56-58)
  • test_function_agent_function_result (68-70)
  • mock_swarm_agent (34-46)
  • mock_tool_response (80-82)
  • mock_agent (21-30)
  • done_agent (50-52)
🔇 Additional comments (13)
src/mcp_agent/utils/pydantic_type_serializer.py (4)

184-191: LGTM! Literal values are now properly serialized

The code now correctly wraps each literal argument with make_serializable() to ensure JSON compatibility, addressing the previous concern about non-JSON-serializable types like bytes.


303-303: LGTM! Pydantic v2 field validator compatibility

The change from validator.fields to validator.info.fields correctly aligns with Pydantic v2 API structure.


420-424: LGTM! Enhanced private attribute handling for Pydantic v2

The fallback to Any when type_ is not available ensures compatibility while leveraging v2's improved type information when present.


445-455: LGTM! Improved config serialization with proper JSON validation

The new logic correctly handles both dict-based configs (Pydantic v2 ConfigDict) and class-based configs, with appropriate JSON serializability checks.

tests/workflows/llm/test_augmented_llm_openai.py (2)

70-91: LGTM! Clean test response factory method

The static method provides a clean way to create mock OpenAI chat completion responses with proper typing and structure.


94-126: LGTM! Comprehensive tool use response creation

The method correctly constructs tool call responses with proper OpenAI structure and handles the nested function call format appropriately.

tests/workflows/llm/test_augmented_llm_ollama.py (5)

1-17: LGTM: Clean test setup and clear documentation.

The imports are appropriate and the class documentation clearly explains the testing scope, focusing on Ollama-specific functionality while acknowledging inheritance from OpenAIAugmentedLLM.


19-50: Well-designed fixtures with good separation of concerns.

The fixture design is excellent:

  • mock_llm provides a pre-configured instance for common use cases
  • mock_context_factory allows creating clean contexts for specific test scenarios
  • Good use of realistic test data in the OpenAISettings

129-163: LGTM: Well-structured async test for JSON mode.

The test properly:

  • Uses appropriate async/await patterns
  • Mocks dependencies correctly
  • Tests the expected JSON mode behavior
  • Has clear assertions

214-218: LGTM: Clear verification of client initialization.

The assertions properly verify that the correct API key and base URL are passed to the structured completion task, which is the core behavior being tested.


93-93:

✅ Verification successful

Verify if commented line is needed.

There's a commented line for reasoning_effort. Ensure this isn't required for proper test functionality.


🏁 Script executed:

#!/bin/bash
# Check if reasoning_effort is accessed during OpenAIAugmentedLLM initialization
ast-grep --pattern 'reasoning_effort'

Length of output: 907


🏁 Script executed:

#!/bin/bash
# Display lines 1-120 around the commented reasoning_effort in the test file
sed -n '1,120p' tests/workflows/llm/test_augmented_llm_ollama.py

Length of output: 5013


🏁 Script executed:

#!/bin/bash
# Extract the OllamaAugmentedLLM class definition and its init method
ast-grep --pattern $'class OllamaAugmentedLLM { $$$ init($_) { $$$ } $$$ }' --language javascript

Length of output: 287


Drop the commented-out reasoning_effort assignment
The MagicMock in this test is already created with a spec that includes reasoning_effort, so it will automatically return a MagicMock value when accessed—and this attribute isn’t asserted anywhere in test_initialization. You can safely remove the commented line for clarity.

• tests/workflows/llm/test_augmented_llm_ollama.py (line ~93)

tests/workflows/swarm/test_swarm.py (2)

23-512: Comprehensive test coverage with good structure.

The test suite provides excellent coverage of the swarm workflow components with well-structured test cases that cover various scenarios including normal operations, edge cases, and error conditions. The tests properly exercise the real methods rather than mocking them (except for the one identified issue), and the assertions validate the expected behavior correctly.


206-206:

❌ Incorrect review comment

Potential AttributeError with spec-based mock.

Based on the fixture definition in conftest.py, mock_swarm_agent is created with MagicMock(spec=SwarmAgent). Direct assignment to _function_tool_map may cause an AttributeError since this is an instance attribute created in __init__, not a class attribute.

Let me verify if the fixture definition still has this issue:


🏁 Script executed:

#!/bin/bash
# Check the mock_swarm_agent fixture definition to see if it uses spec=SwarmAgent
cat tests/workflows/swarm/conftest.py | grep -A 15 "def mock_swarm_agent"

Length of output: 537


🏁 Script executed:

#!/bin/bash
# Search for the SwarmAgent class definition and its __init__ to locate _function_tool_map
rg -n "class SwarmAgent" -C5
rg -n "def __init__" -A10 $(rg -l "class SwarmAgent" -g '*.py')

Length of output: 2345


🏁 Script executed:

#!/bin/bash
# Check if _function_tool_map is defined in SwarmAgent class
rg -n "_function_tool_map" src/mcp_agent/workflows/swarm/swarm.py

Length of output: 166


🏁 Script executed:

#!/bin/bash
python3 - <<'EOF'
from unittest.mock import MagicMock
class Foo: pass

# Test spec-based mock assignment
m_spec = MagicMock(spec=Foo)
m_spec.new_attr = 5
print("spec assignment:", m_spec.new_attr)

# Test spec_set-based mock assignment
from unittest.mock import MagicMock as MM2
m_spec_set = MM2(spec_set=Foo)
try:
    m_spec_set.other = 6
    print("spec_set assignment succeeded:", m_spec_set.other)
except Exception as e:
    print("spec_set error:", type(e).__name__, e)
EOF

Length of output: 98


No change needed: MagicMock(spec=…) allows new attributes
Assigning mock_swarm_agent._function_tool_map = {} will not raise an AttributeError—only spec_set mocks disallow new attributes. The existing fixture is safe as written.

Likely an incorrect or invalid review comment.

@StreetLamb StreetLamb requested a review from saqadri May 25, 2025 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants