-
Notifications
You must be signed in to change notification settings - Fork 433
Improve code coverage of mcp-agent #215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Deleted the `test_event_progress.py` file as it is no longer needed. - Added a new README.md file in the LLM tests directory to document test coverage and usage. - Created a `conftest.py` file for shared fixtures used across LLM provider tests. - Implemented comprehensive tests for the `OpenAIAugmentedLLM` class, covering various functionalities including text generation, structured output, tool usage, error handling, and request parameter merging.
* add support for Azure Identity instead of keys * make credential_scopes configurable, but with a default to `https://cognitiveservices.azure.com/.default` --------- Co-authored-by: Brian Caswell <[email protected]>
* Add streamable HTTP support to mcp-agent * Session ID support for session resumption * Example to showcase streamable HTTP server support
…kMCPTypeConverter
…blocks in anthropic_content_to_mcp_content
…beddingIntentClassifier
…tentClassifier functionality
- Created test package for orchestrator workflow module. - Implemented unit tests for orchestrator methods, including initialization and execution of plans. - Added integration tests for full and iterative workflow executions. - Developed tests for orchestrator data models and prompt templates. - Ensured comprehensive coverage of task, step, and plan result formatting.
…ither 'full' or 'iterative'
… test_parallel_llm.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 27
🔭 Outside diff range comments (2)
src/mcp_agent/workflows/llm/augmented_llm_openai.py (1)
1029-1051
: 🛠️ Refactor suggestionLimited content-type coverage in
mcp_content_to_openai_content_part
The helper downgradesImageContent
and binaryEmbeddedResource
blocks to plain text ("<mime>:<data>"
). The OpenAI SDK already offersChatCompletionContentPartImageParam
; losing the structured form reduces model performance on vision-capable models.Recommend adding proper image part construction:
@@ elif isinstance(content, ImageContent): - # Best effort to convert an image to text - return ChatCompletionContentPartTextParam( - type="text", text=f"{content.mimeType}:{content.data}" - ) + return ChatCompletionContentPartImageParam( + type="image_url", + image_url={"url": f"data:{content.mimeType};base64,{content.data}"}, + )…and import
ChatCompletionContentPartImageParam
.src/mcp_agent/utils/pydantic_type_serializer.py (1)
648-662
:⚠️ Potential issueCustom
default_factory
functions are silently discardedOnly the three built-ins are restored; any user-defined factory is lost,
leading to wrong defaults or validation errors.- else: - default_factory = None + else: + # Attempt to re-import the factory by name from the original + # module, otherwise keep the string for later resolution. + try: + mod = importlib.import_module(field_info["default_factory_module"]) + default_factory = getattr(mod, default_factory) + except Exception: + # Fallback: preserve the string so consumers know it existed + default_factory = default_factoryYou’ll need to store
default_factory_module
during serialization to
support this restoration.
♻️ Duplicate comments (2)
tests/workflows/orchestrator/test_orchestrator.py (2)
357-365
: Same SIM117 issue—combine thewith
statementsSee previous comment for justification.
-with patch.object(Orchestrator, "_get_next_step", mock_get_next_step): - with patch.object(Orchestrator, "_execute_step", mock_execute_step): +with patch.object(Orchestrator, "_get_next_step", mock_get_next_step), \ + patch.object(Orchestrator, "_execute_step", mock_execute_step):🧰 Tools
🪛 Ruff (0.11.9)
357-358: Use a single
with
statement with multiple contexts instead of nestedwith
statements(SIM117)
430-437
: Combine the patch contexts here as well-with patch.object(Orchestrator, "_get_full_plan", mock_get_full_plan): - with patch.object(Orchestrator, "_execute_step", mock_execute_step): +with patch.object(Orchestrator, "_get_full_plan", mock_get_full_plan), \ + patch.object(Orchestrator, "_execute_step", mock_execute_step):🧰 Tools
🪛 Ruff (0.11.9)
430-431: Use a single
with
statement with multiple contexts instead of nestedwith
statements(SIM117)
🧹 Nitpick comments (45)
src/mcp_agent/workflows/llm/augmented_llm_google.py (1)
442-451
: Improve code readability with ternary operator.The role normalization logic is correct and aligns with MCP compatibility requirements. However, the code can be simplified using a ternary operator as suggested by static analysis.
Apply this diff to improve readability:
- if result.role == "model": - role = "assistant" - else: - role = result.role + role = "assistant" if result.role == "model" else result.role return MCPMessageResult( role=role, content=contents[0], model="", stopReason=None, )🧰 Tools
🪛 Ruff (0.11.9)
442-445: Use ternary operator
role = "assistant" if result.role == "model" else result.role
instead ofif
-else
-blockReplace
if
-else
-block withrole = "assistant" if result.role == "model" else result.role
(SIM108)
tests/workflows/swarm/test_swarm_anthropic.py (1)
135-144
: Extract repeated message string to a constant.The follow-up message "Please resolve my original request. If it has already been resolved then end turn" is repeated in multiple test assertions (lines 137 and 143). This should be extracted to a module-level constant for better maintainability.
+# At the top of the file, after imports +FOLLOW_UP_MESSAGE = "Please resolve my original request. If it has already been resolved then end turn" # In test assertions, replace the hardcoded strings: - assert ( - second_call_kwargs["message"] - == "Please resolve my original request. If it has already been resolved then end turn" - ) + assert second_call_kwargs["message"] == FOLLOW_UP_MESSAGE - assert ( - third_call_kwargs["message"] - == "Please resolve my original request. If it has already been resolved then end turn" - ) + assert third_call_kwargs["message"] == FOLLOW_UP_MESSAGEtests/workflows/llm/test_augmented_llm_ollama.py (1)
94-94
: Remove or uncomment the commented code.There's a commented line that should either be removed if it's not needed or uncommented with proper justification.
mock_openai_config_for_custom.api_key = "test_api_key" - # mock_openai_config_for_custom.reasoning_effort = "medium"
If this line is needed, uncomment it. If not, remove it entirely to keep the code clean.
tests/mcp/test_mcp_connection_manager.py (1)
126-149
: Address the commented out test with TODO.There's a commented out test for initialization failure with a TODO comment from "jerron". This should either be fixed and uncommented, or removed if it's no longer needed.
The test appears to be checking for proper error handling when server initialization fails. If you need help implementing this test or determining whether it's still needed, I can assist with:
- Analyzing why the test might be failing
- Providing an alternative implementation approach
- Helping determine if this test case is still relevant
Would you like me to help resolve this TODO item?
tests/executor/temporal/test_workflow_registry.py (2)
18-26
: Good test coverage but consider adding edge case validation.The test correctly verifies registration and retrieval functionality. However, consider adding assertions to verify the internal state consistency beyond just the workflow retrieval.
async def test_register_and_get_workflow(registry): mock_workflow = MagicMock(name="test_workflow") run_id = "run-id" workflow_id = "workflow-id" await registry.register(mock_workflow, run_id, workflow_id) workflow = await registry.get_workflow(run_id) assert workflow == mock_workflow assert registry._workflow_ids[workflow_id] == [run_id] + # Verify the workflow is in local workflows + assert run_id in registry._local_workflows + assert registry._local_workflows[run_id] == mock_workflow
74-77
: Test method name doesn't match the actual test behavior.The test name suggests it's testing error status, but it's actually testing the case where a workflow doesn't exist. Consider renaming for clarity.
-async def test_get_workflow_status_error(registry, mock_executor): +async def test_get_workflow_status_nonexistent_workflow(registry, mock_executor): # Should return error status if workflow_id is missing result = await registry.get_workflow_status("nonexistent") assert result is Falsetests/workflows/router/test_router_base.py (1)
74-84
: Good validation test but consider more specific error messages.The test correctly validates that empty inputs raise a ValueError. Consider checking the specific error message to ensure it's meaningful for users.
def test_initialization_with_empty_inputs(self, mock_context): """Tests initialization fails when no routing targets are provided.""" - with pytest.raises(ValueError): + with pytest.raises(ValueError, match="At least one of server_names, agents, or functions must be provided"): # Initialize with empty inputs _ = TestRouter( server_names=[], agents=[], functions=[], context=mock_context, )tests/utils/test_pydantic_type_serializer.py (2)
213-213
: Consider using model_config property consistently.The config access pattern using
getattr
with defaults is defensive, but ensure this pattern is consistent with the actual Pydantic v2 ConfigDict structure.# Test config is preserved -assert getattr(ComplexModelReconstructed.model_config, "validate_assignment", True) +assert ComplexModelReconstructed.model_config.get("validate_assignment", True) assert getattr( - ComplexModelReconstructed.model_config, "arbitrary_types_allowed", True + ComplexModelReconstructed.model_config.get("arbitrary_types_allowed", True) )Also applies to: 215-215
252-253
: Address the commented-out tests to improve coverage.The TODO comments indicate these tests need to be fixed. Consider prioritizing the resolution of these test issues to maintain comprehensive test coverage.
These tests appear to be related to Annotated fields and import cycles. Would you like me to help investigate the specific issues preventing these tests from passing and suggest potential fixes?
Also applies to: 296-297
tests/workflows/parallel/test_fan_out.py (2)
39-61
: Fixture mis‐labels an AugmentedLLM as an “agent”.
fan_out_with_llms
passesmock_llm_with_name
via theagents
parameter:return FanOut( agents=[mock_llm_with_name], # <-- actually an AugmentedLLM context=mock_context, )If
FanOut
differentiates between realAgent
instances (which requirellm_factory
) and rawAugmentedLLM
objects, this fixture relies on an implementation detail that “anything with a.generate()
method will work”.
Should the public constructor instead expose a dedicated keyword (e.g.llms=[...]
) to make the intent explicit? This would avoid confusion for future maintainers and catch mis-use at type-checking time.
290-319
: Assertion misses argument verification for the executor.
execute_many
is the work-horse that runs all fan-out calls concurrently, but the tests only assertassert_called_once()
. Verifying the callable list and its length would give stronger coverage:mock_context.executor.execute_many.assert_called_once() calls = mock_context.executor.execute_many.call_args[0][0] assert len(calls) == 3 # agent, llm, functionThis protects against regressions where items silently drop out of the fan-out collection.
tests/workflows/parallel/test_parallel_llm.py (1)
276-297
:MagicMock
forfan_in
lacks async spec – risk of false positives.
parallel_llm_with_agent.fan_in
is set to a bareMagicMock
, and later tests monkey-patchfan_in.generate
/generate_str
withAsyncMock
. BecauseMagicMock
does not enforce attribute existence, typos such asgenrate
would silently pass. PreferAsyncMock
withspec_set
to catch such mistakes:pll.fan_in = AsyncMock(spec_set=['generate', 'generate_str', 'generate_structured'])This strengthens the tests without additional complexity.
tests/test_app.py (2)
224-231
: Combine nestedwith
blocks to reduce indentation noise.Ruff (SIM117) correctly points out that two sequential
with
statements can be merged:with patch("mcp_agent.app.initialize_context", AsyncMock(return_value=mock_context)), \ patch("mcp_agent.app.cleanup_context", AsyncMock()) as mock_cleanup: ...This small refactor improves readability and avoids an extra indentation level. The same pattern recurs at lines 250-257 and 265-272.
🧰 Tools
🪛 Ruff (0.11.9)
224-227: Use a single
with
statement with multiple contexts instead of nestedwith
statementsCombine
with
statements(SIM117)
158-178
: Windows loop-policy test leaks a fakeasyncio
module intosys.modules
.Within the context manager:
import_patch = patch.dict( "sys.modules", {"asyncio": MagicMock(WindowsProactorEventLoopPolicy=mock_policy_class)}, )Any import of
asyncio
inside the body receives this stub rather than the real module – fine for the test – but ifMCPApp
(or its transitive imports) accesses otherasyncio
attributes during initialization they will be missing, leading to brittle test failures.A safer pattern is:
- Keep the real
asyncio
insys.modules
.- Monkey-patch only the missing attribute:
with patch.object(asyncio, "WindowsProactorEventLoopPolicy", mock_policy_class), \ patch("sys.platform", "win32"), \ patch("asyncio.set_event_loop_policy") as mock_set: MCPApp(name="test_app")This preserves the full
asyncio
surface while still exercising the Windows branch.tests/workflows/intent_classifier/test_intent_classifier_embedding_cohere.py (1)
243-243
: Fix unused loop variable.The loop variable
intent_name
is not used within the loop body. This should be renamed to indicate it's intentionally unused.- for intent_name, intent in classifier.intents.items(): + for _intent_name, intent in classifier.intents.items():🧰 Tools
🪛 Ruff (0.11.9)
243-243: Loop control variable
intent_name
not used within loop bodyRename unused
intent_name
to_intent_name
(B007)
tests/workflows/intent_classifier/test_intent_classifier_embedding_openai.py (1)
243-243
: Fix unused loop variable.The loop variable
intent_name
is not used within the loop body. Rename it to indicate it's intentionally unused.- for intent_name, intent in classifier.intents.items(): + for _intent_name, intent in classifier.intents.items():🧰 Tools
🪛 Ruff (0.11.9)
243-243: Loop control variable
intent_name
not used within loop bodyRename unused
intent_name
to_intent_name
(B007)
tests/workflows/swarm/conftest.py (1)
23-30
: Explicitly setreturn_value
onAsyncMock
s to prevent accidental misuseThe fixtures
mock_agent
andmock_swarm_agent
exposecall_tool
,initialize
, andshutdown
asAsyncMock
s without areturn_value
.
If a test forgets toawait
one of these mocks (or inspects the returned value) it will receive anotherAsyncMock
, leading to confusingRuntimeWarning: coroutine was never awaited
messages or false-positive assertions.agent.call_tool = AsyncMock() -# … +# Give the mock a deterministic result that matches the production contract +agent.call_tool.return_value = NoneApply the same pattern to the other async mocks (
initialize
,shutdown
, and the equivalents onmock_swarm_agent
) or, if a specific value is expected in a test, set it there.
This keeps the fixtures predictable and avoids latent test flakiness.tests/workflows/orchestrator/test_orchestrator.py (2)
174-178
: Rename the unused loop variable to underscore to silence Ruff B007
agent
is not referenced inside the loop; Ruff correctly flags this as B007.
Renaming it to_
communicates the intent and keeps the static-analysis pipeline green.- for agent_name, agent in orchestrator.agents.items(): + for agent_name, _ in orchestrator.agents.items():🧰 Tools
🪛 Ruff (0.11.9)
174-174: Loop control variable
agent
not used within loop bodyRename unused
agent
to_agent
(B007)
295-303
: Collapse the nestedwith patch
blocks into a single context managerRuff SIM117 suggests combining adjacent
with
statements.
Doing so reduces indentation and improves readability:-with patch.object(Orchestrator, "_get_full_plan", mock_get_full_plan): - with patch.object(Orchestrator, "_execute_step", mock_execute_step): +with patch.object(Orchestrator, "_get_full_plan", mock_get_full_plan), \ + patch.object(Orchestrator, "_execute_step", mock_execute_step):Apply the same pattern to the similar blocks at 357-365 and 430-437.
🧰 Tools
🪛 Ruff (0.11.9)
295-296: Use a single
with
statement with multiple contexts instead of nestedwith
statements(SIM117)
tests/workflows/router/test_router_llm_openai.py (1)
211-214
: Preferpatch.object
as a context manager when spying on an instance methodDirectly overwriting
router._route_with_llm
works, but usingpatch.object
keeps the original attribute intact after the test and avoids accidental leakage across tests:with patch.object(router, "_route_with_llm", wraps=router._route_with_llm) as spy: result = await router.route("Test request") spy.assert_called_once_with("Test request")This small change promotes isolation and ensures the router instance is restored to its original state.
tests/workflows/router/test_router_embedding_cohere.py (1)
25-27
: Avoid mutating NumPy’s global RNG in tests
np.random.seed(seed)
resets the global RNG state on every call, which can introduce cross-test flakiness or brittle ordering-dependencies (especially when the suite grows or runs in parallel).A safer pattern is to create a local generator per sample so the remainder of the test-suite keeps its own random state untouched.
- seed = sum(ord(c) for c in data[i]) - np.random.seed(seed) - embeddings[i] = np.random.rand(embedding_dim).astype(np.float32) + # Derive a deterministic generator per input so global RNG state is + # left unchanged. + seed = sum(ord(c) for c in data[i]) + rng = np.random.default_rng(seed) + embeddings[i] = rng.random(embedding_dim, dtype=np.float32).astype( + np.float32 + )tests/workflows/parallel/test_fan_in.py (1)
119-125
: Patch stack objects with a single context-managerStatic analysis correctly points out that a nested
with
can be flattened for clarity:- with patch("contextlib.AsyncExitStack") as MockAsyncExitStack: - # Configure the mock stack - mock_stack = AsyncMock() - MockAsyncExitStack.return_value = mock_stack - mock_stack.__aenter__.return_value = mock_stack - mock_stack.enter_async_context.return_value = mock_agent + with patch("contextlib.AsyncExitStack") as MockAsyncExitStack: + mock_stack = AsyncMock() + MockAsyncExitStack.return_value = mock_stack + mock_stack.__aenter__.return_value = mock_stack + mock_stack.enter_async_context.return_value = mock_agentSince only one object is patched, the suggestion is mostly stylistic, but flattening keeps the codebase consistent with the linter rule (SIM117) and reduces indentation noise.
tests/agents/test_agent.py (4)
219-223
: Collapse nestedwith
statements per SIM117You can satisfy Ruff’s SIM117 rule and improve readability by combining these two context-managers:
-with patch("uuid.uuid4", return_value="test-uuid"): - # Mock the method to return directly - with patch.object( - Agent, "request_human_input", AsyncMock(return_value="Test user input") - ): - result = await agent_with_human_input.request_human_input(request) +with patch("uuid.uuid4", return_value="test-uuid"), patch.object( + Agent, "request_human_input", AsyncMock(return_value="Test user input") +): + result = await agent_with_human_input.request_human_input(request)🧰 Tools
🪛 Ruff (0.11.9)
219-223: Use a single
with
statement with multiple contexts instead of nestedwith
statements(SIM117)
277-281
: Duplicate nested-with
pattern — flatten for consistencyThe same SIM117 issue appears here; a one-liner context-manager improves legibility:
-with patch.object( - Agent, "request_human_input", side_effect=Exception("Callback error") -): +with patch.object( + Agent, "request_human_input", side_effect=Exception("Callback error") +):(The outer
with
can stay as-is or be merged depending on preference.)🧰 Tools
🪛 Ruff (0.11.9)
277-281: Use a single
with
statement with multiple contexts instead of nestedwith
statements(SIM117)
263-270
: Dead helper code is never executed
mock_implementation
is defined but immediately superseded by thepatch.object(...)
call that injects a different side-effect. Unless you intend to reuse the richer assertions inside the helper, the function is redundant and can be removed to keep the test focused.- async def mock_implementation(self, req): - # Simulate the error handling logic from the original method - error_message = "Callback error" - self.executor.signal.assert_called_once() - signal_call = self.executor.signal.call_args[1] - assert "payload" in signal_call - assert error_message in signal_call["payload"] - raise Exception(error_message)
456-462
: Small accuracy nit – misleading inline comment
tool_name = test_function.__name__
resolves to the actual function name (function
).
The inline comment# Should be "function" not "test_function"
is correct, but it can be removed (or re-phrased) to avoid confusion for future maintainers — the code already speaks for itself.tests/workflows/router/test_router_llm_anthropic.py (1)
219-225
: Possible type mismatch forconfidence
StructuredResponseCategory.confidence
is often typed asfloat
(e.g. 0.83) in the production code.
Asserting against the string"high"
(lines 222-225) may break if the implementation was changed to numeric.Consider asserting on a float threshold instead:
-assert result[0].confidence == "high" +assert 0.8 <= result[0].confidence <= 1.0tests/workflows/llm/test_augmented_llm_google.py (1)
469-479
: Assert the model actually used in the request payloadYou verify that
select_model
was called, but not that the chosen model is present in the request sent to the executor.
Adding an assertion strengthens the test:sent_model = mock_llm.executor.execute.call_args[0][1].payload["model"] assert sent_model == "gemini-1.5-flash"tests/workflows/router/conftest.py (2)
95-100
: Avoid mutating NumPy’s global RNG in a shared test processCalling
np.random.seed(seed)
inside the loop resets the global generator every iteration and can leak state into unrelated tests executed in the same Python process.Instead, create a local generator:
- seed = sum(ord(c) for c in data[i]) - np.random.seed(seed) - embeddings[i] = np.random.rand(embedding_dim).astype(np.float32) + rng = np.random.default_rng(sum(ord(c) for c in data[i])) + embeddings[i] = rng.random(embedding_dim, dtype=np.float32)This removes global side-effects and still gives you deterministic but distinct vectors.
114-118
: Name shadowing between fixture and inner functionThe fixture is called
test_function
and it returns a nested function with the same identifier.
Although legal, this can be confusing in stack-traces and when debugging.
Consider renaming the inner callable (e.g.echo_fn
) or the fixture (echo_function_fixture
) for clarity.tests/executor/test_workflow_signal.py (4)
197-205
: Raise expectation should referenceasyncio.TimeoutError
for precision
AsyncioSignalHandler.wait_for_signal()
propagatesasyncio.TimeoutError
.
Catching the base‐classTimeoutError
works today (they are the same object), but being explicit protects you if Python ever diverges again.-with pytest.raises(TimeoutError): +with pytest.raises(asyncio.TimeoutError):
285-305
: Assertion onrun_in_executor
arguments is brittle
run_in_executor
is called with positional args(None, input, "Enter value: ")
.
assert "Enter value: " in mock_loop.run_in_executor.call_args[0]
passes only because the substring exists in one of the tuple elements.
If the executor signature changes (e.g. kwargs), this test will give a false negative.A safer check:
executor_args = mock_loop.run_in_executor.call_args.args assert executor_args[1] is input assert executor_args[2] == "Enter value: "This pins the contract clearly.
428-456
: Usecontextlib.suppress
to silence expected timeouts cleanlyInstead of a
try/except …: pass
block you can rely oncontextlib.suppress
for fewer lines and clearer intent.-try: - await store.wait_for("test_signal", timeout_seconds=0.1) -except asyncio.TimeoutError: - pass +from contextlib import suppress +with suppress(asyncio.TimeoutError): + await store.wait_for("test_signal", timeout_seconds=0.1)This also resolves the Ruff
SIM105
hint.🧰 Tools
🪛 Ruff (0.11.9)
452-455: Use
contextlib.suppress(asyncio.TimeoutError)
instead oftry
-except
-pass
Replace with
contextlib.suppress(asyncio.TimeoutError)
(SIM105)
183-191
:asyncio.sleep(0.1)
inflates suite timeThe sleeps are only meant to yield control to the event-loop.
await asyncio.sleep(0)
(orpytest.skip_event_loop()
helper if available) achieves the same without slowing the suite.Refactor all occurrences where the delay is not functionally needed.
Also applies to: 213-221, 545-553
tests/workflows/intent_classifier/test_intent_classifier_llm_openai.py (2)
185-188
: Float equality assertions should usepytest.approx
Direct equality on floating-point numbers can lead to flaky tests across platforms.
Replace with an approximate comparison:-assert results[0].p_score == 0.9 +assert results[0].p_score == pytest.approx(0.9)Apply similarly to the other locations (
0.7
,0.95
,0.85
,0.65
, …).Also applies to: 253-259, 371-379
70-80
: Repeated patch blocks could be collapsed with a fixture or parametrisationThe boilerplate
with patch(... OpenAIAugmentedLLM ...):
clutter is repeated many times.
Creating a fixture:@pytest.fixture(autouse=True) def patched_llm(monkeypatch): monkeypatch.setattr( "mcp_agent.workflows.intent_classifier.intent_classifier_llm_openai.OpenAIAugmentedLLM", MockOpenAIAugmentedLLM, )would make tests shorter and easier to maintain.
Also applies to: 118-130, 270-279
tests/workflows/orchestrator/conftest.py (1)
64-70
: Factory omitscontext
, which many LLM methods expect
factory(agent)
instantiatesMockAugmentedLLM(agent=agent)
but does not forward thecontext
created inmock_context()
.
If any test relies onllm.context
being present (e.g. model-selection logic), aNoneType
access will surface.- def factory(agent): - return MockAugmentedLLM(agent=agent) + def factory(agent, context=None): + # default to the shared mock_context fixture when not provided + return MockAugmentedLLM(agent=agent, context=context)or pass
mock_context
explicitly from the caller.
This keeps the fixture self-contained and prevents hiddenAttributeError
s.tests/workflows/intent_classifier/test_intent_classifier_llm_anthropic.py (1)
319-324
: Patchinginitialize
withAsyncMock
may mask real idempotency logicWrapping the real coroutine with
AsyncMock(wraps=real_initialize)
is handy for call-count checks, but it replaces the bound method on the instance.
Ifinitialize
containsself.initialized
guards, those still run, yet future refactors that accesssuper().initialize
or rely on the original descriptor semantics will be obscured.Consider:
with patch.object(classifier, "initialize", wraps=classifier.initialize) as init_mock: ...so you avoid permanently altering the instance attribute and keep the patch within a controlled context.
src/mcp_agent/workflows/llm/augmented_llm_openai.py (1)
964-972
: Single-element list creation is correct but leaves existing helper unused
You now wrap the part in[mcp_content_to_openai_content_part(...)]
, which is fine. However the old multi-content helper (mcp_content_to_openai_content
) still exists (or did in sibling modules). Consider removing dead code to avoid confusion.tests/workflows/llm/test_augmented_llm_bedrock.py (3)
130-148
: Assertion accesses nested payload incorrectly & assumes request shape
first_call_args = mock_llm.executor.execute.call_args[0][1]
retrieves the second positional argument, assuming the signature isexecute(task, payload)
. Ifexecute
is later called with keyword arguments (e.g.,payload=…
), this indexing will break silently.Prefer:
_, request_obj = mock_llm.executor.execute.call_args.args payload = request_obj.payloador access via
call_kwargs
. This keeps the test stable against minor signature changes.
278-311
:custom_side_effect
does not await tool-calls sequentially – race condition in test
Because all branches ofcustom_side_effect
share mutablecall_count
, rapidawait
/AsyncMock
scheduling could interleave, producing nondeterministic counts in rare CI conditions.Guard with a simple lock or, easier, use
side_effect=[resp1, resp2]
:mock_llm.executor.execute = AsyncMock(side_effect=[ self.create_tool_use_response(...), self.create_text_response(...) ])This is deterministic and easier to read.
501-525
:typed_dict_extras
tests import via bedrock module – fragile coupling
You importtyped_dict_extras
frommcp_agent.workflows.llm.augmented_llm_bedrock
, but the helper actually lives inmcp_agent.utils.common
and is only re-exported indirectly.Importing from its canonical location avoids accidental breakage if the bedrock module stops re-exporting:
from mcp_agent.utils.common import typed_dict_extrassrc/mcp_agent/workflows/llm/augmented_llm_azure.py (1)
377-381
: Duplicate broad exception handling – consolidate to avoid masking errorsTwo separate
except Exception as e:
blocks return very similarToolMessage
error payloads. Consolidating them improves readability and prevents subtle flow-control differences.Consider:
- except json.JSONDecodeError as e: - ... - except Exception as e: - ... + except (json.JSONDecodeError, Exception) as e: + return ToolMessage( + tool_call_id=tool_call_id, + content=f"Error executing tool '{tool_name}': {str(e)}", + )Also applies to: 400-404
tests/workflows/orchestrator/test_orchestrator_integration.py (1)
37-41
: Rename unused loop variable to underscore to satisfy linters
agent
is never referenced inside the loop; Ruff flags this as B007.
Renaming improves clarity and keeps CI green.-for agent_name, agent in orchestrator.agents.items(): +for agent_name, _agent in orchestrator.agents.items():(Apply the same change in the second loop around line 182.)
Also applies to: 182-186
🧰 Tools
🪛 Ruff (0.11.9)
37-37: Loop control variable
agent
not used within loop bodyRename unused
agent
to_agent
(B007)
tests/workflows/llm/test_augmented_llm_openai.py (1)
149-156
: Accessing request payload via positional indices is brittle
first_call_args = mock_llm.executor.execute.call_args_list[0][0]
relies on a fixed positional signature ofexecutor.execute(...)
.
If the executor’s API changes, the test will silently break.
Prefer usingcall_kwargs = mock_llm.executor.execute.call_args.kwargs
(orargs, kwargs = ...
) and assert via named attributes. This keeps the test stable against parameter re-ordering.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lock
is excluded by!**/*.lock
📒 Files selected for processing (68)
pyproject.toml
(2 hunks)src/mcp_agent/executor/workflow_signal.py
(2 hunks)src/mcp_agent/logging/transport.py
(1 hunks)src/mcp_agent/mcp/mcp_aggregator.py
(2 hunks)src/mcp_agent/utils/pydantic_type_serializer.py
(8 hunks)src/mcp_agent/workflows/llm/augmented_llm.py
(1 hunks)src/mcp_agent/workflows/llm/augmented_llm_anthropic.py
(1 hunks)src/mcp_agent/workflows/llm/augmented_llm_azure.py
(6 hunks)src/mcp_agent/workflows/llm/augmented_llm_bedrock.py
(2 hunks)src/mcp_agent/workflows/llm/augmented_llm_google.py
(1 hunks)src/mcp_agent/workflows/llm/augmented_llm_openai.py
(4 hunks)src/mcp_agent/workflows/orchestrator/orchestrator.py
(2 hunks)src/mcp_agent/workflows/router/router_embedding.py
(2 hunks)src/mcp_agent/workflows/swarm/swarm.py
(2 hunks)tests/agents/conftest.py
(1 hunks)tests/agents/test_agent.py
(1 hunks)tests/executor/temporal/test_signal_handler.py
(1 hunks)tests/executor/temporal/test_temporal_executor.py
(1 hunks)tests/executor/temporal/test_workflow_registry.py
(1 hunks)tests/executor/test_workflow.py
(1 hunks)tests/executor/test_workflow_signal.py
(1 hunks)tests/fixture/README.md
(0 hunks)tests/fixture/expected_output.txt
(0 hunks)tests/fixture/mcp_basic_agent_20250131_205604.jsonl
(0 hunks)tests/mcp/test_mcp_aggregator.py
(1 hunks)tests/mcp/test_mcp_connection_manager.py
(1 hunks)tests/test_app.py
(1 hunks)tests/test_event_progress.py
(0 hunks)tests/utils/test_pydantic_type_serializer.py
(6 hunks)tests/workflows/evaluator_optimizer/test_evaluator_optimizer.py
(1 hunks)tests/workflows/intent_classifier/README.md
(1 hunks)tests/workflows/intent_classifier/conftest.py
(1 hunks)tests/workflows/intent_classifier/test_intent_classifier_embedding_cohere.py
(1 hunks)tests/workflows/intent_classifier/test_intent_classifier_embedding_openai.py
(1 hunks)tests/workflows/intent_classifier/test_intent_classifier_llm_anthropic.py
(1 hunks)tests/workflows/intent_classifier/test_intent_classifier_llm_openai.py
(1 hunks)tests/workflows/llm/README.md
(1 hunks)tests/workflows/llm/conftest.py
(1 hunks)tests/workflows/llm/test_augmented_llm_anthropic.py
(8 hunks)tests/workflows/llm/test_augmented_llm_azure.py
(1 hunks)tests/workflows/llm/test_augmented_llm_bedrock.py
(1 hunks)tests/workflows/llm/test_augmented_llm_google.py
(1 hunks)tests/workflows/llm/test_augmented_llm_ollama.py
(1 hunks)tests/workflows/llm/test_augmented_llm_openai.py
(3 hunks)tests/workflows/orchestrator/__init__.py
(1 hunks)tests/workflows/orchestrator/conftest.py
(1 hunks)tests/workflows/orchestrator/test_orchestrator.py
(1 hunks)tests/workflows/orchestrator/test_orchestrator_integration.py
(1 hunks)tests/workflows/orchestrator/test_orchestrator_models.py
(1 hunks)tests/workflows/orchestrator/test_orchestrator_prompts.py
(1 hunks)tests/workflows/parallel/conftest.py
(1 hunks)tests/workflows/parallel/test_fan_in.py
(1 hunks)tests/workflows/parallel/test_fan_out.py
(1 hunks)tests/workflows/parallel/test_parallel_llm.py
(1 hunks)tests/workflows/router/__init__.py
(1 hunks)tests/workflows/router/conftest.py
(1 hunks)tests/workflows/router/test_router_base.py
(1 hunks)tests/workflows/router/test_router_embedding.py
(1 hunks)tests/workflows/router/test_router_embedding_cohere.py
(1 hunks)tests/workflows/router/test_router_embedding_openai.py
(1 hunks)tests/workflows/router/test_router_llm.py
(1 hunks)tests/workflows/router/test_router_llm_anthropic.py
(1 hunks)tests/workflows/router/test_router_llm_openai.py
(1 hunks)tests/workflows/swarm/__init__.py
(1 hunks)tests/workflows/swarm/conftest.py
(1 hunks)tests/workflows/swarm/test_swarm.py
(1 hunks)tests/workflows/swarm/test_swarm_anthropic.py
(1 hunks)tests/workflows/swarm/test_swarm_openai.py
(1 hunks)
💤 Files with no reviewable changes (4)
- tests/fixture/expected_output.txt
- tests/fixture/README.md
- tests/test_event_progress.py
- tests/fixture/mcp_basic_agent_20250131_205604.jsonl
🧰 Additional context used
🧬 Code Graph Analysis (17)
src/mcp_agent/executor/workflow_signal.py (1)
src/mcp_agent/executor/executor.py (3)
signal
(151-176)signal
(382-392)uuid
(215-222)
src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (1)
src/mcp_agent/utils/common.py (1)
to_string
(32-39)
tests/executor/temporal/test_temporal_executor.py (3)
src/mcp_agent/executor/temporal/__init__.py (5)
ensure_client
(255-265)wrap_as_activity
(85-107)_execute_task_as_async
(109-143)start_workflow
(267-350)terminate_workflow
(367-386)src/mcp_agent/app.py (1)
config
(128-129)src/mcp_agent/server/app_server.py (1)
app
(69-71)
tests/workflows/parallel/conftest.py (4)
src/mcp_agent/core/context.py (1)
Context
(50-79)src/mcp_agent/agents/agent.py (1)
Agent
(48-750)src/mcp_agent/workflows/llm/augmented_llm.py (1)
AugmentedLLM
(208-661)tests/workflows/router/conftest.py (3)
mock_context
(17-54)mock_agent
(58-69)mock_llm
(73-81)
tests/executor/test_workflow_signal.py (1)
src/mcp_agent/executor/workflow_signal.py (8)
Signal
(13-48)SignalRegistration
(51-59)PendingSignal
(88-95)BaseSignalHandler
(98-157)AsyncioSignalHandler
(219-297)ConsoleSignalHandler
(160-216)LocalSignalStore
(301-336)wait_for
(319-336)
tests/workflows/router/test_router_embedding.py (4)
src/mcp_agent/agents/agent.py (1)
Agent
(48-750)src/mcp_agent/workflows/router/router_embedding.py (4)
EmbeddingRouter
(29-240)EmbeddingRouterCategory
(22-26)_compute_embedding
(233-240)_route_with_embedding
(189-231)tests/workflows/router/conftest.py (5)
mock_context
(17-54)mock_embedding_model
(85-105)mock_agent
(58-69)test_function
(109-118)test_function
(114-116)src/mcp_agent/workflows/router/router_base.py (1)
RouterResult
(19-31)
tests/executor/test_workflow.py (1)
src/mcp_agent/executor/workflow.py (8)
WorkflowState
(33-52)WorkflowResult
(55-59)Workflow
(62-505)record_error
(47-52)id
(111-115)run_id
(118-123)update_state
(447-454)cancel
(331-352)
tests/workflows/router/conftest.py (10)
src/mcp_agent/core/context.py (1)
Context
(50-79)src/mcp_agent/workflows/embedding/embedding_base.py (1)
EmbeddingModel
(14-32)src/mcp_agent/workflows/router/router_base.py (3)
RouterCategory
(34-49)ServerRouterCategory
(52-55)AgentRouterCategory
(58-61)tests/test_app.py (1)
mock_context
(16-29)src/mcp_agent/mcp/mcp_server_registry.py (1)
get_server_config
(362-378)src/mcp_agent/agents/agent.py (1)
Agent
(48-750)tests/workflows/router/test_router_llm_openai.py (3)
generate
(29-31)generate_str
(33-35)generate_structured
(37-39)tests/workflows/router/test_router_llm_anthropic.py (3)
generate
(29-31)generate_str
(33-35)generate_structured
(37-39)tests/workflows/router/test_router_embedding_openai.py (1)
embed
(19-28)tests/workflows/router/test_router_embedding_cohere.py (1)
embed
(19-28)
tests/workflows/llm/test_augmented_llm_ollama.py (3)
src/mcp_agent/config.py (1)
OpenAISettings
(145-160)src/mcp_agent/workflows/llm/augmented_llm_ollama.py (1)
OllamaAugmentedLLM
(17-79)tests/workflows/llm/conftest.py (1)
mock_aggregator
(17-29)
tests/workflows/intent_classifier/test_intent_classifier_embedding_cohere.py (7)
src/mcp_agent/core/context.py (1)
Context
(50-79)src/mcp_agent/workflows/intent_classifier/intent_classifier_base.py (1)
IntentClassificationResult
(27-40)src/mcp_agent/workflows/intent_classifier/intent_classifier_embedding.py (1)
EmbeddingIntent
(24-30)src/mcp_agent/workflows/intent_classifier/intent_classifier_embedding_cohere.py (1)
CohereEmbeddingIntentClassifier
(13-45)tests/workflows/intent_classifier/conftest.py (4)
embed
(55-65)embedding_dim
(68-69)test_intents
(28-46)mock_context
(11-24)tests/workflows/intent_classifier/test_intent_classifier_embedding_openai.py (2)
embed
(31-39)embedding_dim
(42-43)tests/workflows/parallel/conftest.py (1)
mock_context
(10-16)
tests/workflows/evaluator_optimizer/test_evaluator_optimizer.py (2)
src/mcp_agent/workflows/evaluator_optimizer/evaluator_optimizer.py (5)
EvaluatorOptimizerLLM
(48-469)EvaluationResult
(33-45)QualityRating
(24-30)_build_eval_prompt
(425-446)_build_refinement_prompt
(448-469)src/mcp_agent/workflows/llm/augmented_llm.py (1)
AugmentedLLM
(208-661)
tests/workflows/swarm/test_swarm.py (4)
src/mcp_agent/agents/agent.py (1)
CallToolRequest
(789-798)src/mcp_agent/workflows/swarm/swarm.py (8)
AgentFunctionResult
(142-156)SwarmAgent
(74-139)DoneAgent
(309-322)create_agent_resource
(50-58)create_agent_function_result_resource
(61-71)get_tool
(213-220)set_agent
(274-297)should_continue
(299-306)src/mcp_agent/workflows/swarm/swarm_openai.py (1)
OpenAISwarm
(9-41)tests/workflows/swarm/conftest.py (8)
mock_context
(11-17)mock_swarm_agent
(34-46)test_function_result
(56-58)test_function_agent_result
(62-64)test_function_agent_function_result
(68-70)mock_tool_response
(80-82)mock_agent
(21-30)done_agent
(50-52)
tests/workflows/llm/test_augmented_llm_bedrock.py (5)
src/mcp_agent/config.py (1)
BedrockSettings
(121-132)src/mcp_agent/workflows/llm/augmented_llm_bedrock.py (4)
BedrockAugmentedLLM
(48-346)BedrockMCPTypeConverter
(434-496)mcp_content_to_bedrock_content
(499-531)bedrock_content_to_mcp_content
(534-569)src/mcp_agent/workflows/llm/augmented_llm.py (5)
RequestParams
(114-153)select_model
(302-343)_log_chat_progress
(511-521)_log_chat_finished
(523-526)get_request_params
(345-366)src/mcp_agent/utils/common.py (1)
typed_dict_extras
(27-29)tests/workflows/llm/conftest.py (1)
mock_aggregator
(17-29)
tests/mcp/test_mcp_aggregator.py (1)
src/mcp_agent/mcp/mcp_aggregator.py (12)
MCPAggregator
(65-1102)list_servers
(536-548)_parse_capability_name
(946-991)NamespacedTool
(45-52)NamespacedPrompt
(55-62)load_server
(294-367)load_servers
(369-424)MCPCompoundServer
(1105-1171)_list_tools
(1121-1124)_list_prompts
(1141-1144)_call_tool
(1126-1139)_get_prompt
(1146-1162)
tests/workflows/orchestrator/test_orchestrator_integration.py (4)
src/mcp_agent/workflows/llm/augmented_llm.py (1)
RequestParams
(114-153)src/mcp_agent/workflows/orchestrator/orchestrator.py (1)
Orchestrator
(46-586)src/mcp_agent/workflows/orchestrator/orchestrator_models.py (5)
Plan
(46-55)Step
(35-43)NextStep
(111-116)PlanResult
(86-108)AgentTask
(27-32)tests/workflows/orchestrator/conftest.py (5)
mock_llm_factory
(64-70)mock_agents
(74-87)mock_context
(42-60)generate_str
(34-35)generate
(31-32)
tests/workflows/llm/test_augmented_llm_openai.py (4)
src/mcp_agent/config.py (1)
OpenAISettings
(145-160)src/mcp_agent/workflows/llm/augmented_llm_openai.py (4)
MCPOpenAITypeConverter
(920-1026)generate
(146-359)to_mcp_message_result
(944-953)from_mcp_message_param
(956-977)src/mcp_agent/workflows/llm/augmented_llm.py (3)
RequestParams
(114-153)select_model
(302-343)get_request_params
(345-366)tests/workflows/llm/conftest.py (2)
mock_context
(8-13)mock_aggregator
(17-29)
tests/workflows/llm/test_augmented_llm_anthropic.py (4)
src/mcp_agent/config.py (1)
AnthropicSettings
(111-118)src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (8)
AnthropicMCPTypeConverter
(754-863)mcp_content_to_anthropic_content
(866-936)anthropic_content_to_mcp_content
(939-982)mcp_stop_reason_to_anthropic_stop_reason
(985-997)anthropic_stop_reason_to_mcp_stop_reason
(1000-1012)generate
(131-332)to_mcp_message_result
(779-796)from_mcp_message_param
(799-808)src/mcp_agent/workflows/llm/augmented_llm.py (2)
RequestParams
(114-153)get_request_params
(345-366)src/mcp_agent/utils/common.py (1)
typed_dict_extras
(27-29)
🪛 Ruff (0.11.9)
src/mcp_agent/workflows/swarm/swarm.py
87-87: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
88-88: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
src/mcp_agent/workflows/llm/augmented_llm_google.py
442-445: Use ternary operator role = "assistant" if result.role == "model" else result.role
instead of if
-else
-block
Replace if
-else
-block with role = "assistant" if result.role == "model" else result.role
(SIM108)
tests/executor/test_workflow_signal.py
452-455: Use contextlib.suppress(asyncio.TimeoutError)
instead of try
-except
-pass
Replace with contextlib.suppress(asyncio.TimeoutError)
(SIM105)
tests/test_app.py
224-227: Use a single with
statement with multiple contexts instead of nested with
statements
Combine with
statements
(SIM117)
250-251: Use a single with
statement with multiple contexts instead of nested with
statements
(SIM117)
265-266: Use a single with
statement with multiple contexts instead of nested with
statements
(SIM117)
tests/workflows/intent_classifier/test_intent_classifier_embedding_openai.py
243-243: Loop control variable intent_name
not used within loop body
Rename unused intent_name
to _intent_name
(B007)
tests/workflows/intent_classifier/test_intent_classifier_embedding_cohere.py
243-243: Loop control variable intent_name
not used within loop body
Rename unused intent_name
to _intent_name
(B007)
tests/agents/test_agent.py
219-223: Use a single with
statement with multiple contexts instead of nested with
statements
(SIM117)
277-281: Use a single with
statement with multiple contexts instead of nested with
statements
(SIM117)
tests/workflows/orchestrator/test_orchestrator.py
174-174: Loop control variable agent
not used within loop body
Rename unused agent
to _agent
(B007)
295-296: Use a single with
statement with multiple contexts instead of nested with
statements
(SIM117)
357-358: Use a single with
statement with multiple contexts instead of nested with
statements
(SIM117)
430-431: Use a single with
statement with multiple contexts instead of nested with
statements
(SIM117)
tests/workflows/orchestrator/test_orchestrator_integration.py
37-37: Loop control variable agent
not used within loop body
Rename unused agent
to _agent
(B007)
182-182: Loop control variable agent
not used within loop body
Rename unused agent
to _agent
(B007)
🔇 Additional comments (62)
tests/workflows/router/__init__.py (1)
1-2
: LGTM! Standard package marker file.This is a proper addition of an
__init__.py
file to mark the directory as a Python package, enabling test discovery. The comments clearly explain the purpose.pyproject.toml (1)
75-75
: LGTM! Appropriate addition of trio for async testing.Adding
trio>=0.30.0
to dev dependencies is appropriate for supporting asynchronous concurrency testing in the new test suites.src/mcp_agent/workflows/llm/augmented_llm.py (1)
256-261
: LGTM! Clean conditional parameter passing.Good improvement using dictionary unpacking to conditionally pass the
instruction
parameter only when it's notNone
. This prevents passing explicitNone
values and allows theAgent
constructor to handle its own default value logic.tests/workflows/parallel/conftest.py (4)
9-16
: LGTM! Clean and simple mock context fixture.The
mock_context
fixture provides a minimal mockContext
instance with the essentialexecutor
attribute. This is appropriately simplified compared to the router conftest which includes provider configurations not needed for parallel workflow tests.
19-28
: LGTM! Proper async context manager mocking.The
mock_agent
fixture correctly mocks theAgent
class with async context manager support (__aenter__
and__aexit__
). This is essential sinceAgent
implements these methods for proper lifecycle management.
31-40
: LGTM! Comprehensive LLM method mocking.The
mock_llm
fixture properly mocks all the key async generation methods (generate
,generate_str
,generate_structured
) of theAugmentedLLM
class, providing good test isolation.
43-48
: LGTM! Useful factory pattern for async testing.The
mock_llm_factory
fixture provides an async factory function that returns the mocked LLM instance. This pattern is useful for testing components that need to create LLM instances dynamically.tests/workflows/intent_classifier/conftest.py (2)
10-24
: Well-structured mock context fixture.The mock context fixture properly isolates API configurations for testing and follows pytest best practices.
27-46
: Good test intent examples for comprehensive testing.The test intents provide diverse examples covering greeting, farewell, and help scenarios, which should enable thorough testing of intent classification logic.
tests/workflows/orchestrator/__init__.py (1)
1-1
: Proper package initialization for test discovery.The docstring appropriately marks this directory as a test package for the orchestrator workflow module.
tests/workflows/swarm/__init__.py (1)
1-1
: Appropriate test package marker.The comment clearly identifies this as the test package for swarm workflow components.
src/mcp_agent/logging/transport.py (1)
276-276
:❓ Verification inconclusive
Review the event loop handling in init_queue() before auto-initialization.
While automatically calling
init_queue()
in the constructor improves usability, it makes the problematic event loop handling ininit_queue()
more likely to cause issues.The
init_queue()
method (lines 278-288) contains this concerning pattern:try: self._loop = asyncio.get_running_loop() except RuntimeError: self._loop = asyncio.new_event_loop() asyncio.set_event_loop(self._loop) # This sets the loop globally!Setting a new event loop globally in a constructor can interfere with existing event loops in the application. Consider one of these safer approaches:
- Lazy initialization: Only call
init_queue()
when actually needed (e.g., instart()
method)- Remove global loop setting: Store the loop reference without setting it globally
- Constructor parameter: Allow passing the event loop as a constructor parameter
🏁 Script executed:
#!/bin/bash # Search for other usages of AsyncEventBus constructor to assess impact rg -A 5 -B 5 "AsyncEventBus\(" --type pyLength of output: 42
Verify global event loop handling in init_queue()
No other references to
AsyncEventBus(
were found in the codebase—please confirm whether this constructor is used elsewhere (e.g., plugins, downstream services) to avoid unintended side effects.In
src/mcp_agent/logging/transport.py
(lines 276–288),init_queue()
does:try: self._loop = asyncio.get_running_loop() except RuntimeError: self._loop = asyncio.new_event_loop() asyncio.set_event_loop(self._loop) # resets the global loop!Setting a new event loop globally in a constructor can silently break existing loops in your application. Consider one of these safer approaches:
- Lazy initialization: defer
init_queue()
until you actually need to start processing (e.g., in astart()
orrun()
method)- Remove global loop setting: keep a private loop reference without calling
asyncio.set_event_loop()
- Constructor parameter: accept an existing loop via
__init__(..., loop: asyncio.AbstractEventLoop=None)
Please review all usages of
AsyncEventBus
in your projects and assess the impact of altering the global loop.tests/agents/conftest.py (2)
7-17
: Well-structured mock context fixture.The fixture properly mocks the asynchronous methods and provides a realistic context structure for testing agent functionality.
20-27
: Good test tool fixture design.The mock tool provides a realistic schema structure that's suitable for testing tool-related functionality.
tests/workflows/llm/conftest.py (2)
7-13
: Clean and focused mock context fixture.The fixture provides essential mocked components (executor and model_selector) needed for LLM workflow testing.
16-29
: Comprehensive mock aggregator fixture.The fixture properly mocks both synchronous (tools list) and asynchronous (list_tools, call_tool) functionality needed for testing tool-enabled LLM workflows.
tests/workflows/llm/README.md (1)
1-95
: Excellent test documentation.The README provides comprehensive guidance for running, understanding, and extending the LLM provider tests. The structure is clear and includes practical examples for different scenarios.
src/mcp_agent/workflows/orchestrator/orchestrator.py (3)
102-107
: LGTM - Typo fix and instruction improvement!The correction from
sythesizer
tosynthesizer
fixes a typo, and the simplified instruction is more concise and clear.
109-113
: Good addition of input validation!Adding validation for the
plan_type
parameter is excellent defensive programming. This ensures only valid values are accepted and provides clear error messages.
358-358
: LGTM - Consistent typo fix!Good catch fixing the variable name from
sythesizer
tosynthesizer
to maintain consistency with the constructor fix.src/mcp_agent/workflows/router/router_embedding.py (3)
101-101
: Critical fix - Properly await async method!Good catch adding
await
beforeself._compute_embedding()
. The_compute_embedding
method is async and must be awaited.
103-103
: Correct data extraction using model_dump()!Using
model_dump()
before unpacking ensures proper data extraction from the Pydantic model into the dictionary format expected byEmbeddingRouterCategory
.
209-209
: Critical fix - Properly await async method!Good catch adding
await
beforeself._compute_embedding()
. This was another missing await for the async embedding computation method.tests/workflows/orchestrator/test_orchestrator_prompts.py (1)
1-131
: Excellent comprehensive test suite for prompt templates!This test file provides thorough coverage for all orchestrator prompt templates with several strengths:
- Complete coverage: Tests all major prompt templates used in the orchestrator workflow
- Clear structure: Each test follows a consistent pattern of format → assert
- Content validation: Verifies both placeholder substitution and expected content presence
- Consistency checks: The
test_templates_consistency
andtest_template_order
methods ensure uniform formatting and logical organization- Clear naming: Test method names clearly indicate what's being tested
The test structure is well-organized and provides good confidence in template correctness.
tests/workflows/intent_classifier/README.md (1)
1-70
: Excellent test documentation providing comprehensive guidance!This README effectively documents the intent classifier testing strategy with several strong points:
- Clear purpose: Explains what's being tested and why it matters
- Mock strategy: Well-explained rationale for using mocks (speed, determinism, no external dependencies)
- Practical guidance: Specific commands for running tests at different granularities
- Development support: Clear instructions for adding new tests with naming conventions
- Test coverage guidelines: Comprehensive list of key test cases to ensure thorough coverage
The documentation will help developers understand the testing approach and maintain consistent test quality across the intent classifier modules.
src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (1)
956-976
: LGTM! Improved type handling for content conversion.The conditional logic to handle both Pydantic model instances and plain dictionaries makes the function more robust and defensive. The implementation correctly extracts
type
andtext
attributes based on the block's type, which aligns with the broader pattern of content conversion improvements across LLM provider modules.tests/workflows/orchestrator/test_orchestrator_models.py (1)
1-266
: Excellent comprehensive test coverage for orchestrator models.This test module provides thorough coverage of the orchestrator data models including:
- Object creation and attribute validation for all model types
- Relationship handling between steps, tasks, and results
- Collection operations (add_task_result, add_step_result)
- Formatting functions with proper status indicators
- Edge cases like completion state handling
The test structure is clean with descriptive names and clear assertions. This significantly improves the code coverage as mentioned in the PR objectives.
src/mcp_agent/executor/workflow_signal.py (1)
228-228
: LGTM! Improved naming consistency and handler tracking.The renaming from
unique_name
tounique_signal_name
enhances code clarity. The modification to store handlers as[unique_signal_name, wrapped]
tuples instead of just the wrapped function enables better handler tracking and management, which supports more robust signal handling capabilities.These changes maintain backward compatibility while improving internal structure and readability.
Also applies to: 232-232, 260-260, 267-267, 275-277
src/mcp_agent/workflows/llm/augmented_llm_bedrock.py (2)
473-473
: Critical fix: Wrap single content in list for conversion function.This fix ensures
mcp_content_to_bedrock_content
receives a list as expected, preventing potential type errors during content conversion.
541-541
: Critical bug fixes: Correct variable references in content loop.These fixes change incorrect references from
content
(the entire list) toblock
(the current iteration item), ensuring proper access to individual content block fields liketext
,image
,toolUse
, anddocument
. This prevents erroneous data extraction and ensures correct content conversion.Also applies to: 546-547, 555-555, 563-564
src/mcp_agent/mcp/mcp_aggregator.py (2)
190-190
: Good improvement for lifecycle state consistency.Explicitly setting
initialized = False
when skipping closure ensures consistent state tracking regardless of connection persistence mode. This prevents potential issues where the aggregator might appear initialized when it shouldn't be.
983-985
: Excellent change for deterministic behavior.Iterating over servers in the configured
server_names
order instead of arbitrary dictionary iteration makes capability name parsing predictable and consistent. This ensures that when multiple servers provide the same capability name, the resolution order follows the intended server priority.tests/workflows/swarm/test_swarm_openai.py (1)
1-192
: Excellent comprehensive test suite for OpenAISwarm.This test file provides thorough coverage of the OpenAISwarm functionality:
- Proper async testing: Uses
pytest.mark.asyncio
consistently- Good fixture design: Well-structured fixtures for mock objects
- Comprehensive scenarios: Tests initialization, default/custom parameters, multiple iterations, early termination, and DoneAgent behavior
- Effective mocking: Proper use of
AsyncMock
andMagicMock
to control test behavior- Validation coverage: Verifies both method calls and expected parameters/results
The tests effectively validate iteration control logic and agent state management, which are critical for swarm workflow reliability.
tests/executor/temporal/test_temporal_executor.py (1)
1-105
: Well-structured test suite for TemporalExecutor.This test file provides solid coverage of the TemporalExecutor functionality:
- Good fixture design: Clear separation of concerns with
mock_client
,mock_context
, andexecutor
fixtures- Core functionality coverage: Tests client connection, activity wrapping, and task execution paths
- Execution context handling: Tests both inside and outside workflow contexts with appropriate patching
- Workflow management: Covers workflow starting and termination with proper mock verification
- Async testing patterns: Consistent use of
pytest.mark.asyncio
and proper async/await handlingThe tests effectively validate the Temporal integration patterns and ensure the executor behaves correctly under different execution contexts.
src/mcp_agent/workflows/swarm/swarm.py (1)
255-265
: Good alignment with resource handling changes.The update to check for
AgentFunctionResultResource
type and access nested attributes throughcontent.result.*
properly aligns with the resource wrapper pattern. This ensures consistent handling of agent function results and context variable updates.tests/workflows/router/test_router_llm.py (1)
1-503
: Excellent comprehensive test coverage!This test suite provides thorough coverage of the LLMRouter functionality including:
- Initialization scenarios (basic and factory methods)
- Routing with different parameter combinations
- Edge cases (empty responses, invalid categories)
- All routing method variants (route, route_to_server, route_to_agent, route_to_function)
- Context generation with different inclusion flags
The test structure is clean, well-organized, and follows good async testing patterns with proper mocking.
tests/executor/temporal/test_workflow_registry.py (1)
1-10
: LGTM! Good import structure and fixture setup.The imports are appropriate for async testing with proper mocking utilities. The mock executor fixture correctly sets up the necessary async mock structure.
tests/workflows/router/test_router_base.py (3)
14-49
: Excellent concrete implementation for testing abstract base class.The TestRouter class provides a clean, minimal implementation of the abstract Router methods that's perfect for testing the base class functionality without complex dependencies.
220-231
: Good mock setup for tools formatting test.The test correctly sets up mock tools with string attributes. This approach properly tests the formatting logic without relying on complex tool objects.
135-155
:✅ Verification successful
Verify the test logic for multiple initialize calls.
The test mocks
get_server_category
after initialization to verify it's not called again. However, this approach might not accurately test the actual implementation behavior.
🏁 Script executed:
#!/bin/bash # Description: Check if Router.initialize method has proper initialization guards # Search for the initialize method implementation ast-grep --pattern $'async def initialize(self): $$$ if self.initialized: $$$ $$$'Length of output: 9047
🏁 Script executed:
#!/bin/bash # Find TestRouter definition in tests rg --with-filename --line-number "class TestRouter" -n testsLength of output: 202
🏁 Script executed:
#!/bin/bash # Display TestRouter implementation sed -n '1,200p' tests/workflows/router/test_router_base.pyLength of output: 7848
Multiple initialize calls guard is correctly tested
The
initialize
method inrouter_base.py
has an early‐return whenself.initialized
isTrue
, so replacingget_server_category
with a mock after the first call and asserting itscall_count
remains zero on the second invocation accurately verifies that no re‐initialization occurs. No changes required.tests/executor/test_workflow.py (3)
14-23
: Excellent error recording test with proper exception handling.The test correctly verifies that error information is captured with proper type, message, and timestamp fields. This ensures the error recording functionality works as expected.
35-37
: Simple but effective MockWorkflow implementation.The MockWorkflow provides a clean, minimal implementation of the abstract run method that's perfect for testing the base Workflow functionality.
89-94
: Good test for executor property validation.The test correctly verifies that accessing the executor property raises a ValueError when the executor is None. This ensures proper error handling for missing dependencies.
tests/utils/test_pydantic_type_serializer.py (2)
64-65
: Correct update to Pydantic v2 field validator syntax.The field validators are now properly declared as class methods with the
@classmethod
decorator, which is required in Pydantic v2.Also applies to: 91-92
179-181
: Good update to schema comparison method.The change from
schema()
tomodel_json_schema()
is correct for Pydantic v2 compatibility.tests/workflows/parallel/test_fan_out.py (1)
214-238
:generate
/generate_str
mocks may need to beAsyncMock
s.The LLM/Agent stubs here are patched with:
mock_llm_with_name.generate.return_value = expected_result mock_llm_with_name.generate_str.return_value = expected_resultThese attributes will only behave as awaitable coroutines if
mock_llm_with_name.generate
was already anAsyncMock
in the upstream fixture. If the underlying fixture supplies a plainMagicMock
,await llm.generate(...)
inside the realFanOut
will raiseTypeError: object MagicMock can't be used in 'await' expression
.Consider enforcing the async nature explicitly for safety:
mock_llm_with_name.generate = AsyncMock(return_value=expected_result) mock_llm_with_name.generate_str = AsyncMock(return_value=expected_result)(or guarantee the upstream fixture uses
AsyncMock
).tests/test_app.py (1)
400-417
: Potential logger double-patch confusion.The test first patches
get_logger
, initializes the context (which internally may callget_logger
), then resetsbasic_app._logger
and patchesget_logger
again. If the internal initialization logic caches the original logger elsewhere (e.g., in theContext
), subsequent calls might unexpectedly use the first mock.Consider making the intention explicit by asserting no hidden second reference exists:
assert basic_app._context.logger is None # or whatever the contract isor reuse the same patched
get_logger
for both phases.tests/workflows/router/test_router_embedding.py (1)
1-498
: Excellent comprehensive test coverage for EmbeddingRouter!This test file demonstrates excellent testing practices with thorough coverage of the
EmbeddingRouter
class functionality:
- Complete lifecycle testing: Covers initialization, factory methods, embedding computation, and all routing variants
- Proper async testing: Correctly uses
pytest.mark.asyncio
and proper mocking for async methods- Edge case coverage: Tests empty categories, missing embeddings, and similarity scoring
- Good isolation: Uses mocks effectively to avoid external dependencies
- Clear documentation: Each test has descriptive docstrings explaining its purpose
The test structure follows best practices with logical grouping and comprehensive assertions that verify both functionality and ordering of results.
tests/workflows/intent_classifier/test_intent_classifier_embedding_cohere.py (1)
1-285
: Well-structured test suite with comprehensive coverage.This test file provides excellent coverage for the
CohereEmbeddingIntentClassifier
with:
- Mock implementation: Clean
MockCohereEmbeddingModel
that provides deterministic embeddings for testing- Complete functionality testing: Covers initialization, factory methods, classification, and edge cases
- Proper async patterns: Correctly structured async tests with appropriate mocking
- Error handling: Tests empty intents scenario with proper exception handling
The test structure is consistent with other classifier tests in the codebase, promoting maintainability.
🧰 Tools
🪛 Ruff (0.11.9)
243-243: Loop control variable
intent_name
not used within loop bodyRename unused
intent_name
to_intent_name
(B007)
tests/workflows/intent_classifier/test_intent_classifier_embedding_openai.py (1)
1-285
: Excellent test coverage for OpenAI embedding-based intent classification.This test file provides comprehensive testing for the
OpenAIEmbeddingIntentClassifier
with:
- Consistent structure: Follows the same testing patterns as other classifier tests, promoting codebase consistency
- Mock implementation:
MockOpenAIEmbeddingModel
provides deterministic embeddings suitable for testing- Full feature coverage: Tests initialization, factory methods, classification functionality, and error scenarios
- Proper async handling: Well-structured async tests with appropriate mocking to avoid external API calls
The parallel structure with the Cohere tests makes the test suite easy to understand and maintain.
🧰 Tools
🪛 Ruff (0.11.9)
243-243: Loop control variable
intent_name
not used within loop bodyRename unused
intent_name
to_intent_name
(B007)
tests/executor/temporal/test_signal_handler.py (1)
1-99
: Comprehensive signal handling test coverage.This test file provides excellent coverage for the Temporal signal handling functionality:
- Clean fixture design: Well-organized pytest fixtures that promote test reusability
- Complete scenario coverage: Tests both in-workflow and out-of-workflow signal handling paths
- Proper mocking: Appropriately mocks Temporal workflow internals without testing external dependencies
- Edge case handling: Includes tests for nonexistent signals and idempotent operations
- Async testing: Correctly structured async tests for temporal operations
The tests effectively verify the signal mailbox functionality, workflow attachment behavior, and both internal and external signaling mechanisms.
tests/workflows/router/test_router_llm_anthropic.py (1)
155-176
: Ensure test does not indirectly depend on the realanthropic
packageThe test patches
AnthropicAugmentedLLM
, but a hard import ofmcp_agent.workflows.router.router_llm_anthropic
(line 8) may still execute a top-levelimport anthropic
inside that module.
Ifanthropic
is not listed in the project’s test dependencies, CI will raiseModuleNotFoundError
.Options
- Declare
anthropic
(or a lightweight stub such asanthropic==0.0.0
) in the test extras /pyproject.toml
.- Lazily import the router inside the patched context, e.g.:
with patch("mcp_agent.workflows.router.router_llm_anthropic.AnthropicAugmentedLLM", MockAnthropicAugmentedLLM): from mcp_agent.workflows.router.router_llm_anthropic import AnthropicLLMRouter
- Or monkey-patch
sys.modules["anthropic"] = ModuleType("anthropic")
before the first import.Pick whichever suits the repo’s dependency policy.
tests/workflows/intent_classifier/test_intent_classifier_llm_openai.py (1)
448-470
:initialize
spy counts two calls even when the second is a no-opWrapping the real
initialize
method withAsyncMock(wraps=real_initialize)
increments the call-count before the method’s internal early-exit hits, so the assertion verifies the spy, not the optimisation.If you want to assert the internal short-circuit, introspect a flag instead:
await classifier.initialize() await classifier.initialize() assert classifier._already_initialized # or similar internal flagOtherwise this test may pass even if the second call performs heavy work again.
tests/workflows/llm/test_augmented_llm_azure.py (1)
293-296
: Assertion validates response content inside the request payload – very likely wrong
req.payload["messages"]
represents the prompt sent to Azure, yet the test asserts it already contains the assistant’s reply:assert req.payload["messages"][1].content == "Response without history"At request-time the assistant message does not exist; only the user (and possibly system/history) messages should be present.
The check should compare against the user query or simply verify that only one user message is present whenuse_history=False
.src/mcp_agent/workflows/llm/augmented_llm_openai.py (1)
944-950
:model=""
may introduce sentinel-value ambiguity
to_mcp_message_result
now hard-codesmodel=""
instead ofNone
. Down-stream code that previously relied onNone
to signal “model unknown/unset” will now receive an empty string, which is truthy in many contexts (e.g.,if result.model:
).Unless you have audited all call-sites, prefer keeping
None
or make the change explicit via typed docs & unit tests.src/mcp_agent/workflows/llm/augmented_llm_azure.py (1)
225-227
: Token-usage accessor may not match SDK object – please double-check.
response.usage
is typically an object exposing attributes such asprompt_tokens
/completion_tokens
(at least in the GA Azure SDK). Switching toresponse.usage["prompt_tokens"]
and["completion_tokens"]
will raiseTypeError: 'Usage' object is not subscriptable
if the SDK version still uses attributes.If you intentionally upgraded to a newer SDK that returns a mapping-style
usage
, please ignore; otherwise revert to attribute access or add a defensive fallback:-total_input_tokens += response.usage["prompt_tokens"] -total_output_tokens += response.usage["completion_tokens"] +total_input_tokens += getattr(response.usage, "prompt_tokens", response.usage.get("prompt_tokens")) +total_output_tokens += getattr(response.usage, "completion_tokens", response.usage.get("completion_tokens"))tests/workflows/evaluator_optimizer/test_evaluator_optimizer.py (1)
120-122
: Enum rendering mismatch – expected numeric vs. nameThe test asserts the prompt contains the literal string
"Quality Rating: 1"
.
_build_refinement_prompt
interpolatesfeedback.rating
, whose string
representation may be"FAIR"
or"QualityRating.FAIR"
depending on the enum
definition, not"1"
. The test could become flaky if the enum__str__
implementation changes.Please verify the intended format and adjust either the production code or the
assertion:- assert "Quality Rating: 1" in prompt + assert "Quality Rating: FAIR" in prompt # or the agreed representationtests/workflows/llm/test_augmented_llm_anthropic.py (1)
318-334
:create_tool_use_side_effect
inspectskwargs["messages"]
but executor passes messages positionallyInside
side_effect
, the checkmessages = kwargs.get("messages", [])will always return
[]
because in production the request object is passed as a positional arg (e.g.,await executor.execute(self, request)
), not as a keyword. Consequently the “final iteration” condition may never trigger, making the test logic depend only oncall_count
.Consider extracting the
request
object fromargs[1]
, as done elsewhere in the suite, e.g.:request_obj = args[1] messages = request_obj.payload["messages"]src/mcp_agent/utils/pydantic_type_serializer.py (4)
303-314
: Good migration to Pydantic v2 validator APIThe switch to
validator.info.fields
is correct and the cast tostr
keeps JSON payloads clean.
No further action required.
420-425
: 👍 Prefertype_
for private attrsUsing
type_
where available brings v2 parity while still falling back
toAny
. Looks good.
445-455
: Efficient handling ofConfigDict
Treating
ConfigDict
as a mapping and early-returning prevents an
unnecessary reflection pass—nice improvement.
538-541
: Literal deserialization looks correctUsing
Literal.__getitem__
with the raw tuple reconstructs the type
accurately. Good job mirroring the v2typing
internals.
# Setup mock executor to raise an exception | ||
mock_llm.executor.execute = AsyncMock(return_value=Exception("API Error")) | ||
|
||
# Call LLM | ||
responses = await mock_llm.generate("Test query with API error") | ||
|
||
# Assertions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same “exception should be raised” issue as in the OpenAI test
The mock returns an Exception
object instead of raising it, so the code under test is never put in the error path.
-# Setup mock executor to raise an exception
-mock_llm.executor.execute = AsyncMock(return_value=Exception("API Error"))
+# Setup mock executor to raise an exception
+mock_llm.executor.execute = AsyncMock(side_effect=Exception("API Error"))
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Setup mock executor to raise an exception | |
mock_llm.executor.execute = AsyncMock(return_value=Exception("API Error")) | |
# Call LLM | |
responses = await mock_llm.generate("Test query with API error") | |
# Assertions | |
# Setup mock executor to raise an exception | |
mock_llm.executor.execute = AsyncMock(side_effect=Exception("API Error")) | |
# Call LLM | |
responses = await mock_llm.generate("Test query with API error") | |
# Assertions |
🤖 Prompt for AI Agents
In tests/workflows/llm/test_augmented_llm_anthropic.py around lines 403 to 409,
the mock executor is set to return an Exception object instead of raising it, so
the error path is not triggered. Change the AsyncMock setup to raise the
Exception by using side_effect=Exception("API Error") instead of return_value,
ensuring the code under test properly encounters the raised exception.
# Patch __init__ to ensure private attributes are initialized on instance | ||
private_attrs = getattr(reconstructed_model, "__private_attributes__", {}) | ||
if private_attrs: | ||
orig_init = reconstructed_model.__init__ | ||
|
||
def _init_with_private_attrs(self, *args, **kwargs): | ||
orig_init(self, *args, **kwargs) | ||
for attr_name, private_attr in private_attrs.items(): | ||
# Only set if not already set | ||
if not hasattr(self, attr_name): | ||
default = private_attr.default | ||
# If default is ... (Ellipsis), treat as None | ||
if default is ...: | ||
default = None | ||
setattr(self, attr_name, default) | ||
|
||
reconstructed_model.__init__ = _init_with_private_attrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Monkey-patching __init__
may break dataclass-like signatures
Overwriting __init__
:
- Removes the rich signature Pydantic generates (hurts tooling &
validation). - Breaks multiple inheritance if models are later subclassed.
- Runs for every reconstructed model, even when Pydantic would have
already initialised private attrs (v1 & v2).
Prefer letting Pydantic handle private attributes:
- private_attrs = getattr(reconstructed_model, "__private_attributes__", {})
- if private_attrs:
- orig_init = reconstructed_model.__init__
- def _init_with_private_attrs(self, *args, **kwargs):
- orig_init(self, *args, **kwargs)
- ...
- reconstructed_model.__init__ = _init_with_private_attrs
+ # Pydantic takes care of initialising PrivateAttr defaults; no patch needed.
If a specific bug required this hook, add a comment with a minimal
reproduction and scope the patch to that scenario.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/mcp_agent/utils/pydantic_type_serializer.py around lines 719 to 735, the
current approach monkey-patches the __init__ method of reconstructed_model to
initialize private attributes, which removes Pydantic's rich __init__ signature,
breaks multiple inheritance, and runs unnecessarily for all models. To fix this,
remove the __init__ monkey-patching entirely and rely on Pydantic's built-in
handling of private attributes. If a specific bug requires this patch, add a
detailed comment explaining the minimal reproduction and limit the patch only to
that case.
field_definitions[field_name] = ( | ||
field_type, | ||
Field( | ||
default=default if default_factory is None else ..., | ||
default=... if default_factory is not None else default, | ||
default_factory=default_factory, | ||
**constraints, | ||
), | ||
) | ||
else: | ||
field_definitions[field_name] = (field_type, default) | ||
if is_required: | ||
field_definitions[field_name] = (field_type, Field(default=...)) | ||
else: | ||
field_definitions[field_name] = ( | ||
field_type, | ||
Field( | ||
default=default, | ||
), | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid passing default_factory=None
& reconcile required
logic
Field(..., default_factory=None)
is semantically meaningless and may
raise in future Pydantic releases.
Also, when is_required=False
but default
is still ...
, the field
remains required.
- Field(
- default=... if default_factory is not None else default,
- default_factory=default_factory,
+ kwargs = {**constraints}
+ if default_factory is not None:
+ kwargs["default_factory"] = default_factory
+ kwargs["default"] = ...
+ elif default is not ...:
+ kwargs["default"] = default
+ else:
+ kwargs["default"] = ...
+ Field(**kwargs),
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/mcp_agent/utils/pydantic_type_serializer.py around lines 683 to 701,
avoid passing default_factory=None to Field as it is meaningless and may cause
errors in future Pydantic versions. Also, ensure that when is_required is False,
the default is not set to ... which makes the field required. Fix this by only
passing default_factory if it is not None and by setting default to a proper
non-required value when is_required is False.
…mAgent to avoid mutable default arguments
…rtion for result validation
…and improve context handling
…lizer to avoid runtime TypeError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (6)
tests/workflows/llm/test_augmented_llm_azure.py (1)
402-403
: Fix pending: API error test never raises – useside_effect
instead ofreturn_value
The mock executor is still returning an Exception object instead of raising it, which means the error handling code path won't be properly tested.
- mock_llm.executor.execute = AsyncMock(return_value=Exception("API Error")) + mock_llm.executor.execute = AsyncMock(side_effect=Exception("API Error"))src/mcp_agent/utils/pydantic_type_serializer.py (3)
393-393
: Consider Pydantic v1 compatibility for broader adoptionWhile the current approach uses Pydantic v2's
is_required()
method, maintaining compatibility with v1 would increase the utility's adoption:- "required": field_info.is_required(), + "required": getattr(field_info, "is_required", lambda: getattr(field_info, "required", True))(),
683-701
: Avoid passingdefault_factory=None
to prevent future issuesThe current code may pass
default_factory=None
toField()
, which is semantically meaningless and could cause errors in future Pydantic versions:- Field( - default=... if default_factory is not None else default, - default_factory=default_factory, - **constraints, - ), + kwargs = {**constraints} + if default_factory is not None: + kwargs["default_factory"] = default_factory + kwargs["default"] = ... + elif default is not ...: + kwargs["default"] = default + else: + kwargs["default"] = ... + Field(**kwargs),
719-735
: Consider removing__init__
monkey-patching for better maintainabilityThe current approach of monkey-patching
__init__
can break Pydantic's rich signatures and inheritance patterns. Consider relying on Pydantic's built-in private attribute handling instead, unless this patch addresses a specific documented issue.tests/workflows/llm/test_augmented_llm_openai.py (1)
382-383
: Fix pending: API error test never raises – useside_effect
instead ofreturn_value
The mock executor is returning an Exception object instead of raising it, preventing proper testing of error handling logic:
- # Setup mock executor to raise an exception - mock_llm.executor.execute = AsyncMock(return_value=Exception("API Error")) + # Setup mock executor to raise an exception + mock_llm.executor.execute = AsyncMock(side_effect=Exception("API Error"))tests/workflows/swarm/test_swarm.py (1)
201-220
:⚠️ Potential issueFix mocking anti-pattern - the test defeats its own purpose.
This test mocks the
call_tool
method onmock_swarm_agent
and then calls the same mocked method, which doesn't actually test the realSwarmAgent.call_tool
logic when a tool is not in_function_tool_map
.The test should verify that
SwarmAgent.call_tool
falls back tosuper().call_tool()
when the tool name is not found in_function_tool_map
. Replace this with a test that uses a realSwarmAgent
instance:- @pytest.mark.asyncio - async def test_call_tool_with_non_function_tool( - self, mock_swarm_agent, mock_tool_response - ): - """Test call_tool with a non-function tool.""" - # Set up mocks - mock_swarm_agent._function_tool_map = {} - mock_swarm_agent.initialized = True - mock_swarm_agent.call_tool = AsyncMock(return_value=mock_tool_response) - - # Call the method directly without using Agent.call_tool - # We're testing that the SwarmAgent's call_tool method works when the tool - # is not in the function tool map - result = await mock_swarm_agent.call_tool("non_function_tool", {"arg": "value"}) - - # Assert the call was made and the result was returned - mock_swarm_agent.call_tool.assert_called_once_with( - "non_function_tool", {"arg": "value"} - ) - assert result == mock_tool_response + @pytest.mark.asyncio + async def test_call_tool_with_non_function_tool(self, mock_tool_response): + """Test call_tool with a non-function tool falls back to parent class.""" + # Create a real SwarmAgent instance + agent = SwarmAgent( + name="test_agent", + instruction="Test instruction", + server_names=[], + functions=[], + context=Context(), + ) + agent._function_tool_map = {} # Ensure tool is not in function map + agent.initialized = True + + # Mock the parent class call_tool method + with patch.object(agent.__class__.__bases__[0], 'call_tool', new_callable=AsyncMock) as mock_parent_call_tool: + mock_parent_call_tool.return_value = mock_tool_response + + # Call the method + result = await agent.call_tool("non_function_tool", {"arg": "value"}) + + # Assert parent method was called and result returned + mock_parent_call_tool.assert_called_once_with("non_function_tool", {"arg": "value"}) + assert result == mock_tool_responseYou'll also need to add the patch import:
+from unittest.mock import AsyncMock, MagicMock, patch -from unittest.mock import AsyncMock, MagicMock
🧹 Nitpick comments (2)
tests/workflows/llm/test_augmented_llm_ollama.py (1)
186-208
: Simplify complex mocking pattern.The
execute_side_effect
function and nested mocking are quite complex and brittle. Consider a simpler approach:- # Patch executor.execute to forward to the patched structured task - async def execute_side_effect(task, request): - if ( - task is mock_structured_task._mock_wraps - or task is mock_structured_task - ): - return await mock_structured_task(request) - return MagicMock() - - llm.executor.execute = AsyncMock(side_effect=execute_side_effect) + # Mock executor.execute to return a simple result + llm.executor.execute = AsyncMock(return_value=TestResponseModel(name="test"))This approach:
- Reduces complexity and potential brittleness
- Makes the test easier to understand
- Focuses on the essential behavior being tested
tests/workflows/swarm/test_swarm.py (1)
36-36
: Consider using the mock_context fixture consistently.Multiple tests create
Context()
instances directly instead of using the providedmock_context
fixture. While this works, using the fixture would be more consistent and allow for better test isolation.Consider updating these tests to use the
mock_context
fixture:- context=Context(), + context=mock_context,This ensures all tests use the same mocked context setup and makes the tests more maintainable.
Also applies to: 58-58, 84-84, 93-93, 121-121, 149-149, 184-184
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
pyproject.toml
(1 hunks)src/mcp_agent/utils/pydantic_type_serializer.py
(8 hunks)src/mcp_agent/workflows/llm/augmented_llm_openai.py
(4 hunks)src/mcp_agent/workflows/swarm/swarm.py
(2 hunks)tests/executor/temporal/test_workflow_registry.py
(1 hunks)tests/executor/test_workflow.py
(1 hunks)tests/mcp/test_mcp_connection_manager.py
(1 hunks)tests/workflows/intent_classifier/conftest.py
(1 hunks)tests/workflows/llm/conftest.py
(1 hunks)tests/workflows/llm/test_augmented_llm_azure.py
(1 hunks)tests/workflows/llm/test_augmented_llm_bedrock.py
(1 hunks)tests/workflows/llm/test_augmented_llm_google.py
(1 hunks)tests/workflows/llm/test_augmented_llm_ollama.py
(1 hunks)tests/workflows/llm/test_augmented_llm_openai.py
(3 hunks)tests/workflows/orchestrator/conftest.py
(1 hunks)tests/workflows/parallel/test_parallel_llm.py
(1 hunks)tests/workflows/router/conftest.py
(1 hunks)tests/workflows/router/test_router_embedding_openai.py
(1 hunks)tests/workflows/swarm/test_swarm.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (13)
- tests/workflows/llm/conftest.py
- tests/workflows/intent_classifier/conftest.py
- tests/executor/test_workflow.py
- src/mcp_agent/workflows/swarm/swarm.py
- tests/mcp/test_mcp_connection_manager.py
- tests/workflows/parallel/test_parallel_llm.py
- tests/executor/temporal/test_workflow_registry.py
- tests/workflows/router/test_router_embedding_openai.py
- tests/workflows/llm/test_augmented_llm_google.py
- tests/workflows/router/conftest.py
- tests/workflows/orchestrator/conftest.py
- tests/workflows/llm/test_augmented_llm_bedrock.py
- src/mcp_agent/workflows/llm/augmented_llm_openai.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/workflows/swarm/test_swarm.py (5)
src/mcp_agent/agents/agent.py (1)
CallToolRequest
(789-798)src/mcp_agent/workflows/swarm/swarm.py (6)
AgentFunctionResult
(147-161)SwarmAgent
(74-144)DoneAgent
(314-327)create_agent_resource
(50-58)create_agent_function_result_resource
(61-71)should_continue
(304-311)src/mcp_agent/workflows/swarm/swarm_openai.py (1)
OpenAISwarm
(9-41)src/mcp_agent/core/context.py (1)
Context
(50-79)tests/workflows/swarm/conftest.py (7)
mock_context
(11-17)test_function_result
(56-58)test_function_agent_function_result
(68-70)mock_swarm_agent
(34-46)mock_tool_response
(80-82)mock_agent
(21-30)done_agent
(50-52)
🔇 Additional comments (13)
src/mcp_agent/utils/pydantic_type_serializer.py (4)
184-191
: LGTM! Literal values are now properly serializedThe code now correctly wraps each literal argument with
make_serializable()
to ensure JSON compatibility, addressing the previous concern about non-JSON-serializable types like bytes.
303-303
: LGTM! Pydantic v2 field validator compatibilityThe change from
validator.fields
tovalidator.info.fields
correctly aligns with Pydantic v2 API structure.
420-424
: LGTM! Enhanced private attribute handling for Pydantic v2The fallback to
Any
whentype_
is not available ensures compatibility while leveraging v2's improved type information when present.
445-455
: LGTM! Improved config serialization with proper JSON validationThe new logic correctly handles both dict-based configs (Pydantic v2
ConfigDict
) and class-based configs, with appropriate JSON serializability checks.tests/workflows/llm/test_augmented_llm_openai.py (2)
70-91
: LGTM! Clean test response factory methodThe static method provides a clean way to create mock OpenAI chat completion responses with proper typing and structure.
94-126
: LGTM! Comprehensive tool use response creationThe method correctly constructs tool call responses with proper OpenAI structure and handles the nested function call format appropriately.
tests/workflows/llm/test_augmented_llm_ollama.py (5)
1-17
: LGTM: Clean test setup and clear documentation.The imports are appropriate and the class documentation clearly explains the testing scope, focusing on Ollama-specific functionality while acknowledging inheritance from OpenAIAugmentedLLM.
19-50
: Well-designed fixtures with good separation of concerns.The fixture design is excellent:
mock_llm
provides a pre-configured instance for common use casesmock_context_factory
allows creating clean contexts for specific test scenarios- Good use of realistic test data in the OpenAISettings
129-163
: LGTM: Well-structured async test for JSON mode.The test properly:
- Uses appropriate async/await patterns
- Mocks dependencies correctly
- Tests the expected JSON mode behavior
- Has clear assertions
214-218
: LGTM: Clear verification of client initialization.The assertions properly verify that the correct API key and base URL are passed to the structured completion task, which is the core behavior being tested.
93-93
:✅ Verification successful
Verify if commented line is needed.
There's a commented line for
reasoning_effort
. Ensure this isn't required for proper test functionality.
🏁 Script executed:
#!/bin/bash # Check if reasoning_effort is accessed during OpenAIAugmentedLLM initialization ast-grep --pattern 'reasoning_effort'Length of output: 907
🏁 Script executed:
#!/bin/bash # Display lines 1-120 around the commented reasoning_effort in the test file sed -n '1,120p' tests/workflows/llm/test_augmented_llm_ollama.pyLength of output: 5013
🏁 Script executed:
#!/bin/bash # Extract the OllamaAugmentedLLM class definition and its init method ast-grep --pattern $'class OllamaAugmentedLLM { $$$ init($_) { $$$ } $$$ }' --language javascriptLength of output: 287
Drop the commented-out
reasoning_effort
assignment
TheMagicMock
in this test is already created with a spec that includesreasoning_effort
, so it will automatically return aMagicMock
value when accessed—and this attribute isn’t asserted anywhere intest_initialization
. You can safely remove the commented line for clarity.• tests/workflows/llm/test_augmented_llm_ollama.py (line ~93)
tests/workflows/swarm/test_swarm.py (2)
23-512
: Comprehensive test coverage with good structure.The test suite provides excellent coverage of the swarm workflow components with well-structured test cases that cover various scenarios including normal operations, edge cases, and error conditions. The tests properly exercise the real methods rather than mocking them (except for the one identified issue), and the assertions validate the expected behavior correctly.
206-206
:❌ Incorrect review comment
Potential AttributeError with spec-based mock.
Based on the fixture definition in
conftest.py
,mock_swarm_agent
is created withMagicMock(spec=SwarmAgent)
. Direct assignment to_function_tool_map
may cause anAttributeError
since this is an instance attribute created in__init__
, not a class attribute.Let me verify if the fixture definition still has this issue:
🏁 Script executed:
#!/bin/bash # Check the mock_swarm_agent fixture definition to see if it uses spec=SwarmAgent cat tests/workflows/swarm/conftest.py | grep -A 15 "def mock_swarm_agent"Length of output: 537
🏁 Script executed:
#!/bin/bash # Search for the SwarmAgent class definition and its __init__ to locate _function_tool_map rg -n "class SwarmAgent" -C5 rg -n "def __init__" -A10 $(rg -l "class SwarmAgent" -g '*.py')Length of output: 2345
🏁 Script executed:
#!/bin/bash # Check if _function_tool_map is defined in SwarmAgent class rg -n "_function_tool_map" src/mcp_agent/workflows/swarm/swarm.pyLength of output: 166
🏁 Script executed:
#!/bin/bash python3 - <<'EOF' from unittest.mock import MagicMock class Foo: pass # Test spec-based mock assignment m_spec = MagicMock(spec=Foo) m_spec.new_attr = 5 print("spec assignment:", m_spec.new_attr) # Test spec_set-based mock assignment from unittest.mock import MagicMock as MM2 m_spec_set = MM2(spec_set=Foo) try: m_spec_set.other = 6 print("spec_set assignment succeeded:", m_spec_set.other) except Exception as e: print("spec_set error:", type(e).__name__, e) EOFLength of output: 98
No change needed: MagicMock(spec=…) allows new attributes
Assigningmock_swarm_agent._function_tool_map = {}
will not raise anAttributeError
—onlyspec_set
mocks disallow new attributes. The existing fixture is safe as written.Likely an incorrect or invalid review comment.
… field information in PydanticTypeSerializer
…clarity and structure
… handling in tests
Summary by CodeRabbit
Bug Fixes
New Features
Chores
Style
Tests