-
Notifications
You must be signed in to change notification settings - Fork 0
test: end-to-end testing for chat features #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@claude can you please do a deep review of this pr |
|
Claude finished @kantord's task —— View job Deep Review of PR #199
|

Summary
This PR adds E2E tests for the AI assistant chat feature using Ollama with a tiny local model (qwen2.5:1.5b), proving that real integration testing of LLM-powered features is feasible in CI without API keys or mocking.
This is an experiment to validate that we can test the chat feature end-to-end with a real (but small) language model, ensuring the full streaming pipeline works correctly.
Changes
Won't this make tests super slow?
No. With proper caching, Ollama adds minimal overhead:
Total added time: ~10-20s per CI run after caches are warm. Not done at this PR, but should be 100% parallelizable with other test cases, so effectively this should have no harmful effect on CI performance, especially when used sparingly for a few important test cases.
Won't this make tests super flaky?
The tests are designed to minimize flakiness:
The model is small but capable enough for simple instruction-following. Tests don't rely on specific wording, just that the model echoes back identifiers we provide.
What about testing actual tool calls?
This PR intentionally skips MCP tool fetching in Ollama mode to focus on proving the concept first. However, testing real tool calls is absolutely possible. It should be easy to craft some prompts that would not result in test flakes but still prove the health of the tool call workflow.
Once this approach is approved, extending it to test the full MCP integration would be straightforward.