-
Notifications
You must be signed in to change notification settings - Fork 13
[WIP] fix: Windows compatibility fixes. Fix #57 #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis PR introduces cross-platform compatibility improvements and refactors stdin handling. Key changes include replacing StreamReader-based stdin reading with a thread pool-based async generator, adding Windows-compatible signal handling, fixing path operations for cross-drive scenarios on Windows, and relaxing timing assertions in tests. Various test infrastructure improvements address platform-specific file operations and mocking patterns. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30–45 minutes
Possibly related issues
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/qdrant-loader-core/src/qdrant_loader_core/logging.py (1)
374-401: Pipeline failure: File exceeds 400-line size limit.The multi-line reformatting at lines 374-376 and 391-401 has pushed the file to 401 lines, exceeding the configured limit of 400. While the formatting improves readability, you'll need to reduce the line count.
Options to reduce line count:
- Revert to single-line formatting for the list comprehension:
- cls._installed_handlers = [ - h for h in cls._installed_handlers if h is not cls._file_handler - ] + cls._installed_handlers = [h for h in cls._installed_handlers if h is not cls._file_handler]
- Or combine the tuple unpacking/packing:
- level, fmt, _, clean_output, suppress_qdrant_warnings, disable_console = ( - cls._current_config - ) - cls._current_config = ( - level, - fmt, - file, - clean_output, - suppress_qdrant_warnings, - disable_console, - ) + level, fmt, _, clean_output, suppress_qdrant_warnings, disable_console = cls._current_config + cls._current_config = (level, fmt, file, clean_output, suppress_qdrant_warnings, disable_console)
🧹 Nitpick comments (3)
packages/qdrant-loader-mcp-server/tests/unit/search/test_nlp_components.py (1)
63-67: Relaxing timing assertion here is reasonable and reduces flakinessAllowing
processing_time_msto be>= 0for this multi-query loop is sensible: it still verifies the field is present and numeric, while avoiding spurious failures on very fast or cached paths (especially for empty queries). Keeping stricter> 0checks in more focused tests maintains stronger guarantees where they matter.If you see similar intermittent timing-related failures in other tests, consider applying the same
>= 0pattern there as well.packages/qdrant-loader-mcp-server/tests/unit/test_logging.py (1)
145-153: Windows compatibility fix looks good, minor style note.The
logging.shutdown()andshutil.rmtree()pattern correctly handles Windows file locking.Consider moving the
shutilimport to the top of the file with other imports for consistency:+import shutil import logging import os import tempfileThen simplify the finally block:
finally: logging.shutdown() - import shutil - try: shutil.rmtree(tmp_dir, ignore_errors=True) except Exception: passpackages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py (1)
72-79: Useasyncio.get_running_loop()instead of deprecatedget_event_loop().
asyncio.get_event_loop()is deprecated in Python 3.10+ when called from a running coroutine. Since this is an async generator, useasyncio.get_running_loop()which is the recommended approach for obtaining the loop from within async code.async def read_stdin_lines(): """Cross-platform async generator that yields lines from stdin.""" - loop = asyncio.get_event_loop() + loop = asyncio.get_running_loop() while True: line = await loop.run_in_executor(None, sys.stdin.readline) if not line: # EOF break yield line
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
packages/qdrant-loader-core/src/qdrant_loader_core/logging.py(2 hunks)packages/qdrant-loader-mcp-server/pyproject.toml(1 hunks)packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py(3 hunks)packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/hybrid/components/builder.py(2 hunks)packages/qdrant-loader-mcp-server/tests/unit/search/test_faceted_search.py(1 hunks)packages/qdrant-loader-mcp-server/tests/unit/search/test_nlp_components.py(1 hunks)packages/qdrant-loader-mcp-server/tests/unit/test_cli.py(6 hunks)packages/qdrant-loader-mcp-server/tests/unit/test_logging.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/back-end-dev.mdc)
**/*.py: Use Python 3.x for back-end implementation and code examples
When building LLM/AI workflows, prefer and follow best practices of PydanticAI or LangChain
Use Qdrant as the vector database with performant indexing and query patterns
Design and implement RESTful endpoints with clear resource modeling and proper status codes
Implement authentication/authorization using OAuth2 or JWT; follow secure session and token handling
Files:
packages/qdrant-loader-mcp-server/tests/unit/search/test_nlp_components.pypackages/qdrant-loader-mcp-server/tests/unit/search/test_faceted_search.pypackages/qdrant-loader-mcp-server/tests/unit/test_logging.pypackages/qdrant-loader-core/src/qdrant_loader_core/logging.pypackages/qdrant-loader-mcp-server/tests/unit/test_cli.pypackages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/hybrid/components/builder.pypackages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py
🧠 Learnings (1)
📚 Learning: 2025-08-12T09:02:25.128Z
Learnt from: martin-papy
Repo: martin-papy/qdrant-loader PR: 43
File: docs/developers/testing/README.md:36-40
Timestamp: 2025-08-12T09:02:25.128Z
Learning: Both packages/qdrant-loader and packages/qdrant-loader-mcp-server directories contain pyproject.toml files at their root level, making them installable with pip install -e for editable development installations.
Applied to files:
packages/qdrant-loader-mcp-server/pyproject.tomlpackages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py
🧬 Code graph analysis (3)
packages/qdrant-loader-mcp-server/tests/unit/test_cli.py (1)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py (1)
read_stdin_lines(72-79)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/hybrid/components/builder.py (2)
packages/qdrant-loader-core/src/qdrant_loader_core/llm/factory.py (1)
create_provider(52-87)packages/qdrant-loader/src/qdrant_loader/config/__init__.py (1)
llm_settings(307-317)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py (1)
packages/qdrant-loader-core/src/qdrant_loader_core/logging.py (2)
LoggingConfig(190-401)get_logger(354-357)
🪛 GitHub Actions: Quality Gates
packages/qdrant-loader-mcp-server/tests/unit/search/test_nlp_components.py
[error] 1-1: Getting requirements to build editable did not run successfully. ValueError: invalid pyproject.toml config: project.optional-dependencies.classifiers[0] must be pep508.
packages/qdrant-loader-mcp-server/tests/unit/search/test_faceted_search.py
[error] 1-1: Getting requirements to build editable did not run successfully. ValueError: invalid pyproject.toml config: project.optional-dependencies.classifiers[0] must be pep508.
packages/qdrant-loader-mcp-server/pyproject.toml
[error] 1-1: invalid pyproject.toml config: project.optional-dependencies.classifiers[0]. configuration error: project.optional-dependencies.classifiers[0] must be pep508
packages/qdrant-loader-mcp-server/tests/unit/test_logging.py
[error] 1-1: Getting requirements to build editable did not run successfully. ValueError: invalid pyproject.toml config: project.optional-dependencies.classifiers[0] must be pep508.
packages/qdrant-loader-mcp-server/tests/unit/test_cli.py
[error] 1-1: Getting requirements to build editable did not run successfully. ValueError: invalid pyproject.toml config: project.optional-dependencies.classifiers[0] must be pep508.
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/hybrid/components/builder.py
[error] 1-1: Getting requirements to build editable did not run successfully. ValueError: invalid pyproject.toml config: project.optional-dependencies.classifiers[0] must be pep508.
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py
[error] 1-1: Getting requirements to build editable did not run successfully. ValueError: invalid pyproject.toml config: project.optional-dependencies.classifiers[0] must be pep508.
🪛 GitHub Actions: Test and Coverage
packages/qdrant-loader-mcp-server/pyproject.toml
[error] 1-1: invalid pyproject.toml config: project.optional-dependencies.classifiers[0] must be pep508
packages/qdrant-loader-mcp-server/tests/unit/test_logging.py
[error] 1-1: Core modules exceeding size limits: 401 > 400 (logging.py)
packages/qdrant-loader-core/src/qdrant_loader_core/logging.py
[error] 1-1: Core modules exceeding size limits: 401 > 400 (logging.py)
🔇 Additional comments (11)
packages/qdrant-loader-mcp-server/tests/unit/search/test_faceted_search.py (1)
466-471: Zero generation times are acceptable for this sanity checkChanging the assertion to
faceted_results.generation_time_ms >= 0keeps the validation that the timing is computed and sensible, while avoiding brittle assumptions about minimum duration that can fail on fast machines or coarse timers.packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/hybrid/components/builder.py (2)
33-43: LGTM: Defensive debug logging for config load failure.The logging is properly guarded with try/except and uses appropriate debug level for troubleshooting. The error message provides useful context without exposing sensitive data.
123-139: LGTM: Provider creation logging with appropriate safeguards.The logging correctly uses
has_api_key=bool(...)to avoid leaking the actual API key. The provider type and name provide useful debugging information.Note:
base_urlis logged directly. This is generally acceptable since base URLs are typically public endpoints, but verify this doesn't expose internal infrastructure URLs in your deployment.packages/qdrant-loader-mcp-server/tests/unit/test_logging.py (1)
96-121: LGTM: Windows-compatible file cleanup.The changes correctly address Windows file locking by:
- Extracting
tmp_file_namebefore closing the NamedTemporaryFile context- Calling
logging.shutdown()before file deletion to release handlesThis is the proper approach for cross-platform compatibility.
packages/qdrant-loader-mcp-server/tests/unit/test_cli.py (5)
162-188: LGTM: Test correctly validates async generator behavior.The test properly:
- Mocks
run_in_executorto return lines then EOF (empty string)- Consumes the async generator with
async for- Verifies both the yielded content and the call count (3 = 2 lines + 1 EOF check)
This aligns with the
read_stdin_lines()implementation in cli.py.
294-334: LGTM: JSON parse error test updated for async generator pattern.The mock async generator correctly yields invalid JSON, and the test verifies the expected error response (code -32700, "Parse error").
336-377: LGTM: Invalid request format test updated correctly.The async generator yields non-object JSON (
"not an object"), and the test correctly expects error code -32600 ("Invalid Request").
379-421: LGTM: Invalid JSON-RPC version test updated correctly.Test properly validates that JSON-RPC version "1.0" (instead of "2.0") triggers error code -32600 with appropriate error data.
423-474: LGTM: Successful request test updated correctly.The test validates the happy path with a valid JSON-RPC 2.0 request, verifying that the MCP handler is called and the response is properly written.
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py (2)
289-293: LGTM!The async generator consumption pattern is clean and correctly handles line-by-line processing with appropriate whitespace trimming and empty line filtering.
533-543: LGTM! Appropriate handling for Windows compatibility.The try/except block correctly handles the
NotImplementedErrorraised by Windows ProactorEventLoop. Debug-level logging is appropriate since this is expected behavior on Windows, not an error condition.Note: Without signal handlers, graceful shutdown via SIGTERM/SIGINT won't work on Windows. Consider documenting that Windows users should use Ctrl+C for interactive shutdown, or explore alternative approaches like
signal.signal()if needed for background processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (11)
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py (1)
76-80: Consider using@pytest.mark.skipifdecorator for cleaner platform skipping.The inline skip works correctly but pytest's decorator is more idiomatic and improves test readability.
Apply this diff to use the decorator approach:
+ @pytest.mark.skipif( + sys.platform == "win32", + reason="signal.alarm not available on Windows" + ) def test_timeout_handler_context_manager(self): """Test timeout handler as context manager.""" - import sys - - # Skip on Windows - signal.alarm not available - if sys.platform == "win32": - pytest.skip("signal.alarm not available on Windows") - with patch("signal.signal") as mock_signal, patch("signal.alarm") as mock_alarm:Note:
sysis already imported at the module level (line 5 importssignal; you can addimport sysat line 5 alongside it).packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1)
176-182: Path normalization is correct; innerPathimport can be droppedUsing
Path(actual_path).resolve()and checking that the resolved DB path has the resolved workspace in itsparentsis a good, cross-platform way to assert the override behavior and directory relationship. SincePathis already imported at the top of the module (Line 5), the innerfrom pathlib import Pathat Line 177 is redundant and can be removed.You can simplify this block as:
- # Normalize paths for comparison (handles Windows short paths) - from pathlib import Path - assert Path(actual_path).resolve() == Path(expected_path).resolve() + # Normalize paths for comparison (handles Windows paths consistently) + assert Path(actual_path).resolve() == Path(expected_path).resolve()packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py (1)
92-107: Consider consolidating duplicate test coverage.This test is nearly identical to
test_create_database_directory_exceptionintest_cli.py(lines 140-150). Both test files cover the same exception handling behavior for_create_database_directory.Consider consolidating the test coverage to avoid maintenance overhead.
packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1)
143-157: Make Windows privilege detection more robust than string-matching the error messageThe new try/except correctly skips the test when symlink creation fails due to missing privileges on Windows, and otherwise re-raises the error. To avoid brittle dependence on the localized error text (
"privilege" in str(e).lower()), consider matching on the platform-specific error code instead (e.g.,winerror == 1314) or at least errno/exception type:- import sys + import sys @@ - try: - symlink_dir.symlink_to(temp_dir) - except OSError as e: - # Skip test on Windows if user doesn't have symlink privileges - if sys.platform == "win32" and "privilege" in str(e).lower(): - pytest.skip("Symlink creation requires admin privileges on Windows") - raise + try: + symlink_dir.symlink_to(temp_dir) + except OSError as e: + # Skip test on Windows if user doesn't have symlink privileges + if sys.platform == "win32" and getattr(e, "winerror", None) == 1314: + pytest.skip("Symlink creation requires admin privileges on Windows") + raise(Alternatively, you could check
isinstance(e, PermissionError)and/ore.errno in {errno.EPERM, errno.EACCES}.)Minor style nit: if you prefer consistent import style across the test suite, you could move the
import systo the module top; local import is fine functionally, though.packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
149-155: Consider using a more explicit patch target.The Windows-compatible mocking approach is excellent and solves the chmod portability issue. However, for better clarity and maintainability, patch where
os.accessis used rather than patching the built-in module globally.Apply this diff for a more explicit patch target:
- # Mock os.access to simulate unreadable file - with patch("os.access", return_value=False): + # Mock os.access to simulate unreadable file + with patch("qdrant_loader.connectors.git.file_processor.os.access", return_value=False): assert processor.should_process_file(test_file) is FalseThis makes it clearer which module's
os.accessis being mocked and prevents potential side effects if other code under test also callsos.access.packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (2)
378-382: Move import to module level.The
sysmodule is imported at function level. For consistency and clarity, move it to the top of the file with other imports.import os import tempfile from pathlib import Path from unittest.mock import Mock, patch +import sys import pytestThen remove the function-level import.
36-44: Consider extracting cleanup pattern to a fixture.The temporary file creation and cleanup pattern is repeated across many tests. Consider creating a pytest fixture or context manager to reduce duplication and ensure consistent cleanup behavior.
Example fixture:
@pytest.fixture def temp_file_path(request): """Create a temporary file and ensure cleanup.""" suffix = request.param if hasattr(request, 'param') else '.tmp' with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp_file: tmp_path = tmp_file.name yield tmp_path try: os.unlink(tmp_path) except (OSError, PermissionError): passUsage:
@pytest.mark.parametrize('temp_file_path', ['.pdf'], indirect=True) def test_example(self, temp_file_path): # Write to file with open(temp_file_path, 'wb') as f: f.write(b"content") # Test with temp_file_pathAlso applies to: 63-71, 88-97, 104-113, 120-135, 145-154, 333-342, 354-366, 410-420, 439-451, 467-478, 493-509
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (1)
54-75: Consider extracting cleanup pattern to a fixture.Similar to the unit tests, this file repeats the temporary file creation and cleanup pattern. Consider creating a shared pytest fixture (perhaps in a
conftest.py) to reduce duplication across both unit and integration tests.Example shared fixture in
tests/conftest.py:@pytest.fixture def windows_safe_temp_file(): """Create a temporary file with Windows-safe cleanup.""" def _create(suffix='.tmp', content=b''): with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp_file: if content: tmp_file.write(content) tmp_file.flush() tmp_path = tmp_file.name yield tmp_path try: os.unlink(tmp_path) except (OSError, PermissionError): pass return _createAlso applies to: 98-110, 176-188, 195-227, 234-251
packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1)
442-448: Remove redundant import statement.The module-level import of
osat line 5 makes the local import at line 443 redundant.Apply this diff to remove the redundant import:
# Use os.path.join for expected paths to match OS-specific separators - import os expected = [ os.path.join("/fake/repo/path", "file1.txt"), os.path.join("/fake/repo/path", "file2.py"),packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py (1)
43-58: Consider moving tempfile import to module level.The
tempfileimport at line 45 works correctly but is conventionally placed at the module level with other imports for better code organization.Apply this diff to move the import:
import os from datetime import datetime from unittest.mock import MagicMock, patch +import tempfile import pytestAnd remove from the fixture:
def mock_git_ops(self, mock_repo, mock_config): """Fixture creating mock Git operations.""" - import tempfile git_ops = MagicMock(spec=GitOperations)packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (1)
49-49: Optional: Remove unusedcaplogparameter.The
caplogfixture parameter is not used in the test body since the test manually creates and manages a custom handler.-def test_stdlib_logs_are_redacted(caplog): +def test_stdlib_logs_are_redacted():
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
packages/qdrant-loader-core/tests/unit/quality/test_module_sizes.py(1 hunks)packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py(1 hunks)packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py(2 hunks)packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py(1 hunks)packages/qdrant-loader/src/qdrant_loader/core/monitoring/ingestion_metrics.py(1 hunks)packages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.py(1 hunks)packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py(1 hunks)packages/qdrant-loader/tests/integration/test_file_conversion_integration.py(3 hunks)packages/qdrant-loader/tests/unit/cli/test_cli.py(1 hunks)packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_workspace_integration.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py(2 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py(1 hunks)packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py(10 hunks)packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py(1 hunks)packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py(3 hunks)packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/back-end-dev.mdc)
**/*.py: Use Python 3.x for back-end implementation and code examples
When building LLM/AI workflows, prefer and follow best practices of PydanticAI or LangChain
Use Qdrant as the vector database with performant indexing and query patterns
Design and implement RESTful endpoints with clear resource modeling and proper status codes
Implement authentication/authorization using OAuth2 or JWT; follow secure session and token handling
Files:
packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.pypackages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.pypackages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.pypackages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.pypackages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.pypackages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.pypackages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.pypackages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.pypackages/qdrant-loader/tests/integration/test_file_conversion_integration.pypackages/qdrant-loader/tests/unit/cli/test_cli.pypackages/qdrant-loader/tests/unit/config/test_workspace_integration.pypackages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.pypackages/qdrant-loader/tests/unit/utils/test_logging_redaction.pypackages/qdrant-loader/tests/unit/connectors/git/test_git_connector.pypackages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.pypackages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.pypackages/qdrant-loader/src/qdrant_loader/core/monitoring/ingestion_metrics.pypackages/qdrant-loader-core/tests/unit/quality/test_module_sizes.pypackages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py
🧬 Code graph analysis (6)
packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py (4)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/components/models/hybrid.py (1)
file_path(60-61)packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
temp_dir(18-21)packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1)
temp_dir(23-27)packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1)
temp_dir(18-29)
packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (2)
packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py (1)
should_process_file(37-227)packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py (1)
should_process_file(36-175)
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_converter.py (2)
convert_file(396-436)create_fallback_document(456-472)packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_detector.py (2)
is_supported_for_conversion(152-200)get_file_type_info(202-240)
packages/qdrant-loader/tests/unit/cli/test_cli.py (1)
packages/qdrant-loader/src/qdrant_loader/cli/cli.py (1)
_create_database_directory(97-118)
packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1)
packages/qdrant-loader/tests/unit/config/test_workspace.py (3)
temp_workspace(22-36)temp_workspace(141-150)temp_workspace(222-231)
packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (3)
packages/qdrant-loader-core/src/qdrant_loader_core/logging.py (2)
LoggingConfig(190-401)get_logger(354-357)packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/utils/logging.py (2)
LoggingConfig(55-162)get_logger(129-132)packages/qdrant-loader/src/qdrant_loader/utils/logging.py (2)
LoggingConfig(399-630)get_logger(618-630)
🔇 Additional comments (15)
packages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.py (1)
105-117: Safe current-window rate calculation looks goodThe new
current_window_elapsed/current_window_ratelogic correctly guards againstcurrent_window_startbeingNoneand zero-duration windows, without changing the returned schema. This is a straightforward robustness improvement and keeps behavior consistent with the rest of the class.packages/qdrant-loader/src/qdrant_loader/core/monitoring/ingestion_metrics.py (1)
74-80: Derivedaverage_conversion_timeproperty is a solid improvementTreating
average_conversion_timeas a computed property overtotal_files_processedandtotal_conversion_time(with a zero-files guard) removes denormalized state and keeps summary/save logic simple and consistent with the underlying counters.packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py (3)
121-126: Relaxed conversion_time assertion is appropriateAllowing
conversion_time >= 0(instead of> 0) matches the real behavior where very fast tests can yield an effectively zero duration, while still guarding against negative values.
321-321: Helpful note about computed average_conversion_timeThe inline comment correctly documents that
average_conversion_timeis now derived fromtotal_conversion_time / total_files_processed, keeping the test aligned with the implementation.
345-345: Average-conversion-time assertion matches new property semanticsAsserting
summary["average_conversion_time"] == 4.5is consistent with the property45.0 / 10computation and provides a clear, precise expectation for the derived metric.packages/qdrant-loader/tests/unit/cli/test_cli.py (1)
145-150: LGTM! Improved test reliability.The updated test correctly mocks the internal
_create_db_dir_helperto simulate permission errors, making the test more predictable and platform-independent. This approach avoids relying on actual filesystem behavior, which is especially valuable for Windows compatibility.packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py (1)
99-107: LGTM! Consistent exception handling test.The test correctly verifies that
OSErrorfrom the internal helper is wrapped in aClickException. The mocking approach improves test reliability and Windows compatibility.packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (2)
5-5: LGTM! Import supports cross-platform testing.The
patchimport is correctly added to enable mockingos.accessfor Windows-compatible file access testing.
148-148: Clear documentation of the cross-platform approach.The comment effectively explains why mocking is used instead of filesystem permission changes.
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (1)
378-382: Windows skip logic is appropriate.The Windows skip for permission-related tests is correct, as
chmodbehavior differs significantly on Windows and these tests would be unreliable.packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (1)
54-75: LGTM! Windows-safe file handling implemented correctly.All integration tests properly implement the Windows-compatible pattern:
- Close temporary files before processing (Windows requires this)
- Store the path for later use
- Clean up with proper exception handling for
OSErrorandPermissionErrorThe added comments ("File is now closed, safe to process and delete") clearly document the Windows compatibility requirement.
Also applies to: 98-110, 176-188, 195-227, 234-251
packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py (1)
63-68: LGTM! Cross-platform path handling implemented correctly.The code properly handles cross-drive scenarios on Windows by catching
ValueErrorfromos.path.relpathand falling back to absolute paths. The conditional logic at line 88 correctly extracts the directory for both relative and absolute path cases.Also applies to: 88-88
packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py (1)
68-76: <function_calls>
#!/bin/bashRead lines 60-130 showing cross-drive handling and exclude patterns
sed -n '60,130p' packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py | cat -n
</function_calls>packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (2)
52-53: LGTM! Correct use of structured logging.The change to use
LoggingConfig.get_logger("stdlib.logger")correctly adopts the structured logging pattern and returns astructlog.BoundLoggerthat integrates with the redaction pipeline.
78-83: LGTM! Correct structured logging pattern for redaction.The switch from string interpolation to keyword arguments is the proper pattern for structlog. This allows the
_redact_processorto identify and redact sensitive keys (token,api_key) in the event dictionary before rendering.
packages/qdrant-loader-core/tests/unit/quality/test_module_sizes.py
Outdated
Show resolved
Hide resolved
packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py
Show resolved
Hide resolved
packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py
Show resolved
Hide resolved
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py
Outdated
Show resolved
Hide resolved
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py
Outdated
Show resolved
Hide resolved
packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py (1)
93-102: Add Windows skip for signal.SIGALRM usage.This test uses
signal.SIGALRMat line 99, which is not available on Windows (Unix-only signal). The test will fail withAttributeErroron Windows. Sincetest_timeout_handler_context_manageralready has a Windows skip for the same reason, this test should be consistent.Apply this diff to add the Windows skip:
+ @pytest.mark.skipif( + sys.platform == "win32", + reason="signal.SIGALRM not available on Windows" + ) def test_timeout_handler_raises_timeout_error(self): """Test that timeout handler raises ConversionTimeoutError."""
🧹 Nitpick comments (9)
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)
54-75: Windows‑safe temp file handling and cleanup looks solidStoring
tmp_file.nameintotmp_pathbefore leaving theNamedTemporaryFilecontext and then running conversion +os.unlinkin atry/finallyis exactly what you want for Windows (no open-handle issues, cleanup always runs, including onpytest.skip). CatchingOSError/PermissionErroris reasonable to avoid flaky failures from already‑deleted or transiently locked files in CI.If you care to trim a nit later:
PermissionErroris already anOSError, so catching justOSErrorwould be sufficient.
176-188: Good shift to path‑based detection with guarded cleanupUsing
tmp_path(closed file path) when callingself.detector.is_supported_for_conversionmatches the production API and avoids platform‑specific file‑handle quirks. Thetry/finally+os.unlinkwrapper ensures temporary files are cleaned up without making tests flaky if the file is already gone or briefly locked.If you later want to detect “file still in use” leaks, you could consider only swallowing
FileNotFoundErrorand letting otherOSErrors surface, but for cross‑platform CI stability your current approach is acceptable.packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1)
176-178: Remove redundant local import.Path is already imported at the top of the file (line 5), making the local import on line 177 unnecessary.
Apply this diff to remove the redundant import:
# Normalize paths for comparison (handles Windows short paths) -from pathlib import Path assert Path(actual_path).resolve() == Path(expected_path).resolve()packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py (1)
76-80: Consider using pytest.mark.skipif decorator for cleaner test skipping.The inline skip works correctly, but using a decorator would be more idiomatic and move the platform check out of the test body.
Apply this diff to use the decorator approach:
+ @pytest.mark.skipif( + sys.platform == "win32", + reason="signal.alarm not available on Windows" + ) def test_timeout_handler_context_manager(self): """Test timeout handler as context manager.""" - import sys - - # Skip on Windows - signal.alarm not available - if sys.platform == "win32": - pytest.skip("signal.alarm not available on Windows") - with patch("signal.signal") as mock_signal, patch("signal.alarm") as mock_alarm:Note:
sysis already imported at line 5, so the inline import at line 76 is redundant.packages/qdrant-loader/src/qdrant_loader/core/monitoring/ingestion_metrics.py (1)
74-80: LGTM! Computed property improves consistency.Converting
average_conversion_timefrom a stored field to a computed property eliminates the risk of stale values and reduces state mutation. The division-by-zero guard is appropriate, and the property integrates seamlessly with existing usages inget_conversion_summary(line 385) andsave_metrics(line 451).For consistency with this pattern, consider whether other averaged metrics in the codebase could benefit from the same computed-property approach.
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (2)
104-104: Add spaces around assignment operator.These lines are missing spaces around the
=operator, which violates PEP 8 style guidelines.Apply this formatting fix to all affected lines:
- tmp_path=tmp_file.name + tmp_path = tmp_file.nameAlso applies to: 120-120, 145-145, 333-333, 354-354, 387-387, 410-410, 467-467
154-154: Verify indentation consistency in exception handlers.The
passstatements in theseexceptblocks appear to have inconsistent indentation compared to other similar blocks in the file. Please ensure consistent indentation throughout.Also applies to: 342-342, 366-366, 420-420, 451-451, 478-478, 509-509
packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py (1)
88-88: Consider simplifying the ternary logic for clarity.The conditional expression
os.path.dirname(rel_path) if not os.path.isabs(rel_path) else os.path.dirname(file_path)is logically correct but somewhat difficult to parse. Sincerel_pathequalsfile_pathwhen the relpath computation fails (line 68), you could simplify this to justos.path.dirname(rel_path)in both cases, or add a comment explaining the intent.Example simplification:
- "file_directory": os.path.dirname(rel_path) if not os.path.isabs(rel_path) else os.path.dirname(file_path), + "file_directory": os.path.dirname(rel_path), # Uses relative path if available, otherwise absolutepackages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1)
143-144: Consider moving the import to module level.The
import sysstatement is placed inside the test method, which works but is non-idiomatic. For consistency with the existing imports at lines 3-11 and adherence to PEP 8 conventions, consider moving it to the top of the file.Apply this diff to move the import:
+import sys import os import tempfile from pathlib import PathAnd remove it from the test method:
async def test_document_id_consistency_with_symlinks(self, temp_dir): """Test that document IDs remain consistent when accessing files through symlinks.""" - import sys - # Create a symlink to the temp directory
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
packages/qdrant-loader-core/tests/unit/quality/test_module_sizes.py(1 hunks)packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py(1 hunks)packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py(2 hunks)packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py(1 hunks)packages/qdrant-loader/src/qdrant_loader/core/monitoring/ingestion_metrics.py(1 hunks)packages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.py(1 hunks)packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py(1 hunks)packages/qdrant-loader/tests/integration/test_file_conversion_integration.py(3 hunks)packages/qdrant-loader/tests/unit/cli/test_cli.py(1 hunks)packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_workspace_integration.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py(2 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py(2 hunks)packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py(1 hunks)packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py(10 hunks)packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py(1 hunks)packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py(3 hunks)packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/back-end-dev.mdc)
**/*.py: Use Python 3.x for back-end implementation and code examples
When building LLM/AI workflows, prefer and follow best practices of PydanticAI or LangChain
Use Qdrant as the vector database with performant indexing and query patterns
Design and implement RESTful endpoints with clear resource modeling and proper status codes
Implement authentication/authorization using OAuth2 or JWT; follow secure session and token handling
Files:
packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.pypackages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.pypackages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.pypackages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.pypackages/qdrant-loader/tests/unit/config/test_workspace_integration.pypackages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.pypackages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.pypackages/qdrant-loader/src/qdrant_loader/core/monitoring/ingestion_metrics.pypackages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.pypackages/qdrant-loader/tests/unit/connectors/git/test_git_operations.pypackages/qdrant-loader/tests/unit/cli/test_cli.pypackages/qdrant-loader/tests/integration/test_file_conversion_integration.pypackages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.pypackages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.pypackages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.pypackages/qdrant-loader-core/tests/unit/quality/test_module_sizes.pypackages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.pypackages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.pypackages/qdrant-loader/tests/unit/connectors/git/test_git_connector.pypackages/qdrant-loader/tests/unit/utils/test_logging_redaction.pypackages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py
🧬 Code graph analysis (7)
packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py (1)
packages/qdrant-loader/src/qdrant_loader/connectors/publicdocs/connector.py (1)
_extract_title(495-538)
packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (2)
packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
temp_dir(18-21)packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1)
temp_dir(23-27)
packages/qdrant-loader/tests/unit/cli/test_cli.py (1)
packages/qdrant-loader/src/qdrant_loader/cli/cli.py (1)
_create_database_directory(97-118)
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_converter.py (2)
convert_file(396-436)create_fallback_document(456-472)packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_detector.py (2)
is_supported_for_conversion(152-200)get_file_type_info(202-240)
packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py (5)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/components/models/hybrid.py (1)
file_path(60-61)packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
temp_dir(18-21)packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1)
temp_dir(23-27)packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1)
temp_dir(18-29)packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor_enhanced.py (1)
temp_dir(20-23)
packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py (2)
packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (2)
mock_repo(31-35)temp_dir(23-27)packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
temp_dir(18-21)
packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (3)
packages/qdrant-loader-core/src/qdrant_loader_core/logging.py (1)
get_logger(354-357)packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/utils/logging.py (1)
get_logger(129-132)packages/qdrant-loader-core/tests/test_logging_core.py (1)
TestHandler(18-22)
🪛 Gitleaks (8.29.1)
packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py
[high] 81-81: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
[high] 82-82: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🔇 Additional comments (24)
packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py (1)
96-107: Exception-path test correctly aligned with helper-based implementationMocking
_create_db_dir_helperto raiseOSErrorand asserting aClickExceptioncontaining"Failed to create directory"matches the new_create_database_directorybehavior and avoids filesystem dependencies, which is good for cross-platform stability and determinism in tests.packages/qdrant-loader/tests/unit/cli/test_cli.py (1)
145-150: Refined exception test matches refactored directory-creation flowSwitching to mock
_create_db_dir_helperand assertingClickExceptionwith"Failed to create directory"correctly tracks the new_create_database_directoryimplementation and keeps the test independent of actual path semantics on different OSes.packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (3)
98-110: Consistent pattern for size‑limit test is appropriateSame pattern here: capturing
tmp_pathafterNamedTemporaryFile, asserting the size‑limit failure, and then guaranteeingos.unlinkviafinallyis correct and consistent with the previous test. This avoids Windows handle issues while still enforcing the behavior you care about.
195-227: Comprehensive file‑type info test now properly uses a closed pathCapturing
tmp_paththen callingget_file_type_info(tmp_path)after the context ensures the handle is closed on Windows while still allowingos.path.getsizeand type detection to work as in production. The assertions over all expected fields and PDF‑specific values are clear, and cleanup infinallyis robust.
234-251: Fallback document test aligns with production usage and cleans up correctlyPassing
tmp_pathintocreate_fallback_documentmatches the real code expectations (string path, not open handle). The assertions verify both structure and key content (status, error message, filename), and thetry/finally+os.unlinkpattern maintains Windows compatibility and avoids leaking temp files.packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1)
178-181: LGTM! Robust Windows path handling.The use of
.resolve()for path normalization and.parentsfor ancestry checking is a solid cross-platform approach that correctly handles Windows short paths and ensures consistent behavior across operating systems.packages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.py (1)
105-117: LGTM! Defensive programming for rate calculation.The introduction of
current_window_elapsedwith a None-check and the division-by-zero guard forcurrent_window_ratecomputation ensures robustness. The fallback to 0.0 is appropriate when no time has elapsed or the window hasn't started.packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py (2)
125-125: LGTM! Relaxed assertion for cross-platform reliability.Changing from
> 0to>= 0accommodates scenarios where operations complete instantly, which is more likely on Windows or during fast test execution. This aligns with the PR's Windows compatibility goals.
321-321: LGTM! Clarifying comments for computed property.The comments correctly document that
average_conversion_timeis now a computed property rather than a stored field, helping future maintainers understand the behavior.Also applies to: 345-345
packages/qdrant-loader-core/tests/unit/quality/test_module_sizes.py (1)
13-13: LGTM! Reasonable exemption for logging infrastructure.The 410-line exemption for the logging module is justified given its role as core infrastructure with structured logging support. The modest 10-line increase above the default limit is appropriate for this foundational component.
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (2)
36-44: LGTM! Good Windows compatibility pattern.Storing the file path separately and wrapping cleanup in
try/exceptto tolerateOSErrorandPermissionErrorproperly handles Windows file locking issues.
31-509: Excellent Windows compatibility improvements.The systematic changes throughout this test file properly address Windows-specific file handling challenges:
- Separating file paths from file handles prevents handle-related issues
- Guarded cleanup tolerates Windows file locking behavior
- Platform-specific test skip for chmod operations is appropriate
These changes align well with the PR's goal of achieving Windows compatibility.
packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py (1)
63-68: LGTM! Cross-drive path handling implemented correctly.The try/except ValueError appropriately handles Windows cross-drive scenarios where
os.path.relpathcannot compute a relative path between drives (e.g., C:\ and D:). The fallback to using the absolute path is a sensible approach.packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py (2)
92-93: LGTM! Test appropriately relaxed for cross-platform compatibility.The assertion now accepts platform-specific path variations while still validating that the file_directory metadata is correctly extracted. This aligns well with the cross-drive path handling implemented in the source code.
164-165: LGTM! Consistent test relaxation for platform compatibility.This change mirrors the adjustment made in the earlier test, ensuring the error handling test remains valid across different platforms and path configurations.
packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py (1)
57-64: LGTM! Consistent cross-platform path handling.The implementation correctly mirrors the approach used in
git/file_processor.py, ensuring consistent behavior across both Git and LocalFile connectors. The try/except ValueError handles Windows cross-drive scenarios, and the path normalization with forward slashes ensures consistent pattern matching regardless of the platform.packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
148-155: LGTM! Better test strategy for cross-platform compatibility.Switching from
chmodto mockingos.accessis an excellent improvement. This approach works reliably on Windows (wherechmoddoesn't restrict file access in the same way) while still validating that the code correctly handles unreadable files. The comment clearly explains the rationale for this change.packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py (1)
68-76: LGTM! Consistent implementation across connectors.This implementation is identical to the changes in
localfile/file_processor.py, ensuring both Git and LocalFile connectors handle cross-drive paths consistently on Windows. The try/except ValueError pattern is appropriate, and the path normalization ensures downstream matching works correctly regardless of platform path separators.packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py (1)
49-54: Excellent Windows compatibility fix.The change from hardcoded absolute paths to temp_dir-based paths correctly addresses cross-drive issues on Windows. The fallback logic (
mock_config.temp_dir or tempfile.gettempdir()) ensures the fixture works whether or not temp_dir is explicitly configured, andos.path.joinprovides platform-appropriate path construction.packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (3)
52-53: LGTM: Correct API method name.The update from
getLoggertoget_loggeraligns with the LoggingConfig API, and the comment accurately reflects the structured logging capability.
64-74: LGTM: Formatter propagation ensures redaction visibility.The defensive logic correctly propagates the root logger's formatter to the test handler, ensuring that redacted output is captured correctly. The checks for handler and formatter existence are appropriate safeguards.
78-83: LGTM: Correct structured logging format for redaction.The change from printf-style formatting to structured logging with keyword arguments (
token=,api_key=) is the correct approach for enabling field-level redaction. The test fixtures are intentionally sensitive-looking strings to validate redaction behavior.Note: The Gitleaks warnings (lines 81-82) are false positives—these are test fixtures specifically designed to verify that redaction properly masks sensitive values in log output.
packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (2)
158-230: LGTM: Proper resource cleanup structure.The
try-finallyblock ensures the symlink is properly cleaned up even if the test fails, which is particularly important for Windows compatibility where lingering symlinks could cause issues with subsequent test runs.
152-156: Good Windows compatibility fix that aligns with established test patterns.The approach of catching
OSErrorand skipping the test on Windows when symlink privileges are missing is sound and aligns with the PR's Windows compatibility objectives. The error detection using string matching"privilege" in str(e).lower()is consistent with similar symlink-related OSError handling elsewhere in the test suite (e.g.,test_website_build_edge_cases.py:269). Windows error code 1314 (ERROR_PRIVILEGE_NOT_HELD) is confirmed as the expected error for symlink creation privilege issues, and the current approach pragmatically handles it without additional complexity.
packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py
Show resolved
Hide resolved
packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py
Outdated
Show resolved
Hide resolved
packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py
Outdated
Show resolved
Hide resolved
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py (1)
92-110: Good targeted coverage of the helper exception pathPatching
_create_db_dir_helperto raiseOSErrorand asserting the translatedClickException(with the expected message text) gives solid, platform-independent coverage of the failure path. If you ever want to strengthen this further, you could optionally assert that the logger is called with an error message, but the current test is already clear and sufficient.packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)
54-75: Cross-platform temp file handling for PDF conversion looks correctThe switch to capturing
tmp_pathinside theNamedTemporaryFileblock and then using atry/finallywith guardedos.unlink(tmp_path)gives you a Windows-safe pattern: the file handle is closed before conversion, and deletion happens even when the test is skipped. The structure is sound for the Windows locking semantics you’re targeting.One small nit: catching a broad
Exceptionbefore callingpytest.skip(...)will also skip the test on genuine regressions inconvert_file, not just missing/unsupported MarkItDown environments. If you ever want tighter signal here, consider narrowing thatexceptto the specific error(s) that indicate an unavailable conversion environment, or adding a comment that this broad skip is intentional to keep CI noise down.
98-110: Consistent path-based usage and guarded cleanup; consider a small helper to DRY it upAll of these tests now follow the same, Windows-friendly pattern:
- Create a
NamedTemporaryFile(..., delete=False), write data,flush(), and capturetmp_path = tmp_file.nameinside the context.- Use
tmp_pathonly after thewithblock (so the handle is closed beforeconvert_file/is_supported_for_conversion/get_file_type_info/create_fallback_documentrun).- Wrap usage in a
try/finallyand callos.unlink(tmp_path)while suppressingOSErrorandPermissionErrorto avoid spurious failures on locked or already-removed files.This aligns well with the core implementation in
FileConverterandFileDetectorthat expects plain string paths and usesos.path.getsize/detect_file_typeon them, so the integration surface is correct.If you want to reduce duplication, you could factor the
try: os.unlink(tmp_path) except (OSError, PermissionError): passpattern into a small helper (or a fixture/context manager in tests) that owns temp-file cleanup. That’s entirely optional but would make future changes to cleanup behavior (e.g., adding logging) easier.Also applies to: 176-186, 193-225, 232-249
packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1)
176-183: Resolved‑path assertions look correct; innerPathimport is redundantUsing
Path(...).resolve()for bothactual_pathandexpected_path, and asserting the resolved workspace directory is amongPath(actual_path).resolve().parents, is a solid way to avoid Windows short‑path issues while still verifying the DB lives under the workspace.Minor cleanup:
Pathis already imported at the top of this file (Line [5]), so the localfrom pathlib import Pathinside the test is redundant and can be removed.- # Normalize paths for comparison (handles Windows short paths) - from pathlib import Path - - assert Path(actual_path).resolve() == Path(expected_path).resolve() + # Normalize paths for comparison (handles Windows short paths) + assert Path(actual_path).resolve() == Path(expected_path).resolve()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
packages/qdrant-loader-core/tests/unit/quality/test_module_sizes.py(1 hunks)packages/qdrant-loader-mcp-server/tests/integration/test_phase1_2_simple_integration.py(2 hunks)packages/qdrant-loader-mcp-server/tests/unit/search/test_topic_search_chain.py(1 hunks)packages/qdrant-loader/pyproject.toml(1 hunks)packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py(1 hunks)packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py(2 hunks)packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py(1 hunks)packages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.py(1 hunks)packages/qdrant-loader/tests/integration/test_file_conversion_integration.py(3 hunks)packages/qdrant-loader/tests/unit/cli/test_cli.py(1 hunks)packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_workspace_integration.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py(2 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py(2 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py(1 hunks)packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py(10 hunks)packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py(1 hunks)packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- packages/qdrant-loader-mcp-server/tests/unit/search/test_topic_search_chain.py
🚧 Files skipped from review as they are similar to previous changes (9)
- packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py
- packages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.py
- packages/qdrant-loader-core/tests/unit/quality/test_module_sizes.py
- packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py
- packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py
- packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py
- packages/qdrant-loader/tests/unit/cli/test_cli.py
- packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py
- packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/back-end-dev.mdc)
**/*.py: Use Python 3.x for back-end implementation and code examples
When building LLM/AI workflows, prefer and follow best practices of PydanticAI or LangChain
Use Qdrant as the vector database with performant indexing and query patterns
Design and implement RESTful endpoints with clear resource modeling and proper status codes
Implement authentication/authorization using OAuth2 or JWT; follow secure session and token handling
Files:
packages/qdrant-loader/tests/unit/utils/test_logging_redaction.pypackages/qdrant-loader/tests/integration/test_file_conversion_integration.pypackages/qdrant-loader-mcp-server/tests/integration/test_phase1_2_simple_integration.pypackages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.pypackages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.pypackages/qdrant-loader/tests/unit/config/test_workspace_integration.pypackages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.pypackages/qdrant-loader/tests/unit/connectors/git/test_git_connector.pypackages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py
🧠 Learnings (1)
📚 Learning: 2025-08-12T09:02:25.128Z
Learnt from: martin-papy
Repo: martin-papy/qdrant-loader PR: 43
File: docs/developers/testing/README.md:36-40
Timestamp: 2025-08-12T09:02:25.128Z
Learning: Both packages/qdrant-loader and packages/qdrant-loader-mcp-server directories contain pyproject.toml files at their root level, making them installable with pip install -e for editable development installations.
Applied to files:
packages/qdrant-loader/pyproject.toml
🧬 Code graph analysis (7)
packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (2)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/utils/logging.py (1)
get_logger(129-132)packages/qdrant-loader-core/tests/test_logging_core.py (1)
TestHandler(18-22)
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_converter.py (2)
convert_file(396-436)create_fallback_document(456-472)packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_detector.py (2)
is_supported_for_conversion(152-200)get_file_type_info(202-240)
packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (2)
packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
temp_dir(18-21)packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1)
temp_dir(23-27)
packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py (1)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/components/models/hybrid.py (1)
file_path(60-61)
packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1)
packages/qdrant-loader/tests/unit/config/test_workspace.py (3)
temp_workspace(22-36)temp_workspace(141-150)temp_workspace(222-231)
packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py (4)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/components/models/hybrid.py (1)
file_path(60-61)packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
temp_dir(18-21)packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1)
temp_dir(23-27)packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1)
temp_dir(18-29)
packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py (2)
packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (2)
mock_repo(31-35)temp_dir(23-27)packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
temp_dir(18-21)
🪛 Gitleaks (8.29.1)
packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py
[high] 81-81: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
[high] 82-82: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🔇 Additional comments (8)
packages/qdrant-loader-mcp-server/tests/integration/test_phase1_2_simple_integration.py (1)
309-318: Verify that the 3.3x timing relaxation doesn't mask regressions.The end-to-end timing threshold increased from 1500ms to 5000ms for CI variance with real spaCy models. While this supports Windows compatibility and reduces flaky tests, ensure that genuine performance regressions in the search chaining logic would still be caught by other means (e.g., benchmarking in controlled environments, performance monitoring).
packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py (2)
6-6: Past review addressed - import correctly placed at module level.The
tempfileimport is now at module level as requested in the previous review, following PEP 8 conventions with proper alphabetical ordering among standard library imports.
44-58: LGTM! Windows-compatible path generation implemented correctly.The fixture changes properly address Windows compatibility:
- Adding
mock_configparameter allows accessingtemp_dirconfiguration- Using
temp_dir or tempfile.gettempdir()generates Windows-safe paths that avoid cross-drive issues (e.g., mixing C:\ and D:\ paths)- The comment on line 49 clearly explains the purpose
- All test methods that use
mock_git_opsalso requestmock_config, satisfying the dependency- Tests verify file names (line 109) rather than full paths, so the temp_dir-based approach is appropriate
This aligns with the PR's objective of fixing Windows compatibility issues across the test suite.
packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (2)
52-53: Good switch toLoggingConfig.get_loggerfor structured loggingUsing
LoggingConfig.get_logger("stdlib.logger")(instead of a plain stdlib logger) keeps this test aligned with the real structured logging/redaction pipeline configured byLoggingConfig.setup, so you’re validating the right behavior.
64-74: Formatter reuse pattern is correct and robustCopying the formatter from the first existing root handler (when present) onto
test_handleris a simple and effective way to keep test output formatting consistent with production logging, while safely falling back to the default formatter if no handlers/formatters exist. No functional issues here.packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py (1)
57-71: Cross‑drive relpath handling and normalization now behave safelyCatching
ValueErrorfromos.path.relpathand explicitly skipping the file with a warning avoids the previous bug where falling back to an absolute path broke include/exclude pattern matching. Normalizingrel_pathto forward slashes before all pattern checks keeps behavior consistent across platforms. As long as “skip cross‑drive files” is acceptable for the LocalFile connector, this looks solid.packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1)
141-157: Symlink privilege handling on Windows is robust and test‑friendlyWrapping
symlink_dir.symlink_to(temp_dir)in a try/except and skipping the test when the error message indicates missing symlink privileges on Windows prevents spurious failures for non‑admin users while still surfacing real errors on other platforms. Cleanup via the existingfinallyremains intact. This is a good trade‑off for cross‑platform CI.packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py (1)
68-82: Git file processor cross‑drive handling aligns with LocalFile and avoids pattern bugsThe new try/except around
os.path.relpath(file_path, self.temp_dir)correctly handles Windows cross‑driveValueErrorby logging a warning and returningFalseinstead of feeding an absolute path into the pattern logic. Normalizingrel_pathto forward slashes before matching keeps include/exclude semantics consistent across platforms and with the LocalFile processor.
packages/qdrant-loader-mcp-server/tests/integration/test_phase1_2_simple_integration.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/qdrant-loader/tests/unit/core/file_conversion/test_markitdown_windows_fix.py (1)
19-41: Add cleanup to restore signal module state and prevent test isolation issues.Multiple tests (
test_windows_signal_monkey_patch,test_signal_alarm_no_op_windows) modify the globalsignalmodule by deleting attributes (SIGALRM,alarm) without restoring the original state. This violates test isolation and can cause flaky failures if tests run in different orders or if other tests depend on these attributes.Consider using pytest fixtures or setUp/tearDown to capture and restore the signal module state:
import pytest @pytest.fixture def preserve_signal_state(): """Preserve and restore signal module attributes.""" import signal original_sigalrm = getattr(signal, 'SIGALRM', None) original_alarm = getattr(signal, 'alarm', None) yield # Restore original state if original_sigalrm is not None: signal.SIGALRM = original_sigalrm elif hasattr(signal, 'SIGALRM'): delattr(signal, 'SIGALRM') if original_alarm is not None: signal.alarm = original_alarm elif hasattr(signal, 'alarm'): delattr(signal, 'alarm')Then apply the fixture to affected tests:
def test_windows_signal_monkey_patch(self, preserve_signal_state): # test implementationAlso applies to: 116-139
🧹 Nitpick comments (9)
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py (1)
75-82: Windows skip is correct; consider a decorator + feature-detection for styleThe inline
sys.platform == "win32"check and earlypytest.skip(...)are a correct way to avoidsignal.alarm/SIGALRMissues on Windows and keep the test green.If you want to tighten this up and keep test intent visible at collection time, consider:
- Moving
import systo the module level, and- Using a decorator with feature detection, e.g.:
import sys import signal @pytest.mark.skipif( sys.platform == "win32" or not hasattr(signal, "alarm"), reason="signal.alarm not available on this platform", ) def test_timeout_handler_context_manager(self): ...This preserves the behavior while making the platform requirement explicit in the test metadata.
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/enhanced/__init__.py (1)
27-29: Import reorganization unrelated to Windows compatibility.This change moves
SimilarityMetricto a separate import block but doesn't appear connected to the Windows compatibility fixes described in the PR objectives (stdin handling, signal handlers, timing assertions, file handle cleanup). The reorganization is harmless and doesn't affect functionality or the public API.Consider moving purely stylistic changes like this to a separate cleanup PR to keep Windows compatibility fixes focused.
packages/qdrant-loader/tests/unit/connectors/jira/test_jira_connector.py (1)
8-9: Remove redundant local import at line 300.The top-level import of
HTTPErroris now available throughout the module. The local import at line 300 withintest_error_handlingis redundant and should be removed to avoid duplication.Apply this diff to remove the duplicate import:
async def mock_make_request(*args, **kwargs): import requests - from requests.exceptions import HTTPError response = requests.Response()packages/qdrant-loader/tests/unit/core/test_attachment_downloader.py (1)
10-10: LGTM: Minor formatting improvement.The blank line improves import grouping by separating third-party imports from local project imports, aligning with PEP 8 conventions.
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (1)
44-250: Consider extracting the temp file pattern into a helper fixture.The pattern of creating a temp file, storing the path, closing it, processing, and cleaning up is repeated across five test methods. While the current implementation is correct, you could reduce duplication by creating a pytest fixture or context manager:
@pytest.fixture def temp_file_path(): """Provide a Windows-safe temporary file path.""" def _create(suffix="", content=b""): with tempfile.NamedTemporaryFile( mode="wb", suffix=suffix, delete=False ) as tmp_file: tmp_file.write(content) tmp_file.flush() tmp_path = tmp_file.name try: yield tmp_path finally: try: os.unlink(tmp_path) except (OSError, PermissionError): pass return _createThis is optional and can be deferred if you prefer explicit test logic.
packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py (1)
315-347: Average-conversion-time property usage looks correct; consider a small extra assertionThe comment and expectation that
average_conversion_timeis derived fromtotal_conversion_time / total_files_processed(45.0 / 10 = 4.5) match the new property-based behavior and the summary assertion. As a small enhancement, you could also assertmonitor.conversion_metrics.average_conversion_time == 4.5before callingget_conversion_summary()to directly validate the property as well as the summary.packages/qdrant-loader/tests/unit/core/file_conversion/test_markitdown_windows_fix.py (2)
43-82: Consider testing with real MarkItDown to verify signal error prevention.The test mocks both
_validate_fileand_get_markitdown, which means the actual MarkItDown conversion code that might encounter signal issues is never executed. This limits the test's effectiveness in verifying that Windows signal errors are actually prevented.If feasible, consider a test approach that exercises the real MarkItDown path:
def test_markitdown_signal_error_prevention(self): """Test that MarkItDown signal errors are prevented on Windows.""" config = FileConversionConfig( max_file_size=50 * 1024 * 1024, conversion_timeout=300, markitdown=MarkItDownConfig(), ) converter = FileConverter(config) # Create a minimal valid PDF or use a fixture with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp_file: # Write minimal valid PDF content tmp_file.write(b"%PDF-1.4\n%EOF") tmp_file.flush() tmp_path = Path(tmp_file.name) try: with patch("sys.platform", "win32"): # Let it use real MarkItDown to catch actual signal errors result = converter.convert_file(str(tmp_path)) # Just verify no signal-related exception was raised assert result is not None finally: tmp_path.unlink(missing_ok=True)Alternatively, if MarkItDown requires complex setup, at minimum mock only
_validate_fileto let the MarkItDown instantiation code run, which is where signal-related issues might occur.
19-139: Consider more explicit control over monkey-patching for robust testing.The tests rely on import-time monkey-patching via
importlib.reload()andpatch("sys.platform"). This approach can be fragile because:
- Platform checks may have already occurred in previously-imported modules
importlib.reload()doesn't reload all dependent modules and can leave stale references- Import-time side effects are harder to control and reason about in tests
Consider refactoring the monkey-patch logic to be explicitly callable:
# In file_converter.py def apply_windows_signal_patch(): """Apply Windows signal compatibility patch.""" if sys.platform == "win32": import signal if not hasattr(signal, 'SIGALRM'): signal.SIGALRM = 14 if not hasattr(signal, 'alarm'): signal.alarm = lambda x: None # Then in tests: def test_windows_signal_monkey_patch(self): """Test that Windows signal patch can be applied.""" import signal # Remove attributes to simulate Windows if hasattr(signal, "SIGALRM"): delattr(signal, "SIGALRM") # Explicitly apply patch from qdrant_loader.core.file_conversion.file_converter import apply_windows_signal_patch with patch("sys.platform", "win32"): apply_windows_signal_patch() # Verify assert hasattr(signal, "SIGALRM")This gives you explicit control over when the patch is applied without relying on import-time execution or module reloading.
packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py (1)
165-166: Async mock conversion for scroll method is correct.The conversion to
AsyncMockmaintains consistency with thesearchmethod and correctly returns the expected tuple format(list_of_points, next_page_offset).For consistency with the suggestion on lines 150-151, consider using the attribute configuration pattern:
- # scroll method returns tuple: (list_of_points, next_page_offset) - must be async - mock_client.scroll = AsyncMock(return_value=(mock_scroll_results, None)) + # scroll method returns tuple: (list_of_points, next_page_offset) - must be async + mock_client.scroll.return_value = (mock_scroll_results, None)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (105)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/enhanced/__init__.py(1 hunks)packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py(2 hunks)packages/qdrant-loader/src/qdrant_loader/core/state/models.py(1 hunks)packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py(1 hunks)packages/qdrant-loader/src/qdrant_loader/core/text_processing/topic_modeler.py(1 hunks)packages/qdrant-loader/tests/conftest.py(1 hunks)packages/qdrant-loader/tests/integration/connectors/publicdocs/test_publicdocs_integration.py(1 hunks)packages/qdrant-loader/tests/integration/core/chunking/test_chunking_integration.py(1 hunks)packages/qdrant-loader/tests/integration/core/chunking/test_modular_chunking_integration.py(1 hunks)packages/qdrant-loader/tests/integration/core/state/test_state_management.py(1 hunks)packages/qdrant-loader/tests/integration/core/test_chunking_consistency.py(1 hunks)packages/qdrant-loader/tests/integration/core/test_pipeline_integration.py(1 hunks)packages/qdrant-loader/tests/integration/test_file_conversion_integration.py(4 hunks)packages/qdrant-loader/tests/scripts/check_metadata.py(1 hunks)packages/qdrant-loader/tests/scripts/check_nested_metadata.py(1 hunks)packages/qdrant-loader/tests/scripts/query_qdrant.py(1 hunks)packages/qdrant-loader/tests/unit/cli/test_asyncio.py(1 hunks)packages/qdrant-loader/tests/unit/cli/test_cli.py(2 hunks)packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py(2 hunks)packages/qdrant-loader/tests/unit/cli/test_cli_comprehensive.py(1 hunks)packages/qdrant-loader/tests/unit/cli/test_project_commands.py(1 hunks)packages/qdrant-loader/tests/unit/cli/test_project_commands_coverage.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_base.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_chunking_strategy_configs.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_config_loader.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_config_module.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_global_config.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_llm_descriptions_config.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_models.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_parser.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_qdrant_config.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_settings_qdrant.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_source_config.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_template_config.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_test_template_config.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_traditional_config.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_validator.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_workspace.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_workspace_integration.py(2 hunks)packages/qdrant-loader/tests/unit/connectors/confluence/test_confluence_connector.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_adapter.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py(2 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py(2 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py(2 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_git_windows_compatibility.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py(3 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor_enhanced.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/jira/test_jira_connector.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_file_processor.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py(2 hunks)packages/qdrant-loader/tests/unit/connectors/localfile/test_windows_path_handling.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_config.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_connector.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_content_extraction.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py(2 hunks)packages/qdrant-loader/tests/unit/connectors/shared/test_http_policy.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/shared/test_rate_limiter.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/test_base.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/test_base_connector.py(1 hunks)packages/qdrant-loader/tests/unit/core/chunking/strategy/base/test_base_chunk_processor.py(1 hunks)packages/qdrant-loader/tests/unit/core/chunking/strategy/base/test_base_document_parser.py(1 hunks)packages/qdrant-loader/tests/unit/core/chunking/strategy/base/test_base_metadata_extractor.py(1 hunks)packages/qdrant-loader/tests/unit/core/chunking/strategy/base/test_base_section_splitter.py(1 hunks)packages/qdrant-loader/tests/unit/core/chunking/strategy/default/test_default_strategy_modular.py(1 hunks)packages/qdrant-loader/tests/unit/core/chunking/strategy/default/test_text_document_parser.py(1 hunks)packages/qdrant-loader/tests/unit/core/chunking/strategy/default/test_text_metadata_extractor.py(1 hunks)packages/qdrant-loader/tests/unit/core/chunking/strategy/json/test_json_section_splitter.py(1 hunks)packages/qdrant-loader/tests/unit/core/chunking/strategy/test_base_strategy.py(1 hunks)packages/qdrant-loader/tests/unit/core/chunking/strategy/test_code_strategy_modern.py(1 hunks)packages/qdrant-loader/tests/unit/core/chunking/strategy/test_default_strategy.py(1 hunks)packages/qdrant-loader/tests/unit/core/chunking/strategy/test_html_strategy_modern.py(1 hunks)packages/qdrant-loader/tests/unit/core/chunking/strategy/test_json_strategy_modern.py(1 hunks)packages/qdrant-loader/tests/unit/core/chunking/strategy/test_markdown_strategy.py(1 hunks)packages/qdrant-loader/tests/unit/core/chunking/test_chunking_service.py(1 hunks)packages/qdrant-loader/tests/unit/core/embedding/test_embedding_service.py(1 hunks)packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py(11 hunks)packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py(2 hunks)packages/qdrant-loader/tests/unit/core/file_conversion/test_file_detector.py(1 hunks)packages/qdrant-loader/tests/unit/core/file_conversion/test_markitdown_api_key.py(1 hunks)packages/qdrant-loader/tests/unit/core/file_conversion/test_markitdown_windows_fix.py(1 hunks)packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py(4 hunks)packages/qdrant-loader/tests/unit/core/monitoring/test_prometheus_metrics.py(1 hunks)packages/qdrant-loader/tests/unit/core/monitoring/test_resource_monitor.py(1 hunks)packages/qdrant-loader/tests/unit/core/pipeline/test_async_ingestion_pipeline.py(1 hunks)packages/qdrant-loader/tests/unit/core/pipeline/test_document_pipeline.py(1 hunks)packages/qdrant-loader/tests/unit/core/pipeline/test_orchestrator.py(1 hunks)packages/qdrant-loader/tests/unit/core/pipeline/test_resource_manager.py(1 hunks)packages/qdrant-loader/tests/unit/core/pipeline/test_source_filter.py(1 hunks)packages/qdrant-loader/tests/unit/core/pipeline/test_source_processor.py(1 hunks)packages/qdrant-loader/tests/unit/core/pipeline/workers/test_chunking_worker.py(1 hunks)packages/qdrant-loader/tests/unit/core/pipeline/workers/test_embedding_worker.py(1 hunks)packages/qdrant-loader/tests/unit/core/pipeline/workers/test_upsert_worker.py(1 hunks)packages/qdrant-loader/tests/unit/core/state/test_document_state_manager.py(1 hunks)packages/qdrant-loader/tests/unit/core/state/test_state_change_detector.py(1 hunks)packages/qdrant-loader/tests/unit/core/state/test_state_manager.py(1 hunks)packages/qdrant-loader/tests/unit/core/test_attachment_downloader.py(1 hunks)packages/qdrant-loader/tests/unit/core/test_embedding_service.py(1 hunks)packages/qdrant-loader/tests/unit/core/test_init_collection.py(1 hunks)packages/qdrant-loader/tests/unit/core/test_project_manager.py(1 hunks)packages/qdrant-loader/tests/unit/core/test_qdrant_manager.py(1 hunks)packages/qdrant-loader/tests/unit/core/text_processing/test_semantic_analyzer.py(1 hunks)packages/qdrant-loader/tests/unit/core/text_processing/test_text_processor.py(1 hunks)packages/qdrant-loader/tests/unit/core/text_processing/test_topic_modeler.py(1 hunks)packages/qdrant-loader/tests/unit/utils/test_version_check.py(1 hunks)packages/qdrant-loader/tests/unit/utils/test_version_check_comprehensive.py(1 hunks)
✅ Files skipped from review due to trivial changes (70)
- packages/qdrant-loader/tests/unit/core/pipeline/test_resource_manager.py
- packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_content_extraction.py
- packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_config.py
- packages/qdrant-loader/tests/unit/core/chunking/strategy/test_json_strategy_modern.py
- packages/qdrant-loader/tests/unit/config/test_source_config.py
- packages/qdrant-loader/tests/unit/config/test_test_template_config.py
- packages/qdrant-loader/tests/scripts/check_metadata.py
- packages/qdrant-loader/tests/unit/config/test_workspace.py
- packages/qdrant-loader/tests/unit/core/chunking/strategy/test_code_strategy_modern.py
- packages/qdrant-loader/tests/unit/config/test_config_loader.py
- packages/qdrant-loader/tests/unit/core/test_init_collection.py
- packages/qdrant-loader/tests/unit/core/test_embedding_service.py
- packages/qdrant-loader/tests/unit/config/test_settings_qdrant.py
- packages/qdrant-loader/tests/unit/core/pipeline/test_async_ingestion_pipeline.py
- packages/qdrant-loader/tests/unit/core/test_qdrant_manager.py
- packages/qdrant-loader/tests/integration/core/test_pipeline_integration.py
- packages/qdrant-loader/tests/unit/core/pipeline/test_document_pipeline.py
- packages/qdrant-loader/tests/unit/config/test_global_config.py
- packages/qdrant-loader/tests/unit/core/pipeline/workers/test_chunking_worker.py
- packages/qdrant-loader/tests/unit/connectors/shared/test_http_policy.py
- packages/qdrant-loader/tests/unit/core/pipeline/test_source_processor.py
- packages/qdrant-loader/tests/unit/core/chunking/test_chunking_service.py
- packages/qdrant-loader/tests/integration/core/state/test_state_management.py
- packages/qdrant-loader/tests/unit/core/state/test_state_change_detector.py
- packages/qdrant-loader/tests/unit/config/test_chunking_strategy_configs.py
- packages/qdrant-loader/tests/unit/connectors/shared/test_rate_limiter.py
- packages/qdrant-loader/tests/unit/core/pipeline/test_orchestrator.py
- packages/qdrant-loader/tests/unit/core/chunking/strategy/base/test_base_chunk_processor.py
- packages/qdrant-loader/tests/unit/core/state/test_document_state_manager.py
- packages/qdrant-loader/tests/unit/core/chunking/strategy/test_base_strategy.py
- packages/qdrant-loader/tests/unit/utils/test_version_check.py
- packages/qdrant-loader/tests/integration/core/test_chunking_consistency.py
- packages/qdrant-loader/tests/unit/connectors/test_base_connector.py
- packages/qdrant-loader/tests/unit/core/chunking/strategy/default/test_text_metadata_extractor.py
- packages/qdrant-loader/tests/unit/config/test_base.py
- packages/qdrant-loader/tests/unit/core/chunking/strategy/test_markdown_strategy.py
- packages/qdrant-loader/tests/unit/cli/test_project_commands.py
- packages/qdrant-loader/tests/unit/core/chunking/strategy/default/test_default_strategy_modular.py
- packages/qdrant-loader/tests/unit/core/pipeline/workers/test_embedding_worker.py
- packages/qdrant-loader/tests/unit/core/chunking/strategy/test_default_strategy.py
- packages/qdrant-loader/tests/scripts/check_nested_metadata.py
- packages/qdrant-loader/tests/unit/core/text_processing/test_semantic_analyzer.py
- packages/qdrant-loader/tests/unit/config/test_llm_descriptions_config.py
- packages/qdrant-loader/tests/unit/core/text_processing/test_text_processor.py
- packages/qdrant-loader/tests/unit/core/text_processing/test_topic_modeler.py
- packages/qdrant-loader/tests/unit/utils/test_version_check_comprehensive.py
- packages/qdrant-loader/tests/unit/connectors/confluence/test_confluence_connector.py
- packages/qdrant-loader/tests/unit/config/test_parser.py
- packages/qdrant-loader/tests/unit/core/chunking/strategy/json/test_json_section_splitter.py
- packages/qdrant-loader/tests/unit/cli/test_cli_comprehensive.py
- packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_file_processor.py
- packages/qdrant-loader/tests/unit/core/embedding/test_embedding_service.py
- packages/qdrant-loader/tests/unit/config/test_template_config.py
- packages/qdrant-loader/tests/unit/core/chunking/strategy/test_html_strategy_modern.py
- packages/qdrant-loader/tests/unit/core/chunking/strategy/base/test_base_metadata_extractor.py
- packages/qdrant-loader/tests/unit/config/test_traditional_config.py
- packages/qdrant-loader/tests/unit/config/test_config_module.py
- packages/qdrant-loader/tests/unit/connectors/git/test_git_windows_compatibility.py
- packages/qdrant-loader/tests/conftest.py
- packages/qdrant-loader/tests/unit/core/monitoring/test_resource_monitor.py
- packages/qdrant-loader/tests/unit/core/monitoring/test_prometheus_metrics.py
- packages/qdrant-loader/tests/unit/core/file_conversion/test_markitdown_api_key.py
- packages/qdrant-loader/tests/unit/cli/test_project_commands_coverage.py
- packages/qdrant-loader/tests/integration/core/chunking/test_modular_chunking_integration.py
- packages/qdrant-loader/tests/unit/core/chunking/strategy/base/test_base_document_parser.py
- packages/qdrant-loader/tests/unit/core/pipeline/workers/test_upsert_worker.py
- packages/qdrant-loader/tests/scripts/query_qdrant.py
- packages/qdrant-loader/src/qdrant_loader/core/text_processing/topic_modeler.py
- packages/qdrant-loader/tests/unit/core/chunking/strategy/base/test_base_section_splitter.py
- packages/qdrant-loader/src/qdrant_loader/core/state/models.py
🚧 Files skipped from review as they are similar to previous changes (8)
- packages/qdrant-loader/tests/unit/config/test_workspace_integration.py
- packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py
- packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py
- packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py
- packages/qdrant-loader/tests/unit/cli/test_cli.py
- packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py
- packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py
- packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/back-end-dev.mdc)
**/*.py: Use Python 3.x for back-end implementation and code examples
When building LLM/AI workflows, prefer and follow best practices of PydanticAI or LangChain
Use Qdrant as the vector database with performant indexing and query patterns
Design and implement RESTful endpoints with clear resource modeling and proper status codes
Implement authentication/authorization using OAuth2 or JWT; follow secure session and token handling
Files:
packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.pypackages/qdrant-loader/tests/unit/connectors/test_base.pypackages/qdrant-loader/tests/unit/core/test_attachment_downloader.pypackages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_connector.pypackages/qdrant-loader/tests/unit/connectors/localfile/test_windows_path_handling.pypackages/qdrant-loader/tests/unit/config/test_qdrant_config.pypackages/qdrant-loader/tests/integration/test_file_conversion_integration.pypackages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.pypackages/qdrant-loader/tests/unit/cli/test_asyncio.pypackages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor_enhanced.pypackages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.pypackages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.pypackages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.pypackages/qdrant-loader/tests/unit/core/chunking/strategy/default/test_text_document_parser.pypackages/qdrant-loader/tests/unit/config/test_models.pypackages/qdrant-loader/tests/integration/core/chunking/test_chunking_integration.pypackages/qdrant-loader/tests/unit/core/pipeline/test_source_filter.pypackages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/enhanced/__init__.pypackages/qdrant-loader/tests/unit/core/file_conversion/test_file_detector.pypackages/qdrant-loader/tests/unit/core/file_conversion/test_markitdown_windows_fix.pypackages/qdrant-loader/tests/unit/connectors/jira/test_jira_connector.pypackages/qdrant-loader/tests/unit/core/state/test_state_manager.pypackages/qdrant-loader/tests/unit/core/test_project_manager.pypackages/qdrant-loader/tests/integration/connectors/publicdocs/test_publicdocs_integration.pypackages/qdrant-loader/tests/unit/config/test_validator.pypackages/qdrant-loader/tests/unit/connectors/git/test_adapter.pypackages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py
🧬 Code graph analysis (3)
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_converter.py (2)
convert_file(396-436)create_fallback_document(456-472)packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_detector.py (2)
is_supported_for_conversion(152-200)get_file_type_info(202-240)
packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py (2)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/hybrid/engine.py (1)
search(114-121)packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/hybrid/api.py (1)
search(71-94)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/enhanced/__init__.py (1)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/enhanced/cdi/models.py (1)
SimilarityMetric(16-25)
🪛 GitHub Actions: Test and Coverage
packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py
[error] 1-1: TestRealEndToEndPhase2_3::test_real_find_similar_documents_integration failed due to attribute error in underlying vector search integration (AsyncQdrantClient.search missing). See full pytest run: 'pytest tests/ --cov=src --cov-report=xml:../../coverage-mcp.xml --cov-report=html:../../htmlcov-mcp -v'.
🔇 Additional comments (26)
packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor_enhanced.py (1)
8-13: Non-functional import spacing change is fineThe added blank line after the
HttpUrlimport is a harmless formatting tweak and does not affect test behavior. No further changes needed.packages/qdrant-loader/tests/unit/core/chunking/strategy/default/test_text_document_parser.py (1)
4-4: Formatting-only change is fineThe added blank line after
import pytestis a harmless formatting tweak and keeps imports readable; no further changes needed.packages/qdrant-loader/tests/unit/core/test_project_manager.py (1)
9-10: AsyncSession import placement and usage look correctImporting
AsyncSessionfromsqlalchemy.ext.asyncioand using it as aspecforAsyncMockis the right pattern for async SQLAlchemy sessions; consolidating it in the main import block improves clarity and avoids duplication.If you want to double‑check there’s no stray/duplicate
AsyncSessionimport left elsewhere in the file or project, a quick text/AST search in the repo would be sufficient. Based on library context, this import path matches current SQLAlchemy async APIs.packages/qdrant-loader/tests/unit/connectors/test_base.py (1)
5-7: LGTM!The added blank line correctly separates third-party imports from local project imports, following PEP 8 import grouping conventions.
packages/qdrant-loader/tests/unit/connectors/git/test_adapter.py (1)
7-10: Import grouping / whitespace change is fineThe added blank line cleanly separates third‑party imports from the local GitPythonAdapter import and has no functional impact. Looks good to keep as‑is.
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_detector.py (1)
9-11: Formatting-only import change looks goodThe added blank line cleanly separates third-party (
pytest) from local imports and has no behavioral impact. No further changes needed.packages/qdrant-loader/tests/unit/core/pipeline/test_source_filter.py (1)
6-6: LGTM! Formatting improvement aligns with PEP 8.The blank line addition follows Python's PEP 8 style guide for separating import groups (standard library/third-party imports from project imports), improving readability.
packages/qdrant-loader/tests/unit/config/test_qdrant_config.py (1)
5-5: LGTM!The blank line correctly separates third-party imports from local project imports, following PEP 8 conventions.
packages/qdrant-loader/tests/integration/core/chunking/test_chunking_integration.py (1)
6-7: Import spacing change is fineThe added blank line after
import pytestkeeps imports visually grouped and does not affect behavior. Looks good as-is.packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_connector.py (1)
7-18: Import grouping change is fineThe added blank line cleanly separates third‑party imports from local
qdrant_loaderimports; no behavioral or compatibility impact. Nothing else to change here.packages/qdrant-loader/tests/unit/cli/test_asyncio.py (1)
7-7: LGTM!Blank line properly separates third-party imports from local imports, following PEP 8 conventions.
packages/qdrant-loader/tests/unit/config/test_validator.py (1)
6-6: Minor formatting improvement aligns with PEP 8.This blank line appropriately separates standard/third-party imports from project-specific imports, following Python style conventions.
packages/qdrant-loader/tests/unit/config/test_models.py (1)
6-6: LGTM!The blank line appropriately separates third-party imports from local imports, following PEP 8 conventions.
packages/qdrant-loader/tests/unit/connectors/localfile/test_windows_path_handling.py (1)
5-6: LGTM!The blank line correctly separates third-party imports from local project imports per PEP 8 conventions.
packages/qdrant-loader/tests/integration/connectors/publicdocs/test_publicdocs_integration.py (1)
10-11: LGTM!The
SourceTypeimport is required for lines 45 and 65 where it's used in the test configuration. The blank line correctly groups imports.packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py (2)
5-6: LGTM!Proper import grouping following PEP 8 conventions.
145-154: Good improvement over the previous substring matching.The
startswithcheck addresses the concern from the past review about overly lenient substring matching. This correctly rejects cases where expected text appears mid-string (e.g.,"Extra Malformed Title") while still accommodating BeautifulSoup's behavior of including trailing content from unclosed tags.The added comments clearly document the expected behavior, which helps future maintainers understand the design choice.
packages/qdrant-loader/tests/unit/core/state/test_state_manager.py (1)
13-14: LGTM!The import is correctly placed and properly aliased to distinguish from
sqlite3.OperationalError. The exception is appropriately caught intest_initialization_error(line 88) to handle database operational errors from SQLAlchemy.packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py (1)
5-7: Dependency verified and issue resolved.The import statement on line 5 (
from langchain_text_splitters import RecursiveCharacterTextSplitter) is now properly supported by the dependency declaration inpyproject.tomlat line 47:langchain-text-splitters>=0.3.9. The past concern about a missing dependency has been addressed in this PR.packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py (1)
101-110: Improved cross-platform test reliability.The refactoring from using an actual invalid path to mocking
_create_db_dir_helpereliminates platform-dependent filesystem behavior and makes the test deterministic. This aligns well with the PR's Windows compatibility objectives, as path permission handling can vary significantly across platforms.packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)
44-76: Windows-safe temporary file handling implemented correctly.The test now stores the file path, closes the file before processing, and cleans up in a finally block with appropriate exception handling. This prevents file locking issues on Windows where open files cannot be deleted.
189-226: Enhanced test coverage with Windows-safe file handling.Good addition of comprehensive assertions validating all fields returned by
get_file_type_info, including field presence checks and specific value verification for PDF files.packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py (2)
9-11: Import and spacing change is fineThe import block (including the blank line after
pytest) is clean and has no behavioral impact; it matches typical test-module style.
108-127: Relaxedconversion_timeassertion avoids timing flakinessAsserting
conversion_time is not Noneand thenconversion_time >= 0is appropriate and makes the test robust when conversions complete so quickly that the measured duration is0.0.packages/qdrant-loader/tests/unit/core/file_conversion/test_markitdown_windows_fix.py (1)
8-12: LGTM! Import additions align with standardized config usage.The added imports for
FileConversionConfigandMarkItDownConfigare necessary for the test methods and follow standard import organization.packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py (1)
150-151: I'll help you verify the review comment about the async mock conversion. Let me start by examining the test file and the relevant implementation code.
<function_calls>
#!/bin/bashFirst, let's examine the test file to understand the context around lines 150-151
FILE="packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py"
Get line count first
wc -l "$FILE"
Then examine the relevant section (lines 120-210)
sed -n '120,210p' "$FILE" | cat -n
</function_calls><function_calls>
#!/bin/bashSearch for qdrant_client usage in the SearchEngine to understand how it's accessed
rg -n --type=py 'qdrant_client.' packages/qdrant-loader-mcp-server/src/ -A 2 -B 2 | head -100
</function_calls><function_calls>
#!/bin/bashLet's find the SearchEngine class and see how qdrant_client is used
find packages/qdrant-loader-mcp-server/src -name "*.py" -type f | head -20
</function_calls><function_calls>
#!/bin/bashSearch for SearchEngine class definition
rg -l "class SearchEngine" packages/qdrant-loader-mcp-server/
</function_calls><function_calls>
#!/bin/bashGet the test file and SearchEngine implementation
echo "=== TEST FILE ==="
sed -n '120,210p' packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py | cat -necho ""
echo "=== SEARCHING FOR SearchEngine CLASS ==="
rg -n "class SearchEngine" packages/qdrant-loader-mcp-server/src/ -A 30
</function_calls><function_calls>
#!/bin/bashLet's first check if the test file exists and has the expected content
if [ -f "packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py" ]; then
echo "Test file exists"
wc -l "packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py"
else
echo "Test file not found"
find . -name "test_real_end_to_end" -type f 2>/dev/null
fi
</function_calls><function_calls>
#!/bin/bashSearch for any test files with similar names
find . -path "/tests/" -name "test.py" -type f 2>/dev/null | grep -i "end_to_end|phase" | head -10
</function_calls><function_calls>
#!/bin/bashLet's check what exists in the packages directory
ls -la packages/ 2>/dev/null || echo "packages dir not found"
Try finding qdrant-loader-mcp-server
find . -type d -name "qdrantloader*" 2>/dev/null | head -5
</function_calls><function_calls>
#!/bin/bashList all Python test files to understand structure
find . -name "test_*.py" -type f 2>/dev/null | head -20
</function_calls><function_calls>
#!/bin/bashCheck git ls-files to see the actual repo structure
git ls-files | grep -i "test.*phase|end_to_end" | head -20
</function_calls><function_calls>
#!/bin/bashGet all files in git
git ls-files | head -50
</function_calls><function_calls>
#!/bin/bashFind any python files in the repo
git ls-files "*.py" | head -30
</function_calls><function_calls>
#!/bin/bashGet files matching the pattern in the review
git ls-files | grep "test_real_end_to_end"
</function_calls>Based on my investigation, I need to verify the information against the actual codebase. Let me use a broader search approach to understand the context:
<function_calls>
#!/bin/bashLet's see what's in the current directory structure
pwd
ls -la | head -20
</function_calls><function_calls>
#!/bin/bashList the tree structure of relevant paths
echo "=== Looking for test files ==="
find . -name ".py" -path "/test*" 2>/dev/null | grep -i "phase|end_to_end" | head -20echo ""
echo "=== All test files ==="
find . -name "test_*.py" 2>/dev/null | head -20
</function_calls><function_calls>
#!/bin/bashCheck if the file path from review comment actually exists with full path
test_file="packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py"
if [ -f "$test_file" ]; then
echo "File found!"
cat -n "$test_file" | sed -n '115,210p'
else
echo "File not found at: $test_file"
echo "Checking what's in packages dir:"
ls -la packages/ 2>/dev/null || find . -maxdepth 1 -type d | head -10
fi
</function_calls>Let me check what the PR is actually about by looking at the repository structure and the related code:
<function_calls>
#!/bin/bashList files more broadly
echo "=== Files in root ==="
ls -1 | head -20echo ""
echo "=== Check git status ==="
git status --short 2>&1 | head -20
</function_calls><function_calls>
#!/bin/bashTry to cat the exact file specified
cat -n "packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py" 2>&1 | head -100
</function_calls><function_calls>
#!/bin/bashCheck what files Python files exist related to qdrant
find . -name "*.py" | grep -i qdrant | head -20
</function_calls><function_calls>
#!/bin/bashSearch for AsyncMock usage in tests to understand the pattern
rg -n "AsyncMock" . --type py -B 3 -A 3 | head -100
</function_calls>Based on the search results and the structure of the review comment, let me search more broadly for the actual test file and related implementation:
<function_calls>
#!/bin/bashCheck the current branch and recent files
echo "=== Current branch ==="
git branch -v | head -5echo ""
echo "=== Recent modified files ==="
git log --oneline -10echo ""
echo "=== Check if file exists in git ==="
git ls-files "test_real_end_to_end"
</function_calls>Based on my investigation, I cannot locate the specified test file in the repository. This suggests either:
- The file path is incorrect in the review comment
- The file is not present in the current branch
- The repository structure is different from what's expected
Let me verify the review comment content itself by checking if there are any relevant async mock patterns:
<function_calls>
#!/bin/bashSearch broadly for any file with "end_to_end" or "phase2" in its name
echo "=== Files with end_to_end ==="
find . -name "end_to_end" 2>/dev/nullecho ""
echo "=== Files with phase2 ==="
find . -name "phase2*" 2>/dev/nullecho ""
echo "=== Files with phase ==="
find . -name "phase" 2>/dev/nullecho ""
echo "=== All python test files ==="
find . -path "/tests/" -name "*.py" 2>/dev/null | head -20
</function_calls><function_calls>
#!/bin/bashLet's check what changed in this PR/branch
echo "=== Diff summary ==="
git diff --name-only origin/main 2>/dev/null | head -30 || git log --oneline -20echo ""
echo "=== Check if we're looking at untracked files ==="
git status | head -30
</function_calls>Let me take a different approach and analyze the review comment itself for validity:
<function_calls>
#!/bin/bashSearch for SearchEngine class that uses qdrant_client
echo "=== Searching for SearchEngine class ==="
rg -n "class SearchEngine" --type py -A 20echo ""
echo "=== Searching for async def search ==="
rg -n "async def search" --type py -B 2 -A 5 | head -80
</function_calls><function_calls>
#!/bin/bashCheck for unittest.mock and AsyncMock imports/usage
echo "=== AsyncMock import and usage ==="
rg -n "from unittest.mock import|AsyncMock|from asyncio import" --type py | head -50echo ""
echo "=== Check .return_value patterns ==="
rg -n ".return_value\s*=" --type py -B 2 -A 2 | head -100
</function_calls>Based on my investigation of the repository, I cannot find the specific test file referenced in the review comment (
packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py). The file does not appear to exist in the current branch or repository.However, I can analyze the review comment itself based on general best practices for async mocking in Python. Let me verify the technical accuracy of the suggestions:
<function_calls>
AsyncMock return_value vs attribute replacement pattern best practices
</function_calls>
Summary of Findings
The referenced test file cannot be located in the repository; verification of the specific code changes and suggestions cannot be completed.
The file path
packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.pydoes not exist in the repository. Attempts to locate it usinggit ls-files,find, direct file access, and git status all returned no results.Without access to the actual test file, I cannot verify:
- Whether
mock_clientis initialized asAsyncMock()on line 125- Whether the attribute replacement pattern
mock_client.search = AsyncMock(...)is actually used- Whether the mock injection at line 201 is effective
- Whether the pipeline failure mentioned is real or what causes it
- Whether the suggested alternative pattern would actually improve the code
The technical guidance in the review about AsyncMock patterns is generally sound, but it cannot be validated against the actual code being reviewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py (2)
150-167: Async Qdrant client mocks are correctly aligned with awaited APIUsing
AsyncMockformock_client.searchandmock_client.scrollmatches the real async usage (await qdrant_client.search/scroll(...)) and avoids the previous sync/await mismatch in the test path. This should make failures more realistic if the async contract changes.As a small improvement, you could consider adding a
spectomock_clientto catch accidental calls to non-existent methods, e.g.mock_client = AsyncMock(spec=SomeAsyncQdrantClientType), if that type is easily available.
203-207: Make vector_search_service mocking robust if the attribute exists but is NoneThe conditional currently only checks for the presence of the
vector_search_serviceattribute:if hasattr(search_engine.hybrid_search, 'vector_search_service'): search_engine.hybrid_search.vector_search_service.qdrant_client = mock_qdrant_client search_engine.hybrid_search.vector_search_service.openai_client = mock_openai_clientIf some configuration sets
vector_search_servicetoNonebut still defines the attribute, this will raise anAttributeErrorduring fixture setup. You can make this more robust with agetattr+ None check:- # Also mock the qdrant_client in vector_search_service if it exists - if hasattr(search_engine.hybrid_search, 'vector_search_service'): - search_engine.hybrid_search.vector_search_service.qdrant_client = mock_qdrant_client - search_engine.hybrid_search.vector_search_service.openai_client = mock_openai_client + # Also mock the qdrant_client in vector_search_service if it exists and is not None + vector_service = getattr(search_engine.hybrid_search, "vector_search_service", None) + if vector_service is not None: + vector_service.qdrant_client = mock_qdrant_client + vector_service.openai_client = mock_openai_clientThis keeps the intent while avoiding brittle failures if initialization changes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/back-end-dev.mdc)
**/*.py: Use Python 3.x for back-end implementation and code examples
When building LLM/AI workflows, prefer and follow best practices of PydanticAI or LangChain
Use Qdrant as the vector database with performant indexing and query patterns
Design and implement RESTful endpoints with clear resource modeling and proper status codes
Implement authentication/authorization using OAuth2 or JWT; follow secure session and token handling
Files:
packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py
...es/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/hybrid/components/builder.py
Outdated
Show resolved
Hide resolved
...es/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/hybrid/components/builder.py
Outdated
Show resolved
Hide resolved
| return False | ||
|
|
||
| # Normalize path separators to forward slashes for consistent matching | ||
| rel_path = rel_path.replace("\\", "/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is a fix for cross-drive paths on windows (eg: c:\ , D:), please:
- Provide test evidence or UT to demonstrate this scenario
- Update the MR desc with the reasoning for this change
This enhancement is nice to have, but seem to be out of scope of the current fix issue. Consider moving it to a separated chore/improvement MR. (Or do 1&2)
packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py
Show resolved
Hide resolved
| return False | ||
|
|
||
| # Normalize path separators to forward slashes for consistent matching | ||
| rel_path = rel_path.replace("\\", "/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please debug or check log debug for this, need it or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
packages/qdrant-loader-mcp-server/tests/unit/test_logging.py (1)
146-153: Redundant error handling:ignore_errors=Truemakestry/exceptunnecessary.The cleanup logic uses both
ignore_errors=Trueinshutil.rmtreeand wraps it in atry/exceptblock. This is redundant—ignore_errors=Truealready suppresses all errors during directory removal.Apply this diff to simplify the error handling:
finally: # Close all logging handlers before cleanup (Windows compatibility) logging.shutdown() - - try: - shutil.rmtree(tmp_dir, ignore_errors=True) - except Exception: - pass + shutil.rmtree(tmp_dir, ignore_errors=True)packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1)
146-231: Windows symlink privilege handling looks good; consider slightly safer cleanupThe new
try/exceptaroundsymlink_dir.symlink_to(temp_dir)with a targetedwinerror == 1314check is a solid way to skip when symlink privileges are missing on Windows while still surfacing other OS errors. The added outertry/finallyensures the symlink is cleaned up even if assertions fail, which is important for keeping the test environment tidy.One optional hardening: before calling
symlink_dir.unlink()(both before creation and in the cleanup), consider checkingsymlink_dir.is_symlink()to avoid accidentally trying to unlink a non-symlink directory if that path somehow already exists:- if symlink_dir.exists(): - symlink_dir.unlink() + if symlink_dir.is_symlink(): + symlink_dir.unlink() ... - if symlink_dir.exists(): - symlink_dir.unlink() + if symlink_dir.is_symlink(): + symlink_dir.unlink()This makes the test more defensive against unexpected filesystem state without changing the intended behavior.
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (2)
334-343: Robust temp-file handling in FileConverter tests; consider a small helper to DRY it upThe repeated pattern of:
- creating a
NamedTemporaryFile(..., delete=False),- capturing
tmp_path = tmp_file.name,- using
tmp_pathfor converter calls and assertions, and- cleaning up with guarded
os.unlink(tmp_path)(andos.chmodwhere needed),is correct and clearly improves Windows compatibility by tolerating
OSError/PermissionErrorduring cleanup. If you want to reduce duplication, you could introduce a small test helper (e.g., a context manager that yieldstmp_pathand performs the guarded cleanup) and reuse it across these tests, but this is purely an optional readability/maintainability tweak.Also applies to: 355-367, 387-403, 410-420, 438-450, 466-477, 492-506
466-472: Tighten MarkItDownError file_path assertion (optional)
MarkItDownErroris constructed with the sametmp_pathstring passed intoconvert_file, soexc_info.value.file_pathshould equaltmp_path. You could strengthen this test by asserting equality instead of substring containment, e.g.assert exc_info.value.file_path == tmp_path, to catch any future formatting changes in how the path is stored.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
packages/qdrant-loader-mcp-server/pyproject.toml(1 hunks)packages/qdrant-loader-mcp-server/tests/unit/test_logging.py(3 hunks)packages/qdrant-loader/tests/unit/config/test_workspace_integration.py(2 hunks)packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py(2 hunks)packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py(11 hunks)packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/qdrant-loader-mcp-server/pyproject.toml
- packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/back-end-dev.mdc)
**/*.py: Use Python 3.x for back-end implementation and code examples
When building LLM/AI workflows, prefer and follow best practices of PydanticAI or LangChain
Use Qdrant as the vector database with performant indexing and query patterns
Design and implement RESTful endpoints with clear resource modeling and proper status codes
Implement authentication/authorization using OAuth2 or JWT; follow secure session and token handling
Files:
packages/qdrant-loader-mcp-server/tests/unit/test_logging.pypackages/qdrant-loader/tests/unit/config/test_workspace_integration.pypackages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.pypackages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py
🧬 Code graph analysis (4)
packages/qdrant-loader-mcp-server/tests/unit/test_logging.py (1)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/utils/logging.py (3)
LoggingConfig(55-162)setup(66-126)format(38-43)
packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1)
packages/qdrant-loader/tests/unit/config/test_workspace.py (3)
temp_workspace(23-37)temp_workspace(142-151)temp_workspace(223-232)
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (3)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_detector.py (3)
is_supported_for_conversion(152-200)detect_file_type(89-120)get_file_type_info(202-240)packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_converter.py (3)
_validate_file(438-454)convert_file(396-436)create_fallback_document(456-472)packages/qdrant-loader/src/qdrant_loader/core/file_conversion/exceptions.py (4)
FileSizeExceededError(39-55)FileAccessError(94-108)UnsupportedFileTypeError(25-36)MarkItDownError(73-91)
packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (4)
packages/qdrant-loader/tests/unit/connectors/git/test_adapter.py (1)
temp_dir(17-20)packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
temp_dir(19-22)packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py (1)
temp_dir(20-23)packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1)
temp_dir(24-28)
🔇 Additional comments (7)
packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (2)
177-179: Excellent Windows-compatible path comparison.The use of
Path.resolve()properly normalizes paths for comparison, handling Windows path representation differences (short vs long names, separators). This is essential for cross-platform compatibility.
180-181: Much more robust ancestry check.Replacing the substring check with proper resolved-path ancestry verification is significantly better. The original
str(temp_workspace) in actual_pathcould produce false positives/negatives due to path representation differences, while the new approach correctly verifies the parent-child relationship.packages/qdrant-loader-mcp-server/tests/unit/test_logging.py (2)
3-3: LGTM!The
shutilimport is correctly added to supportshutil.rmtreein the Windows compatibility fix.
104-122: Based on my verification of the codebase, I can now provide the final assessment.The Windows compatibility fix is correct and well-implemented; parallel test execution is not a concern for this project.
The verification shows:
- No pytest-xdist dependency is configured in the project
- No parallel test configuration exists (no pytest.ini, setup.cfg configuration for parallel execution)
- Only 2 test functions use
logging.shutdown()(lines 118 and 148)- Each test properly resets
LoggingConfig._initialized = Falseat the start for isolation- Tests use try/finally cleanup blocks appropriately
The implementation correctly handles the Windows file locking issue by calling
logging.shutdown()before file deletion. Since the project does not use parallel test execution, the concern about pytest-xdist interference is not applicable. Test isolation via theLoggingConfig._initializedreset flag is sufficient for sequential test execution.packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1)
3-5:sysimport for Windows-specific skip is appropriate
sysis only used for thesys.platform == "win32"check in the symlink privilege handling, which is a clear and minimal dependency; no issues here.packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (2)
7-7: Windows-specific skip is appropriate andsysimport is justifiedUsing
sys.platform == "win32"to skip the unreadable-file permission test on Windows avoids unreliablechmod/os.accessbehavior there while preserving coverage on POSIX. The newimport sysis minimal and correctly scoped to this use.Also applies to: 380-383
37-45: Consistenttmp_pathusage and guarded cleanup improve cross‑platform reliabilitySwitching to a
tmp_path = tmp_file.namevariable and wrappingos.unlink(tmp_path)intry/except (OSError, PermissionError)across the FileDetector tests makes the tests clearer (explicit on-disk path) and more robust on Windows, where temporary file handle semantics and delete races can surface asPermissionError. This pattern looks correct and consistent in all these blocks.Also applies to: 64-72, 89-98, 105-114, 121-136, 146-155
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py (1)
144-153: Use exact matching for better test precision—startswith is still too lenient.Similar to a previous review, the current assertion using
startswithremains overly lenient. It would incorrectly pass if BeautifulSoup extracts concatenated text (e.g.,"Malformed Title\nMalformed H1\nContent").Use exact value matching instead:
- expected_starts = ["Malformed Title", "Malformed H1", "Untitled Document"] - assert any( - title.startswith(expected) for expected in expected_starts - ), f"Expected title to start with one of {expected_starts}, but got: {title!r}" + # BeautifulSoup html.parser should handle malformed HTML deterministically + expected_titles = ["Malformed Title", "Malformed H1", "Untitled Document"] + assert title in expected_titles, ( + f"Expected title to be one of {expected_titles}, but got: {title!r}" + )This prevents false passes from concatenated or partially extracted text while still handling different parsing outcomes. If you've observed that Windows consistently produces titles with additional trailing content, please document that specific behavior in a comment and verify it with a test run.
Run the following script to verify the actual title extraction behavior on the current platform:
#!/bin/bash # Description: Test actual BeautifulSoup behavior with malformed HTML to determine if startswith is necessary cat > /tmp/test_malformed_title.py << 'EOF' from bs4 import BeautifulSoup malformed_html = """ <html> <head> <title>Malformed Title </head> <body> <div class="content"> <h1>Malformed H1 <p>Content</p> </div> </body> """ soup = BeautifulSoup(malformed_html, "html.parser") title_tag = soup.find("title") print("Python version:", __import__('sys').version) print("BeautifulSoup version:", __import__('bs4').__version__) print("Platform:", __import__('platform').system()) print() if title_tag: title = title_tag.get_text(strip=True) print(f"Extracted title: {title!r}") print(f"Title length: {len(title)}") print(f"Starts with 'Malformed Title': {title.startswith('Malformed Title')}") print(f"Equals 'Malformed Title': {title == 'Malformed Title'}") else: print("No title tag found") EOF python3 /tmp/test_malformed_title.py
🧹 Nitpick comments (6)
packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py (2)
92-93: Tighten the assertion for better test precision.The substring check
"tmp" in metadata["file_directory"]is overly permissive and could pass with unintended paths like "/tmpfoo" or "atmpb". Consider using more precise checks that remain platform-agnostic.Apply this diff for a more robust assertion:
- # File directory could be absolute or relative depending on platform - assert "tmp" in metadata["file_directory"] or metadata["file_directory"] == ".." + # File directory could be absolute or relative depending on platform + file_dir = metadata["file_directory"] + # Check for expected directory name or relative path + assert file_dir in ("tmp", "..", "/tmp") or file_dir.endswith("/tmp") or file_dir.endswith("\\tmp")Alternatively, use
pathlib.Pathfor platform-agnostic validation:+ from pathlib import Path + - # File directory could be absolute or relative depending on platform - assert "tmp" in metadata["file_directory"] or metadata["file_directory"] == ".." + # File directory could be absolute or relative depending on platform + file_dir = Path(metadata["file_directory"]) + assert file_dir.name in ("tmp", "..") or str(file_dir) == ".."
164-165: Tighten the assertion for better test precision.The substring check
"nonexistent" in metadata["file_directory"]is overly permissive and could match unintended paths. Consider a more precise validation.Apply this diff:
- # File directory could be absolute or relative depending on platform - assert "nonexistent" in metadata["file_directory"] + # File directory could be absolute or relative depending on platform + file_dir = metadata["file_directory"] + # Check for expected directory name or absolute path + assert file_dir in ("nonexistent", "/nonexistent") or file_dir.endswith("/nonexistent") or file_dir.endswith("\\nonexistent")Or use
pathlib.Pathfor clearer intent:+ from pathlib import Path + - # File directory could be absolute or relative depending on platform - assert "nonexistent" in metadata["file_directory"] + # File directory could be absolute or relative depending on platform + assert Path(metadata["file_directory"]).name == "nonexistent"packages/qdrant-loader/tests/unit/cli/test_cli.py (1)
140-150: Test logic is correct; consider removing unnecessary patch.The updated test correctly verifies that
_create_database_directorywraps exceptions from_create_db_dir_helperin a ClickException. The patching strategy properly isolates the error handling path.However, the
click.confirmpatch on line 144 appears unnecessary. Based on the implementation,_create_db_dir_helperis called directly and raises the exception before any confirmation prompt would occur. The patch doesn't affect the test outcome but adds unneeded setup.Consider this optional cleanup:
def test_create_database_directory_exception(self): """Test directory creation with exception.""" test_path = Path("/invalid/path/that/cannot/be/created") - with patch("click.confirm", return_value=True): - with patch("qdrant_loader.cli.cli._create_db_dir_helper") as mock_helper: - # Force helper to raise exception - mock_helper.side_effect = OSError("Permission denied") - - with pytest.raises(ClickException, match="Failed to create directory"): - _create_database_directory(test_path) + with patch("qdrant_loader.cli.cli._create_db_dir_helper") as mock_helper: + # Force helper to raise exception + mock_helper.side_effect = OSError("Permission denied") + + with pytest.raises(ClickException, match="Failed to create directory"): + _create_database_directory(test_path)packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (3)
54-75: Tempfile pattern is Windows‑friendly; consider narrowing the broadExceptioncatchCapturing
tmp_pathbefore leaving theNamedTemporaryFilecontext and doing conversion +os.unlinkafter the file is closed is exactly what you want for Windows handle semantics, and thetry/finallycleanup with(OSError, PermissionError)tolerance is reasonable.The only concern is
except Exception as e:here: it will also skip on genuine regressions inFileConverter.convert_file, not just missing/unsupported MarkItDown environments. If feasible, consider catching the specific wrapper error your converter raises (e.g.,MarkItDownError) and letting unexpected exceptions fail the test.- except Exception as e: + except MarkItDownError as e: # If MarkItDown is not available or fails, that's expected in test environment pytest.skip( f"MarkItDown conversion failed (expected in test environment): {e}" )And add
MarkItDownErrorto the imports fromqdrant_loader.core.file_conversionat the top of the file.
98-110: Use a specific exception type inpytest.raisesfor the size‑limit caseThe comment says this should raise
MarkItDownErrorwrappingFileSizeExceededError, but the assertion currently accepts anyException. That makes the test less strict and could hide regressions where the converter starts failing for the wrong reason.Tightening this to the expected wrapper error will keep the test aligned with the
convert_filecontract while still being independent of the underlying MarkItDown details.- try: - with pytest.raises( - Exception - ): # Should raise MarkItDownError wrapping FileSizeExceededError - small_converter.convert_file(tmp_path) + try: + with pytest.raises( + MarkItDownError + ): # Should raise MarkItDownError wrapping FileSizeExceededError + small_converter.convert_file(tmp_path)(Remember to import
MarkItDownErrorif you adopt this.)
193-225: Comprehensive assertions onget_file_type_info; consider deduping tempfile boilerplateThis block nicely validates all fields returned by
get_file_type_infoand checks PDF‑specific values, and again uses the closed‑file‑then‑unlink pattern, which is good for Windows compatibility.Given how often this
NamedTemporaryFile(...) -> tmp_path -> try/finally unlinkpattern appears in this file, you might consider extracting a small helper/pytest fixture that yields aPath(or string path) and handles deletion, so the tests can focus purely on conversion/detection assertions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py(3 hunks)packages/qdrant-loader-mcp-server/tests/unit/test_logging.py(3 hunks)packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py(2 hunks)packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py(1 hunks)packages/qdrant-loader/tests/integration/test_file_conversion_integration.py(3 hunks)packages/qdrant-loader/tests/unit/cli/test_cli.py(1 hunks)packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py(1 hunks)packages/qdrant-loader/tests/unit/config/test_workspace_integration.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py(2 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py(2 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py(1 hunks)packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py(2 hunks)packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py(2 hunks)packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py(1 hunks)packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py(11 hunks)packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py(2 hunks)packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (13)
- packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py
- packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py
- packages/qdrant-loader/tests/unit/config/test_workspace_integration.py
- packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py
- packages/qdrant-loader-mcp-server/tests/unit/test_logging.py
- packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py
- packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py
- packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py
- packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py
- packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py
- packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py
- packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py
- packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/back-end-dev.mdc)
**/*.py: Use Python 3.x for back-end implementation and code examples
When building LLM/AI workflows, prefer and follow best practices of PydanticAI or LangChain
Use Qdrant as the vector database with performant indexing and query patterns
Design and implement RESTful endpoints with clear resource modeling and proper status codes
Implement authentication/authorization using OAuth2 or JWT; follow secure session and token handling
Files:
packages/qdrant-loader/tests/unit/cli/test_cli.pypackages/qdrant-loader/tests/integration/test_file_conversion_integration.pypackages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.pypackages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py
🧬 Code graph analysis (3)
packages/qdrant-loader/tests/unit/cli/test_cli.py (1)
packages/qdrant-loader/src/qdrant_loader/cli/cli.py (1)
_create_database_directory(97-118)
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_converter.py (2)
convert_file(396-436)create_fallback_document(456-472)packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_detector.py (2)
is_supported_for_conversion(152-200)get_file_type_info(202-240)
packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py (1)
packages/qdrant-loader/src/qdrant_loader/connectors/publicdocs/connector.py (1)
_extract_title(495-538)
🔇 Additional comments (2)
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)
176-186: Detection test now correctly operates on a closed temp fileStoring
tmp_pathand runningis_supported_for_conversionafter theNamedTemporaryFilecontext exits avoids open‑handle issues on Windows, and thetry/finally+os.unlinkwith tolerant error handling is a solid cleanup pattern for these short‑lived test files.
232-249: Fallback document test correctly validates content and performs safe cleanupCreating a real on‑disk file, passing its path into
create_fallback_document, and asserting on key substrings (status, error message, filename) gives good coverage of the fallback behavior while using the same Windows‑safe temp file lifecycle and cleanup pattern as the other tests.
Pull Request
Summary
Fixed Windows compatibility issues in MCP server by replacing
connect_read_pipe(unsupported on Windows) with cross-platformrun_in_executorfor stdin reading, and added error handling for unsupported signal handlers on Windows ProactorEventLoop.Type of change
Docs Impact (required for any code or docs changes)
docs/?python website/build.py+python website/check_links.py)Testing
Changes Made:
cli.py - Windows stdin compatibility:
read_stdin()usingasyncio.StreamReader+connect_read_pipe(NotImplementedError on Windows)read_stdin_lines()usingloop.run_in_executor(None, sys.stdin.readline)(cross-platform)add_signal_handler()to prevent crashes on Windowstest_cli.py - Updated test cases:
read_stdintoread_stdin_linestest_faceted_search.py - Fixed flaky assertions:
assert generation_time_ms > 0to>= 0(test runs too fast, time can be 0)test_nlp_components.py - Fixed flaky assertions:
assert processing_time_ms > 0to>= 0test_logging.py - Fixed Windows file locking:
logger.handlers.clear()before cleanup to release file handlesTest Results: