Skip to content

Conversation

@minh362000
Copy link
Collaborator

@minh362000 minh362000 commented Nov 25, 2025

Pull Request

Summary

Fixed Windows compatibility issues in MCP server by replacing connect_read_pipe (unsupported on Windows) with cross-platform run_in_executor for stdin reading, and added error handling for unsupported signal handlers on Windows ProactorEventLoop.

Type of change

  • Feature
  • Bug fix
  • Docs update
  • Chore

Docs Impact (required for any code or docs changes)

  • Does this change require docs updates? If yes, list pages: No docs updates needed - internal implementation fix
  • Have you updated or added pages under docs/?
  • Did you build the site and run the link checker? (python website/build.py + python website/check_links.py)
  • Did you avoid banned placeholders? (no "TBD/coming soon")

Testing

Changes Made:

  1. cli.py - Windows stdin compatibility:

    • Replaced read_stdin() using asyncio.StreamReader + connect_read_pipe (NotImplementedError on Windows)
    • Implemented read_stdin_lines() using loop.run_in_executor(None, sys.stdin.readline) (cross-platform)
    • Added try-catch for add_signal_handler() to prevent crashes on Windows
  2. test_cli.py - Updated test cases:

    • Changed imports from read_stdin to read_stdin_lines
    • Updated mocks to work with async generator pattern
    • All stdio tests now pass on Windows
  3. test_faceted_search.py - Fixed flaky assertions:

    • Changed assert generation_time_ms > 0 to >= 0 (test runs too fast, time can be 0)
  4. test_nlp_components.py - Fixed flaky assertions:

    • Changed assert processing_time_ms > 0 to >= 0
  5. test_logging.py - Fixed Windows file locking:

    • Added logger.handlers.clear() before cleanup to release file handles
    • Prevents PermissionError on Windows when deleting temp files

Test Results:

# All tests pass on Windows
pytest tests/ -v
# Exit Code: 0 ✅
# MCP server starts successfully on Windows with stdio transport
mcp-qdrant-loader --transport stdio
# No crashes, server ready to handle requests ✅

## Checklist

- [ ] Tests pass (`pytest -v`)
- [ ] Linting passes (make lint / make format)
- [ ] Documentation updated (if applicable)


<!-- This is an auto-generated comment: release notes by coderabbit.ai -->
## Summary by CodeRabbit

## Release Notes

* **New Features**
  * Improved Windows path handling for cross-drive file processing in Git and local file connectors
  * Enhanced CLI robustness with fallback support for environments lacking signal handling

* **Bug Fixes**
  * Resolved cross-drive path issues on Windows affecting file processing

* **Refactor**
  * Metrics computation now uses calculated properties for consistency
  * Improved input processing with line-by-line async reading

* **Chores**
  * Updated dependencies for text splitting functionality
  * Adjusted performance test benchmarks for improved accuracy

<sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

@coderabbitai
Copy link

coderabbitai bot commented Nov 25, 2025

Walkthrough

This PR introduces cross-platform compatibility improvements and refactors stdin handling. Key changes include replacing StreamReader-based stdin reading with a thread pool-based async generator, adding Windows-compatible signal handling, fixing path operations for cross-drive scenarios on Windows, and relaxing timing assertions in tests. Various test infrastructure improvements address platform-specific file operations and mocking patterns.

Changes

Cohort / File(s) Summary
CLI stdin/signal handling refactoring
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py
Replaced read_stdin() StreamReader-based reader with read_stdin_lines() async generator using thread pool executor; updated all stdin consumption to iterate asynchronously; added try/except guards around loop.add_signal_handler() for cross-platform resilience; normalized line trimming and empty-line skipping.
Logging restructuring
packages/qdrant-loader-core/src/qdrant_loader_core/logging.py
Reformatted list comprehension and configuration tuple updates into multi-line explicit sequences; no functional changes.
Windows cross-drive path handling
packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py, packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py, packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py
Wrapped relative-path computations in try/except ValueError to handle cross-drive Windows paths; normalize path separators (backslashes to forward slashes) after successful computation.
Metrics property refactoring
packages/qdrant-loader/src/qdrant_loader/core/monitoring/ingestion_metrics.py
Converted average_conversion_time from stored field to read-only computed property; removed inline field updates.
Processing stats safeguards
packages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.py
Added guards against None and non-positive elapsed times in get_latest_rates computation.
Test assertion relaxations (timing)
packages/qdrant-loader-mcp-server/tests/unit/search/test_faceted_search.py, packages/qdrant-loader-mcp-server/tests/unit/search/test_nlp_components.py, packages/qdrant-loader-mcp-server/tests/unit/search/test_topic_search_chain.py, packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py
Relaxed generation/processing time assertions from > 0 to >= 0 to allow zero timing for fast executions.
CLI test refactoring
packages/qdrant-loader-mcp-server/tests/unit/test_cli.py
Updated test mocks to use async generators and thread pool executor patterns; replaced StreamReader/Protocol mocks with event loop and patched sys.stdin; tests now iterate over read_stdin_lines() asynchronously.
Logging test improvements
packages/qdrant-loader-mcp-server/tests/unit/test_logging.py
Refactored file lifecycle management; replaced TemporaryDirectory with mkdtemp(); added logging.shutdown() calls; moved cleanup to finally blocks with safe resource removal.
File conversion test infrastructure
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py, packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py
Replaced temporary file handle usage with derived path variables; consolidated cleanup into finally blocks with safe unlink; updated calls to operate on paths instead of file handles.
Platform-specific test handling
packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py, packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py
Added Windows-specific skips for symlink creation and signal.alarm-dependent tests; wrapped operations with try/except for privilege-required errors.
Path normalization in tests
packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py, packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py, packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py, packages/qdrant-loader/tests/unit/config/test_workspace_integration.py
Replaced hard-coded POSIX paths with OS-aware path construction; added resolved-path comparisons for Windows compatibility.
Test mocking improvements
packages/qdrant-loader/tests/unit/cli/test_cli.py, packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py, packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py, packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py
Updated patches to target internal helpers; replaced chmod-based file permission simulation with os.access mocking; switched synchronous mocks to AsyncMock for await usage.
Logging redaction updates
packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py
Switched from getLogger() to LoggingConfig.get_logger(); propagated root logger formatter to test handler; updated to structured logging format with keyword arguments.
Performance benchmark thresholds
packages/qdrant-loader-mcp-server/tests/integration/test_phase1_2_simple_integration.py
Increased average generation time target from 250ms to 800ms; increased end-to-end time target from 1500ms to 5000ms.
Module size exemption
packages/qdrant-loader-core/tests/unit/quality/test_module_sizes.py
Added exemption entry for "logging.py" with 402-line limit.
Configuration & dependencies
packages/qdrant-loader-mcp-server/pyproject.toml, packages/qdrant-loader/src/qdrant_loader/connectors/text_processor.py, packages/qdrant-loader/pyproject.toml
Added blank line after dependencies block; updated langchain-text-splitters import path from langchain.text_splitter to langchain_text_splitters; bumped version from >=0.0.1 to >=0.3.9.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30–45 minutes

  • CLI stdin/signal handling refactoring (cli.py): Review async generator implementation, thread pool executor integration, signal handler resilience logic, and all stdin consumption sites.
  • Windows cross-drive path handling: Verify ValueError handling and path normalization across git and localfile connectors; ensure consistency.
  • Async/await mocking in tests: Confirm AsyncMock usage, executor mock patterns, and async iteration patterns in CLI and integration tests align with real code.
  • Timing assertion relaxations: Validate that all >= 0 changes are intentional and reflect acceptable execution profiles.
  • Metrics property refactoring: Ensure property getter correctly computes average and handles division-by-zero edge case.

Possibly related issues

Possibly related PRs

Poem

🐰 A rabbit hops through async streams so clean,
Cross-platform paths where none had been,
Signal handlers tamed with gentle care,
Timing thresholds relaxed without a scare,
Windows and Unix now play fair,
This refactor's love floats through the air!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main purpose of the PR: fixing Windows compatibility issues, as evidenced by the numerous platform-specific changes across CLI, testing, and connector code.
Docstring Coverage ✅ Passed Docstring coverage is 94.94% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bugfix/57

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/qdrant-loader-core/src/qdrant_loader_core/logging.py (1)

374-401: Pipeline failure: File exceeds 400-line size limit.

The multi-line reformatting at lines 374-376 and 391-401 has pushed the file to 401 lines, exceeding the configured limit of 400. While the formatting improves readability, you'll need to reduce the line count.

Options to reduce line count:

  1. Revert to single-line formatting for the list comprehension:
-            cls._installed_handlers = [
-                h for h in cls._installed_handlers if h is not cls._file_handler
-            ]
+            cls._installed_handlers = [h for h in cls._installed_handlers if h is not cls._file_handler]
  1. Or combine the tuple unpacking/packing:
-            level, fmt, _, clean_output, suppress_qdrant_warnings, disable_console = (
-                cls._current_config
-            )
-            cls._current_config = (
-                level,
-                fmt,
-                file,
-                clean_output,
-                suppress_qdrant_warnings,
-                disable_console,
-            )
+            level, fmt, _, clean_output, suppress_qdrant_warnings, disable_console = cls._current_config
+            cls._current_config = (level, fmt, file, clean_output, suppress_qdrant_warnings, disable_console)
🧹 Nitpick comments (3)
packages/qdrant-loader-mcp-server/tests/unit/search/test_nlp_components.py (1)

63-67: Relaxing timing assertion here is reasonable and reduces flakiness

Allowing processing_time_ms to be >= 0 for this multi-query loop is sensible: it still verifies the field is present and numeric, while avoiding spurious failures on very fast or cached paths (especially for empty queries). Keeping stricter > 0 checks in more focused tests maintains stronger guarantees where they matter.

If you see similar intermittent timing-related failures in other tests, consider applying the same >= 0 pattern there as well.

packages/qdrant-loader-mcp-server/tests/unit/test_logging.py (1)

145-153: Windows compatibility fix looks good, minor style note.

The logging.shutdown() and shutil.rmtree() pattern correctly handles Windows file locking.

Consider moving the shutil import to the top of the file with other imports for consistency:

+import shutil
 import logging
 import os
 import tempfile

Then simplify the finally block:

     finally:
         logging.shutdown()
-        import shutil
-
         try:
             shutil.rmtree(tmp_dir, ignore_errors=True)
         except Exception:
             pass
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py (1)

72-79: Use asyncio.get_running_loop() instead of deprecated get_event_loop().

asyncio.get_event_loop() is deprecated in Python 3.10+ when called from a running coroutine. Since this is an async generator, use asyncio.get_running_loop() which is the recommended approach for obtaining the loop from within async code.

 async def read_stdin_lines():
     """Cross-platform async generator that yields lines from stdin."""
-    loop = asyncio.get_event_loop()
+    loop = asyncio.get_running_loop()
     while True:
         line = await loop.run_in_executor(None, sys.stdin.readline)
         if not line:  # EOF
             break
         yield line
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between daf39b8 and 05e7e76.

📒 Files selected for processing (8)
  • packages/qdrant-loader-core/src/qdrant_loader_core/logging.py (2 hunks)
  • packages/qdrant-loader-mcp-server/pyproject.toml (1 hunks)
  • packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py (3 hunks)
  • packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/hybrid/components/builder.py (2 hunks)
  • packages/qdrant-loader-mcp-server/tests/unit/search/test_faceted_search.py (1 hunks)
  • packages/qdrant-loader-mcp-server/tests/unit/search/test_nlp_components.py (1 hunks)
  • packages/qdrant-loader-mcp-server/tests/unit/test_cli.py (6 hunks)
  • packages/qdrant-loader-mcp-server/tests/unit/test_logging.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/back-end-dev.mdc)

**/*.py: Use Python 3.x for back-end implementation and code examples
When building LLM/AI workflows, prefer and follow best practices of PydanticAI or LangChain
Use Qdrant as the vector database with performant indexing and query patterns
Design and implement RESTful endpoints with clear resource modeling and proper status codes
Implement authentication/authorization using OAuth2 or JWT; follow secure session and token handling

Files:

  • packages/qdrant-loader-mcp-server/tests/unit/search/test_nlp_components.py
  • packages/qdrant-loader-mcp-server/tests/unit/search/test_faceted_search.py
  • packages/qdrant-loader-mcp-server/tests/unit/test_logging.py
  • packages/qdrant-loader-core/src/qdrant_loader_core/logging.py
  • packages/qdrant-loader-mcp-server/tests/unit/test_cli.py
  • packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/hybrid/components/builder.py
  • packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py
🧠 Learnings (1)
📚 Learning: 2025-08-12T09:02:25.128Z
Learnt from: martin-papy
Repo: martin-papy/qdrant-loader PR: 43
File: docs/developers/testing/README.md:36-40
Timestamp: 2025-08-12T09:02:25.128Z
Learning: Both packages/qdrant-loader and packages/qdrant-loader-mcp-server directories contain pyproject.toml files at their root level, making them installable with pip install -e for editable development installations.

Applied to files:

  • packages/qdrant-loader-mcp-server/pyproject.toml
  • packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py
🧬 Code graph analysis (3)
packages/qdrant-loader-mcp-server/tests/unit/test_cli.py (1)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py (1)
  • read_stdin_lines (72-79)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/hybrid/components/builder.py (2)
packages/qdrant-loader-core/src/qdrant_loader_core/llm/factory.py (1)
  • create_provider (52-87)
packages/qdrant-loader/src/qdrant_loader/config/__init__.py (1)
  • llm_settings (307-317)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py (1)
packages/qdrant-loader-core/src/qdrant_loader_core/logging.py (2)
  • LoggingConfig (190-401)
  • get_logger (354-357)
🪛 GitHub Actions: Quality Gates
packages/qdrant-loader-mcp-server/tests/unit/search/test_nlp_components.py

[error] 1-1: Getting requirements to build editable did not run successfully. ValueError: invalid pyproject.toml config: project.optional-dependencies.classifiers[0] must be pep508.

packages/qdrant-loader-mcp-server/tests/unit/search/test_faceted_search.py

[error] 1-1: Getting requirements to build editable did not run successfully. ValueError: invalid pyproject.toml config: project.optional-dependencies.classifiers[0] must be pep508.

packages/qdrant-loader-mcp-server/pyproject.toml

[error] 1-1: invalid pyproject.toml config: project.optional-dependencies.classifiers[0]. configuration error: project.optional-dependencies.classifiers[0] must be pep508

packages/qdrant-loader-mcp-server/tests/unit/test_logging.py

[error] 1-1: Getting requirements to build editable did not run successfully. ValueError: invalid pyproject.toml config: project.optional-dependencies.classifiers[0] must be pep508.

packages/qdrant-loader-mcp-server/tests/unit/test_cli.py

[error] 1-1: Getting requirements to build editable did not run successfully. ValueError: invalid pyproject.toml config: project.optional-dependencies.classifiers[0] must be pep508.

packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/hybrid/components/builder.py

[error] 1-1: Getting requirements to build editable did not run successfully. ValueError: invalid pyproject.toml config: project.optional-dependencies.classifiers[0] must be pep508.

packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py

[error] 1-1: Getting requirements to build editable did not run successfully. ValueError: invalid pyproject.toml config: project.optional-dependencies.classifiers[0] must be pep508.

🪛 GitHub Actions: Test and Coverage
packages/qdrant-loader-mcp-server/pyproject.toml

[error] 1-1: invalid pyproject.toml config: project.optional-dependencies.classifiers[0] must be pep508

packages/qdrant-loader-mcp-server/tests/unit/test_logging.py

[error] 1-1: Core modules exceeding size limits: 401 > 400 (logging.py)

packages/qdrant-loader-core/src/qdrant_loader_core/logging.py

[error] 1-1: Core modules exceeding size limits: 401 > 400 (logging.py)

🔇 Additional comments (11)
packages/qdrant-loader-mcp-server/tests/unit/search/test_faceted_search.py (1)

466-471: Zero generation times are acceptable for this sanity check

Changing the assertion to faceted_results.generation_time_ms >= 0 keeps the validation that the timing is computed and sensible, while avoiding brittle assumptions about minimum duration that can fail on fast machines or coarse timers.

packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/hybrid/components/builder.py (2)

33-43: LGTM: Defensive debug logging for config load failure.

The logging is properly guarded with try/except and uses appropriate debug level for troubleshooting. The error message provides useful context without exposing sensitive data.


123-139: LGTM: Provider creation logging with appropriate safeguards.

The logging correctly uses has_api_key=bool(...) to avoid leaking the actual API key. The provider type and name provide useful debugging information.

Note: base_url is logged directly. This is generally acceptable since base URLs are typically public endpoints, but verify this doesn't expose internal infrastructure URLs in your deployment.

packages/qdrant-loader-mcp-server/tests/unit/test_logging.py (1)

96-121: LGTM: Windows-compatible file cleanup.

The changes correctly address Windows file locking by:

  1. Extracting tmp_file_name before closing the NamedTemporaryFile context
  2. Calling logging.shutdown() before file deletion to release handles

This is the proper approach for cross-platform compatibility.

packages/qdrant-loader-mcp-server/tests/unit/test_cli.py (5)

162-188: LGTM: Test correctly validates async generator behavior.

The test properly:

  1. Mocks run_in_executor to return lines then EOF (empty string)
  2. Consumes the async generator with async for
  3. Verifies both the yielded content and the call count (3 = 2 lines + 1 EOF check)

This aligns with the read_stdin_lines() implementation in cli.py.


294-334: LGTM: JSON parse error test updated for async generator pattern.

The mock async generator correctly yields invalid JSON, and the test verifies the expected error response (code -32700, "Parse error").


336-377: LGTM: Invalid request format test updated correctly.

The async generator yields non-object JSON ("not an object"), and the test correctly expects error code -32600 ("Invalid Request").


379-421: LGTM: Invalid JSON-RPC version test updated correctly.

Test properly validates that JSON-RPC version "1.0" (instead of "2.0") triggers error code -32600 with appropriate error data.


423-474: LGTM: Successful request test updated correctly.

The test validates the happy path with a valid JSON-RPC 2.0 request, verifying that the MCP handler is called and the response is properly written.

packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/cli.py (2)

289-293: LGTM!

The async generator consumption pattern is clean and correctly handles line-by-line processing with appropriate whitespace trimming and empty line filtering.


533-543: LGTM! Appropriate handling for Windows compatibility.

The try/except block correctly handles the NotImplementedError raised by Windows ProactorEventLoop. Debug-level logging is appropriate since this is expected behavior on Windows, not an error condition.

Note: Without signal handlers, graceful shutdown via SIGTERM/SIGINT won't work on Windows. Consider documenting that Windows users should use Ctrl+C for interactive shutdown, or explore alternative approaches like signal.signal() if needed for background processes.

@minh362000 minh362000 changed the title fix: Windows compatibility fixes. Fix #57 [WIP] fix: Windows compatibility fixes. Fix #57 Nov 25, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (11)
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py (1)

76-80: Consider using @pytest.mark.skipif decorator for cleaner platform skipping.

The inline skip works correctly but pytest's decorator is more idiomatic and improves test readability.

Apply this diff to use the decorator approach:

+    @pytest.mark.skipif(
+        sys.platform == "win32",
+        reason="signal.alarm not available on Windows"
+    )
     def test_timeout_handler_context_manager(self):
         """Test timeout handler as context manager."""
-        import sys
-        
-        # Skip on Windows - signal.alarm not available
-        if sys.platform == "win32":
-            pytest.skip("signal.alarm not available on Windows")
-            
         with patch("signal.signal") as mock_signal, patch("signal.alarm") as mock_alarm:

Note: sys is already imported at the module level (line 5 imports signal; you can add import sys at line 5 alongside it).

packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1)

176-182: Path normalization is correct; inner Path import can be dropped

Using Path(actual_path).resolve() and checking that the resolved DB path has the resolved workspace in its parents is a good, cross-platform way to assert the override behavior and directory relationship. Since Path is already imported at the top of the module (Line 5), the inner from pathlib import Path at Line 177 is redundant and can be removed.

You can simplify this block as:

-        # Normalize paths for comparison (handles Windows short paths)
-        from pathlib import Path
-        assert Path(actual_path).resolve() == Path(expected_path).resolve()
+        # Normalize paths for comparison (handles Windows paths consistently)
+        assert Path(actual_path).resolve() == Path(expected_path).resolve()
packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py (1)

92-107: Consider consolidating duplicate test coverage.

This test is nearly identical to test_create_database_directory_exception in test_cli.py (lines 140-150). Both test files cover the same exception handling behavior for _create_database_directory.

Consider consolidating the test coverage to avoid maintenance overhead.

packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1)

143-157: Make Windows privilege detection more robust than string-matching the error message

The new try/except correctly skips the test when symlink creation fails due to missing privileges on Windows, and otherwise re-raises the error. To avoid brittle dependence on the localized error text ("privilege" in str(e).lower()), consider matching on the platform-specific error code instead (e.g., winerror == 1314) or at least errno/exception type:

-        import sys
+        import sys

@@
-        try:
-            symlink_dir.symlink_to(temp_dir)
-        except OSError as e:
-            # Skip test on Windows if user doesn't have symlink privileges
-            if sys.platform == "win32" and "privilege" in str(e).lower():
-                pytest.skip("Symlink creation requires admin privileges on Windows")
-            raise
+        try:
+            symlink_dir.symlink_to(temp_dir)
+        except OSError as e:
+            # Skip test on Windows if user doesn't have symlink privileges
+            if sys.platform == "win32" and getattr(e, "winerror", None) == 1314:
+                pytest.skip("Symlink creation requires admin privileges on Windows")
+            raise

(Alternatively, you could check isinstance(e, PermissionError) and/or e.errno in {errno.EPERM, errno.EACCES}.)

Minor style nit: if you prefer consistent import style across the test suite, you could move the import sys to the module top; local import is fine functionally, though.

packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)

149-155: Consider using a more explicit patch target.

The Windows-compatible mocking approach is excellent and solves the chmod portability issue. However, for better clarity and maintainability, patch where os.access is used rather than patching the built-in module globally.

Apply this diff for a more explicit patch target:

-        # Mock os.access to simulate unreadable file
-        with patch("os.access", return_value=False):
+        # Mock os.access to simulate unreadable file
+        with patch("qdrant_loader.connectors.git.file_processor.os.access", return_value=False):
             assert processor.should_process_file(test_file) is False

This makes it clearer which module's os.access is being mocked and prevents potential side effects if other code under test also calls os.access.

packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (2)

378-382: Move import to module level.

The sys module is imported at function level. For consistency and clarity, move it to the top of the file with other imports.

 import os
 import tempfile
 from pathlib import Path
 from unittest.mock import Mock, patch
+import sys

 import pytest

Then remove the function-level import.


36-44: Consider extracting cleanup pattern to a fixture.

The temporary file creation and cleanup pattern is repeated across many tests. Consider creating a pytest fixture or context manager to reduce duplication and ensure consistent cleanup behavior.

Example fixture:

@pytest.fixture
def temp_file_path(request):
    """Create a temporary file and ensure cleanup."""
    suffix = request.param if hasattr(request, 'param') else '.tmp'
    with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp_file:
        tmp_path = tmp_file.name
    
    yield tmp_path
    
    try:
        os.unlink(tmp_path)
    except (OSError, PermissionError):
        pass

Usage:

@pytest.mark.parametrize('temp_file_path', ['.pdf'], indirect=True)
def test_example(self, temp_file_path):
    # Write to file
    with open(temp_file_path, 'wb') as f:
        f.write(b"content")
    # Test with temp_file_path

Also applies to: 63-71, 88-97, 104-113, 120-135, 145-154, 333-342, 354-366, 410-420, 439-451, 467-478, 493-509

packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (1)

54-75: Consider extracting cleanup pattern to a fixture.

Similar to the unit tests, this file repeats the temporary file creation and cleanup pattern. Consider creating a shared pytest fixture (perhaps in a conftest.py) to reduce duplication across both unit and integration tests.

Example shared fixture in tests/conftest.py:

@pytest.fixture
def windows_safe_temp_file():
    """Create a temporary file with Windows-safe cleanup."""
    def _create(suffix='.tmp', content=b''):
        with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp_file:
            if content:
                tmp_file.write(content)
                tmp_file.flush()
            tmp_path = tmp_file.name
        
        yield tmp_path
        
        try:
            os.unlink(tmp_path)
        except (OSError, PermissionError):
            pass
    
    return _create

Also applies to: 98-110, 176-188, 195-227, 234-251

packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1)

442-448: Remove redundant import statement.

The module-level import of os at line 5 makes the local import at line 443 redundant.

Apply this diff to remove the redundant import:

         # Use os.path.join for expected paths to match OS-specific separators
-        import os
         expected = [
             os.path.join("/fake/repo/path", "file1.txt"),
             os.path.join("/fake/repo/path", "file2.py"),
packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py (1)

43-58: Consider moving tempfile import to module level.

The tempfile import at line 45 works correctly but is conventionally placed at the module level with other imports for better code organization.

Apply this diff to move the import:

 import os
 from datetime import datetime
 from unittest.mock import MagicMock, patch
+import tempfile

 import pytest

And remove from the fixture:

     def mock_git_ops(self, mock_repo, mock_config):
         """Fixture creating mock Git operations."""
-        import tempfile
         git_ops = MagicMock(spec=GitOperations)
packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (1)

49-49: Optional: Remove unused caplog parameter.

The caplog fixture parameter is not used in the test body since the test manually creates and manages a custom handler.

-def test_stdlib_logs_are_redacted(caplog):
+def test_stdlib_logs_are_redacted():
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64465fb and d4d223e.

📒 Files selected for processing (19)
  • packages/qdrant-loader-core/tests/unit/quality/test_module_sizes.py (1 hunks)
  • packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py (1 hunks)
  • packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py (2 hunks)
  • packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py (1 hunks)
  • packages/qdrant-loader/src/qdrant_loader/core/monitoring/ingestion_metrics.py (1 hunks)
  • packages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.py (1 hunks)
  • packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py (1 hunks)
  • packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (3 hunks)
  • packages/qdrant-loader/tests/unit/cli/test_cli.py (1 hunks)
  • packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (2 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (10 hunks)
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py (3 hunks)
  • packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/back-end-dev.mdc)

**/*.py: Use Python 3.x for back-end implementation and code examples
When building LLM/AI workflows, prefer and follow best practices of PydanticAI or LangChain
Use Qdrant as the vector database with performant indexing and query patterns
Design and implement RESTful endpoints with clear resource modeling and proper status codes
Implement authentication/authorization using OAuth2 or JWT; follow secure session and token handling

Files:

  • packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py
  • packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py
  • packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py
  • packages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.py
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py
  • packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py
  • packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py
  • packages/qdrant-loader/tests/integration/test_file_conversion_integration.py
  • packages/qdrant-loader/tests/unit/cli/test_cli.py
  • packages/qdrant-loader/tests/unit/config/test_workspace_integration.py
  • packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py
  • packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py
  • packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py
  • packages/qdrant-loader/src/qdrant_loader/core/monitoring/ingestion_metrics.py
  • packages/qdrant-loader-core/tests/unit/quality/test_module_sizes.py
  • packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py
🧬 Code graph analysis (6)
packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py (4)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/components/models/hybrid.py (1)
  • file_path (60-61)
packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
  • temp_dir (18-21)
packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1)
  • temp_dir (23-27)
packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1)
  • temp_dir (18-29)
packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (2)
packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py (1)
  • should_process_file (37-227)
packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py (1)
  • should_process_file (36-175)
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_converter.py (2)
  • convert_file (396-436)
  • create_fallback_document (456-472)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_detector.py (2)
  • is_supported_for_conversion (152-200)
  • get_file_type_info (202-240)
packages/qdrant-loader/tests/unit/cli/test_cli.py (1)
packages/qdrant-loader/src/qdrant_loader/cli/cli.py (1)
  • _create_database_directory (97-118)
packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1)
packages/qdrant-loader/tests/unit/config/test_workspace.py (3)
  • temp_workspace (22-36)
  • temp_workspace (141-150)
  • temp_workspace (222-231)
packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (3)
packages/qdrant-loader-core/src/qdrant_loader_core/logging.py (2)
  • LoggingConfig (190-401)
  • get_logger (354-357)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/utils/logging.py (2)
  • LoggingConfig (55-162)
  • get_logger (129-132)
packages/qdrant-loader/src/qdrant_loader/utils/logging.py (2)
  • LoggingConfig (399-630)
  • get_logger (618-630)
🔇 Additional comments (15)
packages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.py (1)

105-117: Safe current-window rate calculation looks good

The new current_window_elapsed / current_window_rate logic correctly guards against current_window_start being None and zero-duration windows, without changing the returned schema. This is a straightforward robustness improvement and keeps behavior consistent with the rest of the class.

packages/qdrant-loader/src/qdrant_loader/core/monitoring/ingestion_metrics.py (1)

74-80: Derived average_conversion_time property is a solid improvement

Treating average_conversion_time as a computed property over total_files_processed and total_conversion_time (with a zero-files guard) removes denormalized state and keeps summary/save logic simple and consistent with the underlying counters.

packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py (3)

121-126: Relaxed conversion_time assertion is appropriate

Allowing conversion_time >= 0 (instead of > 0) matches the real behavior where very fast tests can yield an effectively zero duration, while still guarding against negative values.


321-321: Helpful note about computed average_conversion_time

The inline comment correctly documents that average_conversion_time is now derived from total_conversion_time / total_files_processed, keeping the test aligned with the implementation.


345-345: Average-conversion-time assertion matches new property semantics

Asserting summary["average_conversion_time"] == 4.5 is consistent with the property 45.0 / 10 computation and provides a clear, precise expectation for the derived metric.

packages/qdrant-loader/tests/unit/cli/test_cli.py (1)

145-150: LGTM! Improved test reliability.

The updated test correctly mocks the internal _create_db_dir_helper to simulate permission errors, making the test more predictable and platform-independent. This approach avoids relying on actual filesystem behavior, which is especially valuable for Windows compatibility.

packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py (1)

99-107: LGTM! Consistent exception handling test.

The test correctly verifies that OSError from the internal helper is wrapped in a ClickException. The mocking approach improves test reliability and Windows compatibility.

packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (2)

5-5: LGTM! Import supports cross-platform testing.

The patch import is correctly added to enable mocking os.access for Windows-compatible file access testing.


148-148: Clear documentation of the cross-platform approach.

The comment effectively explains why mocking is used instead of filesystem permission changes.

packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (1)

378-382: Windows skip logic is appropriate.

The Windows skip for permission-related tests is correct, as chmod behavior differs significantly on Windows and these tests would be unreliable.

packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (1)

54-75: LGTM! Windows-safe file handling implemented correctly.

All integration tests properly implement the Windows-compatible pattern:

  1. Close temporary files before processing (Windows requires this)
  2. Store the path for later use
  3. Clean up with proper exception handling for OSError and PermissionError

The added comments ("File is now closed, safe to process and delete") clearly document the Windows compatibility requirement.

Also applies to: 98-110, 176-188, 195-227, 234-251

packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py (1)

63-68: LGTM! Cross-platform path handling implemented correctly.

The code properly handles cross-drive scenarios on Windows by catching ValueError from os.path.relpath and falling back to absolute paths. The conditional logic at line 88 correctly extracts the directory for both relative and absolute path cases.

Also applies to: 88-88

packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py (1)

68-76: <function_calls>


#!/bin/bash

Read lines 60-130 showing cross-drive handling and exclude patterns

sed -n '60,130p' packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py | cat -n


</function_calls>

packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (2)

52-53: LGTM! Correct use of structured logging.

The change to use LoggingConfig.get_logger("stdlib.logger") correctly adopts the structured logging pattern and returns a structlog.BoundLogger that integrates with the redaction pipeline.


78-83: LGTM! Correct structured logging pattern for redaction.

The switch from string interpolation to keyword arguments is the proper pattern for structlog. This allows the _redact_processor to identify and redact sensitive keys (token, api_key) in the event dictionary before rendering.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py (1)

93-102: Add Windows skip for signal.SIGALRM usage.

This test uses signal.SIGALRM at line 99, which is not available on Windows (Unix-only signal). The test will fail with AttributeError on Windows. Since test_timeout_handler_context_manager already has a Windows skip for the same reason, this test should be consistent.

Apply this diff to add the Windows skip:

+    @pytest.mark.skipif(
+        sys.platform == "win32",
+        reason="signal.SIGALRM not available on Windows"
+    )
     def test_timeout_handler_raises_timeout_error(self):
         """Test that timeout handler raises ConversionTimeoutError."""
🧹 Nitpick comments (9)
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)

54-75: Windows‑safe temp file handling and cleanup looks solid

Storing tmp_file.name into tmp_path before leaving the NamedTemporaryFile context and then running conversion + os.unlink in a try/finally is exactly what you want for Windows (no open-handle issues, cleanup always runs, including on pytest.skip). Catching OSError/PermissionError is reasonable to avoid flaky failures from already‑deleted or transiently locked files in CI.

If you care to trim a nit later: PermissionError is already an OSError, so catching just OSError would be sufficient.


176-188: Good shift to path‑based detection with guarded cleanup

Using tmp_path (closed file path) when calling self.detector.is_supported_for_conversion matches the production API and avoids platform‑specific file‑handle quirks. The try/finally + os.unlink wrapper ensures temporary files are cleaned up without making tests flaky if the file is already gone or briefly locked.

If you later want to detect “file still in use” leaks, you could consider only swallowing FileNotFoundError and letting other OSErrors surface, but for cross‑platform CI stability your current approach is acceptable.

packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1)

176-178: Remove redundant local import.

Path is already imported at the top of the file (line 5), making the local import on line 177 unnecessary.

Apply this diff to remove the redundant import:

 # Normalize paths for comparison (handles Windows short paths)
-from pathlib import Path
 assert Path(actual_path).resolve() == Path(expected_path).resolve()
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py (1)

76-80: Consider using pytest.mark.skipif decorator for cleaner test skipping.

The inline skip works correctly, but using a decorator would be more idiomatic and move the platform check out of the test body.

Apply this diff to use the decorator approach:

+    @pytest.mark.skipif(
+        sys.platform == "win32",
+        reason="signal.alarm not available on Windows"
+    )
     def test_timeout_handler_context_manager(self):
         """Test timeout handler as context manager."""
-        import sys
-        
-        # Skip on Windows - signal.alarm not available
-        if sys.platform == "win32":
-            pytest.skip("signal.alarm not available on Windows")
-            
         with patch("signal.signal") as mock_signal, patch("signal.alarm") as mock_alarm:

Note: sys is already imported at line 5, so the inline import at line 76 is redundant.

packages/qdrant-loader/src/qdrant_loader/core/monitoring/ingestion_metrics.py (1)

74-80: LGTM! Computed property improves consistency.

Converting average_conversion_time from a stored field to a computed property eliminates the risk of stale values and reduces state mutation. The division-by-zero guard is appropriate, and the property integrates seamlessly with existing usages in get_conversion_summary (line 385) and save_metrics (line 451).

For consistency with this pattern, consider whether other averaged metrics in the codebase could benefit from the same computed-property approach.

packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (2)

104-104: Add spaces around assignment operator.

These lines are missing spaces around the = operator, which violates PEP 8 style guidelines.

Apply this formatting fix to all affected lines:

-            tmp_path=tmp_file.name
+            tmp_path = tmp_file.name

Also applies to: 120-120, 145-145, 333-333, 354-354, 387-387, 410-410, 467-467


154-154: Verify indentation consistency in exception handlers.

The pass statements in these except blocks appear to have inconsistent indentation compared to other similar blocks in the file. Please ensure consistent indentation throughout.

Also applies to: 342-342, 366-366, 420-420, 451-451, 478-478, 509-509

packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py (1)

88-88: Consider simplifying the ternary logic for clarity.

The conditional expression os.path.dirname(rel_path) if not os.path.isabs(rel_path) else os.path.dirname(file_path) is logically correct but somewhat difficult to parse. Since rel_path equals file_path when the relpath computation fails (line 68), you could simplify this to just os.path.dirname(rel_path) in both cases, or add a comment explaining the intent.

Example simplification:

-            "file_directory": os.path.dirname(rel_path) if not os.path.isabs(rel_path) else os.path.dirname(file_path),
+            "file_directory": os.path.dirname(rel_path),  # Uses relative path if available, otherwise absolute
packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1)

143-144: Consider moving the import to module level.

The import sys statement is placed inside the test method, which works but is non-idiomatic. For consistency with the existing imports at lines 3-11 and adherence to PEP 8 conventions, consider moving it to the top of the file.

Apply this diff to move the import:

+import sys
 import os
 import tempfile
 from pathlib import Path

And remove it from the test method:

     async def test_document_id_consistency_with_symlinks(self, temp_dir):
         """Test that document IDs remain consistent when accessing files through symlinks."""
-        import sys
-        
         # Create a symlink to the temp directory
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64465fb and 5acebe1.

📒 Files selected for processing (21)
  • packages/qdrant-loader-core/tests/unit/quality/test_module_sizes.py (1 hunks)
  • packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py (1 hunks)
  • packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py (2 hunks)
  • packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py (1 hunks)
  • packages/qdrant-loader/src/qdrant_loader/core/monitoring/ingestion_metrics.py (1 hunks)
  • packages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.py (1 hunks)
  • packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py (1 hunks)
  • packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (3 hunks)
  • packages/qdrant-loader/tests/unit/cli/test_cli.py (1 hunks)
  • packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (2 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py (2 hunks)
  • packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (10 hunks)
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py (3 hunks)
  • packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/back-end-dev.mdc)

**/*.py: Use Python 3.x for back-end implementation and code examples
When building LLM/AI workflows, prefer and follow best practices of PydanticAI or LangChain
Use Qdrant as the vector database with performant indexing and query patterns
Design and implement RESTful endpoints with clear resource modeling and proper status codes
Implement authentication/authorization using OAuth2 or JWT; follow secure session and token handling

Files:

  • packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py
  • packages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.py
  • packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py
  • packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py
  • packages/qdrant-loader/tests/unit/config/test_workspace_integration.py
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py
  • packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py
  • packages/qdrant-loader/src/qdrant_loader/core/monitoring/ingestion_metrics.py
  • packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py
  • packages/qdrant-loader/tests/unit/cli/test_cli.py
  • packages/qdrant-loader/tests/integration/test_file_conversion_integration.py
  • packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py
  • packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py
  • packages/qdrant-loader-core/tests/unit/quality/test_module_sizes.py
  • packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py
  • packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py
🧬 Code graph analysis (7)
packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py (1)
packages/qdrant-loader/src/qdrant_loader/connectors/publicdocs/connector.py (1)
  • _extract_title (495-538)
packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (2)
packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
  • temp_dir (18-21)
packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1)
  • temp_dir (23-27)
packages/qdrant-loader/tests/unit/cli/test_cli.py (1)
packages/qdrant-loader/src/qdrant_loader/cli/cli.py (1)
  • _create_database_directory (97-118)
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_converter.py (2)
  • convert_file (396-436)
  • create_fallback_document (456-472)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_detector.py (2)
  • is_supported_for_conversion (152-200)
  • get_file_type_info (202-240)
packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py (5)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/components/models/hybrid.py (1)
  • file_path (60-61)
packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
  • temp_dir (18-21)
packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1)
  • temp_dir (23-27)
packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1)
  • temp_dir (18-29)
packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor_enhanced.py (1)
  • temp_dir (20-23)
packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py (2)
packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (2)
  • mock_repo (31-35)
  • temp_dir (23-27)
packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
  • temp_dir (18-21)
packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (3)
packages/qdrant-loader-core/src/qdrant_loader_core/logging.py (1)
  • get_logger (354-357)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/utils/logging.py (1)
  • get_logger (129-132)
packages/qdrant-loader-core/tests/test_logging_core.py (1)
  • TestHandler (18-22)
🪛 Gitleaks (8.29.1)
packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py

[high] 81-81: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


[high] 82-82: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🔇 Additional comments (24)
packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py (1)

96-107: Exception-path test correctly aligned with helper-based implementation

Mocking _create_db_dir_helper to raise OSError and asserting a ClickException containing "Failed to create directory" matches the new _create_database_directory behavior and avoids filesystem dependencies, which is good for cross-platform stability and determinism in tests.

packages/qdrant-loader/tests/unit/cli/test_cli.py (1)

145-150: Refined exception test matches refactored directory-creation flow

Switching to mock _create_db_dir_helper and asserting ClickException with "Failed to create directory" correctly tracks the new _create_database_directory implementation and keeps the test independent of actual path semantics on different OSes.

packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (3)

98-110: Consistent pattern for size‑limit test is appropriate

Same pattern here: capturing tmp_path after NamedTemporaryFile, asserting the size‑limit failure, and then guaranteeing os.unlink via finally is correct and consistent with the previous test. This avoids Windows handle issues while still enforcing the behavior you care about.


195-227: Comprehensive file‑type info test now properly uses a closed path

Capturing tmp_path then calling get_file_type_info(tmp_path) after the context ensures the handle is closed on Windows while still allowing os.path.getsize and type detection to work as in production. The assertions over all expected fields and PDF‑specific values are clear, and cleanup in finally is robust.


234-251: Fallback document test aligns with production usage and cleans up correctly

Passing tmp_path into create_fallback_document matches the real code expectations (string path, not open handle). The assertions verify both structure and key content (status, error message, filename), and the try/finally + os.unlink pattern maintains Windows compatibility and avoids leaking temp files.

packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1)

178-181: LGTM! Robust Windows path handling.

The use of .resolve() for path normalization and .parents for ancestry checking is a solid cross-platform approach that correctly handles Windows short paths and ensures consistent behavior across operating systems.

packages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.py (1)

105-117: LGTM! Defensive programming for rate calculation.

The introduction of current_window_elapsed with a None-check and the division-by-zero guard for current_window_rate computation ensures robustness. The fallback to 0.0 is appropriate when no time has elapsed or the window hasn't started.

packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py (2)

125-125: LGTM! Relaxed assertion for cross-platform reliability.

Changing from > 0 to >= 0 accommodates scenarios where operations complete instantly, which is more likely on Windows or during fast test execution. This aligns with the PR's Windows compatibility goals.


321-321: LGTM! Clarifying comments for computed property.

The comments correctly document that average_conversion_time is now a computed property rather than a stored field, helping future maintainers understand the behavior.

Also applies to: 345-345

packages/qdrant-loader-core/tests/unit/quality/test_module_sizes.py (1)

13-13: LGTM! Reasonable exemption for logging infrastructure.

The 410-line exemption for the logging module is justified given its role as core infrastructure with structured logging support. The modest 10-line increase above the default limit is appropriate for this foundational component.

packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (2)

36-44: LGTM! Good Windows compatibility pattern.

Storing the file path separately and wrapping cleanup in try/except to tolerate OSError and PermissionError properly handles Windows file locking issues.


31-509: Excellent Windows compatibility improvements.

The systematic changes throughout this test file properly address Windows-specific file handling challenges:

  • Separating file paths from file handles prevents handle-related issues
  • Guarded cleanup tolerates Windows file locking behavior
  • Platform-specific test skip for chmod operations is appropriate

These changes align well with the PR's goal of achieving Windows compatibility.

packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py (1)

63-68: LGTM! Cross-drive path handling implemented correctly.

The try/except ValueError appropriately handles Windows cross-drive scenarios where os.path.relpath cannot compute a relative path between drives (e.g., C:\ and D:). The fallback to using the absolute path is a sensible approach.

packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py (2)

92-93: LGTM! Test appropriately relaxed for cross-platform compatibility.

The assertion now accepts platform-specific path variations while still validating that the file_directory metadata is correctly extracted. This aligns well with the cross-drive path handling implemented in the source code.


164-165: LGTM! Consistent test relaxation for platform compatibility.

This change mirrors the adjustment made in the earlier test, ensuring the error handling test remains valid across different platforms and path configurations.

packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py (1)

57-64: LGTM! Consistent cross-platform path handling.

The implementation correctly mirrors the approach used in git/file_processor.py, ensuring consistent behavior across both Git and LocalFile connectors. The try/except ValueError handles Windows cross-drive scenarios, and the path normalization with forward slashes ensures consistent pattern matching regardless of the platform.

packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)

148-155: LGTM! Better test strategy for cross-platform compatibility.

Switching from chmod to mocking os.access is an excellent improvement. This approach works reliably on Windows (where chmod doesn't restrict file access in the same way) while still validating that the code correctly handles unreadable files. The comment clearly explains the rationale for this change.

packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py (1)

68-76: LGTM! Consistent implementation across connectors.

This implementation is identical to the changes in localfile/file_processor.py, ensuring both Git and LocalFile connectors handle cross-drive paths consistently on Windows. The try/except ValueError pattern is appropriate, and the path normalization ensures downstream matching works correctly regardless of platform path separators.

packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py (1)

49-54: Excellent Windows compatibility fix.

The change from hardcoded absolute paths to temp_dir-based paths correctly addresses cross-drive issues on Windows. The fallback logic (mock_config.temp_dir or tempfile.gettempdir()) ensures the fixture works whether or not temp_dir is explicitly configured, and os.path.join provides platform-appropriate path construction.

packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (3)

52-53: LGTM: Correct API method name.

The update from getLogger to get_logger aligns with the LoggingConfig API, and the comment accurately reflects the structured logging capability.


64-74: LGTM: Formatter propagation ensures redaction visibility.

The defensive logic correctly propagates the root logger's formatter to the test handler, ensuring that redacted output is captured correctly. The checks for handler and formatter existence are appropriate safeguards.


78-83: LGTM: Correct structured logging format for redaction.

The change from printf-style formatting to structured logging with keyword arguments (token=, api_key=) is the correct approach for enabling field-level redaction. The test fixtures are intentionally sensitive-looking strings to validate redaction behavior.

Note: The Gitleaks warnings (lines 81-82) are false positives—these are test fixtures specifically designed to verify that redaction properly masks sensitive values in log output.

packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (2)

158-230: LGTM: Proper resource cleanup structure.

The try-finally block ensures the symlink is properly cleaned up even if the test fails, which is particularly important for Windows compatibility where lingering symlinks could cause issues with subsequent test runs.


152-156: Good Windows compatibility fix that aligns with established test patterns.

The approach of catching OSError and skipping the test on Windows when symlink privileges are missing is sound and aligns with the PR's Windows compatibility objectives. The error detection using string matching "privilege" in str(e).lower() is consistent with similar symlink-related OSError handling elsewhere in the test suite (e.g., test_website_build_edge_cases.py:269). Windows error code 1314 (ERROR_PRIVILEGE_NOT_HELD) is confirmed as the expected error for symlink creation privilege issues, and the current approach pragmatically handles it without additional complexity.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py (1)

92-110: Good targeted coverage of the helper exception path

Patching _create_db_dir_helper to raise OSError and asserting the translated ClickException (with the expected message text) gives solid, platform-independent coverage of the failure path. If you ever want to strengthen this further, you could optionally assert that the logger is called with an error message, but the current test is already clear and sufficient.

packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)

54-75: Cross-platform temp file handling for PDF conversion looks correct

The switch to capturing tmp_path inside the NamedTemporaryFile block and then using a try/finally with guarded os.unlink(tmp_path) gives you a Windows-safe pattern: the file handle is closed before conversion, and deletion happens even when the test is skipped. The structure is sound for the Windows locking semantics you’re targeting.

One small nit: catching a broad Exception before calling pytest.skip(...) will also skip the test on genuine regressions in convert_file, not just missing/unsupported MarkItDown environments. If you ever want tighter signal here, consider narrowing that except to the specific error(s) that indicate an unavailable conversion environment, or adding a comment that this broad skip is intentional to keep CI noise down.


98-110: Consistent path-based usage and guarded cleanup; consider a small helper to DRY it up

All of these tests now follow the same, Windows-friendly pattern:

  • Create a NamedTemporaryFile(..., delete=False), write data, flush(), and capture tmp_path = tmp_file.name inside the context.
  • Use tmp_path only after the with block (so the handle is closed before convert_file / is_supported_for_conversion / get_file_type_info / create_fallback_document run).
  • Wrap usage in a try/finally and call os.unlink(tmp_path) while suppressing OSError and PermissionError to avoid spurious failures on locked or already-removed files.

This aligns well with the core implementation in FileConverter and FileDetector that expects plain string paths and uses os.path.getsize / detect_file_type on them, so the integration surface is correct.

If you want to reduce duplication, you could factor the try: os.unlink(tmp_path) except (OSError, PermissionError): pass pattern into a small helper (or a fixture/context manager in tests) that owns temp-file cleanup. That’s entirely optional but would make future changes to cleanup behavior (e.g., adding logging) easier.

Also applies to: 176-186, 193-225, 232-249

packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1)

176-183: Resolved‑path assertions look correct; inner Path import is redundant

Using Path(...).resolve() for both actual_path and expected_path, and asserting the resolved workspace directory is among Path(actual_path).resolve().parents, is a solid way to avoid Windows short‑path issues while still verifying the DB lives under the workspace.

Minor cleanup: Path is already imported at the top of this file (Line [5]), so the local from pathlib import Path inside the test is redundant and can be removed.

-        # Normalize paths for comparison (handles Windows short paths)
-        from pathlib import Path
-
-        assert Path(actual_path).resolve() == Path(expected_path).resolve()
+        # Normalize paths for comparison (handles Windows short paths)
+        assert Path(actual_path).resolve() == Path(expected_path).resolve()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5acebe1 and d1c1a13.

📒 Files selected for processing (20)
  • packages/qdrant-loader-core/tests/unit/quality/test_module_sizes.py (1 hunks)
  • packages/qdrant-loader-mcp-server/tests/integration/test_phase1_2_simple_integration.py (2 hunks)
  • packages/qdrant-loader-mcp-server/tests/unit/search/test_topic_search_chain.py (1 hunks)
  • packages/qdrant-loader/pyproject.toml (1 hunks)
  • packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py (1 hunks)
  • packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py (2 hunks)
  • packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py (1 hunks)
  • packages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.py (1 hunks)
  • packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (3 hunks)
  • packages/qdrant-loader/tests/unit/cli/test_cli.py (1 hunks)
  • packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py (2 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (2 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (10 hunks)
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py (1 hunks)
  • packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • packages/qdrant-loader-mcp-server/tests/unit/search/test_topic_search_chain.py
🚧 Files skipped from review as they are similar to previous changes (9)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py
  • packages/qdrant-loader/src/qdrant_loader/core/monitoring/processing_stats.py
  • packages/qdrant-loader-core/tests/unit/quality/test_module_sizes.py
  • packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py
  • packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py
  • packages/qdrant-loader/tests/unit/cli/test_cli.py
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/back-end-dev.mdc)

**/*.py: Use Python 3.x for back-end implementation and code examples
When building LLM/AI workflows, prefer and follow best practices of PydanticAI or LangChain
Use Qdrant as the vector database with performant indexing and query patterns
Design and implement RESTful endpoints with clear resource modeling and proper status codes
Implement authentication/authorization using OAuth2 or JWT; follow secure session and token handling

Files:

  • packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py
  • packages/qdrant-loader/tests/integration/test_file_conversion_integration.py
  • packages/qdrant-loader-mcp-server/tests/integration/test_phase1_2_simple_integration.py
  • packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py
  • packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py
  • packages/qdrant-loader/tests/unit/config/test_workspace_integration.py
  • packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py
  • packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py
🧠 Learnings (1)
📚 Learning: 2025-08-12T09:02:25.128Z
Learnt from: martin-papy
Repo: martin-papy/qdrant-loader PR: 43
File: docs/developers/testing/README.md:36-40
Timestamp: 2025-08-12T09:02:25.128Z
Learning: Both packages/qdrant-loader and packages/qdrant-loader-mcp-server directories contain pyproject.toml files at their root level, making them installable with pip install -e for editable development installations.

Applied to files:

  • packages/qdrant-loader/pyproject.toml
🧬 Code graph analysis (7)
packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (2)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/utils/logging.py (1)
  • get_logger (129-132)
packages/qdrant-loader-core/tests/test_logging_core.py (1)
  • TestHandler (18-22)
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_converter.py (2)
  • convert_file (396-436)
  • create_fallback_document (456-472)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_detector.py (2)
  • is_supported_for_conversion (152-200)
  • get_file_type_info (202-240)
packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (2)
packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
  • temp_dir (18-21)
packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1)
  • temp_dir (23-27)
packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py (1)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/components/models/hybrid.py (1)
  • file_path (60-61)
packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1)
packages/qdrant-loader/tests/unit/config/test_workspace.py (3)
  • temp_workspace (22-36)
  • temp_workspace (141-150)
  • temp_workspace (222-231)
packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py (4)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/components/models/hybrid.py (1)
  • file_path (60-61)
packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
  • temp_dir (18-21)
packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1)
  • temp_dir (23-27)
packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1)
  • temp_dir (18-29)
packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py (2)
packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (2)
  • mock_repo (31-35)
  • temp_dir (23-27)
packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
  • temp_dir (18-21)
🪛 Gitleaks (8.29.1)
packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py

[high] 81-81: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


[high] 82-82: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🔇 Additional comments (8)
packages/qdrant-loader-mcp-server/tests/integration/test_phase1_2_simple_integration.py (1)

309-318: Verify that the 3.3x timing relaxation doesn't mask regressions.

The end-to-end timing threshold increased from 1500ms to 5000ms for CI variance with real spaCy models. While this supports Windows compatibility and reduces flaky tests, ensure that genuine performance regressions in the search chaining logic would still be caught by other means (e.g., benchmarking in controlled environments, performance monitoring).

packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py (2)

6-6: Past review addressed - import correctly placed at module level.

The tempfile import is now at module level as requested in the previous review, following PEP 8 conventions with proper alphabetical ordering among standard library imports.


44-58: LGTM! Windows-compatible path generation implemented correctly.

The fixture changes properly address Windows compatibility:

  • Adding mock_config parameter allows accessing temp_dir configuration
  • Using temp_dir or tempfile.gettempdir() generates Windows-safe paths that avoid cross-drive issues (e.g., mixing C:\ and D:\ paths)
  • The comment on line 49 clearly explains the purpose
  • All test methods that use mock_git_ops also request mock_config, satisfying the dependency
  • Tests verify file names (line 109) rather than full paths, so the temp_dir-based approach is appropriate

This aligns with the PR's objective of fixing Windows compatibility issues across the test suite.

packages/qdrant-loader/tests/unit/utils/test_logging_redaction.py (2)

52-53: Good switch to LoggingConfig.get_logger for structured logging

Using LoggingConfig.get_logger("stdlib.logger") (instead of a plain stdlib logger) keeps this test aligned with the real structured logging/redaction pipeline configured by LoggingConfig.setup, so you’re validating the right behavior.


64-74: Formatter reuse pattern is correct and robust

Copying the formatter from the first existing root handler (when present) onto test_handler is a simple and effective way to keep test output formatting consistent with production logging, while safely falling back to the default formatter if no handlers/formatters exist. No functional issues here.

packages/qdrant-loader/src/qdrant_loader/connectors/localfile/file_processor.py (1)

57-71: Cross‑drive relpath handling and normalization now behave safely

Catching ValueError from os.path.relpath and explicitly skipping the file with a warning avoids the previous bug where falling back to an absolute path broke include/exclude pattern matching. Normalizing rel_path to forward slashes before all pattern checks keeps behavior consistent across platforms. As long as “skip cross‑drive files” is acceptable for the LocalFile connector, this looks solid.

packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1)

141-157: Symlink privilege handling on Windows is robust and test‑friendly

Wrapping symlink_dir.symlink_to(temp_dir) in a try/except and skipping the test when the error message indicates missing symlink privileges on Windows prevents spurious failures for non‑admin users while still surfacing real errors on other platforms. Cleanup via the existing finally remains intact. This is a good trade‑off for cross‑platform CI.

packages/qdrant-loader/src/qdrant_loader/connectors/git/file_processor.py (1)

68-82: Git file processor cross‑drive handling aligns with LocalFile and avoids pattern bugs

The new try/except around os.path.relpath(file_path, self.temp_dir) correctly handles Windows cross‑drive ValueError by logging a warning and returning False instead of feeding an absolute path into the pattern logic. Normalizing rel_path to forward slashes before matching keeps include/exclude semantics consistent across platforms and with the LocalFile processor.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/qdrant-loader/tests/unit/core/file_conversion/test_markitdown_windows_fix.py (1)

19-41: Add cleanup to restore signal module state and prevent test isolation issues.

Multiple tests (test_windows_signal_monkey_patch, test_signal_alarm_no_op_windows) modify the global signal module by deleting attributes (SIGALRM, alarm) without restoring the original state. This violates test isolation and can cause flaky failures if tests run in different orders or if other tests depend on these attributes.

Consider using pytest fixtures or setUp/tearDown to capture and restore the signal module state:

import pytest

@pytest.fixture
def preserve_signal_state():
    """Preserve and restore signal module attributes."""
    import signal
    original_sigalrm = getattr(signal, 'SIGALRM', None)
    original_alarm = getattr(signal, 'alarm', None)
    
    yield
    
    # Restore original state
    if original_sigalrm is not None:
        signal.SIGALRM = original_sigalrm
    elif hasattr(signal, 'SIGALRM'):
        delattr(signal, 'SIGALRM')
    
    if original_alarm is not None:
        signal.alarm = original_alarm
    elif hasattr(signal, 'alarm'):
        delattr(signal, 'alarm')

Then apply the fixture to affected tests:

def test_windows_signal_monkey_patch(self, preserve_signal_state):
    # test implementation

Also applies to: 116-139

🧹 Nitpick comments (9)
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py (1)

75-82: Windows skip is correct; consider a decorator + feature-detection for style

The inline sys.platform == "win32" check and early pytest.skip(...) are a correct way to avoid signal.alarm/SIGALRM issues on Windows and keep the test green.

If you want to tighten this up and keep test intent visible at collection time, consider:

  • Moving import sys to the module level, and
  • Using a decorator with feature detection, e.g.:
import sys
import signal

@pytest.mark.skipif(
    sys.platform == "win32" or not hasattr(signal, "alarm"),
    reason="signal.alarm not available on this platform",
)
def test_timeout_handler_context_manager(self):
    ...

This preserves the behavior while making the platform requirement explicit in the test metadata.

packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/enhanced/__init__.py (1)

27-29: Import reorganization unrelated to Windows compatibility.

This change moves SimilarityMetric to a separate import block but doesn't appear connected to the Windows compatibility fixes described in the PR objectives (stdin handling, signal handlers, timing assertions, file handle cleanup). The reorganization is harmless and doesn't affect functionality or the public API.

Consider moving purely stylistic changes like this to a separate cleanup PR to keep Windows compatibility fixes focused.

packages/qdrant-loader/tests/unit/connectors/jira/test_jira_connector.py (1)

8-9: Remove redundant local import at line 300.

The top-level import of HTTPError is now available throughout the module. The local import at line 300 within test_error_handling is redundant and should be removed to avoid duplication.

Apply this diff to remove the duplicate import:

 async def mock_make_request(*args, **kwargs):
     import requests
-    from requests.exceptions import HTTPError

     response = requests.Response()
packages/qdrant-loader/tests/unit/core/test_attachment_downloader.py (1)

10-10: LGTM: Minor formatting improvement.

The blank line improves import grouping by separating third-party imports from local project imports, aligning with PEP 8 conventions.

packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (1)

44-250: Consider extracting the temp file pattern into a helper fixture.

The pattern of creating a temp file, storing the path, closing it, processing, and cleaning up is repeated across five test methods. While the current implementation is correct, you could reduce duplication by creating a pytest fixture or context manager:

@pytest.fixture
def temp_file_path():
    """Provide a Windows-safe temporary file path."""
    def _create(suffix="", content=b""):
        with tempfile.NamedTemporaryFile(
            mode="wb", suffix=suffix, delete=False
        ) as tmp_file:
            tmp_file.write(content)
            tmp_file.flush()
            tmp_path = tmp_file.name
        try:
            yield tmp_path
        finally:
            try:
                os.unlink(tmp_path)
            except (OSError, PermissionError):
                pass
    return _create

This is optional and can be deferred if you prefer explicit test logic.

packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py (1)

315-347: Average-conversion-time property usage looks correct; consider a small extra assertion

The comment and expectation that average_conversion_time is derived from total_conversion_time / total_files_processed (45.0 / 10 = 4.5) match the new property-based behavior and the summary assertion. As a small enhancement, you could also assert monitor.conversion_metrics.average_conversion_time == 4.5 before calling get_conversion_summary() to directly validate the property as well as the summary.

packages/qdrant-loader/tests/unit/core/file_conversion/test_markitdown_windows_fix.py (2)

43-82: Consider testing with real MarkItDown to verify signal error prevention.

The test mocks both _validate_file and _get_markitdown, which means the actual MarkItDown conversion code that might encounter signal issues is never executed. This limits the test's effectiveness in verifying that Windows signal errors are actually prevented.

If feasible, consider a test approach that exercises the real MarkItDown path:

def test_markitdown_signal_error_prevention(self):
    """Test that MarkItDown signal errors are prevented on Windows."""
    config = FileConversionConfig(
        max_file_size=50 * 1024 * 1024,
        conversion_timeout=300,
        markitdown=MarkItDownConfig(),
    )
    
    converter = FileConverter(config)
    
    # Create a minimal valid PDF or use a fixture
    with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp_file:
        # Write minimal valid PDF content
        tmp_file.write(b"%PDF-1.4\n%EOF")
        tmp_file.flush()
        tmp_path = Path(tmp_file.name)
    
    try:
        with patch("sys.platform", "win32"):
            # Let it use real MarkItDown to catch actual signal errors
            result = converter.convert_file(str(tmp_path))
            # Just verify no signal-related exception was raised
            assert result is not None
    finally:
        tmp_path.unlink(missing_ok=True)

Alternatively, if MarkItDown requires complex setup, at minimum mock only _validate_file to let the MarkItDown instantiation code run, which is where signal-related issues might occur.


19-139: Consider more explicit control over monkey-patching for robust testing.

The tests rely on import-time monkey-patching via importlib.reload() and patch("sys.platform"). This approach can be fragile because:

  1. Platform checks may have already occurred in previously-imported modules
  2. importlib.reload() doesn't reload all dependent modules and can leave stale references
  3. Import-time side effects are harder to control and reason about in tests

Consider refactoring the monkey-patch logic to be explicitly callable:

# In file_converter.py
def apply_windows_signal_patch():
    """Apply Windows signal compatibility patch."""
    if sys.platform == "win32":
        import signal
        if not hasattr(signal, 'SIGALRM'):
            signal.SIGALRM = 14
        if not hasattr(signal, 'alarm'):
            signal.alarm = lambda x: None

# Then in tests:
def test_windows_signal_monkey_patch(self):
    """Test that Windows signal patch can be applied."""
    import signal
    
    # Remove attributes to simulate Windows
    if hasattr(signal, "SIGALRM"):
        delattr(signal, "SIGALRM")
    
    # Explicitly apply patch
    from qdrant_loader.core.file_conversion.file_converter import apply_windows_signal_patch
    with patch("sys.platform", "win32"):
        apply_windows_signal_patch()
    
    # Verify
    assert hasattr(signal, "SIGALRM")

This gives you explicit control over when the patch is applied without relying on import-time execution or module reloading.

packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py (1)

165-166: Async mock conversion for scroll method is correct.

The conversion to AsyncMock maintains consistency with the search method and correctly returns the expected tuple format (list_of_points, next_page_offset).

For consistency with the suggestion on lines 150-151, consider using the attribute configuration pattern:

-        # scroll method returns tuple: (list_of_points, next_page_offset) - must be async
-        mock_client.scroll = AsyncMock(return_value=(mock_scroll_results, None))
+        # scroll method returns tuple: (list_of_points, next_page_offset) - must be async
+        mock_client.scroll.return_value = (mock_scroll_results, None)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8534e6d and afe3d59.

📒 Files selected for processing (105)
  • packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/enhanced/__init__.py (1 hunks)
  • packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py (2 hunks)
  • packages/qdrant-loader/src/qdrant_loader/core/state/models.py (1 hunks)
  • packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py (1 hunks)
  • packages/qdrant-loader/src/qdrant_loader/core/text_processing/topic_modeler.py (1 hunks)
  • packages/qdrant-loader/tests/conftest.py (1 hunks)
  • packages/qdrant-loader/tests/integration/connectors/publicdocs/test_publicdocs_integration.py (1 hunks)
  • packages/qdrant-loader/tests/integration/core/chunking/test_chunking_integration.py (1 hunks)
  • packages/qdrant-loader/tests/integration/core/chunking/test_modular_chunking_integration.py (1 hunks)
  • packages/qdrant-loader/tests/integration/core/state/test_state_management.py (1 hunks)
  • packages/qdrant-loader/tests/integration/core/test_chunking_consistency.py (1 hunks)
  • packages/qdrant-loader/tests/integration/core/test_pipeline_integration.py (1 hunks)
  • packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (4 hunks)
  • packages/qdrant-loader/tests/scripts/check_metadata.py (1 hunks)
  • packages/qdrant-loader/tests/scripts/check_nested_metadata.py (1 hunks)
  • packages/qdrant-loader/tests/scripts/query_qdrant.py (1 hunks)
  • packages/qdrant-loader/tests/unit/cli/test_asyncio.py (1 hunks)
  • packages/qdrant-loader/tests/unit/cli/test_cli.py (2 hunks)
  • packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py (2 hunks)
  • packages/qdrant-loader/tests/unit/cli/test_cli_comprehensive.py (1 hunks)
  • packages/qdrant-loader/tests/unit/cli/test_project_commands.py (1 hunks)
  • packages/qdrant-loader/tests/unit/cli/test_project_commands_coverage.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_base.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_chunking_strategy_configs.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_config_loader.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_config_module.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_global_config.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_llm_descriptions_config.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_models.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_parser.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_qdrant_config.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_settings_qdrant.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_source_config.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_template_config.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_test_template_config.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_traditional_config.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_validator.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_workspace.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (2 hunks)
  • packages/qdrant-loader/tests/unit/connectors/confluence/test_confluence_connector.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_adapter.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py (2 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (2 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (2 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_windows_compatibility.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py (3 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor_enhanced.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/jira/test_jira_connector.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_file_processor.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (2 hunks)
  • packages/qdrant-loader/tests/unit/connectors/localfile/test_windows_path_handling.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_config.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_connector.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_content_extraction.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py (2 hunks)
  • packages/qdrant-loader/tests/unit/connectors/shared/test_http_policy.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/shared/test_rate_limiter.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/test_base.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/test_base_connector.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/base/test_base_chunk_processor.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/base/test_base_document_parser.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/base/test_base_metadata_extractor.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/base/test_base_section_splitter.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/default/test_default_strategy_modular.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/default/test_text_document_parser.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/default/test_text_metadata_extractor.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/json/test_json_section_splitter.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/test_base_strategy.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/test_code_strategy_modern.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/test_default_strategy.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/test_html_strategy_modern.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/test_json_strategy_modern.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/test_markdown_strategy.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/chunking/test_chunking_service.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/embedding/test_embedding_service.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (11 hunks)
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py (2 hunks)
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_detector.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_markitdown_api_key.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_markitdown_windows_fix.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py (4 hunks)
  • packages/qdrant-loader/tests/unit/core/monitoring/test_prometheus_metrics.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/monitoring/test_resource_monitor.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/pipeline/test_async_ingestion_pipeline.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/pipeline/test_document_pipeline.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/pipeline/test_orchestrator.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/pipeline/test_resource_manager.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/pipeline/test_source_filter.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/pipeline/test_source_processor.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/pipeline/workers/test_chunking_worker.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/pipeline/workers/test_embedding_worker.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/pipeline/workers/test_upsert_worker.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/state/test_document_state_manager.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/state/test_state_change_detector.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/state/test_state_manager.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/test_attachment_downloader.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/test_embedding_service.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/test_init_collection.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/test_project_manager.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/test_qdrant_manager.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/text_processing/test_semantic_analyzer.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/text_processing/test_text_processor.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/text_processing/test_topic_modeler.py (1 hunks)
  • packages/qdrant-loader/tests/unit/utils/test_version_check.py (1 hunks)
  • packages/qdrant-loader/tests/unit/utils/test_version_check_comprehensive.py (1 hunks)
✅ Files skipped from review due to trivial changes (70)
  • packages/qdrant-loader/tests/unit/core/pipeline/test_resource_manager.py
  • packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_content_extraction.py
  • packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_config.py
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/test_json_strategy_modern.py
  • packages/qdrant-loader/tests/unit/config/test_source_config.py
  • packages/qdrant-loader/tests/unit/config/test_test_template_config.py
  • packages/qdrant-loader/tests/scripts/check_metadata.py
  • packages/qdrant-loader/tests/unit/config/test_workspace.py
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/test_code_strategy_modern.py
  • packages/qdrant-loader/tests/unit/config/test_config_loader.py
  • packages/qdrant-loader/tests/unit/core/test_init_collection.py
  • packages/qdrant-loader/tests/unit/core/test_embedding_service.py
  • packages/qdrant-loader/tests/unit/config/test_settings_qdrant.py
  • packages/qdrant-loader/tests/unit/core/pipeline/test_async_ingestion_pipeline.py
  • packages/qdrant-loader/tests/unit/core/test_qdrant_manager.py
  • packages/qdrant-loader/tests/integration/core/test_pipeline_integration.py
  • packages/qdrant-loader/tests/unit/core/pipeline/test_document_pipeline.py
  • packages/qdrant-loader/tests/unit/config/test_global_config.py
  • packages/qdrant-loader/tests/unit/core/pipeline/workers/test_chunking_worker.py
  • packages/qdrant-loader/tests/unit/connectors/shared/test_http_policy.py
  • packages/qdrant-loader/tests/unit/core/pipeline/test_source_processor.py
  • packages/qdrant-loader/tests/unit/core/chunking/test_chunking_service.py
  • packages/qdrant-loader/tests/integration/core/state/test_state_management.py
  • packages/qdrant-loader/tests/unit/core/state/test_state_change_detector.py
  • packages/qdrant-loader/tests/unit/config/test_chunking_strategy_configs.py
  • packages/qdrant-loader/tests/unit/connectors/shared/test_rate_limiter.py
  • packages/qdrant-loader/tests/unit/core/pipeline/test_orchestrator.py
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/base/test_base_chunk_processor.py
  • packages/qdrant-loader/tests/unit/core/state/test_document_state_manager.py
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/test_base_strategy.py
  • packages/qdrant-loader/tests/unit/utils/test_version_check.py
  • packages/qdrant-loader/tests/integration/core/test_chunking_consistency.py
  • packages/qdrant-loader/tests/unit/connectors/test_base_connector.py
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/default/test_text_metadata_extractor.py
  • packages/qdrant-loader/tests/unit/config/test_base.py
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/test_markdown_strategy.py
  • packages/qdrant-loader/tests/unit/cli/test_project_commands.py
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/default/test_default_strategy_modular.py
  • packages/qdrant-loader/tests/unit/core/pipeline/workers/test_embedding_worker.py
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/test_default_strategy.py
  • packages/qdrant-loader/tests/scripts/check_nested_metadata.py
  • packages/qdrant-loader/tests/unit/core/text_processing/test_semantic_analyzer.py
  • packages/qdrant-loader/tests/unit/config/test_llm_descriptions_config.py
  • packages/qdrant-loader/tests/unit/core/text_processing/test_text_processor.py
  • packages/qdrant-loader/tests/unit/core/text_processing/test_topic_modeler.py
  • packages/qdrant-loader/tests/unit/utils/test_version_check_comprehensive.py
  • packages/qdrant-loader/tests/unit/connectors/confluence/test_confluence_connector.py
  • packages/qdrant-loader/tests/unit/config/test_parser.py
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/json/test_json_section_splitter.py
  • packages/qdrant-loader/tests/unit/cli/test_cli_comprehensive.py
  • packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_file_processor.py
  • packages/qdrant-loader/tests/unit/core/embedding/test_embedding_service.py
  • packages/qdrant-loader/tests/unit/config/test_template_config.py
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/test_html_strategy_modern.py
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/base/test_base_metadata_extractor.py
  • packages/qdrant-loader/tests/unit/config/test_traditional_config.py
  • packages/qdrant-loader/tests/unit/config/test_config_module.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_windows_compatibility.py
  • packages/qdrant-loader/tests/conftest.py
  • packages/qdrant-loader/tests/unit/core/monitoring/test_resource_monitor.py
  • packages/qdrant-loader/tests/unit/core/monitoring/test_prometheus_metrics.py
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_markitdown_api_key.py
  • packages/qdrant-loader/tests/unit/cli/test_project_commands_coverage.py
  • packages/qdrant-loader/tests/integration/core/chunking/test_modular_chunking_integration.py
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/base/test_base_document_parser.py
  • packages/qdrant-loader/tests/unit/core/pipeline/workers/test_upsert_worker.py
  • packages/qdrant-loader/tests/scripts/query_qdrant.py
  • packages/qdrant-loader/src/qdrant_loader/core/text_processing/topic_modeler.py
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/base/test_base_section_splitter.py
  • packages/qdrant-loader/src/qdrant_loader/core/state/models.py
🚧 Files skipped from review as they are similar to previous changes (8)
  • packages/qdrant-loader/tests/unit/config/test_workspace_integration.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py
  • packages/qdrant-loader/tests/unit/cli/test_cli.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py
  • packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/back-end-dev.mdc)

**/*.py: Use Python 3.x for back-end implementation and code examples
When building LLM/AI workflows, prefer and follow best practices of PydanticAI or LangChain
Use Qdrant as the vector database with performant indexing and query patterns
Design and implement RESTful endpoints with clear resource modeling and proper status codes
Implement authentication/authorization using OAuth2 or JWT; follow secure session and token handling

Files:

  • packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py
  • packages/qdrant-loader/tests/unit/connectors/test_base.py
  • packages/qdrant-loader/tests/unit/core/test_attachment_downloader.py
  • packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_connector.py
  • packages/qdrant-loader/tests/unit/connectors/localfile/test_windows_path_handling.py
  • packages/qdrant-loader/tests/unit/config/test_qdrant_config.py
  • packages/qdrant-loader/tests/integration/test_file_conversion_integration.py
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py
  • packages/qdrant-loader/tests/unit/cli/test_asyncio.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor_enhanced.py
  • packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py
  • packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py
  • packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py
  • packages/qdrant-loader/tests/unit/core/chunking/strategy/default/test_text_document_parser.py
  • packages/qdrant-loader/tests/unit/config/test_models.py
  • packages/qdrant-loader/tests/integration/core/chunking/test_chunking_integration.py
  • packages/qdrant-loader/tests/unit/core/pipeline/test_source_filter.py
  • packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/enhanced/__init__.py
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_detector.py
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_markitdown_windows_fix.py
  • packages/qdrant-loader/tests/unit/connectors/jira/test_jira_connector.py
  • packages/qdrant-loader/tests/unit/core/state/test_state_manager.py
  • packages/qdrant-loader/tests/unit/core/test_project_manager.py
  • packages/qdrant-loader/tests/integration/connectors/publicdocs/test_publicdocs_integration.py
  • packages/qdrant-loader/tests/unit/config/test_validator.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_adapter.py
  • packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py
🧬 Code graph analysis (3)
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_converter.py (2)
  • convert_file (396-436)
  • create_fallback_document (456-472)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_detector.py (2)
  • is_supported_for_conversion (152-200)
  • get_file_type_info (202-240)
packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py (2)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/hybrid/engine.py (1)
  • search (114-121)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/hybrid/api.py (1)
  • search (71-94)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/enhanced/__init__.py (1)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/search/enhanced/cdi/models.py (1)
  • SimilarityMetric (16-25)
🪛 GitHub Actions: Test and Coverage
packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py

[error] 1-1: TestRealEndToEndPhase2_3::test_real_find_similar_documents_integration failed due to attribute error in underlying vector search integration (AsyncQdrantClient.search missing). See full pytest run: 'pytest tests/ --cov=src --cov-report=xml:../../coverage-mcp.xml --cov-report=html:../../htmlcov-mcp -v'.

🔇 Additional comments (26)
packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor_enhanced.py (1)

8-13: Non-functional import spacing change is fine

The added blank line after the HttpUrl import is a harmless formatting tweak and does not affect test behavior. No further changes needed.

packages/qdrant-loader/tests/unit/core/chunking/strategy/default/test_text_document_parser.py (1)

4-4: Formatting-only change is fine

The added blank line after import pytest is a harmless formatting tweak and keeps imports readable; no further changes needed.

packages/qdrant-loader/tests/unit/core/test_project_manager.py (1)

9-10: AsyncSession import placement and usage look correct

Importing AsyncSession from sqlalchemy.ext.asyncio and using it as a spec for AsyncMock is the right pattern for async SQLAlchemy sessions; consolidating it in the main import block improves clarity and avoids duplication.

If you want to double‑check there’s no stray/duplicate AsyncSession import left elsewhere in the file or project, a quick text/AST search in the repo would be sufficient. Based on library context, this import path matches current SQLAlchemy async APIs.

packages/qdrant-loader/tests/unit/connectors/test_base.py (1)

5-7: LGTM!

The added blank line correctly separates third-party imports from local project imports, following PEP 8 import grouping conventions.

packages/qdrant-loader/tests/unit/connectors/git/test_adapter.py (1)

7-10: Import grouping / whitespace change is fine

The added blank line cleanly separates third‑party imports from the local GitPythonAdapter import and has no functional impact. Looks good to keep as‑is.

packages/qdrant-loader/tests/unit/core/file_conversion/test_file_detector.py (1)

9-11: Formatting-only import change looks good

The added blank line cleanly separates third-party (pytest) from local imports and has no behavioral impact. No further changes needed.

packages/qdrant-loader/tests/unit/core/pipeline/test_source_filter.py (1)

6-6: LGTM! Formatting improvement aligns with PEP 8.

The blank line addition follows Python's PEP 8 style guide for separating import groups (standard library/third-party imports from project imports), improving readability.

packages/qdrant-loader/tests/unit/config/test_qdrant_config.py (1)

5-5: LGTM!

The blank line correctly separates third-party imports from local project imports, following PEP 8 conventions.

packages/qdrant-loader/tests/integration/core/chunking/test_chunking_integration.py (1)

6-7: Import spacing change is fine

The added blank line after import pytest keeps imports visually grouped and does not affect behavior. Looks good as-is.

packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_connector.py (1)

7-18: Import grouping change is fine

The added blank line cleanly separates third‑party imports from local qdrant_loader imports; no behavioral or compatibility impact. Nothing else to change here.

packages/qdrant-loader/tests/unit/cli/test_asyncio.py (1)

7-7: LGTM!

Blank line properly separates third-party imports from local imports, following PEP 8 conventions.

packages/qdrant-loader/tests/unit/config/test_validator.py (1)

6-6: Minor formatting improvement aligns with PEP 8.

This blank line appropriately separates standard/third-party imports from project-specific imports, following Python style conventions.

packages/qdrant-loader/tests/unit/config/test_models.py (1)

6-6: LGTM!

The blank line appropriately separates third-party imports from local imports, following PEP 8 conventions.

packages/qdrant-loader/tests/unit/connectors/localfile/test_windows_path_handling.py (1)

5-6: LGTM!

The blank line correctly separates third-party imports from local project imports per PEP 8 conventions.

packages/qdrant-loader/tests/integration/connectors/publicdocs/test_publicdocs_integration.py (1)

10-11: LGTM!

The SourceType import is required for lines 45 and 65 where it's used in the test configuration. The blank line correctly groups imports.

packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py (2)

5-6: LGTM!

Proper import grouping following PEP 8 conventions.


145-154: Good improvement over the previous substring matching.

The startswith check addresses the concern from the past review about overly lenient substring matching. This correctly rejects cases where expected text appears mid-string (e.g., "Extra Malformed Title") while still accommodating BeautifulSoup's behavior of including trailing content from unclosed tags.

The added comments clearly document the expected behavior, which helps future maintainers understand the design choice.

packages/qdrant-loader/tests/unit/core/state/test_state_manager.py (1)

13-14: LGTM!

The import is correctly placed and properly aliased to distinguish from sqlite3.OperationalError. The exception is appropriately caught in test_initialization_error (line 88) to handle database operational errors from SQLAlchemy.

packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py (1)

5-7: Dependency verified and issue resolved.

The import statement on line 5 (from langchain_text_splitters import RecursiveCharacterTextSplitter) is now properly supported by the dependency declaration in pyproject.toml at line 47: langchain-text-splitters>=0.3.9. The past concern about a missing dependency has been addressed in this PR.

packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py (1)

101-110: Improved cross-platform test reliability.

The refactoring from using an actual invalid path to mocking _create_db_dir_helper eliminates platform-dependent filesystem behavior and makes the test deterministic. This aligns well with the PR's Windows compatibility objectives, as path permission handling can vary significantly across platforms.

packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)

44-76: Windows-safe temporary file handling implemented correctly.

The test now stores the file path, closes the file before processing, and cleans up in a finally block with appropriate exception handling. This prevents file locking issues on Windows where open files cannot be deleted.


189-226: Enhanced test coverage with Windows-safe file handling.

Good addition of comprehensive assertions validating all fields returned by get_file_type_info, including field presence checks and specific value verification for PDF files.

packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py (2)

9-11: Import and spacing change is fine

The import block (including the blank line after pytest) is clean and has no behavioral impact; it matches typical test-module style.


108-127: Relaxed conversion_time assertion avoids timing flakiness

Asserting conversion_time is not None and then conversion_time >= 0 is appropriate and makes the test robust when conversions complete so quickly that the measured duration is 0.0.

packages/qdrant-loader/tests/unit/core/file_conversion/test_markitdown_windows_fix.py (1)

8-12: LGTM! Import additions align with standardized config usage.

The added imports for FileConversionConfig and MarkItDownConfig are necessary for the test methods and follow standard import organization.

packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py (1)

150-151: I'll help you verify the review comment about the async mock conversion. Let me start by examining the test file and the relevant implementation code.
<function_calls>

#!/bin/bash

First, let's examine the test file to understand the context around lines 150-151

FILE="packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py"

Get line count first

wc -l "$FILE"

Then examine the relevant section (lines 120-210)

sed -n '120,210p' "$FILE" | cat -n


</function_calls>

<function_calls>

#!/bin/bash

Search for qdrant_client usage in the SearchEngine to understand how it's accessed

rg -n --type=py 'qdrant_client.' packages/qdrant-loader-mcp-server/src/ -A 2 -B 2 | head -100


</function_calls>

<function_calls>

#!/bin/bash

Let's find the SearchEngine class and see how qdrant_client is used

find packages/qdrant-loader-mcp-server/src -name "*.py" -type f | head -20


</function_calls>

<function_calls>

#!/bin/bash

Search for SearchEngine class definition

rg -l "class SearchEngine" packages/qdrant-loader-mcp-server/


</function_calls>

<function_calls>

#!/bin/bash

Get the test file and SearchEngine implementation

echo "=== TEST FILE ==="
sed -n '120,210p' packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py | cat -n

echo ""
echo "=== SEARCHING FOR SearchEngine CLASS ==="
rg -n "class SearchEngine" packages/qdrant-loader-mcp-server/src/ -A 30


</function_calls>

<function_calls>

#!/bin/bash

Let's first check if the test file exists and has the expected content

if [ -f "packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py" ]; then
echo "Test file exists"
wc -l "packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py"
else
echo "Test file not found"
find . -name "test_real_end_to_end" -type f 2>/dev/null
fi


</function_calls>

<function_calls>

#!/bin/bash

Search for any test files with similar names

find . -path "/tests/" -name "test.py" -type f 2>/dev/null | grep -i "end_to_end|phase" | head -10


</function_calls>

<function_calls>

#!/bin/bash

Let's check what exists in the packages directory

ls -la packages/ 2>/dev/null || echo "packages dir not found"

Try finding qdrant-loader-mcp-server

find . -type d -name "qdrantloader*" 2>/dev/null | head -5


</function_calls>

<function_calls>

#!/bin/bash

List all Python test files to understand structure

find . -name "test_*.py" -type f 2>/dev/null | head -20


</function_calls>

<function_calls>

#!/bin/bash

Check git ls-files to see the actual repo structure

git ls-files | grep -i "test.*phase|end_to_end" | head -20


</function_calls>

<function_calls>

#!/bin/bash

Get all files in git

git ls-files | head -50


</function_calls>

<function_calls>

#!/bin/bash

Find any python files in the repo

git ls-files "*.py" | head -30


</function_calls>

<function_calls>

#!/bin/bash

Get files matching the pattern in the review

git ls-files | grep "test_real_end_to_end"


</function_calls>

Based on my investigation, I need to verify the information against the actual codebase. Let me use a broader search approach to understand the context:
<function_calls>

#!/bin/bash

Let's see what's in the current directory structure

pwd
ls -la | head -20


</function_calls>

<function_calls>

#!/bin/bash

List the tree structure of relevant paths

echo "=== Looking for test files ==="
find . -name ".py" -path "/test*" 2>/dev/null | grep -i "phase|end_to_end" | head -20

echo ""
echo "=== All test files ==="
find . -name "test_*.py" 2>/dev/null | head -20


</function_calls>

<function_calls>

#!/bin/bash

Check if the file path from review comment actually exists with full path

test_file="packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py"

if [ -f "$test_file" ]; then
echo "File found!"
cat -n "$test_file" | sed -n '115,210p'
else
echo "File not found at: $test_file"
echo "Checking what's in packages dir:"
ls -la packages/ 2>/dev/null || find . -maxdepth 1 -type d | head -10
fi


</function_calls>

Let me check what the PR is actually about by looking at the repository structure and the related code:
<function_calls>

#!/bin/bash

List files more broadly

echo "=== Files in root ==="
ls -1 | head -20

echo ""
echo "=== Check git status ==="
git status --short 2>&1 | head -20


</function_calls>

<function_calls>

#!/bin/bash

Try to cat the exact file specified

cat -n "packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py" 2>&1 | head -100


</function_calls>

<function_calls>

#!/bin/bash

Check what files Python files exist related to qdrant

find . -name "*.py" | grep -i qdrant | head -20


</function_calls>

<function_calls>

#!/bin/bash

Search for AsyncMock usage in tests to understand the pattern

rg -n "AsyncMock" . --type py -B 3 -A 3 | head -100


</function_calls>

Based on the search results and the structure of the review comment, let me search more broadly for the actual test file and related implementation:
<function_calls>

#!/bin/bash

Check the current branch and recent files

echo "=== Current branch ==="
git branch -v | head -5

echo ""
echo "=== Recent modified files ==="
git log --oneline -10

echo ""
echo "=== Check if file exists in git ==="
git ls-files "test_real_end_to_end"


</function_calls>

Based on my investigation, I cannot locate the specified test file in the repository. This suggests either:

  1. The file path is incorrect in the review comment
  2. The file is not present in the current branch
  3. The repository structure is different from what's expected

Let me verify the review comment content itself by checking if there are any relevant async mock patterns:
<function_calls>

#!/bin/bash

Search broadly for any file with "end_to_end" or "phase2" in its name

echo "=== Files with end_to_end ==="
find . -name "end_to_end" 2>/dev/null

echo ""
echo "=== Files with phase2 ==="
find . -name "phase2*" 2>/dev/null

echo ""
echo "=== Files with phase ==="
find . -name "phase" 2>/dev/null

echo ""
echo "=== All python test files ==="
find . -path "/tests/" -name "*.py" 2>/dev/null | head -20


</function_calls>

<function_calls>

#!/bin/bash

Let's check what changed in this PR/branch

echo "=== Diff summary ==="
git diff --name-only origin/main 2>/dev/null | head -30 || git log --oneline -20

echo ""
echo "=== Check if we're looking at untracked files ==="
git status | head -30


</function_calls>

Let me take a different approach and analyze the review comment itself for validity:
<function_calls>

#!/bin/bash

Search for SearchEngine class that uses qdrant_client

echo "=== Searching for SearchEngine class ==="
rg -n "class SearchEngine" --type py -A 20

echo ""
echo "=== Searching for async def search ==="
rg -n "async def search" --type py -B 2 -A 5 | head -80


</function_calls>

<function_calls>

#!/bin/bash

Check for unittest.mock and AsyncMock imports/usage

echo "=== AsyncMock import and usage ==="
rg -n "from unittest.mock import|AsyncMock|from asyncio import" --type py | head -50

echo ""
echo "=== Check .return_value patterns ==="
rg -n ".return_value\s*=" --type py -B 2 -A 2 | head -100


</function_calls>

Based on my investigation of the repository, I cannot find the specific test file referenced in the review comment (packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py). The file does not appear to exist in the current branch or repository.

However, I can analyze the review comment itself based on general best practices for async mocking in Python. Let me verify the technical accuracy of the suggestions:
<function_calls>

AsyncMock return_value vs attribute replacement pattern best practices
</function_calls>


Summary of Findings

The referenced test file cannot be located in the repository; verification of the specific code changes and suggestions cannot be completed.

The file path packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py does not exist in the repository. Attempts to locate it using git ls-files, find, direct file access, and git status all returned no results.

Without access to the actual test file, I cannot verify:

  • Whether mock_client is initialized as AsyncMock() on line 125
  • Whether the attribute replacement pattern mock_client.search = AsyncMock(...) is actually used
  • Whether the mock injection at line 201 is effective
  • Whether the pipeline failure mentioned is real or what causes it
  • Whether the suggested alternative pattern would actually improve the code

The technical guidance in the review about AsyncMock patterns is generally sound, but it cannot be validated against the actual code being reviewed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py (2)

150-167: Async Qdrant client mocks are correctly aligned with awaited API

Using AsyncMock for mock_client.search and mock_client.scroll matches the real async usage (await qdrant_client.search/scroll(...)) and avoids the previous sync/await mismatch in the test path. This should make failures more realistic if the async contract changes.

As a small improvement, you could consider adding a spec to mock_client to catch accidental calls to non-existent methods, e.g. mock_client = AsyncMock(spec=SomeAsyncQdrantClientType), if that type is easily available.


203-207: Make vector_search_service mocking robust if the attribute exists but is None

The conditional currently only checks for the presence of the vector_search_service attribute:

if hasattr(search_engine.hybrid_search, 'vector_search_service'):
    search_engine.hybrid_search.vector_search_service.qdrant_client = mock_qdrant_client
    search_engine.hybrid_search.vector_search_service.openai_client = mock_openai_client

If some configuration sets vector_search_service to None but still defines the attribute, this will raise an AttributeError during fixture setup. You can make this more robust with a getattr + None check:

-        # Also mock the qdrant_client in vector_search_service if it exists
-        if hasattr(search_engine.hybrid_search, 'vector_search_service'):
-            search_engine.hybrid_search.vector_search_service.qdrant_client = mock_qdrant_client
-            search_engine.hybrid_search.vector_search_service.openai_client = mock_openai_client
+        # Also mock the qdrant_client in vector_search_service if it exists and is not None
+        vector_service = getattr(search_engine.hybrid_search, "vector_search_service", None)
+        if vector_service is not None:
+            vector_service.qdrant_client = mock_qdrant_client
+            vector_service.openai_client = mock_openai_client

This keeps the intent while avoiding brittle failures if initialization changes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between afe3d59 and 2f4d538.

📒 Files selected for processing (1)
  • packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/back-end-dev.mdc)

**/*.py: Use Python 3.x for back-end implementation and code examples
When building LLM/AI workflows, prefer and follow best practices of PydanticAI or LangChain
Use Qdrant as the vector database with performant indexing and query patterns
Design and implement RESTful endpoints with clear resource modeling and proper status codes
Implement authentication/authorization using OAuth2 or JWT; follow secure session and token handling

Files:

  • packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py

return False

# Normalize path separators to forward slashes for consistent matching
rel_path = rel_path.replace("\\", "/")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is a fix for cross-drive paths on windows (eg: c:\ , D:), please:

  1. Provide test evidence or UT to demonstrate this scenario
  2. Update the MR desc with the reasoning for this change
    This enhancement is nice to have, but seem to be out of scope of the current fix issue. Consider moving it to a separated chore/improvement MR. (Or do 1&2)

return False

# Normalize path separators to forward slashes for consistent matching
rel_path = rel_path.replace("\\", "/")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please debug or check log debug for this, need it or not

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
packages/qdrant-loader-mcp-server/tests/unit/test_logging.py (1)

146-153: Redundant error handling: ignore_errors=True makes try/except unnecessary.

The cleanup logic uses both ignore_errors=True in shutil.rmtree and wraps it in a try/except block. This is redundant—ignore_errors=True already suppresses all errors during directory removal.

Apply this diff to simplify the error handling:

     finally:
         # Close all logging handlers before cleanup (Windows compatibility)
         logging.shutdown()
-
-        try:
-            shutil.rmtree(tmp_dir, ignore_errors=True)
-        except Exception:
-            pass
+        shutil.rmtree(tmp_dir, ignore_errors=True)
packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1)

146-231: Windows symlink privilege handling looks good; consider slightly safer cleanup

The new try/except around symlink_dir.symlink_to(temp_dir) with a targeted winerror == 1314 check is a solid way to skip when symlink privileges are missing on Windows while still surfacing other OS errors. The added outer try/finally ensures the symlink is cleaned up even if assertions fail, which is important for keeping the test environment tidy.

One optional hardening: before calling symlink_dir.unlink() (both before creation and in the cleanup), consider checking symlink_dir.is_symlink() to avoid accidentally trying to unlink a non-symlink directory if that path somehow already exists:

-        if symlink_dir.exists():
-            symlink_dir.unlink()
+        if symlink_dir.is_symlink():
+            symlink_dir.unlink()
...
-            if symlink_dir.exists():
-                symlink_dir.unlink()
+            if symlink_dir.is_symlink():
+                symlink_dir.unlink()

This makes the test more defensive against unexpected filesystem state without changing the intended behavior.

packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (2)

334-343: Robust temp-file handling in FileConverter tests; consider a small helper to DRY it up

The repeated pattern of:

  • creating a NamedTemporaryFile(..., delete=False),
  • capturing tmp_path = tmp_file.name,
  • using tmp_path for converter calls and assertions, and
  • cleaning up with guarded os.unlink(tmp_path) (and os.chmod where needed),

is correct and clearly improves Windows compatibility by tolerating OSError/PermissionError during cleanup. If you want to reduce duplication, you could introduce a small test helper (e.g., a context manager that yields tmp_path and performs the guarded cleanup) and reuse it across these tests, but this is purely an optional readability/maintainability tweak.

Also applies to: 355-367, 387-403, 410-420, 438-450, 466-477, 492-506


466-472: Tighten MarkItDownError file_path assertion (optional)

MarkItDownError is constructed with the same tmp_path string passed into convert_file, so exc_info.value.file_path should equal tmp_path. You could strengthen this test by asserting equality instead of substring containment, e.g. assert exc_info.value.file_path == tmp_path, to catch any future formatting changes in how the path is stored.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f4d538 and 141aa0a.

📒 Files selected for processing (6)
  • packages/qdrant-loader-mcp-server/pyproject.toml (1 hunks)
  • packages/qdrant-loader-mcp-server/tests/unit/test_logging.py (3 hunks)
  • packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (2 hunks)
  • packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (2 hunks)
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (11 hunks)
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/qdrant-loader-mcp-server/pyproject.toml
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/back-end-dev.mdc)

**/*.py: Use Python 3.x for back-end implementation and code examples
When building LLM/AI workflows, prefer and follow best practices of PydanticAI or LangChain
Use Qdrant as the vector database with performant indexing and query patterns
Design and implement RESTful endpoints with clear resource modeling and proper status codes
Implement authentication/authorization using OAuth2 or JWT; follow secure session and token handling

Files:

  • packages/qdrant-loader-mcp-server/tests/unit/test_logging.py
  • packages/qdrant-loader/tests/unit/config/test_workspace_integration.py
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py
  • packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py
🧬 Code graph analysis (4)
packages/qdrant-loader-mcp-server/tests/unit/test_logging.py (1)
packages/qdrant-loader-mcp-server/src/qdrant_loader_mcp_server/utils/logging.py (3)
  • LoggingConfig (55-162)
  • setup (66-126)
  • format (38-43)
packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1)
packages/qdrant-loader/tests/unit/config/test_workspace.py (3)
  • temp_workspace (23-37)
  • temp_workspace (142-151)
  • temp_workspace (223-232)
packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (3)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_detector.py (3)
  • is_supported_for_conversion (152-200)
  • detect_file_type (89-120)
  • get_file_type_info (202-240)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_converter.py (3)
  • _validate_file (438-454)
  • convert_file (396-436)
  • create_fallback_document (456-472)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/exceptions.py (4)
  • FileSizeExceededError (39-55)
  • FileAccessError (94-108)
  • UnsupportedFileTypeError (25-36)
  • MarkItDownError (73-91)
packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (4)
packages/qdrant-loader/tests/unit/connectors/git/test_adapter.py (1)
  • temp_dir (17-20)
packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (1)
  • temp_dir (19-22)
packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py (1)
  • temp_dir (20-23)
packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1)
  • temp_dir (24-28)
🔇 Additional comments (7)
packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (2)

177-179: Excellent Windows-compatible path comparison.

The use of Path.resolve() properly normalizes paths for comparison, handling Windows path representation differences (short vs long names, separators). This is essential for cross-platform compatibility.


180-181: Much more robust ancestry check.

Replacing the substring check with proper resolved-path ancestry verification is significantly better. The original str(temp_workspace) in actual_path could produce false positives/negatives due to path representation differences, while the new approach correctly verifies the parent-child relationship.

packages/qdrant-loader-mcp-server/tests/unit/test_logging.py (2)

3-3: LGTM!

The shutil import is correctly added to support shutil.rmtree in the Windows compatibility fix.


104-122: Based on my verification of the codebase, I can now provide the final assessment.

The Windows compatibility fix is correct and well-implemented; parallel test execution is not a concern for this project.

The verification shows:

  • No pytest-xdist dependency is configured in the project
  • No parallel test configuration exists (no pytest.ini, setup.cfg configuration for parallel execution)
  • Only 2 test functions use logging.shutdown() (lines 118 and 148)
  • Each test properly resets LoggingConfig._initialized = False at the start for isolation
  • Tests use try/finally cleanup blocks appropriately

The implementation correctly handles the Windows file locking issue by calling logging.shutdown() before file deletion. Since the project does not use parallel test execution, the concern about pytest-xdist interference is not applicable. Test isolation via the LoggingConfig._initialized reset flag is sufficient for sequential test execution.

packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (1)

3-5: sys import for Windows-specific skip is appropriate

sys is only used for the sys.platform == "win32" check in the symlink privilege handling, which is a clear and minimal dependency; no issues here.

packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (2)

7-7: Windows-specific skip is appropriate and sys import is justified

Using sys.platform == "win32" to skip the unreadable-file permission test on Windows avoids unreliable chmod/os.access behavior there while preserving coverage on POSIX. The new import sys is minimal and correctly scoped to this use.

Also applies to: 380-383


37-45: Consistent tmp_path usage and guarded cleanup improve cross‑platform reliability

Switching to a tmp_path = tmp_file.name variable and wrapping os.unlink(tmp_path) in try/except (OSError, PermissionError) across the FileDetector tests makes the tests clearer (explicit on-disk path) and more robust on Windows, where temporary file handle semantics and delete races can surface as PermissionError. This pattern looks correct and consistent in all these blocks.

Also applies to: 64-72, 89-98, 105-114, 121-136, 146-155

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py (1)

144-153: Use exact matching for better test precision—startswith is still too lenient.

Similar to a previous review, the current assertion using startswith remains overly lenient. It would incorrectly pass if BeautifulSoup extracts concatenated text (e.g., "Malformed Title\nMalformed H1\nContent").

Use exact value matching instead:

-        expected_starts = ["Malformed Title", "Malformed H1", "Untitled Document"]
-        assert any(
-            title.startswith(expected) for expected in expected_starts
-        ), f"Expected title to start with one of {expected_starts}, but got: {title!r}"
+        # BeautifulSoup html.parser should handle malformed HTML deterministically
+        expected_titles = ["Malformed Title", "Malformed H1", "Untitled Document"]
+        assert title in expected_titles, (
+            f"Expected title to be one of {expected_titles}, but got: {title!r}"
+        )

This prevents false passes from concatenated or partially extracted text while still handling different parsing outcomes. If you've observed that Windows consistently produces titles with additional trailing content, please document that specific behavior in a comment and verify it with a test run.

Run the following script to verify the actual title extraction behavior on the current platform:

#!/bin/bash
# Description: Test actual BeautifulSoup behavior with malformed HTML to determine if startswith is necessary

cat > /tmp/test_malformed_title.py << 'EOF'
from bs4 import BeautifulSoup

malformed_html = """
<html>
    <head>
        <title>Malformed Title
    </head>
    <body>
        <div class="content">
            <h1>Malformed H1
            <p>Content</p>
        </div>
    </body>
"""

soup = BeautifulSoup(malformed_html, "html.parser")
title_tag = soup.find("title")

print("Python version:", __import__('sys').version)
print("BeautifulSoup version:", __import__('bs4').__version__)
print("Platform:", __import__('platform').system())
print()
if title_tag:
    title = title_tag.get_text(strip=True)
    print(f"Extracted title: {title!r}")
    print(f"Title length: {len(title)}")
    print(f"Starts with 'Malformed Title': {title.startswith('Malformed Title')}")
    print(f"Equals 'Malformed Title': {title == 'Malformed Title'}")
else:
    print("No title tag found")
EOF

python3 /tmp/test_malformed_title.py
🧹 Nitpick comments (6)
packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py (2)

92-93: Tighten the assertion for better test precision.

The substring check "tmp" in metadata["file_directory"] is overly permissive and could pass with unintended paths like "/tmpfoo" or "atmpb". Consider using more precise checks that remain platform-agnostic.

Apply this diff for a more robust assertion:

-        # File directory could be absolute or relative depending on platform
-        assert "tmp" in metadata["file_directory"] or metadata["file_directory"] == ".."
+        # File directory could be absolute or relative depending on platform
+        file_dir = metadata["file_directory"]
+        # Check for expected directory name or relative path
+        assert file_dir in ("tmp", "..", "/tmp") or file_dir.endswith("/tmp") or file_dir.endswith("\\tmp")

Alternatively, use pathlib.Path for platform-agnostic validation:

+        from pathlib import Path
+        
-        # File directory could be absolute or relative depending on platform
-        assert "tmp" in metadata["file_directory"] or metadata["file_directory"] == ".."
+        # File directory could be absolute or relative depending on platform
+        file_dir = Path(metadata["file_directory"])
+        assert file_dir.name in ("tmp", "..") or str(file_dir) == ".."

164-165: Tighten the assertion for better test precision.

The substring check "nonexistent" in metadata["file_directory"] is overly permissive and could match unintended paths. Consider a more precise validation.

Apply this diff:

-        # File directory could be absolute or relative depending on platform
-        assert "nonexistent" in metadata["file_directory"]
+        # File directory could be absolute or relative depending on platform
+        file_dir = metadata["file_directory"]
+        # Check for expected directory name or absolute path
+        assert file_dir in ("nonexistent", "/nonexistent") or file_dir.endswith("/nonexistent") or file_dir.endswith("\\nonexistent")

Or use pathlib.Path for clearer intent:

+        from pathlib import Path
+        
-        # File directory could be absolute or relative depending on platform
-        assert "nonexistent" in metadata["file_directory"]
+        # File directory could be absolute or relative depending on platform
+        assert Path(metadata["file_directory"]).name == "nonexistent"
packages/qdrant-loader/tests/unit/cli/test_cli.py (1)

140-150: Test logic is correct; consider removing unnecessary patch.

The updated test correctly verifies that _create_database_directory wraps exceptions from _create_db_dir_helper in a ClickException. The patching strategy properly isolates the error handling path.

However, the click.confirm patch on line 144 appears unnecessary. Based on the implementation, _create_db_dir_helper is called directly and raises the exception before any confirmation prompt would occur. The patch doesn't affect the test outcome but adds unneeded setup.

Consider this optional cleanup:

    def test_create_database_directory_exception(self):
        """Test directory creation with exception."""
        test_path = Path("/invalid/path/that/cannot/be/created")

-       with patch("click.confirm", return_value=True):
-           with patch("qdrant_loader.cli.cli._create_db_dir_helper") as mock_helper:
-               # Force helper to raise exception
-               mock_helper.side_effect = OSError("Permission denied")
-
-               with pytest.raises(ClickException, match="Failed to create directory"):
-                   _create_database_directory(test_path)
+       with patch("qdrant_loader.cli.cli._create_db_dir_helper") as mock_helper:
+           # Force helper to raise exception
+           mock_helper.side_effect = OSError("Permission denied")
+
+           with pytest.raises(ClickException, match="Failed to create directory"):
+               _create_database_directory(test_path)
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (3)

54-75: Tempfile pattern is Windows‑friendly; consider narrowing the broad Exception catch

Capturing tmp_path before leaving the NamedTemporaryFile context and doing conversion + os.unlink after the file is closed is exactly what you want for Windows handle semantics, and the try/finally cleanup with (OSError, PermissionError) tolerance is reasonable.

The only concern is except Exception as e: here: it will also skip on genuine regressions in FileConverter.convert_file, not just missing/unsupported MarkItDown environments. If feasible, consider catching the specific wrapper error your converter raises (e.g., MarkItDownError) and letting unexpected exceptions fail the test.

-            except Exception as e:
+            except MarkItDownError as e:
                 # If MarkItDown is not available or fails, that's expected in test environment
                 pytest.skip(
                     f"MarkItDown conversion failed (expected in test environment): {e}"
                 )

And add MarkItDownError to the imports from qdrant_loader.core.file_conversion at the top of the file.


98-110: Use a specific exception type in pytest.raises for the size‑limit case

The comment says this should raise MarkItDownError wrapping FileSizeExceededError, but the assertion currently accepts any Exception. That makes the test less strict and could hide regressions where the converter starts failing for the wrong reason.

Tightening this to the expected wrapper error will keep the test aligned with the convert_file contract while still being independent of the underlying MarkItDown details.

-        try:
-            with pytest.raises(
-                Exception
-            ):  # Should raise MarkItDownError wrapping FileSizeExceededError
-                small_converter.convert_file(tmp_path)
+        try:
+            with pytest.raises(
+                MarkItDownError
+            ):  # Should raise MarkItDownError wrapping FileSizeExceededError
+                small_converter.convert_file(tmp_path)

(Remember to import MarkItDownError if you adopt this.)


193-225: Comprehensive assertions on get_file_type_info; consider deduping tempfile boilerplate

This block nicely validates all fields returned by get_file_type_info and checks PDF‑specific values, and again uses the closed‑file‑then‑unlink pattern, which is good for Windows compatibility.

Given how often this NamedTemporaryFile(...) -> tmp_path -> try/finally unlink pattern appears in this file, you might consider extracting a small helper/pytest fixture that yields a Path (or string path) and handles deletion, so the tests can focus purely on conversion/detection assertions.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 141aa0a and f119d5a.

📒 Files selected for processing (17)
  • packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py (3 hunks)
  • packages/qdrant-loader-mcp-server/tests/unit/test_logging.py (3 hunks)
  • packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py (2 hunks)
  • packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py (1 hunks)
  • packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (3 hunks)
  • packages/qdrant-loader/tests/unit/cli/test_cli.py (1 hunks)
  • packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py (1 hunks)
  • packages/qdrant-loader/tests/unit/config/test_workspace_integration.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py (2 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py (2 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py (1 hunks)
  • packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py (2 hunks)
  • packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py (2 hunks)
  • packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py (1 hunks)
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py (11 hunks)
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py (2 hunks)
  • packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (13)
  • packages/qdrant-loader/src/qdrant_loader/connectors/git/metadata_extractor.py
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_converter.py
  • packages/qdrant-loader/tests/unit/config/test_workspace_integration.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_file_processor.py
  • packages/qdrant-loader-mcp-server/tests/unit/test_logging.py
  • packages/qdrant-loader/tests/unit/cli/test_cli_commands_enhanced.py
  • packages/qdrant-loader/tests/unit/core/monitoring/test_monitoring.py
  • packages/qdrant-loader-mcp-server/tests/integration/test_real_end_to_end_phase2_3.py
  • packages/qdrant-loader/tests/unit/connectors/localfile/test_localfile_id_consistency.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_operations.py
  • packages/qdrant-loader/tests/unit/core/file_conversion/test_file_conversion.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_git_connector.py
  • packages/qdrant-loader/src/qdrant_loader/core/text_processing/text_processor.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/back-end-dev.mdc)

**/*.py: Use Python 3.x for back-end implementation and code examples
When building LLM/AI workflows, prefer and follow best practices of PydanticAI or LangChain
Use Qdrant as the vector database with performant indexing and query patterns
Design and implement RESTful endpoints with clear resource modeling and proper status codes
Implement authentication/authorization using OAuth2 or JWT; follow secure session and token handling

Files:

  • packages/qdrant-loader/tests/unit/cli/test_cli.py
  • packages/qdrant-loader/tests/integration/test_file_conversion_integration.py
  • packages/qdrant-loader/tests/unit/connectors/git/test_metadata_extractor.py
  • packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py
🧬 Code graph analysis (3)
packages/qdrant-loader/tests/unit/cli/test_cli.py (1)
packages/qdrant-loader/src/qdrant_loader/cli/cli.py (1)
  • _create_database_directory (97-118)
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_converter.py (2)
  • convert_file (396-436)
  • create_fallback_document (456-472)
packages/qdrant-loader/src/qdrant_loader/core/file_conversion/file_detector.py (2)
  • is_supported_for_conversion (152-200)
  • get_file_type_info (202-240)
packages/qdrant-loader/tests/unit/connectors/publicdocs/test_publicdocs_title_extraction.py (1)
packages/qdrant-loader/src/qdrant_loader/connectors/publicdocs/connector.py (1)
  • _extract_title (495-538)
🔇 Additional comments (2)
packages/qdrant-loader/tests/integration/test_file_conversion_integration.py (2)

176-186: Detection test now correctly operates on a closed temp file

Storing tmp_path and running is_supported_for_conversion after the NamedTemporaryFile context exits avoids open‑handle issues on Windows, and the try/finally + os.unlink with tolerant error handling is a solid cleanup pattern for these short‑lived test files.


232-249: Fallback document test correctly validates content and performs safe cleanup

Creating a real on‑disk file, passing its path into create_fallback_document, and asserting on key substrings (status, error message, filename) gives good coverage of the fallback behavior while using the same Windows‑safe temp file lifecycle and cleanup pattern as the other tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants