Skip to content

Conversation

@dreadatour
Copy link
Contributor

@dreadatour dreadatour commented Oct 29, 2025

Update fixtures and tests to enable distributed tests.
See related SaaS PR for more details.

Summary by Sourcery

Enable and streamline distributed test support by adjusting fixtures, environment variable handling, and test behaviors.

Enhancements:

  • Introduce _create_job helper and automatically set DATACHAIN_JOB_ID in metastore fixtures and datachain_job_id fixture
  • Refactor run_datachain_worker fixture to generate dynamic Celery queue names and disable redundant distributed flags
  • Convert SQLiteMetastore and SQLiteWarehouse fixtures to accept string file paths for db_file parameters
  • Elevate cloud_server_credentials fixture to session scope and simplify AWS credential setup via os.environ

Tests:

  • Add skip-if markers to checkpoint tests when distributed mode is enabled
  • Unify exception assertions in functional UDF tests to always expect RuntimeError
  • Update job management unit tests to clear DATACHAIN_JOB_ID via monkeypatch before creation

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 29, 2025

Reviewer's Guide

Enable distributed test support by centralizing job creation, standardizing environment variable handling, refining worker startup logic, updating path casting, simplifying test assertions, and skipping checkpoints tests under distributed mode.

File-Level Changes

Change Details Files
Centralize job creation and environment variable setup in fixtures
  • Add private _create_job helper in conftest.py
  • Invoke _create_job and set DATACHAIN_JOB_ID in metastore and metastore_tmpfile fixtures
  • Replace datachain_job_id fixture with yield from _create_job
tests/conftest.py
Ensure SQLite fixtures accept string file paths
  • Wrap tmp_path/db_file arguments in str() for SQLiteMetastore
  • Wrap tmp_path/db_file arguments in str() for SQLiteWarehouse
tests/conftest.py
Standardize environment variable manipulation using os.environ
  • Switch from monkeypatch.delenv to os.environ.pop for AWS_PROFILE
  • Switch from monkeypatch.setenv to os.environ assignments for AWS credentials
  • Adjust cloud_server_credentials fixture scope to session
tests/conftest.py
Refactor run_datachain_worker fixture to support dynamic queues
  • Remove static queue and concurrency setup
  • Clear DATACHAIN_DISTRIBUTED_DISABLED via monkeypatch
  • Generate unique queue names with uuid.uuid4() in loop
  • Collect worker Popen processes in typed list
tests/conftest.py
Add monkeypatch fixture to job management unit tests
  • Include monkeypatch argument in test signatures to clear DATACHAIN_JOB_ID
  • Call monkeypatch.delenv in tests for get_or_create and finalize flows
tests/unit/test_job_management.py
Unify exception assertions in UDF functional tests
  • Remove conditional branches on DATACHAIN_DISTRIBUTED in test_udf
  • Always expect RuntimeError with UDF Execution Failed message
tests/func/test_udf.py
Skip checkpoint tests under distributed mode
  • Add pytest.mark.skipif for DATACHAIN_DISTRIBUTED in unit checkpoint tests
  • Add pytest.mark.skipif for DATACHAIN_DISTRIBUTED in functional checkpoint tests
tests/unit/lib/test_checkpoints.py
tests/func/test_checkpoints.py
Inject environment cleanup in get_distributed_class unit test
  • Add monkeypatch fixture and call delevent for DATACHAIN_DISTRIBUTED_DISABLED
  • Verify get_udf_distributor_class returns None under distributed flag
tests/unit/test_catalog_loader.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Consider using the pytest monkeypatch fixture for all environment variable changes instead of modifying os.environ directly (e.g., in cloud_server_credentials) to ensure state is properly isolated and restored between tests.
  • There’s a lot of repeated DATACHAIN_JOB_ID setup in multiple fixtures—factor that logic into a single helper fixture to reduce duplication and make future updates easier.
  • The query string in _create_job looks like it’s missing a closing bracket/parenthesis around the call to save()—double-check that expression to avoid silent syntax errors.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider using the pytest monkeypatch fixture for all environment variable changes instead of modifying os.environ directly (e.g., in cloud_server_credentials) to ensure state is properly isolated and restored between tests.
- There’s a lot of repeated DATACHAIN_JOB_ID setup in multiple fixtures—factor that logic into a single helper fixture to reduce duplication and make future updates easier.
- The query string in _create_job looks like it’s missing a closing bracket/parenthesis around the call to save()—double-check that expression to avoid silent syntax errors.

## Individual Comments

### Comment 1
<location> `tests/conftest.py:494-495` </location>
<code_context>

[email protected]
-def cloud_server_credentials(cloud_server, monkeypatch):
[email protected](scope="session")
+def cloud_server_credentials(cloud_server):
     if cloud_server.kind == "s3":
         cfg = cloud_server.src.fs.client_kwargs
</code_context>

<issue_to_address>
**question (testing):** Changed cloud_server_credentials fixture scope to session.

Ensure that no tests modify environment variables set by this fixture, as session scope will share state across all tests.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +14 to +17
@pytest.mark.skipif(
"os.environ.get('DATACHAIN_DISTRIBUTED')",
reason="Checkpoints test skipped in distributed mode",
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkpoints tests failed with distributed more enabled. Need to fix them later, not a subject for this PR.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors distributed testing configuration to improve test isolation and flexibility. The main changes standardize environment variable handling, add conditional test skipping for distributed mode, and modify exception handling expectations.

  • Adds monkeypatch.delenv("DATACHAIN_JOB_ID", raising=False) to job management tests to ensure clean test state
  • Introduces @pytest.mark.skipif decorators to skip checkpoint tests in distributed mode
  • Standardizes exception handling in UDF tests to expect RuntimeError instead of conditional logic
  • Refactors cloud_server_credentials fixture to session scope and use direct os.environ manipulation
  • Updates run_datachain_worker fixture to support dynamic queue names and disable distributed mode guard

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/unit/test_job_management.py Adds monkeypatch fixture and clears DATACHAIN_JOB_ID env var to ensure test isolation
tests/unit/test_catalog_loader.py Adds environment variable cleanup for distributed mode testing
tests/unit/lib/test_checkpoints.py Adds skip conditions for checkpoint tests in distributed mode
tests/func/test_checkpoints.py Adds skip condition for checkpoint test in distributed mode
tests/func/test_udf.py Simplifies exception handling by removing conditional logic for distributed vs local mode
tests/conftest.py Refactors fixtures: extracts job creation helper, changes fixture scopes, adds dynamic queue generation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link

codecov bot commented Oct 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.96%. Comparing base (176b7cb) to head (2b8278d).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1438   +/-   ##
=======================================
  Coverage   87.96%   87.96%           
=======================================
  Files         160      160           
  Lines       15377    15377           
  Branches     2224     2224           
=======================================
  Hits        13527    13527           
  Misses       1336     1336           
  Partials      514      514           
Flag Coverage Δ
datachain 87.92% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Oct 29, 2025

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2b8278d
Status: ✅  Deploy successful!
Preview URL: https://e5caa63f.datachain-documentation.pages.dev
Branch Preview URL: https://metastore-update.datachain-documentation.pages.dev

View logs

@dreadatour dreadatour mentioned this pull request Nov 6, 2025
@dreadatour
Copy link
Contributor Author

Closing this, see new version: #1451

@dreadatour dreadatour closed this Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants