Fix distributed tests #1438

dreadatour · 2025-10-29T16:40:37Z

Update fixtures and tests to enable distributed tests.
See related SaaS PR for more details.

Summary by Sourcery

Enable and streamline distributed test support by adjusting fixtures, environment variable handling, and test behaviors.

Enhancements:

Introduce _create_job helper and automatically set DATACHAIN_JOB_ID in metastore fixtures and datachain_job_id fixture
Refactor run_datachain_worker fixture to generate dynamic Celery queue names and disable redundant distributed flags
Convert SQLiteMetastore and SQLiteWarehouse fixtures to accept string file paths for db_file parameters
Elevate cloud_server_credentials fixture to session scope and simplify AWS credential setup via os.environ

Tests:

Add skip-if markers to checkpoint tests when distributed mode is enabled
Unify exception assertions in functional UDF tests to always expect RuntimeError
Update job management unit tests to clear DATACHAIN_JOB_ID via monkeypatch before creation

sourcery-ai · 2025-10-29T16:40:43Z

Reviewer's Guide

Enable distributed test support by centralizing job creation, standardizing environment variable handling, refining worker startup logic, updating path casting, simplifying test assertions, and skipping checkpoints tests under distributed mode.

File-Level Changes

Change	Details	Files
Centralize job creation and environment variable setup in fixtures	Add private _create_job helper in conftest.py Invoke _create_job and set DATACHAIN_JOB_ID in metastore and metastore_tmpfile fixtures Replace datachain_job_id fixture with yield from _create_job	`tests/conftest.py`
Ensure SQLite fixtures accept string file paths	Wrap tmp_path/db_file arguments in str() for SQLiteMetastore Wrap tmp_path/db_file arguments in str() for SQLiteWarehouse	`tests/conftest.py`
Standardize environment variable manipulation using os.environ	Switch from monkeypatch.delenv to os.environ.pop for AWS_PROFILE Switch from monkeypatch.setenv to os.environ assignments for AWS credentials Adjust cloud_server_credentials fixture scope to session	`tests/conftest.py`
Refactor run_datachain_worker fixture to support dynamic queues	Remove static queue and concurrency setup Clear DATACHAIN_DISTRIBUTED_DISABLED via monkeypatch Generate unique queue names with uuid.uuid4() in loop Collect worker Popen processes in typed list	`tests/conftest.py`
Add monkeypatch fixture to job management unit tests	Include monkeypatch argument in test signatures to clear DATACHAIN_JOB_ID Call monkeypatch.delenv in tests for get_or_create and finalize flows	`tests/unit/test_job_management.py`
Unify exception assertions in UDF functional tests	Remove conditional branches on DATACHAIN_DISTRIBUTED in test_udf Always expect RuntimeError with UDF Execution Failed message	`tests/func/test_udf.py`
Skip checkpoint tests under distributed mode	Add pytest.mark.skipif for DATACHAIN_DISTRIBUTED in unit checkpoint tests Add pytest.mark.skipif for DATACHAIN_DISTRIBUTED in functional checkpoint tests	`tests/unit/lib/test_checkpoints.py` `tests/func/test_checkpoints.py`
Inject environment cleanup in get_distributed_class unit test	Add monkeypatch fixture and call delevent for DATACHAIN_DISTRIBUTED_DISABLED Verify get_udf_distributor_class returns None under distributed flag	`tests/unit/test_catalog_loader.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Consider using the pytest monkeypatch fixture for all environment variable changes instead of modifying os.environ directly (e.g., in cloud_server_credentials) to ensure state is properly isolated and restored between tests.
There’s a lot of repeated DATACHAIN_JOB_ID setup in multiple fixtures—factor that logic into a single helper fixture to reduce duplication and make future updates easier.
The query string in _create_job looks like it’s missing a closing bracket/parenthesis around the call to save()—double-check that expression to avoid silent syntax errors.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider using the pytest monkeypatch fixture for all environment variable changes instead of modifying os.environ directly (e.g., in cloud_server_credentials) to ensure state is properly isolated and restored between tests.
- There’s a lot of repeated DATACHAIN_JOB_ID setup in multiple fixtures—factor that logic into a single helper fixture to reduce duplication and make future updates easier.
- The query string in _create_job looks like it’s missing a closing bracket/parenthesis around the call to save()—double-check that expression to avoid silent syntax errors.

## Individual Comments

### Comment 1
<location> `tests/conftest.py:494-495` </location>
<code_context>

[email protected]
-def cloud_server_credentials(cloud_server, monkeypatch):
[email protected](scope="session")
+def cloud_server_credentials(cloud_server):
     if cloud_server.kind == "s3":
         cfg = cloud_server.src.fs.client_kwargs
</code_context>

<issue_to_address>
**question (testing):** Changed cloud_server_credentials fixture scope to session.

Ensure that no tests modify environment variables set by this fixture, as session scope will share state across all tests.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

tests/conftest.py

dreadatour · 2025-10-29T16:41:55Z

tests/func/test_checkpoints.py

+@pytest.mark.skipif(
+    "os.environ.get('DATACHAIN_DISTRIBUTED')",
+    reason="Checkpoints test skipped in distributed mode",
+)


Checkpoints tests failed with distributed more enabled. Need to fix them later, not a subject for this PR.

Copilot

Pull Request Overview

This PR refactors distributed testing configuration to improve test isolation and flexibility. The main changes standardize environment variable handling, add conditional test skipping for distributed mode, and modify exception handling expectations.

Adds monkeypatch.delenv("DATACHAIN_JOB_ID", raising=False) to job management tests to ensure clean test state
Introduces @pytest.mark.skipif decorators to skip checkpoint tests in distributed mode
Standardizes exception handling in UDF tests to expect RuntimeError instead of conditional logic
Refactors cloud_server_credentials fixture to session scope and use direct os.environ manipulation
Updates run_datachain_worker fixture to support dynamic queue names and disable distributed mode guard

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/unit/test_job_management.py	Adds `monkeypatch` fixture and clears `DATACHAIN_JOB_ID` env var to ensure test isolation
tests/unit/test_catalog_loader.py	Adds environment variable cleanup for distributed mode testing
tests/unit/lib/test_checkpoints.py	Adds skip conditions for checkpoint tests in distributed mode
tests/func/test_checkpoints.py	Adds skip condition for checkpoint test in distributed mode
tests/func/test_udf.py	Simplifies exception handling by removing conditional logic for distributed vs local mode
tests/conftest.py	Refactors fixtures: extracts job creation helper, changes fixture scopes, adds dynamic queue generation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/conftest.py

codecov · 2025-10-29T16:50:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.96%. Comparing base (176b7cb) to head (2b8278d).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1438   +/-   ##
=======================================
  Coverage   87.96%   87.96%           
=======================================
  Files         160      160           
  Lines       15377    15377           
  Branches     2224     2224           
=======================================
  Hits        13527    13527           
  Misses       1336     1336           
  Partials      514      514

Flag	Coverage Δ
datachain	`87.92% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cloudflare-workers-and-pages · 2025-10-29T16:57:40Z

Deploying datachain-documentation with Cloudflare Pages

Latest commit:	`2b8278d`
Status:	✅ Deploy successful!
Preview URL:	https://e5caa63f.datachain-documentation.pages.dev
Branch Preview URL:	https://metastore-update.datachain-documentation.pages.dev

View logs

Co-authored-by: Copilot <[email protected]>

dreadatour · 2025-11-06T14:32:26Z

Closing this, see new version: #1451

dreadatour added 3 commits October 29, 2025 20:01

Update tests fixtures

34ab1f5

Update tests fixtures

274b682

Fix distributed tests

5e8726f

dreadatour requested review from amritghimire, Copilot, ilongin and shcheklein October 29, 2025 16:40

dreadatour self-assigned this Oct 29, 2025

sourcery-ai bot reviewed Oct 29, 2025

View reviewed changes

tests/conftest.py Show resolved Hide resolved

dreadatour commented Oct 29, 2025

View reviewed changes

Copilot AI reviewed Oct 29, 2025

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

tests/conftest.py Show resolved Hide resolved

tests/conftest.py Show resolved Hide resolved

shcheklein approved these changes Oct 29, 2025

View reviewed changes

Fix distributed tests

4cf3930

amritghimire approved these changes Oct 29, 2025

View reviewed changes

dreadatour and others added 9 commits October 30, 2025 00:05

Update tests/conftest.py

799c84b

Co-authored-by: Copilot <[email protected]>

Set datachain worker cwd in tests

ab07e11

Initialize datachain venv in Studio tests

5a8307f

Initialize datachain venv in Studio tests

ac712ca

Initialize datachain venv in Studio tests

fac8675

Initialize datachain venv in Studio tests

70227c2

Initialize datachain venv in Studio tests

ffcde54

Initialize datachain venv in Studio tests

57f6214

Merge branch 'main' into metastore-update

2b8278d

dreadatour mentioned this pull request Nov 6, 2025

Fix distributed tests #1451

Merged

dreadatour closed this Nov 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix distributed tests #1438

Fix distributed tests #1438

Uh oh!

dreadatour commented Oct 29, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Oct 29, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

dreadatour Oct 29, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Oct 29, 2025 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages bot commented Oct 29, 2025 •

edited

Loading

Uh oh!

dreadatour commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix distributed tests #1438

Fix distributed tests #1438

Uh oh!

Conversation

dreadatour commented Oct 29, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dreadatour Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cloudflare-workers-and-pages bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying datachain-documentation with Cloudflare Pages

Uh oh!

dreadatour commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dreadatour commented Oct 29, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 29, 2025 •

edited

Loading

codecov bot commented Oct 29, 2025 •

edited

Loading

cloudflare-workers-and-pages bot commented Oct 29, 2025 •

edited

Loading