Skip to content

Prevent threaded Dask workers in DistRDF backend#22393

Open
JAGANNATHANJP wants to merge 3 commits into
root-project:masterfrom
JAGANNATHANJP:fix-dask-thread-validation
Open

Prevent threaded Dask workers in DistRDF backend#22393
JAGANNATHANJP wants to merge 3 commits into
root-project:masterfrom
JAGANNATHANJP:fix-dask-thread-validation

Conversation

@JAGANNATHANJP
Copy link
Copy Markdown

This PR prevents unsupported threaded Dask workers in DistRDF.

Distributed RDataFrame with Dask threads may lead to crashes and does not provide advantages due to Python GIL limitations. This change validates worker configuration at backend initialization and raises a RuntimeError when threaded workers are detected.

Suggested configuration:

  • processes=True
  • threads_per_worker=1

@github-actions
Copy link
Copy Markdown

Test Results

    22 files      22 suites   3d 12h 10m 36s ⏱️
 3 862 tests  3 862 ✅ 0 💤 0 ❌
76 200 runs  76 200 ✅ 0 💤 0 ❌

Results for commit 5bb11cb.

@JAGANNATHANJP JAGANNATHANJP force-pushed the fix-dask-thread-validation branch from 5bb11cb to edb506a Compare May 25, 2026 06:37
self.client = (daskclient if daskclient is not None else
Client(LocalCluster(n_workers=os.cpu_count(), threads_per_worker=1, processes=True)))

workers = self.client.scheduler_info()["workers"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
workers = self.client.scheduler_info()["workers"]
workers = self.client.scheduler_info().get("workers", None)
if workers is None:
return

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I updated the code to safely handle the case where scheduler_info() does not contain worker information yet.

@JAGANNATHANJP JAGANNATHANJP force-pushed the fix-dask-thread-validation branch 2 times, most recently from e938571 to 2eda199 Compare May 27, 2026 08:50
@JAGANNATHANJP JAGANNATHANJP force-pushed the fix-dask-thread-validation branch from 2eda199 to 0c91ac5 Compare May 27, 2026 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants