Skip to content

fix(rust/sedona-spatial-join): wrap probe-side repartition in ProbeShuffleExec to prevent optimizer stripping#677

Open
Kontinuation wants to merge 3 commits intoapache:mainfrom
Kontinuation:fix/probe-shuffle-exec
Open

fix(rust/sedona-spatial-join): wrap probe-side repartition in ProbeShuffleExec to prevent optimizer stripping#677
Kontinuation wants to merge 3 commits intoapache:mainfrom
Kontinuation:fix/probe-shuffle-exec

Conversation

@Kontinuation
Copy link
Member

@Kontinuation Kontinuation commented Mar 3, 2026

Problem

We have tested the release candidate 0.3.0 and found that the probe side auto-repartitioning introduced by #610 didn't work as expected. This is because DataFusion's EnforceDistribution optimization pass recognize RepartitionExec nodes and can strip or replace them. When the spatial join planner inserts a RepartitionExec(RoundRobinBatch) on the probe side to ensure workload balance and eliminate data distribution skewness, these optimizer passes could remove it, causing the probe side to lose its intended repartitioning.

This problem does not have impact on the correctness of query results, it affects the balancing of spatial join workloads across worker threads and the CPU utilization.

Solution

ProbeShuffleExec wraps a RepartitionExec internally but presents itself as a different ExecutionPlan type. Since the optimizer passes only look for RepartitionExec nodes by type, they leave ProbeShuffleExec untouched. The wrapper delegates all ExecutionPlan trait methods to the inner RepartitionExec.

Changes

  • Introduce ProbeShuffleExec, a thin wrapper around RepartitionExec that is opaque to DataFusion's EnforceDistribution / EnforceSorting optimizer passes, preventing them from stripping the round-robin repartitioning that the spatial join planner inserts on the probe side.
  • Move the ProbeShuffleExec presence assertion into run_spatial_join_query so it runs on every optimized spatial join integration test (~174 cases) rather than only in standalone tests.

… prevent optimizer stripping

DataFusion's EnforceDistribution and EnforceSorting optimizer passes recognize
and can strip RepartitionExec nodes that the spatial join planner inserts for
round-robin repartitioning of the probe side. This causes the probe side to
lose its intended parallelism.

Introduce ProbeShuffleExec, a thin wrapper around RepartitionExec that is
opaque to DataFusion's optimizer passes, preserving the repartitioning.
@Kontinuation Kontinuation requested a review from Copilot March 3, 2026 13:16
@Kontinuation Kontinuation marked this pull request as ready for review March 3, 2026 13:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a performance regression in probe-side auto-repartitioning for spatial joins by making the probe-side round-robin shuffle opaque to DataFusion’s EnforceDistribution / EnforceSorting optimizer passes, so it can’t be stripped during optimization.

Changes:

  • Introduces ProbeShuffleExec, a wrapper around an internal RepartitionExec used to preserve probe-side round-robin repartitioning through physical optimization.
  • Updates the spatial join physical planner to insert ProbeShuffleExec instead of a bare RepartitionExec on the probe side.
  • Extends integration tests to assert ProbeShuffleExec presence on optimized spatial join probe sides across the integration suite.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
rust/sedona-spatial-join/src/planner/probe_shuffle_exec.rs Adds the new ProbeShuffleExec wrapper implementation delegating to an internal RepartitionExec.
rust/sedona-spatial-join/src/planner/physical_planner.rs Switches probe-side auto-repartition insertion to use ProbeShuffleExec.
rust/sedona-spatial-join/src/planner.rs Exposes the new planner submodule for ProbeShuffleExec.
rust/sedona-spatial-join/src/lib.rs Re-exports ProbeShuffleExec for external visibility (tests/other crates).
rust/sedona-spatial-join/tests/spatial_join_integration.rs Adds assertions to verify probe-side shuffling via ProbeShuffleExec when enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Kontinuation Kontinuation changed the title fix(spatial-join): wrap probe-side repartition in ProbeShuffleExec to prevent optimizer stripping fix(rust/sedona-spatial-join): wrap probe-side repartition in ProbeShuffleExec to prevent optimizer stripping Mar 3, 2026
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Is there a specific test case or benchmark that you used to determine this was an issue? (And if so, can/should it be added to either the tests or benchmarks in this repo?)

@Kontinuation
Copy link
Member Author

Kontinuation commented Mar 4, 2026

It applies to all the test cases, the original implementation didn't work and I have not added assertions to verify that RepartitionExec should be present on the probe side. This patch updated the common test method for running spatial join to verify that ProbeShuffleExec presents in every spatial join queries.

ProbeShuffleExec has significant impact on the the CPU utilization of SpatialBench Q10 and Q11. I see 10% performance improvement for non-spilled runs and 70%+ for spilled runs.

I have also observed increase in network bandwidth utilization for some spatial join queries using OvertureMaps data. This is because RepartitionExec uses channels to parallelize the execution of downstream and upstream executors. The network I/O could be scheduled and complete in the background when the tokio worker is busy computing inside SpatialJoinExec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants