feat(rust/sedona-spatial-join): Integrate libgpuspatial into sedona-spatial-join by pwrliang · Pull Request #722 · apache/sedona-db

pwrliang · 2026-03-17T06:54:53Z

This PR integrates libgpuspatial by providing a GPUSpatialIndex and GPUSpatialIndexBuilder. The PR should pass the CI after its predecessor PRs (#717, #718, #719, #721) are merged.

pwrliang · 2026-03-17T06:55:34Z

@paleolimbot @Kontinuation Could you take a look when you are available?

Kontinuation · 2026-03-18T16:58:27Z

rust/sedona-spatial-join/src/index/gpu_spatial_index_builder.rs

+    fn merge_stats(&mut self, stats: GeoStatistics) -> &mut Self {
+        self.stats.merge(&stats);
+        self
+    }


GPU join does not make use of geo statistics. We can remove merge_stats and stats field altogether.

Kontinuation · 2026-03-18T17:15:58Z

rust/sedona-spatial-join/src/index/gpu_spatial_index_builder.rs

+                arrow::compute::concat_batches(&schema, all_record_batches).map_err(|e| {
+                    DataFusionError::Execution(format!("Failed to concatenate left batches: {}", e))
+                })?;


There is no need to .map_err for ArrowError, it could be converted to DataFusionError::ArrowError automatically.

Kontinuation · 2026-03-18T17:27:53Z

rust/sedona-spatial-join/src/index/gpu_spatial_index_builder.rs

+
+        let sedona_type = self.indexed_batches[0].geom_array.sedona_type.clone();
+
+        if self.options.gpu.concat_build {


Do we have tests covering the gpu.concat_build: false case?

Kontinuation · 2026-03-18T17:30:59Z

rust/sedona-spatial-join/src/operand_evaluator.rs

+                        if wkb.geometry_type() == GeometryType::Point {
+                            Some(Rect::new(
+                                coord!(x: min.x as f32, y: min.y as f32),
+                                coord!(x: max.x as f32, y: max.y as f32),
+                            ))


Will this introduce false negatives since the resulting bounding box is not guaranteed to cover the original bounding box?

paleolimbot

Thank you for working on this!

My main concern with this PR is that the GPU details are very entangled with the non-GPU details. How about:

sedona-spatial-join-common (defines the traits you nicely separated in the last PR and a SedonaSpatialJoinExtension trait)
sedona-spatial-join-gpu (implements the traits in sedona-spatial-join-common)
sedona-spatial-join (uses an Arc<dyn SedonaSpatialJoinExtension> object to decide if an extension join should be used in place of the normal join). The hook to register a join extension can be a global function for now and you can use a mock join extension built from the default implementation index/refiner to test it.
sedona (when built with the gpu feature, registers the Gpu join with sedona-spatial-join)

There are a few things I think this will help with:

We may want to add other join accelerators for other hardware (e.g., Apple GPU) or other data types (Geography x H3 cell join), or possibly Raster/Vector join.
It will make it easier for you to have clear ownership over a subdirectory so you can iterate faster.
It makes the sedona crate the single place where features are assembled (to avoid two levels of feature flags, the consequences of which you've started to identify here with the "all except GPU" feature).

I do think it's important to get this right early while we are all here and have time to dedicate to this!

pwrliang · 2026-03-21T02:42:19Z

Thank you for working on this!

My main concern with this PR is that the GPU details are very entangled with the non-GPU details. How about:

sedona-spatial-join-common (defines the traits you nicely separated in the last PR and a SedonaSpatialJoinExtension trait)

sedona-spatial-join-gpu (implements the traits in sedona-spatial-join-common)

sedona-spatial-join (uses an Arc<dyn SedonaSpatialJoinExtension> object to decide if an extension join should be used in place of the normal join). The hook to register a join extension can be a global function for now and you can use a mock join extension built from the default implementation index/refiner to test it.

sedona (when built with the gpu feature, registers the Gpu join with sedona-spatial-join)

There are a few things I think this will help with:

We may want to add other join accelerators for other hardware (e.g., Apple GPU) or other data types (Geography x H3 cell join), or possibly Raster/Vector join.

It will make it easier for you to have clear ownership over a subdirectory so you can iterate faster.

It makes the sedona crate the single place where features are assembled (to avoid two levels of feature flags, the consequences of which you've started to identify here with the "all except GPU" feature).

I do think it's important to get this right early while we are all here and have time to dedicate to this!

Thanks for the suggestions. If I understand correctly, this design implies that sedona-spatial-join-gpu will execute end-to-end spatial joins. This requires several components to work together, such as the optimizer, planner, and SpatialJoinStream. We would either need to export all these components to the GPU module or copy/paste the code. My concern is that this design could introduce a lot of redundant code. Since libgpuspatial functions as a refinement backend (similar to GEOS, GEO, or TG), I'm not sure why adding one more backend requiring such significant architectural changes.

paleolimbot · 2026-03-21T04:05:52Z

Thanks for the suggestions. If I understand correctly, this design implies that sedona-spatial-join-gpu will execute end-to-end spatial joins.

I don't think we need to (although we can if it is easier)....I had in mind that the sedona-spatial-join would be expose something like fn register_join_extension(extension: Arc<dyn SpatialJoinExtension>), where SpatialJoinExtension has members that instantiate the builder and refiner.

It may be worth trying to separate that all within sedona-spatial-join first to see what it will look like (and then move whatever has to be moved to sedona-spatial-join-common, and then move whatever has to be moved into sedona-spatial-join-gpu.

paleolimbot · 2026-03-21T04:15:19Z

Since libgpuspatial functions as a refinement backend (similar to GEOS, GEO, or TG), I'm not sure why adding one more backend requiring such significant architectural changes.

I think the main difference is that (1) the index is now configurable where it wasn't before and (2) tg, geo, and geos are trivial dependencies to satisfy.

I'm happy to give this a go on Monday!

Kontinuation reviewed Mar 18, 2026

View reviewed changes

pwrliang and others added 2 commits March 19, 2026 00:55

Integrate libgpuspatial into sedona-db

8b342c5

Fix codereview issues and add tests

41b97c8

pwrliang force-pushed the integrate/libgpuspatial branch from 1d3b96b to 41b97c8 Compare March 19, 2026 11:59

pwrliang added 2 commits March 20, 2026 07:44

Fix CI

8e5a9a4

Fix wrong feature name

78a3d20

paleolimbot reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rust/sedona-spatial-join): Integrate libgpuspatial into sedona-spatial-join#722

feat(rust/sedona-spatial-join): Integrate libgpuspatial into sedona-spatial-join#722
pwrliang wants to merge 4 commits intoapache:mainfrom
pwrliang:integrate/libgpuspatial

pwrliang commented Mar 17, 2026

Uh oh!

pwrliang commented Mar 17, 2026

Uh oh!

Kontinuation Mar 18, 2026

Uh oh!

Kontinuation Mar 18, 2026

Uh oh!

Kontinuation Mar 18, 2026

Uh oh!

Kontinuation Mar 18, 2026

Uh oh!

paleolimbot left a comment

Uh oh!

pwrliang commented Mar 21, 2026

Uh oh!

paleolimbot commented Mar 21, 2026

Uh oh!

paleolimbot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		let sedona_type = self.indexed_batches[0].geom_array.sedona_type.clone();

		if self.options.gpu.concat_build {

Conversation

pwrliang commented Mar 17, 2026

Uh oh!

pwrliang commented Mar 17, 2026

Uh oh!

Kontinuation Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Kontinuation Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Kontinuation Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Kontinuation Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

pwrliang commented Mar 21, 2026

Uh oh!

paleolimbot commented Mar 21, 2026

Uh oh!

paleolimbot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants