[GH-2230] Implement GeoSeries.is_ccw and GeoSeries.minimum_clearance#2772
[GH-2230] Implement GeoSeries.is_ccw and GeoSeries.minimum_clearance#2772piyushka-ally wants to merge 1 commit intoapache:masterfrom
Conversation
jbampton
left a comment
There was a problem hiding this comment.
Seem pre-commit is failing and it reformatted python/sedona/spark/geopandas/geoseries.py
We are not using Python 3.14
https://github.com/apache/sedona/actions/runs/23374281851/job/68015517846?pr=2772
171e20c to
5caea51
Compare
petern48
left a comment
There was a problem hiding this comment.
Thank you for working on this! Overall, the changes look good 👏 . Except that I would prefer skipping the implementation of is_ccw() for now (see my comments)
| Notes | ||
| ----- | ||
| Unlike geopandas, which also supports LinearRing inputs, this | ||
| implementation uses Sedona's ``ST_IsPolygonCCW`` which only recognises | ||
| Polygon and MultiPolygon geometries. All other geometry types return | ||
| ``False``. |
There was a problem hiding this comment.
We actually had a previous contribution attempt for is_ccw(), and encountered this problem, and we agreed not to implement it for now (#2386 (comment)), given it's not easy to replicate the full desired behavior. We're not just missing proper behavior for LinearRings (which are rare), but we would have the incorrect behavior for LineStrings, which are very common. As you can see in the code snippet below, LineString can evaluate to True here too, but our existing ST_IsPolygonCCW() isn't capable of determining that at the moment.
import shapely
from shapely import LineString
from geopandas import GeoSeries
ls = LineString([[0, 0], [1, 0], [1, 1], [2, 2]])
gs = GeoSeries(ls)
print(gs.is_ccw)In order to implement proper support for is_ccw(), I think we would need to investigate doing some more changes in the actual Java code and potentially look into the upstream JTS library. Given that complexity, I would rather skip implementing it entirely for now (or at least in a separate PR if you really want to investigate it, though it would be harder). Could we remove it for now?
There was a problem hiding this comment.
Yes please.
No worries, I will try to pick some more functions I can implement. Thanks for reviewing my PR.
piyushka-ally
left a comment
There was a problem hiding this comment.
Please approve. Thanks.
You haven't addressed the review feedback that I left (I don't see any new commits pushed). Do you plan to fulfill my request? I don't think this PR is ready to merge without it. |
|
Note that all Python CIs are currently failing due to #2774 We cannot merge this PR without all green python CIs. Meanwhile, please fix what Peter suggested. |
|
Hey sorry, I thought this was my other PR. I am working on this one, will push shortly. |
Co-authored-by: Isaac
5caea51 to
b6140e2
Compare
|
Pre-commit hooks pass locally, and I have removed the is_ccw() implementation. Thanks. |
[GH-2230] Implement GeoSeries.is_ccw and GeoSeries.minimum_clearance
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Part of [EPIC] Implement More Geopandas Functions #2230What changes were proposed in this PR?
Implements two missing
GeoSeriesmethods from the GH-2230 EPIC to expand Sedona's geopandas-compatible API:GeoSeries.is_ccw(property)TrueforPolygon/MultiPolygongeometries whose exterior ring is counter-clockwise oriented,Falsefor all other geometry types.ST_IsPolygonCCW.LinearRing), this implementation operates on polygon geometry types. The behavioral difference is documented in the docstring.GeoSeries.minimum_clearance()(method)float64Series — the smallest distance by which any vertex could be moved to produce an invalid geometry.ST_MinimumClearance.Double.MAX_VALUEfor degenerate geometries (e.g.Point, empty); this is converted tofloat('inf')to match geopandas/shapely behaviour.Both methods are implemented in
geoseries.py(Spark SQL logic) andbase.py(GeoDataFrame delegation + docstring with examples).How was this patch tested?
tests/geopandas/test_geoseries.py— hardcoded expected-value tests for both functions, including GeoDataFrame delegation:test_is_ccw: CW polygon →False, CCW polygon →True,Point→Falsetest_minimum_clearance: unit square →1.0, half-unit square →0.5tests/geopandas/test_match_geopandas_series.py— comparison against geopandas:test_is_ccw: compares empty polygon (the only case where Sedona and geopandas semantics agree)test_minimum_clearance: iterates all geometry fixture types (self.geoms) and compares against geopandas resultsAll 4 tests pass locally.
Did this PR include necessary documentation updates?