[GH-2657] Upgrade proj4sedona to 0.0.4 and adopt UrlCRSProvider#2658
[GH-2657] Upgrade proj4sedona to 0.0.4 and adopt UrlCRSProvider#2658
Conversation
- Bump proj4sedona.version from 0.0.3 to 0.0.4 - Add 3 new Spark configs: spark.sedona.crs.url.base, spark.sedona.crs.url.pathTemplate, spark.sedona.crs.url.format - Add registerUrlCrsProvider() in FunctionsProj4 with thread-safe idempotent registration (AtomicReference, priority 50) - Wire ST_Transform to capture URL CRS config on driver and register provider on executors via companion object readConfig() - Add tests: 6 unit tests (FunctionsProj4Test), 6 config tests (SedonaConfTest), 4 integration tests (CRSTransformProj4Test) using local HTTP server with fake EPSG:990001
1. Move URL CRS provider registration from ST_Transform class body into lazy val f, so it only executes on executors during row evaluation, never on the driver during query planning. 2. Wrap registerUrlCrsProvider's remove-register-set sequence in a synchronized block with double-checked locking. The fast path (already registered) is lock-free (volatile read + String.equals). 3. Add 16-thread concurrency test verifying no duplicate providers are registered under contention.
- Parameter.md: document spark.sedona.crs.url.base, pathTemplate, format - CRS-Transformation.md: add URL CRS Provider section with hosting guidance (GitHub repo, S3), supported formats table, GitHub raw URL example, self-hosted server example, custom authority codes example (MYORG:1001), and instructions for disabling - All examples use Python SedonaContext.builder().config() style
There was a problem hiding this comment.
Pull request overview
This pull request upgrades proj4sedona from version 0.0.3 to 0.0.4 and introduces a new URL-based CRS Provider feature that enables users to resolve custom CRS definitions from remote HTTP servers (such as GitHub repositories or S3 buckets) before falling back to built-in definitions. The feature is designed to address use cases where users need custom or internal coordinate reference system definitions not included in standard CRS databases, particularly relevant for specialized transformations as described in issue #1397.
Changes:
- Upgraded proj4sedona dependency from 0.0.3 to 0.0.4 in pom.xml
- Added three new Spark configuration parameters for URL-based CRS resolution:
spark.sedona.crs.url.base,spark.sedona.crs.url.pathTemplate, andspark.sedona.crs.url.format - Implemented thread-safe URL CRS provider registration logic in FunctionsProj4.java using double-checked locking pattern
- Integrated URL provider registration into ST_Transform expression with lazy evaluation on executors
- Added comprehensive test coverage including unit tests, integration tests, and concurrency tests
- Documented the new feature with multiple usage examples in CRS-Transformation.md and Parameter.md
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pom.xml | Bumped proj4sedona version from 0.0.3 to 0.0.4 |
| spark/common/src/main/java/org/apache/sedona/core/utils/SedonaConf.java | Added three new configuration fields and getter methods for URL CRS provider settings |
| spark/common/src/test/java/org/apache/sedona/core/utils/SedonaConfTest.java | Added 6 new unit tests validating default values and custom configurations for URL CRS provider settings |
| common/src/main/java/org/apache/sedona/common/FunctionsProj4.java | Implemented thread-safe registerUrlCrsProvider() method with double-checked locking and parseCrsFormat() helper |
| common/src/test/java/org/apache/sedona/common/FunctionsProj4Test.java | Added 7 comprehensive unit tests including thread safety, idempotency, config changes, format parsing, and local HTTP server integration |
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/Functions.scala | Modified ST_Transform to capture and serialize URL provider config from driver to executors, with lazy registration in executor evaluation |
| spark/common/src/test/scala/org/apache/sedona/sql/CRSTransformProj4Test.scala | Added 4 integration tests covering default behavior, fallback scenarios, provider registration verification, and end-to-end custom CRS transformation |
| docs/api/sql/Parameter.md | Documented the three new configuration parameters with descriptions, defaults, examples, and supported values |
| docs/api/sql/CRS-Transformation.md | Added comprehensive "URL CRS Provider" section with hosting guidance, configuration instructions, format table, and 5 practical examples |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
common/src/main/java/org/apache/sedona/common/FunctionsProj4.java
Outdated
Show resolved
Hide resolved
spark/common/src/test/scala/org/apache/sedona/sql/CRSTransformProj4Test.scala
Outdated
Show resolved
Hide resolved
…, StandardCharsets.UTF_8
…test-only reset method
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes Upgrade proj4sedona to 0.0.4 version and adopt the UrlCRSprovider #2657 and Specify custom transformation parameters/wkt string from CoordinateSystem A til CoordinateSystem B. #1397What changes were proposed in this PR?
Upgrade proj4sedona from 0.0.3 to 0.0.4 and adopt the new
UrlCRSProviderAPI, allowing users to resolve CRS definitions from a remote HTTP server (e.g., a GitHub repo or S3 bucket) before falling back to built-in definitions.Changes
Dependency upgrade
pom.xml: bumpproj4sedona.versionfrom0.0.3to0.0.4New Spark configuration keys (in
SedonaConf.java)spark.sedona.crs.url.base(default: empty string, disabled) — Base URL of the CRS definition serverspark.sedona.crs.url.pathTemplate(default:/{authority}/{code}.json) — URL path template with{authority}and{code}placeholdersspark.sedona.crs.url.format(default:projjson) — Response format:projjson,proj,wkt1, orwkt2Registration logic (in
FunctionsProj4.java)registerUrlCrsProvider(baseUrl, pathTemplate, format): registers aUrlCRSProviderwith proj4sedona Defs registry at priority 50 (before built-in at 100)parseCrsFormat(String): maps config string toCRSResult.FormatenumST_Transform integration (in
Functions.scala)ST_Transformcaptures the 3 new config values on the driver viaSedonaConfand serializes them to executorslazy val fon executors during row evaluationreadConfig()consolidates all config readingDocumentation (in
CRS-Transformation.mdandParameter.md)How was this patch tested?
Unit tests (FunctionsProj4Test.java — 42 tests)
testRegisterUrlCrsProviderNoOpOnNullOrEmpty: null/empty baseUrl is a no-optestRegisterUrlCrsProviderRegistersAndIsIdempotent: single registration, no duplicates on repeat calltestRegisterUrlCrsProviderReRegistersOnConfigChange: config change triggers re-registrationtestParseCrsFormatAllMappings: all format strings map correctlytestParseCrsFormatDefaultsAndCaseInsensitive: null/empty/unknown/uppercase default to PROJJSONtestTransformWithLocalUrlCrsProvider: local HTTP server serves fake EPSG:990001, verifies URL provider resolves custom codetestRegisterUrlCrsProviderConcurrentThreadSafety: 16 threads race into registration via CyclicBarrier, asserts exactly 1 provider registeredIntegration tests (CRSTransformProj4Test.scala — 36 tests, 4 new)
should still transform correctly when URL provider is not configuredshould fall back to built-in when URL provider returns nothingshould register URL CRS provider when config is setshould transform using local HTTP URL CRS provider with custom CRSConfig tests (SedonaConfTest.java — 9 tests, 6 new)
Run commands:
Did this PR include necessary documentation updates?