Fix Iceberg package-private access after shim isolation#14866
Fix Iceberg package-private access after shim isolation#14866gerashegalov wants to merge 8 commits into
Conversation
Add root-layout helpers for Iceberg package-private APIs used by the Iceberg integration, and bridge clustered/fanout writers through root classes so the GPU writers continue to use Iceberg's writer implementations without copying them. Add an ICEBERG_EXTRA_CLASSPATH integration-test mode so local Iceberg runtime jars can be loaded through driver/executor extraClassPath instead of spark.jars/spark.jars.packages. Add a root-safe Iceberg version accessor for the pytest version-detection path.
Greptile SummaryThis PR fixes Iceberg package-private access failures that occur when Iceberg runtime jars are placed on
Confidence Score: 5/5Safe to merge; the change narrows a real classloader boundary violation without altering Iceberg write/read semantics. All package-private access is now correctly routed through root-layout Java classes in the same Iceberg packages. The writer bridge pattern preserves the ClusteredWriter/FanoutWriter contract with no behavioral change. GpuBatchAppend.commit correctly delegates to the CPU batch write after GpuSparkWriteAccess.taskCommit creates proper SparkWrite.TaskCommit messages. The binary-dedupe.sh promotion logic has appropriate guards. No logic, data-correctness, or resource-management regressions were identified. No files require special attention. Important Files Changed
Reviews (4): Last reviewed commit: "Update Iceberg write copyright header" | Re-trigger Greptile |
Delegate GPU batch write commit, abort, and commit-coordinator handling back to the Iceberg CPU BatchWrite instances. This keeps Iceberg's private commitOperation and abort methods out of GpuSparkWriteAccess while preserving the GPU writer factory. Private Iceberg write state still requires field reflection because SparkWrite and SparkPositionDeltaWrite keep the needed members private and the nested Context class itself is private.
|
Review follow-up:
Broader removal of the remaining private-field reflection in the Iceberg write shim is intentionally deferred to a follow-up PR so this packaging/classloader fix can stay narrow. |
|
build |
|
NOTE: release/26.06 has been created from main. Please retarget your PR to release/26.06 if it should be included in the release. |
res-life
left a comment
There was a problem hiding this comment.
LGTM
Checked by AI, does not change any original behavior.
|
The issue was created for 26.06, should we retarget this one to release/26.06? |
Yes. |
|
Reran all iceberg tests with Iceberg on the system classpath after retargetting |
Fixes #14726.
Description
This PR narrows the SQL plugin shim isolation work to the Iceberg package-private access failures. When Iceberg runtime jars are placed on driver/executor extraClassPath, Spark can load root classes through the app loader while shim classes load from Spark's mutable URL classloader. That breaks access to package-private Iceberg classes and methods unless the accessing code is also root-loadable.
Changes:
IcebergS3InputFileAccessforBaseS3File/S3URI.GpuParquetIOAccessforParquetIO.file.IcebergProviderAccessfor version detection from pytests.ClusteredWriterandFanoutWriterby introducing root-layout Java bridge classes, avoiding copied implementations in the shimmed Scala classes.GpuTypeToSparkTyperoot-loadable by removing its dependency onSchemaUtils.ICEBERG_EXTRA_CLASSPATHtointegration_tests/run_pyspark_from_build.shand document it so Iceberg integration pytests can run with local Iceberg runtime jars on driver/executor extraClassPath instead of--jars/--packages.Validation:
./build/buildall --profile=353,357mvn -B -P release357 -pl integration_tests -am -DskipTests packagebash -n integration_tests/run_pyspark_from_build.shINSERT INTO local.db.smoke VALUES (1, 'Alice'), (2, 'Bob')TESTS=iceberg/iceberg_version_detection_test.py ./integration_tests/run_pyspark_from_build.sh -m iceberg --iceberg -k test_iceberg_version_detectionwithICEBERG_EXTRA_CLASSPATH=~/downloads/iceberg-spark-runtime-3.5_2.12-1.10.1.jar: 1 passedTESTS=iceberg/iceberg_append_test.py ./integration_tests/run_pyspark_from_build.sh -m iceberg --iceberg -k test_insert_into_unpartitioned_table_valueswith the sameICEBERG_EXTRA_CLASSPATH: 2 passedChecklists
Documentation
Testing
Existing tests run:
iceberg_version_detection_test.py::test_iceberg_version_detectionandiceberg_append_test.py::test_insert_into_unpartitioned_table_values.Performance