Skip to content

Fix Iceberg package-private access after shim isolation#14866

Open
gerashegalov wants to merge 8 commits into
NVIDIA:release/26.06from
gerashegalov:codex/iceberg-package-private-access
Open

Fix Iceberg package-private access after shim isolation#14866
gerashegalov wants to merge 8 commits into
NVIDIA:release/26.06from
gerashegalov:codex/iceberg-package-private-access

Conversation

@gerashegalov
Copy link
Copy Markdown
Collaborator

@gerashegalov gerashegalov commented May 23, 2026

Fixes #14726.

Description

This PR narrows the SQL plugin shim isolation work to the Iceberg package-private access failures. When Iceberg runtime jars are placed on driver/executor extraClassPath, Spark can load root classes through the app loader while shim classes load from Spark's mutable URL classloader. That breaks access to package-private Iceberg classes and methods unless the accessing code is also root-loadable.

Changes:

  • Add root-layout accessors for Iceberg package-private APIs used by the GPU integration:
    • IcebergS3InputFileAccess for BaseS3File/S3URI.
    • GpuParquetIOAccess for ParquetIO.file.
    • IcebergProviderAccess for version detection from pytests.
  • Keep the GPU clustered/fanout write path extending Iceberg ClusteredWriter and FanoutWriter by introducing root-layout Java bridge classes, avoiding copied implementations in the shimmed Scala classes.
  • Make GpuTypeToSparkType root-loadable by removing its dependency on SchemaUtils.
  • Add the new root-layout classes to dist unshimmed allowlists.
  • Add ICEBERG_EXTRA_CLASSPATH to integration_tests/run_pyspark_from_build.sh and document it so Iceberg integration pytests can run with local Iceberg runtime jars on driver/executor extraClassPath instead of --jars/--packages.

Validation:

  • ./build/buildall --profile=353,357
  • mvn -B -P release357 -pl integration_tests -am -DskipTests package
  • bash -n integration_tests/run_pyspark_from_build.sh
  • Spark 3.5.7 Iceberg write smoke using extraClassPath: INSERT INTO local.db.smoke VALUES (1, 'Alice'), (2, 'Bob')
  • TESTS=iceberg/iceberg_version_detection_test.py ./integration_tests/run_pyspark_from_build.sh -m iceberg --iceberg -k test_iceberg_version_detection with ICEBERG_EXTRA_CLASSPATH=~/downloads/iceberg-spark-runtime-3.5_2.12-1.10.1.jar: 1 passed
  • TESTS=iceberg/iceberg_append_test.py ./integration_tests/run_pyspark_from_build.sh -m iceberg --iceberg -k test_insert_into_unpartitioned_table_values with the same ICEBERG_EXTRA_CLASSPATH: 2 passed

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    Existing tests run: iceberg_version_detection_test.py::test_iceberg_version_detection and iceberg_append_test.py::test_insert_into_unpartitioned_table_values.
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

gerashegalov and others added 4 commits May 22, 2026 16:31
Add root-layout helpers for Iceberg package-private APIs used by the Iceberg integration, and bridge clustered/fanout writers through root classes so the GPU writers continue to use Iceberg's writer implementations without copying them.

Add an ICEBERG_EXTRA_CLASSPATH integration-test mode so local Iceberg runtime jars can be loaded through driver/executor extraClassPath instead of spark.jars/spark.jars.packages. Add a root-safe Iceberg version accessor for the pytest version-detection path.
@gerashegalov gerashegalov requested a review from a team as a code owner May 23, 2026 06:05
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 23, 2026

Greptile Summary

This PR fixes Iceberg package-private access failures that occur when Iceberg runtime jars are placed on driver/executor extraClassPath, causing Iceberg classes to be loaded by the root/app classloader while GPU shim classes are loaded by Spark's mutable URL classloader — a situation where package-private access checks fail across loader boundaries.

  • Introduces root-layout Java accessor classes (GpuSparkWriteAccess, GpuSparkScanAccess, GpuSparkReadConfAccess, GpuParquetIOAccess, IcebergS3InputFileAccess, IcebergProviderAccess) in the same packages as the Iceberg classes they access, so JVM package-private checks pass without cross-loader violations.
  • Replaces direct ClusteredWriter/FanoutWriter inheritance in shimmed Scala classes with an abstract Scala intermediary (GpuClusteredWriter/GpuFanoutWriter) that delegates to root-layout Java bridge classes (GpuClusteredWriterBridge/GpuFanoutWriterBridge), preserving the Iceberg writer contract without shim-classloader access to package-private Iceberg types.
  • Adds ICEBERG_EXTRA_CLASSPATH support to run_pyspark_from_build.sh and updates binary-dedupe.sh to promote the new root-layout classes from spark-shared to the root jar layout during the dist build.

Confidence Score: 5/5

Safe to merge; the change narrows a real classloader boundary violation without altering Iceberg write/read semantics.

All package-private access is now correctly routed through root-layout Java classes in the same Iceberg packages. The writer bridge pattern preserves the ClusteredWriter/FanoutWriter contract with no behavioral change. GpuBatchAppend.commit correctly delegates to the CPU batch write after GpuSparkWriteAccess.taskCommit creates proper SparkWrite.TaskCommit messages. The binary-dedupe.sh promotion logic has appropriate guards. No logic, data-correctness, or resource-management regressions were identified.

No files require special attention.

Important Files Changed

Filename Overview
iceberg/common/src/main/java/org/apache/iceberg/io/GpuClusteredWriterBridge.java New root-layout bridge extending package-private ClusteredWriter; delegates all abstract methods to caller-provided lambdas; cleanly enables shimmed Scala subclasses to avoid direct inheritance of the Iceberg package-private class.
iceberg/common/src/main/java/org/apache/iceberg/io/GpuFanoutWriterBridge.java Parallel to GpuClusteredWriterBridge for FanoutWriter; structurally identical, no issues.
iceberg/common/src/main/java/org/apache/iceberg/spark/source/GpuSparkWriteAccess.java Root-layout accessor class for SparkWrite/SparkPositionDeltaWrite package-private fields; uses the same field-reflection pattern (findField walks class hierarchy) as the Scala code it replaces; correctly constructs TaskCommit/DeltaTaskCommit in the same package as Iceberg.
iceberg/common/src/main/java/org/apache/iceberg/spark/source/GpuSparkScanAccess.java Root-layout accessor for SparkScan, SparkBatch, SparkInputPartition package-private classes; correct use of direct casts and reflection to expose required fields and type-checks.
iceberg/common/src/main/scala/org/apache/iceberg/io/clustered.scala Introduces abstract GpuClusteredWriter that wraps a GpuClusteredWriterBridge via lambda callbacks; concrete writer classes now extend the abstract Scala class instead of the Iceberg package-private ClusteredWriter directly; delegation chain is correct.
iceberg/common/src/main/scala/org/apache/iceberg/io/fanout.scala Parallel restructuring to clustered.scala for FanoutWriter; no issues found.
iceberg/common/src/main/scala/org/apache/iceberg/spark/source/GpuSparkWrite.scala GpuSparkWrite now accepts Write (not SparkWrite) and delegates field reads to GpuSparkWriteAccess; files()/commitOperation() helpers removed; abort delegated to cpuBatchWrite directly. Clean refactor.
iceberg/common/src/main/scala/org/apache/iceberg/spark/source/write.scala GpuBaseBatchWrite now carries cpuBatchWrite reference; abort and useCommitCoordinator delegate to CPU; GpuBatchAppend.commit simplified to cpuBatchWrite.commit(messages), removing duplicated append logic.
iceberg/common/src/main/scala/org/apache/iceberg/spark/source/GpuSparkPositionDeltaWrite.scala cpu parameter widened from SparkPositionDeltaWrite to DeltaWrite; field reads via GpuSparkWriteAccess; DeltaTaskCommit construction moved to accessor; abort(messages) helper removed as it is now handled by cpuBatchWrite.abort directly.
dist/scripts/binary-dedupe.sh New copy_unshimmed_from_spark_shared function promotes allowlisted spark-shared files to root layout; correctly guards with -s check before rsync; env variable passed from build-parallel-worlds.xml.

Reviews (4): Last reviewed commit: "Update Iceberg write copyright header" | Re-trigger Greptile

Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
Delegate GPU batch write commit, abort, and commit-coordinator handling back to the Iceberg CPU BatchWrite instances. This keeps Iceberg's private commitOperation and abort methods out of GpuSparkWriteAccess while preserving the GPU writer factory.

Private Iceberg write state still requires field reflection because SparkWrite and SparkPositionDeltaWrite keep the needed members private and the nested Context class itself is private.
@gerashegalov
Copy link
Copy Markdown
Collaborator Author

Review follow-up:

  • The method-reflection comment is obsolete after dba0e2d: Spark write commit/abort handling now delegates to the CPU BatchWrite instead of reflectively invoking Iceberg methods.
  • The widened InputPartition cast comment is addressed in 6cf50b2 by validating that the Iceberg partition implements HasPartitionKey up front and reusing the typed reference.

Broader removal of the remaining private-field reflection in the Iceberg write shim is intentionally deferred to a follow-up PR so this packaging/classloader fix can stay narrow.

@gerashegalov gerashegalov self-assigned this May 23, 2026
@gerashegalov gerashegalov added the bug Something isn't working label May 23, 2026
@gerashegalov
Copy link
Copy Markdown
Collaborator Author

build

@nvauto
Copy link
Copy Markdown
Collaborator

nvauto commented May 25, 2026

NOTE: release/26.06 has been created from main. Please retarget your PR to release/26.06 if it should be included in the release.

res-life
res-life previously approved these changes May 29, 2026
Copy link
Copy Markdown
Collaborator

@res-life res-life left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Checked by AI, does not change any original behavior.

@pxLi
Copy link
Copy Markdown
Member

pxLi commented May 29, 2026

The issue was created for 26.06, should we retarget this one to release/26.06?

@sameerz
Copy link
Copy Markdown
Collaborator

sameerz commented May 29, 2026

The issue was created for 26.06, should we retarget this one to release/26.06?

Yes.

@gerashegalov gerashegalov changed the base branch from main to release/26.06 May 29, 2026 23:18
@gerashegalov gerashegalov dismissed res-life’s stale review May 29, 2026 23:18

The base branch was changed.

@gerashegalov
Copy link
Copy Markdown
Collaborator Author

Reran all iceberg tests with Iceberg on the system classpath after retargetting

$ SPARK_HOME=$HOME/dist/spark-3.5.7-bin-hadoop3 TEST_PARALLEL=0 ICEBERG_EXTRA_CLASSPATH=$HOME/.m2/repository/org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/1.10.1/iceberg-spark-runtime-3.5_2.12-1.10.1.jar PYSP_TEST_spark_sql_extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions PYSP_TEST_spark_sql_catalog_spark__catalog=org.apache.iceberg.spark.SparkSessionCatalog PYSP_TEST_spark_sql_catalog_spark__catalog_type=hadoop PYSP_TEST_spark_sql_catalog_spark__catalog_warehouse=/tmp/spark-warehouse-pr14866-spark357-$RANDOM ./integration_tests/run_pyspark_from_build.sh -m iceberg --iceberg

...
= 852 passed, 16 skipped, 39094 deselected, 5 xpassed, 14 warnings in 22241.81s (6:10:41) =

@NvTimLiu NvTimLiu self-requested a review June 1, 2026 01:37
Copy link
Copy Markdown
Collaborator

@NvTimLiu NvTimLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Iceberg 1.10.1 SparkWrite class loader issue when jars in $SPARK_HOME/jars

6 participants