Skip to content

[FLINK-39245][Iceberg] Add AWS Glue Catalog support for Iceberg pipeline connector#4314

Open
norrishuang wants to merge 9 commits intoapache:masterfrom
norrishuang:feature/glue-catalog-support
Open

[FLINK-39245][Iceberg] Add AWS Glue Catalog support for Iceberg pipeline connector#4314
norrishuang wants to merge 9 commits intoapache:masterfrom
norrishuang:feature/glue-catalog-support

Conversation

@norrishuang
Copy link

@norrishuang norrishuang commented Mar 12, 2026

This closes FLINK-39245

Summary

Add AWS Glue Catalog support for the Iceberg pipeline sink connector. Previously, the Iceberg pipeline only supported hadoop and hive catalog types. This PR enables users to use AWS Glue Data Catalog as the Iceberg catalog by setting catalog.properties.type: glue.

Changes

New Configuration Options

Option Type Description
catalog.properties.type String Now supports glue in addition to hadoop and hive
catalog.properties.catalog-impl String Custom catalog implementation class (e.g. org.apache.iceberg.aws.glue.GlueCatalog)
catalog.properties.io-impl String Custom FileIO implementation (e.g. org.apache.iceberg.aws.s3.S3FileIO)
catalog.properties.glue.id String Glue Catalog ID (AWS account ID) for cross-account access
catalog.properties.glue.skip-archive Boolean Skip archiving older table versions in Glue (default: true)
catalog.properties.glue.skip-name-validation Boolean Skip name validation for Glue catalog (default: false)
catalog.properties.client.region String AWS region for the Glue catalog client

Files Modified

  • IcebergDataSinkOptions.java — Added Glue-related config options, updated TYPE and WAREHOUSE descriptions
  • IcebergDataSinkFactory.java — Registered new optional config options
  • IcebergDataSinkFactoryTest.java — Added 2 test cases for Glue catalog creation (via type=glue and catalog-impl)
  • pom.xml — Dependency and shade plugin adjustments

Usage Example

sink:
  type: iceberg
  catalog.properties.type: glue
  catalog.properties.warehouse: s3://my-bucket/warehouse/
  catalog.properties.io-impl: org.apache.iceberg.aws.s3.S3FileIO
  catalog.properties.client.region: us-east-1

How It Works

Iceberg's CatalogUtil.buildIcebergCatalog() natively supports type=glue and automatically loads org.apache.iceberg.aws.glue.GlueCatalog. This PR exposes the necessary configuration options through the Flink CDC pipeline config layer and ensures the Glue-related catalog properties are correctly passed through via the catalog.properties.* prefix.

Testing

Unit tests pass (6/6 in IcebergDataSinkFactoryTest)
Verified end-to-end on Amazon EMR (Flink 1.20, Iceberg 1.10.0-amzn) with MySQL CDC → Iceberg (Glue Catalog + S3)

- Add iceberg-aws and iceberg-aws-bundle dependencies to pom.xml
- Add Glue catalog config options: type=glue, catalog-impl, io-impl,
  glue.id, glue.skip-archive, glue.skip-name-validation, client.region
- Register new options in IcebergDataSinkFactory
- Update TYPE and WAREHOUSE descriptions to include glue catalog
- Add unit tests for Glue catalog DataSink creation
- Include software.amazon.awssdk in shade plugin
…h EMR runtime

The EMR-bundled iceberg-flink-runtime already includes GlueCatalog and AWS SDK.
Shading iceberg-aws into our jar causes 'does not implement Catalog' error
due to duplicate Catalog interface from different classloaders.
On EMR, iceberg-flink-runtime is already in Flink's lib/ dir (app classloader).
Shading it into our connector jar causes the Catalog interface to be loaded
from two different classloaders, resulting in ClassCastException:
  GlueCatalog (ChildFirstClassLoader) cannot be cast to Catalog (app loader)

Solution: all iceberg deps are now provided, connector jar is 67KB (our code only).
…onments

Keep the original compile scope and shade behavior for iceberg-flink-runtime
so that hadoop/hive catalog users on non-EMR environments are not affected.
Only iceberg-aws is added as provided scope.
Both iceberg-flink-runtime and iceberg-aws are provided scope.
The iceberg runtime must be supplied by the deployment environment
(e.g. EMR ships it in Flink lib/, non-EMR users add it manually).
This avoids ClassLoader conflicts entirely.
@norrishuang norrishuang changed the title [FLINK-CDC][Iceberg] Add AWS Glue Catalog support for Iceberg pipeline connector [FLINK-39245][Iceberg] Add AWS Glue Catalog support for Iceberg pipeline connector Mar 12, 2026
@lvyanquan lvyanquan requested a review from Copilot March 13, 2026 07:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds AWS Glue Catalog support to the Flink CDC Iceberg pipeline sink by exposing Glue-related Iceberg catalog properties through catalog.properties.*.

Changes:

  • Extend Iceberg sink options to document/configure Glue catalog (and related catalog/io impl properties).
  • Register the new options in the Iceberg sink factory.
  • Add unit tests for creating an Iceberg sink configuration targeting Glue.
  • Adjust Iceberg connector module dependencies and shade plugin configuration.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/main/java/org/apache/flink/cdc/connectors/iceberg/sink/IcebergDataSinkOptions.java Adds Glue-related config options and updates option descriptions.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/main/java/org/apache/flink/cdc/connectors/iceberg/sink/IcebergDataSinkFactory.java Registers additional (Glue-related) options as optional.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/test/java/org/apache/flink/cdc/connectors/iceberg/sink/IcebergDataSinkFactoryTest.java Adds tests for Glue configuration paths (type=glue and catalog-impl).
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/pom.xml Adds Iceberg AWS dependency and modifies scopes / shading configuration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@lvyanquan
Copy link
Contributor

Hi @norrishuang, would you like to update the related doc for this, and it should reflect which additional jars need to be added.

@norrishuang
Copy link
Author

Hi @norrishuang, would you like to update the related doc for this, and it should reflect which additional jars need to be added.

The doc was updated.

@github-actions github-actions bot added the docs Improvements or additions to documentation label Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation iceberg-pipeline-connector

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants