[FLINK-38844][pipeline-connector][postgres]Add metadata column support#4202
[FLINK-38844][pipeline-connector][postgres]Add metadata column support#4202tchivs wants to merge 2 commits intoapache:masterfrom
Conversation
This commit adds metadata column support for the PostgreSQL Pipeline Connector, enabling users to access metadata information in their data pipelines. Changes: - Add OpTsMetadataColumn for operation timestamp - Add DatabaseNameMetadataColumn for database name - Add SchemaNameMetadataColumn for schema name - Add TableNameMetadataColumn for table name - Update PostgresDataSource to support metadata columns - Add comprehensive E2E test testAllMetadataColumns() - Update documentation (English and Chinese)
yuxiqian
left a comment
There was a problem hiding this comment.
Thanks for @tchivs' contribution.
I wonder if we need individual metadata columns for database, schema, and table, since they're always available in Transform expressions (only after FLINK-38840 got closed).
Thanks for the review @yuxiqian! You raise an important point about the overlap with Transform metadata fields. You're right that namespace_name, schema_name, and table_name are already available in Transform expressions. Let me clarify the design rationale:
I see two perspectives here: Argument for keeping them:
Argument for removing them:
My suggestion:
What's your preference? I'm happy to adjust the PR based on the team's direction. |
|
I think it's OK to polish documentations in this PR, leaving metadata definitions as it is. |
…relationship with Transform expressions
Thanks @yuxiqian for the feedback! I've polished the documentation to clarify the relationship between metadata columns and Transform expressions. Changes made:
The metadata definitions remain unchanged as you suggested. |
What is the purpose of the pull request
This PR adds metadata column support for the PostgreSQL Pipeline Connector, enabling users to access metadata information such as operation timestamp, database name, schema name, and table name in their data pipelines.
Brief change log
OpTsMetadataColumn: Operation timestamp metadataDatabaseNameMetadataColumn: Database name metadataSchemaNameMetadataColumn: Schema name metadataTableNameMetadataColumn: Table name metadataPostgresDataSourceto support metadata columns viasupportedMetadataColumns()methodtestAllMetadataColumns()inPostgresFullTypesITCaseVerifying this change
This change added tests and can be verified as follows:
testAllMetadataColumns()E2E test inPostgresFullTypesITCaseDoes this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation