[FLINK-37604] Generate static UIDs for pipeline operators #3977

morozov · 2025-04-02T16:09:14Z

This PR adds a new configuration parameter operator.uid.prefix¹. Once it's specified, the pipeline composer will generate static UIDs for all its operators using this prefix. This way, for a given job, all operator UIDs will be preserved if the job graph changes.

By default, the operator UID prefix is not set. In this case, the pipeline composer will not generate operator UIDs in order to preserve backward compatibility of the existing jobs with their state that includes operator UIDs generated by Flink.

It should be recommended to set this parameter as it is recommended by Flink.

Please let me know if this a directionally right change, and I will update the documentation.

A similar approach is used in Apache Iceberg's FlinkSink.Builder (reference). ↩

yuxiqian · 2025-04-03T04:23:34Z

It seems we already have the pipeline option schema.operator.uid to configure UID, but only for Schema Operators. So maybe the description of operator.uid.prefix isn't precise as it doesn't apply to “all pipeline operators”.

Agree that keeping all operators UID fixed (not only for schema operators) is the right thing, so perhaps we can deprecate schema.operator.uid and favor operator.uid.prefix? We may keep state backwards compatibility with extra checking:

	`schema.operator.uid` set	`schema.operator.uid` not set
`operator.uid.prefix` set	Incompatible configurations Throw exceptions (?)	Set fixed UID for all operators (including schema operators)
`operator.uid.prefix` not set	Only set UID for schema operators (behavior unchanged for state compatibility)	Only set UID for schema operators with the default value of `schema.operator.uid` (behavior unchanged for state compatibility)

and remove schema.operator.uid as a breaking change later. WDYT?

morozov · 2025-04-03T14:37:46Z

@yuxiqian that sounds like a plan. I will work on making these changes.

lvyanquan · 2025-04-18T02:47:55Z

Hi @yuxiqian please take a look at this if you have time.

yuxiqian · 2025-04-22T02:41:21Z

Thanks for @morozov's quick response. Do you think we need an IT case to verify if operator UIDs are correctly set by examining Flink execution plan JSON (like #3887)?

morozov · 2025-04-23T01:31:40Z

Do you think we need an IT case to verify if operator UIDs are correctly set by examining Flink execution plan JSON (like #3887)?

@yuxiqian, it doesn't look like the execution plan JSON contains operator UIDs. It contains IDs (the numeric identifiers that start with 1 and increment by 1). For example (from the test you referenced):

{
  "nodes" : [ {
    "id" : 1,
    "type" : "Source: Distributed Source",
    "pact" : "Data Source",
    "contents" : "Source: Distributed Source",
    "parallelism" : 9
  }, {
    "id" : 2,
    "type" : "Partitioning",
    "pact" : "Operator",
    "contents" : "Partitioning",
    "parallelism" : 4,
    "predecessors" : [ {
      "id" : 1,
      "ship_strategy" : "REBALANCE",
      "side" : "second"
    } ]
  }, {
    "id" : 4,
    "type" : "SchemaMapper",
    "pact" : "Operator",
    "contents" : "SchemaMapper",
    "parallelism" : 4,
    "predecessors" : [ {
      "id" : 2,
      "ship_strategy" : "CUSTOM",
      "side" : "second"
    } ]
  }, {
    "id" : 5,
    "type" : "Sink: Sink Writer: Value Sink",
    "pact" : "Data Sink",
    "contents" : "Sink: Sink Writer: Value Sink",
    "parallelism" : 10,
    "predecessors" : [ {
      "id" : 4,
      "ship_strategy" : "REBALANCE",
      "side" : "second"
    } ]
  } ]
}

yuxiqian · 2025-04-23T02:00:14Z

Thanks for double-checking this; it seems we could only get the hashed VertexId in the JobGraph, and could not verify this in unit tests reliably. It should not be a blocker.

morozov · 2025-04-23T16:46:25Z

It looks like there's some active development happening in the code modified by this PR. Recently, I resolved conflicts with the changes from #3812, and then with the ones from #3986.

@yuxiqian, is there anything else I can do before the merge?

I added the new parameter to the documentation in English and can update the Chinese if I know what to put there.

yuxiqian · 2025-04-24T02:19:41Z

Thanks for @morozov's nice work, LGTM. Would @lvyanquan like to take a further look?

Suggested Chinese translation:

Index: docs/content.zh/docs/core-concept/data-pipeline.md
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/docs/content.zh/docs/core-concept/data-pipeline.md b/docs/content.zh/docs/core-concept/data-pipeline.md
--- a/docs/content.zh/docs/core-concept/data-pipeline.md	(revision 54768703ec976b712e62f4184b7b24c8319f8e69)
+++ b/docs/content.zh/docs/core-concept/data-pipeline.md	(date 1745460782234)
@@ -111,9 +111,10 @@
 # Pipeline 配置
 下面 是 Data Pipeline 的一些可选配置：
 
-| 参数                      | 含义                                                  | optional/required |
-|-------------------------|-----------------------------------------------------|-------------------|
-| name                    | 这个 pipeline 的名称，会用在 Flink 集群中作为作业的名称。               | optional          |
-| parallelism             | pipeline的全局并发度，默认值是1。                               | optional          |
-| local-time-zone         | 作业级别的本地时区。                                          | optional          |
-| execution.runtime-mode  | pipeline 的运行模式，包含 STREAMING 和 BATCH，默认值是 STREAMING。 | optional          |
\ No newline at end of file
+| 参数                     | 含义                                                  | optional/required |
+|------------------------|-----------------------------------------------------|-------------------|
+| name                   | 这个 Pipeline 的名称，会用在 Flink 集群中作为作业的名称。               | optional          |
+| parallelism            | Pipeline 的全局并发度，默认值是1。                              | optional          |
+| local-time-zone        | 作业级别的本地时区。                                          | optional          |
+| execution.runtime-mode | Pipeline 的运行模式，包含 STREAMING 和 BATCH，默认值是 STREAMING。 | optional          |
+| operator.uid.prefix    | Pipeline 中算子 UID 的前缀。如果不设置，Flink 会为每个算子生成唯一的 UID。   | optional          |
\ No newline at end of file

lvyanquan · 2025-05-27T03:59:55Z

flink-cdc-common/src/main/java/org/apache/flink/cdc/common/pipeline/PipelineOptions.java

+                    .noDefaultValue()
+                    .withDescription(
+                            "The prefix to use for all pipeline operator UIDs. If not set, all pipeline operator UIDs will be generated by Flink.");
+
    public static final ConfigOption<String> PIPELINE_SCHEMA_OPERATOR_UID =


Please add an annotation of @deprecated.

lvyanquan · 2025-05-27T04:13:17Z

docs/content/docs/core-concept/data-pipeline.md

 | `schema-operator.rpc-timeout` | The timeout time for SchemaOperator to wait downstream SchemaChangeEvent applying finished, the default value is 3 minutes.                                                                                                                                                                                                                                                                                                                                                                               | optional          |
+| `operator.uid.prefix`         | The prefix to use for all pipeline operator UIDs. If not set, all pipeline operator UIDs will be generated by Flink.                                                                                                                                                                                                                                                                                                                                                                                      | optional          |


We should clarify that this is a recommended parameter to set and explain the reason behind it.

......
It is recommended to set this parameter to ensure stable and recognizable operator UIDs, which can help with stateful upgrades, troubleshooting, and Flink UI diagnostics.

lvyanquan · 2025-05-27T10:07:15Z

docs/content/docs/core-concept/data-pipeline.md

 | `schema-operator.rpc-timeout` | The timeout time for SchemaOperator to wait downstream SchemaChangeEvent applying finished, the default value is 3 minutes.                                                                                                                                                                                                                                                                                                                                                                               | optional          |
+| `operator.uid.prefix`         | The prefix to use for all pipeline operator UIDs. If not set, all pipeline operator UIDs will be generated by Flink.                                                                                                                                                                                                                                                                                                                                                                                      | optional          |


......
It is recommended to set this parameter to ensure stable and recognizable operator UIDs, which can help with stateful upgrades, troubleshooting, and Flink UI diagnostics.

lvyanquan · 2025-05-27T10:09:06Z

docs/content.zh/docs/core-concept/data-pipeline.md

+| local-time-zone        | 作业级别的本地时区。                                          | optional          |
+| execution.runtime-mode | pipeline 的运行模式，包含 STREAMING 和 BATCH，默认值是 STREAMING。 | optional          |
+| operator.uid.prefix    | Pipeline 中算子 UID 的前缀。如果不设置，Flink 会为每个算子生成唯一的 UID。   | optional          |


......
建议设置这个参数以提供稳定和可识别的算子 ID，这有助于有状态升级、问题排查和在 Flink UI 上的诊断。

lvyanquan

+1

github-actions bot added composer common doris-pipeline-connector starrocks-pipeline-connector labels Apr 2, 2025

morozov force-pushed the FLINK-37604-operator-uid-prefix branch from e9b6879 to 05dd868 Compare April 2, 2025 16:10

morozov force-pushed the FLINK-37604-operator-uid-prefix branch from aeb5798 to 481dccb Compare April 23, 2025 01:31

github-actions bot added the build label Apr 23, 2025

morozov force-pushed the FLINK-37604-operator-uid-prefix branch from 481dccb to 0343c5a Compare April 23, 2025 01:38

github-actions bot removed the build label Apr 23, 2025

morozov force-pushed the FLINK-37604-operator-uid-prefix branch from 0343c5a to 26410c2 Compare April 23, 2025 16:42

github-actions bot added the docs Improvements or additions to documentation label Apr 23, 2025

morozov force-pushed the FLINK-37604-operator-uid-prefix branch from 26410c2 to 105c741 Compare April 24, 2025 17:10

morozov added 4 commits April 29, 2025 09:38

[FLINK-37604] Generate static UIDs for pipeline operators

c13e02c

[FLINK-37604] Use OperatorUidGenerator to generate schema operator UID

f5ced4a

[FLINK-37604] Deprecate schema.operator.uid

3c9eb87

[FLINK-37604] Document operator.uid.prefix

3ed00f4

morozov force-pushed the FLINK-37604-operator-uid-prefix branch from 105c741 to 3ed00f4 Compare April 29, 2025 16:41

lvyanquan self-assigned this May 27, 2025

lvyanquan reviewed May 27, 2025

View reviewed changes

[FLINK-37604] Update documentation

726e569

lvyanquan reviewed May 27, 2025

View reviewed changes

[FLINK-37604] Update documentation

3ba523c

lvyanquan approved these changes May 29, 2025

View reviewed changes

github-actions bot added approved reviewed labels May 29, 2025

lvyanquan merged commit e8f9ff0 into apache:master May 30, 2025
23 checks passed

morozov deleted the FLINK-37604-operator-uid-prefix branch May 30, 2025 06:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FLINK-37604] Generate static UIDs for pipeline operators #3977

[FLINK-37604] Generate static UIDs for pipeline operators #3977

Uh oh!

morozov commented Apr 2, 2025 •

edited

Loading

Uh oh!

yuxiqian commented Apr 3, 2025

Uh oh!

morozov commented Apr 3, 2025

Uh oh!

lvyanquan commented Apr 18, 2025

Uh oh!

yuxiqian commented Apr 22, 2025

Uh oh!

morozov commented Apr 23, 2025

Uh oh!

yuxiqian commented Apr 23, 2025

Uh oh!

morozov commented Apr 23, 2025

Uh oh!

yuxiqian commented Apr 24, 2025

Uh oh!

lvyanquan May 27, 2025

Uh oh!

lvyanquan May 27, 2025

Uh oh!

lvyanquan May 27, 2025

Uh oh!

lvyanquan May 27, 2025

Uh oh!

lvyanquan May 27, 2025

Uh oh!

lvyanquan left a comment

Uh oh!

Uh oh!

Uh oh!

		\| `schema-operator.rpc-timeout` \| The timeout time for SchemaOperator to wait downstream SchemaChangeEvent applying finished, the default value is 3 minutes. \| optional \|
		\| `operator.uid.prefix` \| The prefix to use for all pipeline operator UIDs. If not set, all pipeline operator UIDs will be generated by Flink. \| optional \|

[FLINK-37604] Generate static UIDs for pipeline operators #3977

[FLINK-37604] Generate static UIDs for pipeline operators #3977

Uh oh!

Conversation

morozov commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

yuxiqian commented Apr 3, 2025

Uh oh!

morozov commented Apr 3, 2025

Uh oh!

lvyanquan commented Apr 18, 2025

Uh oh!

yuxiqian commented Apr 22, 2025

Uh oh!

morozov commented Apr 23, 2025

Uh oh!

yuxiqian commented Apr 23, 2025

Uh oh!

morozov commented Apr 23, 2025

Uh oh!

yuxiqian commented Apr 24, 2025

Uh oh!

lvyanquan May 27, 2025

Choose a reason for hiding this comment

Uh oh!

lvyanquan May 27, 2025

Choose a reason for hiding this comment

Uh oh!

lvyanquan May 27, 2025

Choose a reason for hiding this comment

Uh oh!

lvyanquan May 27, 2025

Choose a reason for hiding this comment

Uh oh!

lvyanquan May 27, 2025

Choose a reason for hiding this comment

Uh oh!

lvyanquan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

morozov commented Apr 2, 2025 •

edited

Loading