Skip to content

[FLINK-39209][doris] Fix time data type serialiazation when sink to doris with pipeline connector#4312

Open
chengcongchina wants to merge 5 commits intoapache:masterfrom
chengcongchina:FLINK-39209
Open

[FLINK-39209][doris] Fix time data type serialiazation when sink to doris with pipeline connector#4312
chengcongchina wants to merge 5 commits intoapache:masterfrom
chengcongchina:FLINK-39209

Conversation

@chengcongchina
Copy link

@chengcongchina chengcongchina commented Mar 11, 2026

This closes FLINK-39209.

What is the purpose of the change

This PR fixes an issue where sinking data to Doris using the Flink CDC Pipeline connector with TIME data type (e.g., time(0) from MySQL) would throw a serialization exception: Java 8 date/time type java.time.LocalTime not supported by default.

Previously, the DorisRowConverter returned raw LocalTime objects for TIME_WITHOUT_TIME_ZONE types. Since the default Jackson ObjectMapper configuration in DorisEventSerializer does not support Java 8 time types without the jackson-datatype-jsr310 module, this caused runtime failures. This PR introduces a dedicated TIME_FORMATTER (pattern: HH:mm:ss.SSS) to serialize TIME data into formatted strings before passing them to the Jackson serializer.

Brief change log

  • Update DorisEventSerializer to include a TIME_FORMATTER constant with the pattern HH:mm:ss.SSS.
  • Update DorisRowConverter to format TIME_WITHOUT_TIME_ZONE data using DorisEventSerializer.TIME_FORMATTER instead of returning raw LocalTime objects.
  • Update DorisEventSerializerTest to include a new test case testDataChangeEventWithTimeDataType covering TIME(0) and TIME(3) types.
  • Update MySqlToDorisE2eITCase and data_types_test.sql to include TIME columns in the end-to-end test scenario.

Verifying this change

This change is verified by the following tests:

  • DorisEventSerializerTest#testDataChangeEventWithTimeDataType: Verifies that TIME data with different precisions are correctly serialized to the expected JSON string format without exceptions.
  • MySqlToDorisE2eITCase: Verifies the end-to-end data synchronization correctness for TIME type columns from MySQL to Doris.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @public(Evolving): no
  • The serializers: yes
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Doris sink JSON serialization for TIME columns in the Flink CDC Pipeline connector by converting TIME_WITHOUT_TIME_ZONE values into formatted strings before Jackson serialization.

Changes:

  • Add a TIME_FORMATTER to standardize TIME serialization.
  • Convert TIME_WITHOUT_TIME_ZONE from LocalTime to formatted String in DorisRowConverter.
  • Extend unit and e2e tests to cover TIME(0) / TIME(3) columns end-to-end.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
flink-cdc-e2e-tests/flink-cdc-pipeline-e2e-tests/src/test/resources/ddl/data_types_test.sql Adds TIME columns and values to the e2e DDL/data set.
flink-cdc-e2e-tests/flink-cdc-pipeline-e2e-tests/src/test/java/org/apache/flink/cdc/pipeline/tests/MySqlToDorisE2eITCase.java Updates expected schema/rows to include serialized TIME fields.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-doris/src/test/java/org/apache/flink/cdc/connectors/doris/sink/DorisEventSerializerTest.java Adds a focused unit test asserting TIME serialization output.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-doris/src/main/java/org/apache/flink/cdc/connectors/doris/sink/DorisRowConverter.java Formats TIME_WITHOUT_TIME_ZONE to a String during conversion.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-doris/src/main/java/org/apache/flink/cdc/connectors/doris/sink/DorisEventSerializer.java Introduces a shared TIME_FORMATTER constant.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +120 to +123
return (index, val) ->
val.getTime(index)
.toLocalTime()
.format(DorisEventSerializer.TIME_FORMATTER);
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion currently forces a millisecond suffix for all TIME values via HH:mm:ss.SSS, so TIME(0) becomes HH:mm:ss.000. This changes the textual representation compared to typical TIME(0) formatting and ignores the declared TIME precision. Consider formatting based on the TimeType precision (e.g., HH:mm:ss for precision 0; fractional seconds only when precision > 0, padded to the precision), or use an optional-fraction formatter to avoid emitting .000 when not needed.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

return (index, val) ->
val.getTime(index)
.toLocalTime()
.format(DorisEventSerializer.TIME_FORMATTER);
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DorisRowConverter depending on DorisEventSerializer.TIME_FORMATTER couples conversion logic to the serializer class. To reduce cross-class coupling, consider moving TIME formatting constants/utilities into a dedicated shared utility (or into DorisRowConverter if only used there), keeping DorisEventSerializer focused on event serialization.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to do that, just keep consistent with other parts of the code.

Comment on lines +20 to +23
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.cdc.common.configuration.Configuration;
import org.apache.flink.cdc.common.data.RecordData;
import org.apache.flink.cdc.common.data.TimeData;
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimeData appears to be imported only for the Javadoc link. Many build setups enforce no-unused-imports via Checkstyle/Spotless, and Javadoc references typically don’t count as usage. To avoid a style failure, remove the import and reference the class in Javadoc with its fully-qualified name (or otherwise ensure the import is used in code).

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants