Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-30935][connector/kafka] Add kafka serializers version check when using SimpleVersionedSerializer #10

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chucheng92
Copy link
Member

@chucheng92 chucheng92 commented Mar 22, 2023

What is the purpose of the change

Add deserialize version check for kafka simple versioned serializers like other SimpleVersionedSerializer implementations in case of incompatible or corrupt state when restoring from checkpoint.

Brief change log

Add deserialize version check logic for kafka simple versioned serializers.

Verifying this change

Add cases in KafkaCommittableSerializerTest and KafkaWriterStateSerializerTest and KafkaPartitionSplitSerializerTest.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: yes
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@chucheng92 chucheng92 changed the title [FLINK-30935] Add Kafka serializers version check when using SimpleVersionedSerializer [FLINK-30935] Add kafka serializers version check when using SimpleVersionedSerializer Mar 22, 2023
@chucheng92 chucheng92 changed the title [FLINK-30935] Add kafka serializers version check when using SimpleVersionedSerializer [FLINK-30935][connector/kafka] Add kafka serializers version check when using SimpleVersionedSerializer Mar 22, 2023
@chucheng92
Copy link
Member Author

@leonardBang @PatrickRen sorry to bother you,
can someone of you have a look please?

@chucheng92
Copy link
Member Author

@tzulitai hi, Gordon, can you help to take a look please?

@tzulitai tzulitai self-requested a review March 24, 2023 18:35
Copy link
Contributor

@tzulitai tzulitai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @chucheng92. The changes make sense, but I left a comment to make sure we do our due diligence in documenting historical schema versions. This would allow us to more confidently merge this change.

@@ -46,6 +46,15 @@ public byte[] serialize(KafkaCommittable state) throws IOException {

@Override
public KafkaCommittable deserialize(int version, byte[] serialized) throws IOException {
switch (version) {
case 1:
Copy link
Contributor

@tzulitai tzulitai Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know if there wasn't historical versions where the version number is 0 or some other number?

Is there existing migration IT tests that verify, with this change, we can still safely restore from previous versions?

In general, it looks like we're missing some documentation where we historically track what versions have existed before and what their corresponding schema is. I think having a class-level Javadoc to document this is already enough.

@chucheng92 do you think you'd be able to do this as part of this PR contribution? Basically, look at historical changes to confirm that this was indeed the only used version number + document its schema in the class-level Javadoc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tzulitai hi, Gordon. thanks for your reviewing! the version from 1 can be found at https://github.com/apache/flink/pull/16676/files#diff-decc6e90291b6f7c78efae7bd3223d218a1aba00bbbf53e4c696d63506e907cf. however, we can't always check PR to see if there is an old version or a version change. So I agree with you to add JavaDocs to record serialized version evolutions. I added it in the pr. WDYT?

@chucheng92
Copy link
Member Author

chucheng92 commented Jul 4, 2023

@tzulitai hi, Gordon. It's been a long time, please let me know if you have any concerns, I will fix it ASAP. thanks.

@MartijnVisser
Copy link
Contributor

@chucheng92 Can you please rebase your PR?

@chucheng92
Copy link
Member Author

@chucheng92 Can you please rebase your PR?

Yes, I have rebased it and passed ci. If you have time, pls help to review it again. thanks a lot.

@MartijnVisser
Copy link
Contributor

@mas-chen Do you want to do a review?

Copy link
Contributor

@mas-chen mas-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chucheng92 is this truly a problem? There is only one version of the Kafka state. We only make version changes if there is backward incompatible change to the state serializers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants