Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-32416] Fix flaky tests by ensuring test utilities produce records w… #79

Merged
merged 1 commit into from
Jan 17, 2024

Conversation

mas-chen
Copy link
Contributor

@mas-chen mas-chen commented Jan 17, 2024

…ith consistency and cleanup notify no more splits to ensure it is sent

I have spent the last few days troubleshooting this and found that most issues are related to the bounded mode tests. These problems don't surface locally and I suspect that they are more likely to occur with smaller/busy CI machines. Issues:

  1. Test records were not being produced to Kafka. I added the Kafka settings and refactored to the API to ensure all records are sent to all brokers.
  2. When I fixed this, I ran into some tests that would exceed the 50 minute CI. This is because the the no more splits signal was not sent properly. I refactored the code a bit to make it clearer.

At this commit, the tests are quite stable. 6/7 CI runs pass.
Screenshot 2024-01-16 at 10 13 43 PM

In one CI run that failed, I did see that there was a Flink error, indicating a lost source event. In separate commits, there were other instances that exceeded 50 minutes but this can be solved separately, see https://issues.apache.org/jira/browse/FLINK-34127.

Original issue that Max found after #44 was merged: https://github.com/apache/flink-connector-kafka/actions/runs/7504290046/job/20433204351

cc: @mxm @tzulitai @MartijnVisser

@@ -212,32 +214,27 @@ public void start() {

private void handleNoMoreSplits() {
if (Boundedness.BOUNDED.equals(boundedness)) {
enumContext.runInCoordinatorThread(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to use this API. The code already runs in coordinator thread

@MartijnVisser
Copy link
Contributor

The only thing I'm wondering about is that we actually should file a Jira for this; otherwise LGTM

@mxm
Copy link
Contributor

mxm commented Jan 17, 2024

Can we put this under the original JIRA issue? I think that should be ok since we haven't release in the meantime.

@MartijnVisser
Copy link
Contributor

Can we put this under the original JIRA issue? I think that should be ok since we haven't release in the meantime.

Yeah that makes sense for me too, as a follow-up to actually complete the Jira

Copy link
Contributor

@MartijnVisser MartijnVisser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM: I'll fix the commit message when merging and add it to the original Jira
Edit: I see that @mas-chen still wants to fix one thing, so I'll postpone merging this until he's done that himself :)

@mas-chen
Copy link
Contributor Author

mas-chen commented Jan 17, 2024

@MartijnVisser which thing to fix are you referring to? I think this PR is ready as-is if it blocking your other work. The ticket I'll need a few days to look into, probably get to it at the end of the week. Edit: If it wasn't clear, the ticket also affects the KafkaSource tests (not just DynamicKafkaSource ones).

I modified the commit since I had it open already!

…rds with consistency and cleanup notify no more splits to ensure it is sent
@mas-chen mas-chen changed the title [hotfix] Fix flaky tests by ensuring test utilities produce records w… [FLINK-32416] Fix flaky tests by ensuring test utilities produce records w… Jan 17, 2024
@MartijnVisser
Copy link
Contributor

which thing to fix are you referring to?

I misread #79 (comment) as that you still wanted to fix something 🙈

I'll merge it after this CI run, thanks :)

@MartijnVisser MartijnVisser merged commit cdfa328 into apache:main Jan 17, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants