KAFKA-17904: Flaky testMultiConsumerSessionTimeoutOnClose #17789
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Here are some of my conclusions about this flaky test.
First of all, the reason for the failure of this test is due to TIMEOUT, the method
AbstractConsumerTest#validateGroupAssignment
timeout after waiting for 10 seconds. And it reproduced on my computer.AbstractConsumerTest#validateGroupAssignment
is used to check all the consumer's assignments meet expectations, the exception as below:I ran this junit test many times on my local computer after I added some logs. Then I found the timeout case is the GroupProtocol.CONSUMER mode. The CONSUMER mode maybe interact with the GroupCoordinator multiple times before reconciliation completed
The frequency of interaction is controlled by configuration
group.consumer.heartbeat.interval.ms
which default value is 5000ms. Those successful unit tests take at least 5 seconds to complete, so maybe we can reduce heartbeat interval.After I set
group.consumer.heartbeat.interval.ms
to 1000ms, this problem has not occurred again on my computer. And running this unit test has become more faster.