KAFKA-19767 - Send Share-Fetch one-node at a time for record_limit mode #20855
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
After KIP-1206, when
record_limitmode was introduced, we ideallywant no more than the #records in the
maxRecordsfield inShareFetchRequest.Currently, the client broadcasts the share fetch requests to all nodes
which host the leaders of the partitions that it is subscribed to.
The application thread would be woken up after the first response
arrives, but meanwhile the responses from other nodes could bring in
those many #records next and would wait in the buffer, that would mean
we are wasting the acquisition locks for these records which are
waiting.
Instead we would want to only send the next request when we poll
again.
PR aims to send the request to only 1 node at a time in record_limit
mode.
We are using partition-rotation on each poll so that no partition is
starved.
There were NCSS checkstyle errors in
ShareConsumeRequestManagerTest,so added a few refactors there to reduce the length.
Performance
of data happens in a partition), then we are seeing the performance is
almost the same as the current approach. But when we have lesser
consumers than the #partitions, then we see a performance regression as
client is waiting for a node to return a response before it can send the
next request.
record_limitmode for now,future work will be done to improve this area.