Support data limit when reading a batch with TopicReaderSync#431
Support data limit when reading a batch with TopicReaderSync#431JasonRammoray wants to merge 1 commit intoydb-platform:mainfrom
Conversation
… a sync topic reader
| max_messages: typing.Union[int, None] = None, | ||
| max_bytes: typing.Union[int, None] = None, | ||
| ) -> Union[PublicBatch, None]: | ||
| all_amount = float("inf") |
There was a problem hiding this comment.
Why do you need all_amount as float const?
There was a problem hiding this comment.
The rationale is that by default we have no limits on a data flow.
I'm not sure if UInt64 max value is sufficient enough, therefore, I chose infinity, which happens to be a float.
| max_bytes = all_amount | ||
|
|
||
| is_batch_set = batch is not None | ||
| is_msg_limit_set = max_messages < all_amount |
There was a problem hiding this comment.
why do you need all_amount instead check max_messages is not None?
There was a problem hiding this comment.
Because max_messages is being set to all_amount (up above) in case if it hasn't been provided (e.g. it's None).
|
|
||
| return self._caller.unsafe_call_with_future(self._async_reader.wait_message()) | ||
|
|
||
| def _make_batch_slice( |
There was a problem hiding this comment.
IMPORTANT
After applying the function, a caller will lose messages that have been trimmed from the batch and will not see these messages in the read session. A server does not allow to skip messages during commit. This can cause problems:
- If the caller commits messages with ack, the software will hang up forever (because the server will wait for skipped messages before ack the commit).
- If the caller commits messages without ack. After reconnecting all messages after the last successfully commit (first batch with cut messages) will be re-read. A log of extra work is required to re-read these messages and real progress will be very slow.
- If the progress is saved on the user's side and messages are not committed to the SDK, the will be lost and cannot be recovered.
There was a problem hiding this comment.
Ok, I see, though I'm not quite sure I fully understand the path to a solution.
What was the expected approach to take?
|
already implemented |
Allow a client to control the amount of data it receives, when reading a batch through
TopicReaderSync.Pull request type
Please check the type of change your PR introduces:
What is the current behavior?
TopicReaderSync.receive_batchignoresmax_messagesandmax_bytesparameters, which means a client has no control over the amount of received data.Issue Number: 365
What is the new behavior?
TopicReaderSync.receive_batchnow takesmax_messagesandmax_bytesinto account.Other information
Decisions made:
_commit_get_partition_session, and rather copy_partition_sessionfrom the batch to a new (sliced) batch.max_messages, normax_byteswere provided.