DRIVERS-3446 - Add prose tests to verify correct retry behavior when a mix of overload and non-overload errors are encountered#1924
Conversation
…a mix of overload and non-overload errors are encountered DRIVERS-3427: Remove token bucket disclaimer from backpressure speci… (mongodb#1923) DRIVERS-3436 - Refine `withTransaction` timeout error wrapping semantics and label propagation in spec and prose tests (mongodb#1920) Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com> DRIVERS-3325 allow keyAltName in encryptedFields spec test (mongodb#1853)
| two runs. | ||
|
|
||
| #### Test 3: Overload Errors are Retried a Maximum of MAX_RETRIES times | ||
| #### Test 2: Overload Errors are Retried a Maximum of MAX_RETRIES times |
There was a problem hiding this comment.
We should not change the test numbers. They were, I believe, on purpose left out unchanged when one of the tests was previously removed - drivers may refer to the prose tests either by their numbers, or by linking to a specific test.
Changing the number or the test name breaks those references. It is especially bad when a test is referred to by its number - the old test 3 is now 2, but there is still a test with the number 3, and if a driver refers to the 3rd test, it is not obvious that the reference is broken.
The same applies to the test below.
|
|
||
| - When a failed attempt is retried, backoff MUST be applied if and only if the error is an overload error. | ||
| - If an overload error is encountered: | ||
| - If an overload error is encountered at any point during an operation's retry loop: |
There was a problem hiding this comment.
It's not operations but commands that the read/write retry policies, or the new overload retry policy prescribes the drivers to retry. Previously, the effort was made to make sure that client-backpressure.md correctly refers to either command or operation, given that they are not the same thing.
Let's replace "operation" with "command" here, because that's the correct term here and below in "Why override existing maximum number of retry attempt defaults for retryable reads and writes if an overload error is received at any point during an operation's retry loop?".
|
|
||
| - When a failed attempt is retried, backoff MUST be applied if and only if the error is an overload error. | ||
| - If an overload error is encountered: | ||
| - If an overload error is encountered at any point during an operation's retry loop: |
There was a problem hiding this comment.
What is the "at any point during an operation's retry loop" part intended to clarify? Nowhere else in client-backpressure.md the "at any point of the command retry loop" qualification is made, because it is implied and seems obvious. What is specific about this item that requires the qualification?
Hopefully, another reviewer may see this question as useful, get to the answer, and then act on it however it seems appropriate.
| - If an overload error is encountered at any point during an operation's retry loop: | ||
| - Regardless of whether CSOT is enabled or not, the maximum number of retries for any retry policy becomes | ||
| `MAX_RETRIES`. | ||
| `MAX_RETRIES`. This includes retryable non-overload errors following or preceding the encountered overload error. |
There was a problem hiding this comment.
MAX_RETRIES limits attempts ("the execution attempt number (starting with 0)"). attempt is currently documented in the "Overload retry policy" section as
includes retries for errors that are not overload errors (this might include attempts under other retry policies, see Interactions with Other Retry Policies).
It appears that the addition made here tries to duplicate the above information, but the way it is done does not seem like correct English to me (though I am not a native speaker): the subject in the new sentence is "this", but I fail to find anything in the previous sentence that could act as the thing referred to by "this". Maybe the proposed sentence could be reformulated as
Note that
MAX_RETRIESlimitsattempts, andattemptsincludes retries for errors that are not overload errors.
There was a problem hiding this comment.
Another idea:
When reviewing the PR that introduced the requirements under the "If an overload error is encountered" condition, I suggested to formulate the two them by explicitly saying something like "extends the retry attempt limit..." and "forbids retry attempts after...". The current wording was chosen because it was much shorter than what I envisioned, and was still correct. However, given that it turned out to be practically difficult to understand, we may consider reformulating these requirements using the aforementioned wordings (if we do that, great care should be taken to formulate them correctly, which, historically, is difficult to do).
DRIVERS-3446
Please complete the following before merging:
clusters).