Skip to content

DRIVERS-3446 - Add prose tests to verify correct retry behavior when a mix of overload and non-overload errors are encountered#1924

Open
NoahStapp wants to merge 4 commits intomongodb:masterfrom
NoahStapp:DRIVERS-3446
Open

DRIVERS-3446 - Add prose tests to verify correct retry behavior when a mix of overload and non-overload errors are encountered#1924
NoahStapp wants to merge 4 commits intomongodb:masterfrom
NoahStapp:DRIVERS-3446

Conversation

@NoahStapp
Copy link
Copy Markdown
Contributor

@NoahStapp NoahStapp commented Apr 14, 2026

DRIVERS-3446

Please complete the following before merging:

  • Is the relevant DRIVERS ticket in the PR title?
  • Update changelog.
  • Test changes in at least one language driver. Python.
  • Test these changes against all server versions and topologies (including standalone, replica set, and sharded
    clusters).

…a mix of overload and non-overload errors are encountered

DRIVERS-3427: Remove token bucket disclaimer from backpressure speci… (mongodb#1923)

DRIVERS-3436 - Refine `withTransaction` timeout error wrapping semantics and label propagation in spec and prose tests (mongodb#1920)

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>

DRIVERS-3325 allow keyAltName in encryptedFields spec test (mongodb#1853)
@NoahStapp NoahStapp requested review from a team as code owners April 14, 2026 16:20
@NoahStapp NoahStapp requested review from comandeo-mongo, isabelatkinson, katcharov and papafe and removed request for a team April 14, 2026 16:20
@katcharov katcharov requested review from stIncMale and removed request for katcharov April 14, 2026 17:11
@NoahStapp NoahStapp removed the request for review from isabelatkinson April 14, 2026 17:44
@katcharov katcharov removed the request for review from stIncMale April 14, 2026 22:49
Copy link
Copy Markdown
Member

@stIncMale stIncMale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't be able to come back to this PR review, which means, my comments/suggestions should not block the PR. But I hope both the author and the other reviewers will take them into account.

two runs.

#### Test 3: Overload Errors are Retried a Maximum of MAX_RETRIES times
#### Test 2: Overload Errors are Retried a Maximum of MAX_RETRIES times
Copy link
Copy Markdown
Member

@stIncMale stIncMale Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not change the test numbers. They were, I believe, on purpose left out unchanged when one of the tests was previously removed - drivers may refer to the prose tests either by their numbers, or by linking to a specific test.

Changing the number or the test name breaks those references. It is especially bad when a test is referred to by its number - the old test 3 is now 2, but there is still a test with the number 3, and if a driver refers to the 3rd test, it is not obvious that the reference is broken.

The same applies to the test below.


- When a failed attempt is retried, backoff MUST be applied if and only if the error is an overload error.
- If an overload error is encountered:
- If an overload error is encountered at any point during an operation's retry loop:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not operations but commands that the read/write retry policies, or the new overload retry policy prescribes the drivers to retry. Previously, the effort was made to make sure that client-backpressure.md correctly refers to either command or operation, given that they are not the same thing.

Let's replace "operation" with "command" here, because that's the correct term here and below in "Why override existing maximum number of retry attempt defaults for retryable reads and writes if an overload error is received at any point during an operation's retry loop?".


- When a failed attempt is retried, backoff MUST be applied if and only if the error is an overload error.
- If an overload error is encountered:
- If an overload error is encountered at any point during an operation's retry loop:
Copy link
Copy Markdown
Member

@stIncMale stIncMale Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "at any point during an operation's retry loop" part intended to clarify? Nowhere else in client-backpressure.md the "at any point of the command retry loop" qualification is made, because it is implied and seems obvious. What is specific about this item that requires the qualification?

Hopefully, another reviewer may see this question as useful, get to the answer, and then act on it however it seems appropriate.

- If an overload error is encountered at any point during an operation's retry loop:
- Regardless of whether CSOT is enabled or not, the maximum number of retries for any retry policy becomes
`MAX_RETRIES`.
`MAX_RETRIES`. This includes retryable non-overload errors following or preceding the encountered overload error.
Copy link
Copy Markdown
Member

@stIncMale stIncMale Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_RETRIES limits attempts ("the execution attempt number (starting with 0)"). attempt is currently documented in the "Overload retry policy" section as

includes retries for errors that are not overload errors (this might include attempts under other retry policies, see Interactions with Other Retry Policies).

It appears that the addition made here tries to duplicate the above information, but the way it is done does not seem like correct English to me (though I am not a native speaker): the subject in the new sentence is "this", but I fail to find anything in the previous sentence that could act as the thing referred to by "this". Maybe the proposed sentence could be reformulated as

Note that MAX_RETRIES limits attempts, and attempts includes retries for errors that are not overload errors.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea:

When reviewing the PR that introduced the requirements under the "If an overload error is encountered" condition, I suggested to formulate the two them by explicitly saying something like "extends the retry attempt limit..." and "forbids retry attempts after...". The current wording was chosen because it was much shorter than what I envisioned, and was still correct. However, given that it turned out to be practically difficult to understand, we may consider reformulating these requirements using the aforementioned wordings (if we do that, great care should be taken to formulate them correctly, which, historically, is difficult to do).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants