Skip to content

DRIVERS-2884 Avoid connection churn when operations timeout #1675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 45 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
0f12706
DRIVERS-2884 Add connection churn spec tests
prestonvasquez Oct 14, 2024
fe18120
DRIVERS-2884 Update json
prestonvasquez Oct 14, 2024
05cc88b
DRIVERS-2884 Clean up spec tests
prestonvasquez Oct 30, 2024
98c2a73
Update CMAP to include foreground read
prestonvasquez Oct 30, 2024
4827995
Update changelog
prestonvasquez Oct 30, 2024
234b729
Add justification for CMAP update
prestonvasquez Oct 30, 2024
ccfbcf1
Remove unecessary example
prestonvasquez Oct 30, 2024
fed567b
Use consistent keys
prestonvasquez Oct 30, 2024
8840be4
Update timeouts
prestonvasquez Nov 7, 2024
c1bee3b
DRIVERS-2884 Resolve merge conflicts
prestonvasquez Apr 22, 2025
c0e5aee
DRIVERS-2884 Update pending response unified spec tests
prestonvasquez Apr 22, 2025
dde9e22
DRIVERS-2884 Add UML and update wording
prestonvasquez Apr 23, 2025
5e0305a
DRIVERS-2884 Remove uneeded text from code snippet
prestonvasquez Apr 23, 2025
496724c
DRIVERS-2884 Add prose tests
prestonvasquez Apr 23, 2025
258edf8
DRIVERS-2884 Clean up presentation
prestonvasquez Apr 23, 2025
d217d10
DRIVERS-2884 Add logs and events
prestonvasquez Apr 24, 2025
cc8aec0
DRIVERS-2884 Add log part
prestonvasquez Apr 24, 2025
3d98039
DRIVERS-2884 Add Q&A section
prestonvasquez Apr 25, 2025
07e75bd
DRIVERS-2884 Add changelog
prestonvasquez Apr 25, 2025
8d9e71b
DRIVERS-2884 Fix Markdown failures
prestonvasquez Apr 25, 2025
5c68f77
DRIVERS-2884 Update schema
prestonvasquez Apr 25, 2025
e2653cb
DRIVERS-2884 Update schema w/ new connection events
prestonvasquez Apr 25, 2025
00aa620
DRIVERS-2884 Remove additional properties
prestonvasquez Apr 25, 2025
b29d6cc
DRIVERS-2884 Remove ignoring extra events
prestonvasquez Apr 25, 2025
b04b340
DRIVERS-2884 Clean up tests
prestonvasquez Apr 25, 2025
40b302c
DRIVERS-2884 Uncapitalize D in ID
prestonvasquez Apr 25, 2025
4348b44
DRIVERS-2884 Another ID cleanup
prestonvasquez Apr 25, 2025
dd0dbe9
DRIVERS-2884 Remove the word write
prestonvasquez Apr 25, 2025
e43c466
DRIVERS-2884 Remove the word write
prestonvasquez Apr 25, 2025
89754ce
Add punctuation
prestonvasquez Apr 28, 2025
bc893c6
Merge branch 'master' into DRIVERS-2884
prestonvasquez Apr 30, 2025
a3c00a4
DRIVERS-2884 Update schema latest
prestonvasquez Apr 30, 2025
6a663eb
DRIVERS-2884 Clarify schema bump
prestonvasquez Apr 30, 2025
9500fd5
DRIVERS-2884 Add pending response state
prestonvasquez May 5, 2025
5f3726b
DRIVERS-2884 Move logging tests to csot-specific file
prestonvasquez May 5, 2025
d286786
DRIVERS-2884 Generate connection-logging-csot.json
prestonvasquez May 5, 2025
9c5b33a
DRIVERS-2884 Fix bug; add test super section
prestonvasquez May 5, 2025
7bd5c00
DRIVERS-2884 Add commandName: ping
prestonvasquez May 8, 2025
cc9ec5c
DRIVERS-2884 Clarify behavior for exhaust cursors
prestonvasquez May 8, 2025
a397306
DRIVERS-2884 Account for both pull and push i/o patterns
prestonvasquez May 9, 2025
ef75645
DRIVERS-2884 Update duration commentary
prestonvasquez May 9, 2025
b5c1202
DRIVERS-2884 Ensure all branches are tested
prestonvasquez May 9, 2025
c785d0a
DRIVERS-2884 Make Q&A read/receive agnostic
prestonvasquez May 9, 2025
1097f63
DRIVERS-2884 Update retryable reads and writes to retry for network e…
prestonvasquez May 28, 2025
49b5508
DRIVERS-2884 Update 1.24 with open/close schema udpates
prestonvasquez May 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
777 changes: 777 additions & 0 deletions source/client-side-operations-timeout/tests/pending-response.json

Large diffs are not rendered by default.

411 changes: 411 additions & 0 deletions source/client-side-operations-timeout/tests/pending-response.yml

Large diffs are not rendered by default.

Large diffs are not rendered by default.

58 changes: 58 additions & 0 deletions source/connection-monitoring-and-pooling/tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,64 @@ The following tests have not yet been automated, but MUST still be tested:
5. When a check out attempt fails because connection set up throws an error, assert that a ConnectionCheckOutFailedEvent
with reason="connectionError" is emitted.

### Pending Response

If a connection with a pending response is idle for > 3 seconds, then drivers are expected to perform an aliveness check
by attempting a non-blocking read of 1 byte from the inbound TCP buffer. The following two cases test both a successful
read and a failed one.

#### Connection Aliveness Check Fails

1. Initialize a TCP listener to simulate the server-side behavior. The listener should write at least 5 bytes to the
connection to prevent size-related errors (e.g. `0x01, 0x02, 0x03, 0x04, 0x05, 0xFF`). The response should be
delayed by 2x the size of the socket timeout.
2. Implement a monitoring mechanism to capture the `ConnectionPendingResponseStarted`,
`ConnectionPendingResponseFailed`, and `ConnectionClosed` events.
3. Instantiate a connection pool using the mock listener’s address, ensuring readiness without error. Attach the event
monitor to observe the connection’s state.
4. Check out a connection from the pool and initiate a read operation with an appropriate socket timeout (e.g, 10ms)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is related to my other comment about a less socket-api specific implementation. Is it possible to write these tests in a way that doesn't require explicit use of a read API? Node's connection layer doesn't expose a read method method, we only expose command() which performs write+read on the underlying socket.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Node only has access to a round-trip API? Even in the non-public API? Could you not just discard the write half in the mock listener?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interpreted "mock listener" as "mock server" but reading over this test more closely, that doesn't seem to be what you're asking for. Are you asking for some way of stubbing a socket and pushing chunks into the sockets buffer, instead of a mock server that the driver opens real TCP sockets to? That's possible in Node but I'd check with other drivers (I don't know if something like this exists yet in the specs, it may not be possible in other drivers).

Regardless, I'm not sure if these tests would work for the Node implementation I mentioned here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the word "mock" is redundant here. We just need a way to simulate the response portion of the round trip using a TCP listener. Do you have suggestions on how we can accommodate theses tests for Node? No other drivers indicated this would be an issue: https://mongodb.slack.com/archives/C72LB5RPV/p1746558337106509

that will trigger a timeout due to the artificial delay of 2x the socket timeout (step 1). Ensure that the read
operation returns a timeout error.
5. Check the connection back into the pool and sleep for 3 seconds so that the pending response state timestamp exceeds
the pending response timeout, forcing an aliveness check.
6. Check the connection out. The aliveness check should fail since no additional bytes were added after the delay in
step 1.
7. Verify that one event for each `ConnectionPendingResponseStarted` and `ConnectionPendingResponseFailed` was emitted.
Also verify that the fields were correctly set for each event. Verify that a `ConnectionClosed` event was emitted.

#### Connection Aliveness Check Succeeds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test that demonstrates that multiple aliveness checks might be required to fully read the response from the socket? I'm imagining the mock server emitting chunks of data every second for longer than 6s (2 * the static timeout). Each checkout should fail, but we'll continue to read

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s unclear to me what the goal of this would be. Are you just wanting to make sure the driver does not pre-maturely close the connection if the aliveness check succeeds? We could use event monitoring for that instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I want a test somewhere that demonstrates that drivers might need to span multiple checkout + pending responses, and that that process can take longer than the static timeout, before reading the full response from the connection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we defer this to the unified spec test and not further complicate the prose aliveness checks. Validating that a successful aliveness check does not result in a closed connection would imply that a subsequent check out would either perform an aliveness check or continue attempting to drain the buffer. LMK what you think.


1. Initialize a TCP listener to simulate the server-side behavior. The listener should write at least 5 bytes to the
connection to prevent size-related errors (e.g. `0x01, 0x02, 0x03, 0x04, 0x05, 0xFF`). The response should be delayed
by 2x the size of the socket timeout. Write at least 1 additional byte after the delay so that the aliveness
check succeeds (e.g. `0xAA`).
2. Implement a monitoring mechanism to capture the `ConnectionPendingResponseStarted`,
`ConnectionPendingResponseSucceeded`, and `ConnectionClosed` events.
3. Instantiate a connection pool using the mock listener’s address, ensuring readiness without error. Attach the event
monitor to observe the connection’s state.
4. Check out a connection from the pool and initiate a read operation with an appropriate socket timeout (e.g, 10ms)
that will trigger a timeout due to the artificial delay of 2x the socket timeout (step 1). Ensure that the read
operation returns a timeout error.
5. Check the connection back into the pool and sleep for 3 seconds so that the pending response state timestamp exceeds
the pending response timeout, forcing an aliveness check.
6. Check the connection out. The aliveness check should succeed since no additional bytes were added after the delay in
step 1.
7. Verify that one event for each `ConnectionPendingResponseStarted` and `ConnectionPendingResponseSucceeded` was
emitted. Also verify that the fields were correctly set for each event. Verify that a `ConnectionClosed` event was
not emitted.

#### Exhaust Cursors

Drivers that support the `exhaustAllowed` `OP_MSG` bit flag must ensure that responses which contain `moreToCome` will
not result in a connection being put into a "pending response" state. Drivers that don't support this behavior can
skip this prose test.

1. Configure a failpoint to block `getMore` for 500ms.
2. Insert > 2 records into the collection.
2. Create an exhaust cursor using `find` and iterate one `getMore` using `batchSize=1`.
3. Call a subsequent `getMore` on the exhaust cursor with a client-side timeout of 100ms.
4. Ensure that the `ConnectionClosed` event is emitted due to timeout.

## Logging Tests

Tests for connection pool logging can be found in the `/logging` subdirectory and are written in the
Expand Down
Loading
Loading