-
Notifications
You must be signed in to change notification settings - Fork 244
DRIVERS-2884 Avoid connection churn when operations timeout #1675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
0f12706
fe18120
05cc88b
98c2a73
4827995
234b729
ccfbcf1
fed567b
8840be4
c1bee3b
c0e5aee
dde9e22
5e0305a
496724c
258edf8
d217d10
cc8aec0
3d98039
07e75bd
8d9e71b
5c68f77
e2653cb
00aa620
b29d6cc
b04b340
40b302c
4348b44
dd0dbe9
e43c466
89754ce
bc893c6
a3c00a4
6a663eb
9500fd5
5f3726b
d286786
9c5b33a
7bd5c00
cc9ec5c
a397306
ef75645
b5c1202
c785d0a
1097f63
49b5508
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,6 +21,64 @@ The following tests have not yet been automated, but MUST still be tested: | |
5. When a check out attempt fails because connection set up throws an error, assert that a ConnectionCheckOutFailedEvent | ||
with reason="connectionError" is emitted. | ||
|
||
### Pending Response | ||
|
||
If a connection with a pending response is idle for > 3 seconds, then drivers are expected to perform an aliveness check | ||
by attempting a non-blocking read of 1 byte from the inbound TCP buffer. The following two cases test both a successful | ||
read and a failed one. | ||
|
||
#### Connection Aliveness Check Fails | ||
|
||
1. Initialize a TCP listener to simulate the server-side behavior. The listener should write at least 5 bytes to the | ||
connection to prevent size-related errors (e.g. `0x01, 0x02, 0x03, 0x04, 0x05, 0xFF`). The response should be | ||
delayed by 2x the size of the socket timeout. | ||
2. Implement a monitoring mechanism to capture the `ConnectionPendingResponseStarted`, | ||
`ConnectionPendingResponseFailed`, and `ConnectionClosed` events. | ||
3. Instantiate a connection pool using the mock listener’s address, ensuring readiness without error. Attach the event | ||
monitor to observe the connection’s state. | ||
4. Check out a connection from the pool and initiate a read operation with an appropriate socket timeout (e.g, 10ms) | ||
that will trigger a timeout due to the artificial delay of 2x the socket timeout (step 1). Ensure that the read | ||
operation returns a timeout error. | ||
5. Check the connection back into the pool and sleep for 3 seconds so that the pending response state timestamp exceeds | ||
the pending response timeout, forcing an aliveness check. | ||
6. Check the connection out. The aliveness check should fail since no additional bytes were added after the delay in | ||
step 1. | ||
7. Verify that one event for each `ConnectionPendingResponseStarted` and `ConnectionPendingResponseFailed` was emitted. | ||
Also verify that the fields were correctly set for each event. Verify that a `ConnectionClosed` event was emitted. | ||
|
||
#### Connection Aliveness Check Succeeds | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we add a test that demonstrates that multiple aliveness checks might be required to fully read the response from the socket? I'm imagining the mock server emitting chunks of data every second for longer than 6s (2 * the static timeout). Each checkout should fail, but we'll continue to read There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It’s unclear to me what the goal of this would be. Are you just wanting to make sure the driver does not pre-maturely close the connection if the aliveness check succeeds? We could use event monitoring for that instead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes - I want a test somewhere that demonstrates that drivers might need to span multiple checkout + pending responses, and that that process can take longer than the static timeout, before reading the full response from the connection. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest we defer this to the unified spec test and not further complicate the prose aliveness checks. Validating that a successful aliveness check does not result in a closed connection would imply that a subsequent check out would either perform an aliveness check or continue attempting to drain the buffer. LMK what you think. |
||
|
||
1. Initialize a TCP listener to simulate the server-side behavior. The listener should write at least 5 bytes to the | ||
connection to prevent size-related errors (e.g. `0x01, 0x02, 0x03, 0x04, 0x05, 0xFF`). The response should be delayed | ||
by 2x the size of the socket timeout. Write at least 1 additional byte after the delay so that the aliveness | ||
check succeeds (e.g. `0xAA`). | ||
2. Implement a monitoring mechanism to capture the `ConnectionPendingResponseStarted`, | ||
`ConnectionPendingResponseSucceeded`, and `ConnectionClosed` events. | ||
3. Instantiate a connection pool using the mock listener’s address, ensuring readiness without error. Attach the event | ||
monitor to observe the connection’s state. | ||
4. Check out a connection from the pool and initiate a read operation with an appropriate socket timeout (e.g, 10ms) | ||
that will trigger a timeout due to the artificial delay of 2x the socket timeout (step 1). Ensure that the read | ||
operation returns a timeout error. | ||
5. Check the connection back into the pool and sleep for 3 seconds so that the pending response state timestamp exceeds | ||
the pending response timeout, forcing an aliveness check. | ||
6. Check the connection out. The aliveness check should succeed since no additional bytes were added after the delay in | ||
step 1. | ||
7. Verify that one event for each `ConnectionPendingResponseStarted` and `ConnectionPendingResponseSucceeded` was | ||
emitted. Also verify that the fields were correctly set for each event. Verify that a `ConnectionClosed` event was | ||
not emitted. | ||
|
||
#### Exhaust Cursors | ||
|
||
Drivers that support the `exhaustAllowed` `OP_MSG` bit flag must ensure that responses which contain `moreToCome` will | ||
not result in a connection being put into a "pending response" state. Drivers that don't support this behavior can | ||
skip this prose test. | ||
|
||
1. Configure a failpoint to block `getMore` for 500ms. | ||
2. Insert > 2 records into the collection. | ||
2. Create an exhaust cursor using `find` and iterate one `getMore` using `batchSize=1`. | ||
3. Call a subsequent `getMore` on the exhaust cursor with a client-side timeout of 100ms. | ||
4. Ensure that the `ConnectionClosed` event is emitted due to timeout. | ||
|
||
## Logging Tests | ||
|
||
Tests for connection pool logging can be found in the `/logging` subdirectory and are written in the | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is related to my other comment about a less socket-api specific implementation. Is it possible to write these tests in a way that doesn't require explicit use of a
read
API? Node's connection layer doesn't expose aread
method method, we only exposecommand()
which performs write+read on the underlying socket.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So Node only has access to a round-trip API? Even in the non-public API? Could you not just discard the write half in the mock listener?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I interpreted "mock listener" as "mock server" but reading over this test more closely, that doesn't seem to be what you're asking for. Are you asking for some way of stubbing a socket and pushing chunks into the sockets buffer, instead of a mock server that the driver opens real TCP sockets to? That's possible in Node but I'd check with other drivers (I don't know if something like this exists yet in the specs, it may not be possible in other drivers).
Regardless, I'm not sure if these tests would work for the Node implementation I mentioned here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the word "mock" is redundant here. We just need a way to simulate the response portion of the round trip using a TCP listener. Do you have suggestions on how we can accommodate theses tests for Node? No other drivers indicated this would be an issue: https://mongodb.slack.com/archives/C72LB5RPV/p1746558337106509