Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health-check of connection on check-in #29

Closed
tailhook opened this issue May 28, 2020 · 10 comments
Closed

Health-check of connection on check-in #29

tailhook opened this issue May 28, 2020 · 10 comments

Comments

@tailhook
Copy link

tailhook commented May 28, 2020

This library has a health-check of the connection on check out from pool. This solves the problem of connections being closed by server due to inactivity. But this is suboptimal for handling connections which are broken on checking in to pool.

In rust Drop of mobc::Connection can happen at any time. So for example if you have timeout of queries, the following may happen:

  1. redis.get sends GET\n to the server
  2. Then timeout happens before receiving response, which essentially drops the future holding the mobc::Connection (the response for GET can appear in the input queue shortly)

Usually this is fixed by setting a flag in the connection structure and dropping it on checking-in.

Surely, we can drop the connection in Connection::check if health check on checkout is enabled, but as far as I understand that check is specifically suited for sending ping to server rather than simple check of the connection flag on connection. So there is an incentive by the user to disable these health checks in some cases.

But with the scenario above disabling the health check might be security issue in the example above.

Even more, just current approach of sending PING and receiving PONG (for redis) is not enough, as technically there can be a pipeline which sends PING, GET (for whatever reason) and then times out. In this case, sending PING will receive PONG anyway, with the response_for_get, PONG left in the input queue. So next query returns wrong result resulting in the security issue. Presumably this is the issue in #28, and with current health-check it's only made more rare, not solved.

So connection has to raise a flag self.waiting_for_response and connection pool should drop all those connections on check-in (waiting for responses either on check-in or check-out results in a performance issue so should not be done).

@importcjj
Copy link
Owner

After the check-out of a pool connection, users will directly interact with the DB connection by the Deref mechanism, so it's hard for me to trace and mark what state the original conn is.

And I am turning to the maintainers of Redis-rs for help.

@tailhook
Copy link
Author

Yes, the point of this issue is that mobc provides appropriate hook to discard connections on check-in.

@tailhook
Copy link
Author

We also have to investigate how this works for postgres. Without proposed changes, it either has the same bug as redis or will error on the first request after checking out connection (latter is better as it isn't a security issue, but is still suboptimal).

@importcjj
Copy link
Owner

Ok, I added a new hook to do some cleaning works on the check-in, but it defaults to be disabled.

@tailhook
Copy link
Author

tailhook commented Jun 1, 2020

Thanks. I don't understand the motivation for it to be opt-in, though. I believe it should be always enabled (i.e. there should be no configuration to disable it). This check should be cheap and ignoring it only degrades the performance and stability of the application. (Comparing to health check on check-in which might be slow as it sends things to the network and is safe to ignore if your connections are never idle for too long).

@importcjj
Copy link
Owner

Since I chose to use an async hook at last, but I don't want to pay for the async transform cost. Do you think it is necessary to support async in this hook? Or it should just return boolean like what r2d2 does.

@tailhook
Copy link
Author

tailhook commented Jun 1, 2020

I like the idea to restrict it to sync code. Async here is a large monitoring issue, it's harder to account tasks spawned by the destructor in case they are stuck for some reason.

@importcjj
Copy link
Owner

Hey! Sorry for late.

I have made some changes to the code, you may take a look at it.

#33

Suggestions for revision are welcome.

@tailhook
Copy link
Author

tailhook commented Jun 2, 2020

Suggested a different name in PR. Other than that, looks good. Thanks!

@importcjj
Copy link
Owner

Thank you as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants