Skip to content

test: fix server shutdown logic in IntegrationTestServerImpl destructor#44010

Open
kralicky wants to merge 1 commit intoenvoyproxy:mainfrom
kralicky:kralicky/fix-integration-server-shutdown
Open

test: fix server shutdown logic in IntegrationTestServerImpl destructor#44010
kralicky wants to merge 1 commit intoenvoyproxy:mainfrom
kralicky:kralicky/fix-integration-server-shutdown

Conversation

@kralicky
Copy link
Contributor

Commit Message: test: fix server shutdown logic in IntegrationTestServerImpl destructor

Additional Description:
This fixes the logic in ~IntegrationTestServerImpl() that determines whether or not the test server needs to be shut down during test cleanup. Previously, it is assumed that if the server was started, it will not have been shut down during the test, however this may not always be the case.

For example, I needed to write an integration test to exercise a callback registered to the ServerLifecycleNotifier's ShutdownExit stage. In the body of the integration test, the following code would trigger a manual server shutdown:

test_server_->server().dispatcher().post([this] {
  test_server_->server().shutdown();
});

When the server is shut down during the test, createAndRunEnvoyServer() running in a background thread exits after server.run() returns, causing IntegrationTestServerImpl::server_ to now point to garbage. It also signals the server_gone_ notification.

When the test completes and is torn down, ~IntegrationTestServerImpl() does not check to see if server_gone_ was already notified during the test, and attempts to call server_->dispatcher() and crashes.

Risk Level: moderate (this code path is probably reached in every integration test)

Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

Signed-off-by: Joe Kralicky <joekralicky@gmail.com>
@kralicky
Copy link
Contributor Author

Now that I think about it, I think this might still be prone to a data race, since observing server_gone_.HasBeenNotified()==false doesn't necessarily imply that the server isn't already destroyed. e.g. the background thread could be in the process of destroying the server between server.run() returning and server_gone_.Notify() being called after the scope above is exited. Maybe the best fix here would be to refactor such that we avoid having to call server_->dispatcher()... from within the destructor at all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant