-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leaked RPC authentication error in polykey agent #376
Comments
I'm guessing its a race condition where both sides has authenticated but one hasn't updated it's state yet. This is very likely crashing the seed nodes right now. Though the seed nodes should auto restart when they crash like this. @aryanjassal work with brynley to go over the seed node logs and see if this error is being thrown before crashing. Also work with her to work out why the seed nodes don't auto restart when they fail. |
Could these leaked errors be the cause of the agent shutting down and not restarting properly? The container logs might shed more light to my conjecture. |
Oops, didn't see the previous message before posting mine. After investigating this with brynley, the seednodes were working yesterday, and attempting to find this one line from the logs might be a needle in a haystack kind of a problem, so I will go ahead and work on fixing this issue with the assumption that this is a cause of the seednode failure. |
The seed nodes are not working. One of them is still down and I had to restart one of them on Monday. |
After a quick discussion with Brian, the error originates from the RPC middleware layer. At the time of authentication, two calls are made individually by both the agents. After authenticating from both sides, the restricted commands can be used. If a RPC call is made while being unauthenticated, then this error is thrown, which crashes the program. We have a few options to deal with this.
I have discarded the first option, and will investigate the viability of the second and third option. The third option is more elegant, but would take more work to implement. The second option is easier to implement as it leverages existing work, but is not as efficient as the third option. After finalising a solution, I will start its implementation. |
Describe the bug
After running my agent for a while, it eventually crashed with the following log message.
This indicates that errors are leaking from their context and crashing the entire agent, which shouldn't happen.
To Reproduce
Expected behavior
The agent shouldn't crash. It should either be handled internally, or printed to stdout/stderr as a log message.
Screenshots
Platform
Additional context
Notify maintainers
@tegefaulkes
The text was updated successfully, but these errors were encountered: