You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
bugfix: Do not lock EventLoop::mutex after EventLoop is done
Currently, there are two cases where code may attempt to lock `EventLoop::mutex`
after the `EventLoop::loop()` method has finished running and the `EventLoop` object
is potentially destroyed. Both cases happen due to uses of the `Unlock()` utility
function which unlocks a mutex temporarily, runs a callback function and relocks
it.
The first case happens in `EventLoop::removeClient` when the `EventLoop::done()`
condition is reach and it calls `Unlock()` in order to be able to call
`write(post_fd, ...)` without blocking and wake the EventLoop thread. In this
case, since `EventLoop` execution is done there is not really any point to using
`Unlock()` and relocking the mutex after calling `write()` so the code is
updated to just use a simple `lock.unlock()` call and permanently let go of the
lock and try to reacquire it.
The second case happens in `EventLoop::post` where `Unlock()` is also used in a
similar way, and depending on thread scheduling (if the post thread is delayed
for a long time before relocking) can result in relocking `EventLoop::m_mutex`
after calling `write()` to fail. In this case, since relocking the mutex is
actually necessary for the code that follows, the fix is different: new
`addClient`/`removeClient` calls are just added to this code, so the
`EventLoop::loop()` function will just not exit while the `post()` function is
waiting.
These two changes are being labeled as a bugfix even though there is not
technically a bug here in libmultiprocess code or API. The `EventLoop` object
itself was perfectly safe before these changes as long as outside code was
waited for `EventLoop` methods to finish executing before deleting the
`EventLoop` object. Originally the multiprocess code added in
bitcoin/bitcoin#10102 did this, but later as more
features were added to binding and connecting to unix sockets, and unit tests
were added which would immediately delete the `EventLoop` object after
`EventLoop::loop()` returned it meant these two methods could start failing due
to their uses of `Unlock()` depending on thread scheduling.
A previous attempt was made to fix this bug in
bitcoin/bitcoin#31815 outside of libmultiprocess code.
But it only addressed the first case of a failing `Unlock()` in
`EventLoop::removeClient`, not the second case in `EventLoop::post`.
This first case described above was not actually observed but is just
theoretically possible. The second case happens intermittently on macos and was
reported bitcoin-core#154
0 commit comments