You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Merge #159: bugfix: Do not lock EventLoop::mutex after EventLoop is done
48d01bc bugfix: Do not lock EventLoop::mutex after EventLoop is done (Ryan Ofsky)
Pull request description:
Currently, there are two cases where code may attempt to lock `EventLoop::mutex` after the `EventLoop::loop()` method has finished running. This is not a problem by itself, but is a problem if there is external code which deletes the `EventLoop` object immediately after the method returns, which is the case in Bitcoin Core. Both cases of locking after the loop method returns happen due to uses of the `Unlock()` utility function which unlocks a mutex temporarily, runs a callback function and relocks it.
The first case happens in `EventLoop::removeClient` when the `EventLoop::done()` condition is reached and it calls `Unlock()` in order to be able to call `write(post_fd, ...)` without blocking and wake the EventLoop thread. In this case, since `EventLoop` execution is done there is not really any point to using `Unlock()` and relocking, so the code is changed to use a simple `lock.unlock()` call and not try to relock.
The second case happens in `EventLoop::post` where `Unlock()` is used in a similar way, and depending on thread scheduling (if the thread is delayed for a long time before relocking) can result in failing to relock `EventLoop::m_mutex` after calling `write()`. In this case, since relocking the mutex is actually necessary for the code that follows, the fix is different: new `addClient`/`removeClient` calls are added to this code, so the `EventLoop::loop()` function will never exit while the `post()` function is waiting.
These two changes are being labeled as a bugfix even though there is not technically a bug here in the libmultiprocess code or API. The `EventLoop` object itself was safe before these changes as long as outside code waited for `EventLoop` methods to finish executing before deleting the `EventLoop` object. Originally the multiprocess code added in
bitcoin/bitcoin#10102 did this, but later as more features were added for binding and connecting to unix sockets, and unit tests were added which would immediately delete the `EventLoop` object after `EventLoop::loop()` returned, it meant this code could now start failing to relock after unlocking.
A previous attempt was made to fix this bug in
bitcoin/bitcoin#31815 outside of libmultiprocess code. But it only addressed the first case of a failing `Unlock()` in `EventLoop::removeClient()`, not the second case in `EventLoop::post()`.
This first case described above was not observed but is theoretically possible. The second case happens intermittently on macos and was reported #154
Top commit has no ACKs.
Tree-SHA512: 378d360de84b40f0ddcf2ebc5e11fb74adf6d133e94f148ebb7e0c9926836f276ac79ed1d8cb1a559a7aa756939071e8d252d4f03fe6816760807857fba646cb
0 commit comments