Skip to content

Commit 26b9f3d

Browse files
committed
Merge #159: bugfix: Do not lock EventLoop::mutex after EventLoop is done
48d01bc bugfix: Do not lock EventLoop::mutex after EventLoop is done (Ryan Ofsky) Pull request description: Currently, there are two cases where code may attempt to lock `EventLoop::mutex` after the `EventLoop::loop()` method has finished running. This is not a problem by itself, but is a problem if there is external code which deletes the `EventLoop` object immediately after the method returns, which is the case in Bitcoin Core. Both cases of locking after the loop method returns happen due to uses of the `Unlock()` utility function which unlocks a mutex temporarily, runs a callback function and relocks it. The first case happens in `EventLoop::removeClient` when the `EventLoop::done()` condition is reached and it calls `Unlock()` in order to be able to call `write(post_fd, ...)` without blocking and wake the EventLoop thread. In this case, since `EventLoop` execution is done there is not really any point to using `Unlock()` and relocking, so the code is changed to use a simple `lock.unlock()` call and not try to relock. The second case happens in `EventLoop::post` where `Unlock()` is used in a similar way, and depending on thread scheduling (if the thread is delayed for a long time before relocking) can result in failing to relock `EventLoop::m_mutex` after calling `write()`. In this case, since relocking the mutex is actually necessary for the code that follows, the fix is different: new `addClient`/`removeClient` calls are added to this code, so the `EventLoop::loop()` function will never exit while the `post()` function is waiting. These two changes are being labeled as a bugfix even though there is not technically a bug here in the libmultiprocess code or API. The `EventLoop` object itself was safe before these changes as long as outside code waited for `EventLoop` methods to finish executing before deleting the `EventLoop` object. Originally the multiprocess code added in bitcoin/bitcoin#10102 did this, but later as more features were added for binding and connecting to unix sockets, and unit tests were added which would immediately delete the `EventLoop` object after `EventLoop::loop()` returned, it meant this code could now start failing to relock after unlocking. A previous attempt was made to fix this bug in bitcoin/bitcoin#31815 outside of libmultiprocess code. But it only addressed the first case of a failing `Unlock()` in `EventLoop::removeClient()`, not the second case in `EventLoop::post()`. This first case described above was not observed but is theoretically possible. The second case happens intermittently on macos and was reported #154 Top commit has no ACKs. Tree-SHA512: 378d360de84b40f0ddcf2ebc5e11fb74adf6d133e94f148ebb7e0c9926836f276ac79ed1d8cb1a559a7aa756939071e8d252d4f03fe6816760807857fba646cb
2 parents 4089907 + 48d01bc commit 26b9f3d

File tree

2 files changed

+10
-7
lines changed

2 files changed

+10
-7
lines changed

include/mp/proxy-io.h

+1-1
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,7 @@ class EventLoop
170170

171171
//! Add/remove remote client reference counts.
172172
void addClient(std::unique_lock<std::mutex>& lock);
173-
void removeClient(std::unique_lock<std::mutex>& lock);
173+
bool removeClient(std::unique_lock<std::mutex>& lock);
174174
//! Check if loop should exit.
175175
bool done(std::unique_lock<std::mutex>& lock);
176176

src/mp/proxy.cpp

+9-6
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,7 @@ void EventLoop::post(const std::function<void()>& fn)
225225
return;
226226
}
227227
std::unique_lock<std::mutex> lock(m_mutex);
228+
addClient(lock);
228229
m_cv.wait(lock, [this] { return m_post_fn == nullptr; });
229230
m_post_fn = &fn;
230231
int post_fd{m_post_fd};
@@ -233,21 +234,23 @@ void EventLoop::post(const std::function<void()>& fn)
233234
KJ_SYSCALL(write(post_fd, &buffer, 1));
234235
});
235236
m_cv.wait(lock, [this, &fn] { return m_post_fn != &fn; });
237+
removeClient(lock);
236238
}
237239

238240
void EventLoop::addClient(std::unique_lock<std::mutex>& lock) { m_num_clients += 1; }
239241

240-
void EventLoop::removeClient(std::unique_lock<std::mutex>& lock)
242+
bool EventLoop::removeClient(std::unique_lock<std::mutex>& lock)
241243
{
242244
m_num_clients -= 1;
243245
if (done(lock)) {
244246
m_cv.notify_all();
245247
int post_fd{m_post_fd};
246-
Unlock(lock, [&] {
247-
char buffer = 0;
248-
KJ_SYSCALL(write(post_fd, &buffer, 1)); // NOLINT(bugprone-suspicious-semicolon)
249-
});
248+
lock.unlock();
249+
char buffer = 0;
250+
KJ_SYSCALL(write(post_fd, &buffer, 1)); // NOLINT(bugprone-suspicious-semicolon)
251+
return true;
250252
}
253+
return false;
251254
}
252255

253256
void EventLoop::startAsyncThread(std::unique_lock<std::mutex>& lock)
@@ -263,7 +266,7 @@ void EventLoop::startAsyncThread(std::unique_lock<std::mutex>& lock)
263266
const std::function<void()> fn = std::move(m_async_fns.front());
264267
m_async_fns.pop_front();
265268
Unlock(lock, fn);
266-
removeClient(lock);
269+
if (removeClient(lock)) break;
267270
continue;
268271
} else if (m_num_clients == 0) {
269272
break;

0 commit comments

Comments
 (0)