Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "disconnected: write(m_post_fd, &buffer, 1): Broken pipe" EventLoop shutdown races. #129

Merged
merged 1 commit into from
Jan 27, 2025

Conversation

ryanofsky
Copy link
Collaborator

@ryanofsky ryanofsky commented Jan 23, 2025

The EventLoop shutdown sequence has race conditions that could cause it to shut down right before a removeClient write(m_post_fd, ...) call is about to happen, if threads run in an unexpected order, and cause the write to fail.

Cases where this can happen are described in bitcoin/bitcoin#31151 (comment) and the possible causes are that (1) EventLoop::m_mutex is not used to protect some EventLoop member variables that are accessed from multiple threads, particularly (m_num_clients and m_async_fns) and (2) the removeClient method can do unnecessary write(m_post_fd, ...) calls before the loop is supposed to exit because it is not checking the m_async_fns.empty() condition, and these multiple write calls can make the event loop exit early and cause the final write() call to fail. In practice, only the second cause seems to actually trigger this bug, but PR fixes both possible causes.

Fixes bitcoin/bitcoin#31151

…op shutdown races.

The EventLoop shutdown sequence has race conditions that could cause it to shut
down right before a `removeClient` `write(m_post_fd, ...)` call is about to
happen, if threads run in an unexpected order, and cause the write to fail.

Cases where this can happen are described in
bitcoin/bitcoin#31151 (comment) and the
possible causes are that (1) `EventLoop::m_mutex` is not used to protect some
EventLoop member variables that are accessed from multiple threads,
particularly (`m_num_clients` and  `m_async_fns`) and (2) the `removeClient`
method can do unnecessary `write(m_post_fd, ...)` calls before the loop is
supposed to exit because it is not checking the `m_async_fns.empty()`
condition, and these multiple write calls can make the event loop exit early
and cause the final `write()` call to fail. In practice, only the second cause
seems to actually trigger this bug, but PR fixes both possible causes.

Fixes bitcoin/bitcoin#31151

Co-authored-by: Vasil Dimov <[email protected]>
@ryanofsky
Copy link
Collaborator Author

I've been testing this by running echoipc and stop_node in a loop with this fix and seeing if bitcoin/bitcoin#31151 errors or any others are triggered. I ran with a TSAN build and a normal non-TSAN build and so far haven't seen any errors, so this appears to be working.

@Sjors
Copy link
Member

Sjors commented Jan 24, 2025

Sjors added a commit to Sjors/bitcoin that referenced this pull request Jan 24, 2025
bitcoin-core/libmultiprocess#129: Fix "disconnected: write(m_post_fd, &buffer, 1): Broken pipe" EventLoop shutdown races.

Using rebased commit from: https://github.com/Sjors/libmultiprocess/tree/2025/01/disconnected
Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the OP:

the possible causes are that (1) EventLoop::m_mutex is not used to protect some EventLoop member variables that are accessed from multiple threads

Which variables?

if (m_post_fn) {
Unlock(lock, *m_post_fn);
m_post_fn = nullptr;
m_cv.notify_all();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously m_cv.notify_all() would have been called regardless of whether m_post_fn was nullptr or not. Now it will only be called if m_post_fn is not nullptr. Is this intended?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

Previously m_cv.notify_all() would have been called regardless of whether m_post_fn was nullptr or not. Now it will only be called if m_post_fn is not nullptr. Is this intended?

Yep, that's intended. The only reason to notify a condition variable is after updating some shared state, like changing a reference count or adding or removing something from a queue. The notify_all() call here is made right after setting m_post_fn to notify waiting threads that it changed. In the alternate case, no state is changing and there's nothing notify about.

src/mp/proxy.cpp Outdated
Comment on lines 270 to 280
bool EventLoop::done(std::unique_lock<std::mutex>& lock)
{
assert(m_num_clients >= 0);
return m_num_clients == 0 && m_async_fns.empty();
}
Copy link
Contributor

@vasild vasild Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of passing lock when it is not going to be used? Looks weird. Is this to show that m_mutex must be locked when calling this? If yes, then engaging the clang annotations would be more clear. Because now lock could be anything, it may not own a mutex or may own any mutex other than m_mutex. Maybe add these:

  bool EventLoop::done(std::unique_lock<std::mutex>& lock)
  {
      assert(m_num_clients >= 0);
+     assert(lock.mutex() == &m_mutex);
+     assert(lock.owns_lock());
      return m_num_clients == 0 && m_async_fns.empty();
  }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

Maybe add these:

Thanks, added.

You are right that code is written in this style to provide the similar benefits as clang annotations, so it is a compile error for outside callers access shared state without the needed lock. So few existing EventLoop methods take lock parameters, and this new method follows that pattern.

For background, the EventLoop code is basically some of the first code I wrote in bitcoin/bitcoin#10102 and I'm pretty sure it written before annotations were used much in bitcoin core. It probably would be a very good idea to add clang annotations here too, so that could be a followup, but in the meantime this convention prevents the common bug of forgetting to lock something before accessing shared state. (I think a bug where you would somehow find a different mutex and lock the wrong mutex would be more rare.)

@@ -188,29 +188,27 @@ void EventLoop::loop()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(posting in a random place coz github wouldn't allow me to comment on the destructor)

Not related to this PR, just an observation. The destructor is:

EventLoop::~EventLoop()
{
    if (m_async_thread.joinable()) m_async_thread.join();
    std::lock_guard<std::mutex> lock(m_mutex);
    KJ_ASSERT(m_post_fn == nullptr);
    KJ_ASSERT(m_async_fns.empty());
    KJ_ASSERT(m_wait_fd == -1);
    KJ_ASSERT(m_post_fd == -1);
    KJ_ASSERT(m_num_clients == 0);

locking m_mutex shouldn't be necessary because there cannot be two threads destroying the same object concurrently. Or if there are then this is a serious bug elsewhere that will cause a double free bug after the destructor completes. Also, there cannot be one thread destroying the object while another one is calling a method of that object.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

locking m_mutex shouldn't be necessary because there cannot be two threads destroying the same object concurrently. Or if there are then this is a serious bug elsewhere that will cause a double free bug after the destructor completes. Also, there cannot be one thread destroying the object while another one is calling a method of that object.

I think that is basically true and the lock is probably unnecessary here.

But there might be scenarios where threadsanitizer will throw errors without a lock, if it sees a variable being written here after being read from another thread with no synchronization in between. I am also not sure if clang thread safety annotations would complain without a lock here.

Also to be pedantic though I think "there cannot be one thread destroying the object while another one is calling a method of that object" is not true. The state in the object is still valid while the destructor method is running, and it should be fine for other threads to access it. The destructor method is not different from other methods in this respect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang thread safety is aware of this:

class A
{
public:
    A()
    {
        x = 1;
    }

    ~A()
    {
        x = 2;
    }

    void f()
    {
        x = 3;
    }

    std::mutex m;
    int x __attribute__((guarded_by(m)));
};

Generates only a warning for f():

t.cc:194:9: warning: writing variable 'x' requires holding mutex 'm' exclusively [-Wthread-safety-analysis]
  194 |         x = 3;
      |         ^
1 warning generated.

I assume the thread sanitizer is the same even though I couldn't demonstrate this easily.

The state in the object is still valid while the destructor method is running, and it should be fine for other threads to access it.

True, but then the problem arises that after the destructor finishes, without further synchronization, the memory that contains the object is freed. So it would be a serious read-after-free bug if another method is being executed while the destructor is being called because it could happen that the destructor finishes and the memory freed while the other method is still executing. Any mutexes within the object locked inside the destructor will be unlocked when the destructor finishes and before the memory is freed, so they cannot be used for synchronization with the other method.

Copy link
Collaborator Author

@ryanofsky ryanofsky Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

Clang thread safety is aware of this:

Thanks, this is interesting. I'm a little unsure how clang knows that no mutex is required to access the variable in the destructor. It doesn't seem like a totally safe assumption to make because I could easily think of cases where locking the mutex in a destructor would be required, and then clang wouldn't diagnose any problem. But maybe it is just assuming that typical destructors aren't blocking or doing anything complicated, and the mutex is about to be destroyed anyway, and that is enough proof of ownership.

I assume the thread sanitizer is the same even though I couldn't demonstrate this easily.

This could also be the case, but I wouldn't assume it. Thread sanitizer (as far as I know) does not treat destructors differently than other methods, and is operating at a lower level just looking at what mutexes are held during reads and write and what synchronization events are happening between reads and writes.

It's definitely possible you are right and dropping locks from this code is perfectly ok. So just for the sake of this PR I don't want to expand it by dropping an already existing lock, but, that could be a good change to make in a followup.

The state in the object is still valid while the destructor method is running, and it should be fine for other threads to access it.

True, but then the problem arises that after the destructor finishes, without further synchronization, the memory that contains the object is freed. So it would be a serious read-after-free bug if another method is being executed while the destructor is being called because it could happen that the destructor finishes and the memory freed while the other method is still executing. Any mutexes within the object locked inside the destructor will be unlocked when the destructor finishes and before the memory is freed, so they cannot be used for synchronization with the other method.

This is all true but it is ok for a destructor to block and wait for other events and other threads. And as long as it is waiting, it is ok for other threads to access the object and call its methods.

@@ -188,29 +188,27 @@ void EventLoop::loop()

kj::Own<kj::AsyncIoStream> wait_stream{
m_io_context.lowLevelProvider->wrapSocketFd(m_wait_fd, kj::LowLevelAsyncIoProvider::TAKE_OWNERSHIP)};
int post_fd{m_post_fd};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make a copy of m_post_fd? This code here is the only one that can change it. That is, for sure, at the end where close(post_fd) is called, post_fd will be equal to m_post_fd, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

Why make a copy of m_post_fd? This code here is the only one that can change it. That is, for sure, at the end where close(post_fd) is called, post_fd will be equal to m_post_fd, right?

This is just saving a copy of the variable while the lock is held. There are three methods (loop, post, removeClient) where m_post_fd is saved to a temporary variable before the lock is release and the descriptor is used.

As you say, this should never be necessary because the code that changes this variable can only run after the code that reads it, but at least when I started debugging this issue, thread sanitizer complained that this variable was being read and written to without synchronization, so I stopped accessing it without a lock to prevent this.

Going forward it might also be clearer to access m_post_fd with locks, and this might make it easier to add clang annotations too (not sure).

I wouldn't object to a followup accessing m_post_fd directly without locks, but I would want make sure this is compatible with the tools we use (tsan, thread safety annotations) given their limitations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just saving a copy of the variable while the lock is held.

Hmm, but the lock is not held when making the copy at line 191?

https://github.com/chaincodelabs/libmultiprocess/blob/0e4f88d3f9c66c678cd56154e11827d5c9ec9f9e/src/mp/proxy.cpp#L191-L214

191 int post_fd{m_post_fd};
...
213 KJ_SYSCALL(::close(post_fd));
214 std::unique_lock<std::mutex> lock(m_mutex);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

Hmm, but the lock is not held when making the copy at line 191?

Oh, that is a good point, I didn't notice that. For consistency it would be good to just acquire the lock whereever the variable is used, even when it not needed, or have some other more consistent rule. I think a good followup would be to add clang annotations and add the lock everywhere it says they are required, and remove it places where it isn't necessary and the annotations don't say it is required. This would add an unnecessary lock here as you are suggesting, and remove an unnecessary lock in the destructor as you have also suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use std::atomic_int for m_post_fd? If it does not have to be in sync with other variables...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

Would it be possible to use std::atomic_int for m_post_fd? If it does not have to be in sync with other variables...

Definitely possible. I think it wouldn't be a performance difference because in practice whenever m_post_fd is used some state update is being posted, and a lock is needed anyway to update the shared state. But there could be other reasons for preferring an atomic reasons like style reasons. Very possible writing this code in a different style could make it clearer. I think at very least it should be using clang thread safety annotations which would make intent more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think std::atomic wouldn't make a difference wrt performance but would make the code more clear.

Comment on lines +223 to +230
int post_fd{m_post_fd};
Unlock(lock, [&] {
char buffer = 0;
KJ_SYSCALL(write(m_post_fd, &buffer, 1));
KJ_SYSCALL(write(post_fd, &buffer, 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right that the problem with this code was that the loop() method may finish just before the write() call and set m_post_id to -1 and this may end up calling write(-1, ...)? If yes, then this change is not sufficient - if the same happens, then it will call write(n, ... where n is not -1 but is some number, a stale file descriptor, that has already been closed. Even worse, n may represent a newly opened file if the file descriptor numbers are reused by the OS.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

Am I right that the problem with this code was [...]

This was actually not the problem. The problem wasn't that the code was trying to write(-1, ...), although that could have been a possible outcome of the bug. The problem was not even that it was trying to write to a closed file descriptor, although that could have been another possible outcome the bug. Instead, the actual problem, whenever it happened, was that the write() call failed because it was writing to a pipe that was closed on the reading side.

The cause of all these problems (theoretical and actual) was that removeClient was only checking the m_num_clients == 0 condition to see if it should write to the pipe before this PR, instead of checking the checking m_num_clients == 0 && m_async_fns.empty() condition like it is doing after this PR. So previously, it could write to the pipe multiple times before the loop was actually supposed to exit, and depending on thread scheduling this could occasionally cause the loop to exit slightly too early, immediately before the final write() call was made, causing the write() call to fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so there is another problem in master which is not addressed in this PR:

  1. EventLoop::post() is executing in thread T1
  2. EventLoop::loop() is executing in thread T2
  3. Lets say m_post_id is 5
  4. post() / T1 makes a copy of m_post_fd, post_fd = 5 and unlocks m_mutex
  5. loop() / T2 does close(5) and assigns -1 to m_post_fd
  6. post() / T1 calls write(post_fd, ...) with post_fd being 5. At this point 5 does not refer to an opened file so the call fails, or worse, another file has been opened by the process and the number 5 has been reused for it and now 5 refers to an opened file which is different than the file associated with 5 when the copy post_fd was made (in 4. above)

Right? Or do I miss some high level context here (I am new to the internals of this library)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

Right? Or do I miss some high level context here (I am new to the internals of this library)?

Yes, the high level context that prevents what you described from happening is the addClient/removeClient reference counting. If you call post() without calling addClient(), the race condition you described will be present, but that is a higher level bug. The eventloop code is assuming that addClient and removeClient are being used whenever post() is used.

The bug being fixed in this PR is a lower-level bug that only happens during shutdown when the reference count is 0 and at the very last second, right very last removeClient call in the program is about to call write(m_post_fd), the event loop wakes up because of previous spurious writes, and closes the read end of the file descriptor (before closing m_post_fd) as well.

The cause of the spurious writes is that previously removeClient was only checking the m_num_clients == 0 condition so it was doing spurious write(m_post_fd) calls which could cause the event loop to wake up too early and exit slightly before it was supposed to exit. Now removeClient is checking m_num_clients == 0 && m_async_fns.empty() so it will not do spurious writes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, conclude this. I have another question. Posting here because it is related:

This pattern, used in post() and removeClient():

  1. lock mutex that protects m_post_fd
  2. make a local copy of m_post_id in post_fd
  3. unlock
  4. use the local copy: write(post_fd)

looks odd to me. Assuming the usual semantics that the mutex is protecting the variable - it is not allowed to change while holding the mutex and can change when not holding the mutex. This means that at the time of 4. m_post_fd could have changed after the unlock in 3. Above you said that a higher level logic implies that loop() which is the only one which changes m_post_fd, when it finishes will not finish while post() is executing. Ok, but then that means m_post_fd does not need any protection. Is this just trying to silence some false alarm by the thread sanitizer?

Copy link
Collaborator Author

@ryanofsky ryanofsky Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

Is this just trying to silence some false alarm by the thread sanitizer?

Not exactly. I think you keep basically asking the question again and again "Why is m_post_fd accessed with a lock? It doesn't need to be locked." And sorry I haven't given a more direct answer. It just has to do with behavior of thread sanitizer with earlier versions of this code. When I first enabled thread sanitizer to debug this issue, I fixed immediate tsan errors that happened running the code by restructuring the loop in the way it is currently written, but without the post_fd variable. But then when I ran the test in a loop, after around 20-30 minutes eventually there were tsan errors complaining both about the m_post_fd variable and the underlying file descriptor having conflicting read/write accesses from multiple threads without synchronization. So I added the int post_fd = m_post_fd variables, and that fixed the errors about the m_post_fd variable, but not the errors about the underlying file descriptor. Eventually I figured out the underlying problem which was spurious writes in removeClient and fixed that issue too, but I did not go back and remove the post_fd = m_post_fd variables.

I did not feel any need to remove these variables because I was thinking of the m_post_fd, m_num_clients, and m_async_fns as a unit and wanted to follow a simple rule of using a single lock to protect them. You're completely right though that a lock is probably not needed to access m_post_fd, although I still think clang thread annotation might complain if it is accessed without a lock outside of the destructor, and I am not 100% sure that tsan would not complain if I removed the post_fd variables, but I am 99% sure so that it would be ok, so that change could definitely be a followup.

Thanks for asking this and feel free to ask about anything else that isn't clear or could be improved.

Copy link
Collaborator Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed review!

Updated 5a6c9e0 -> 0e4f88d (pr/disconnected.1 -> pr/disconnected.2, compare) adding suggested asserts and updating some comments.


re: #129 (review)

From the OP:

the possible causes are that (1) EventLoop::m_mutex is not used to protect some EventLoop member variables that are accessed from multiple threads

Which variables?

Updated the description with this information, but checking previous version of the code it looks like variables accessed from multiple threads and used without a lock were m_num_clients, m_async_fns, and m_post_fd.

Also, to be clear, the intermittent failure that's being fixed here was not caused by lack of locking as far as I can tell. The failure was caused by a race between the startAsyncThread and removeClient calls at the end of the ~Connection destructor as described in bitcoin/bitcoin#31151 (comment)

if (m_post_fn) {
Unlock(lock, *m_post_fn);
m_post_fn = nullptr;
m_cv.notify_all();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

Previously m_cv.notify_all() would have been called regardless of whether m_post_fn was nullptr or not. Now it will only be called if m_post_fn is not nullptr. Is this intended?

Yep, that's intended. The only reason to notify a condition variable is after updating some shared state, like changing a reference count or adding or removing something from a queue. The notify_all() call here is made right after setting m_post_fn to notify waiting threads that it changed. In the alternate case, no state is changing and there's nothing notify about.

src/mp/proxy.cpp Outdated
Comment on lines 270 to 280
bool EventLoop::done(std::unique_lock<std::mutex>& lock)
{
assert(m_num_clients >= 0);
return m_num_clients == 0 && m_async_fns.empty();
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

Maybe add these:

Thanks, added.

You are right that code is written in this style to provide the similar benefits as clang annotations, so it is a compile error for outside callers access shared state without the needed lock. So few existing EventLoop methods take lock parameters, and this new method follows that pattern.

For background, the EventLoop code is basically some of the first code I wrote in bitcoin/bitcoin#10102 and I'm pretty sure it written before annotations were used much in bitcoin core. It probably would be a very good idea to add clang annotations here too, so that could be a followup, but in the meantime this convention prevents the common bug of forgetting to lock something before accessing shared state. (I think a bug where you would somehow find a different mutex and lock the wrong mutex would be more rare.)

@@ -188,29 +188,27 @@ void EventLoop::loop()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

locking m_mutex shouldn't be necessary because there cannot be two threads destroying the same object concurrently. Or if there are then this is a serious bug elsewhere that will cause a double free bug after the destructor completes. Also, there cannot be one thread destroying the object while another one is calling a method of that object.

I think that is basically true and the lock is probably unnecessary here.

But there might be scenarios where threadsanitizer will throw errors without a lock, if it sees a variable being written here after being read from another thread with no synchronization in between. I am also not sure if clang thread safety annotations would complain without a lock here.

Also to be pedantic though I think "there cannot be one thread destroying the object while another one is calling a method of that object" is not true. The state in the object is still valid while the destructor method is running, and it should be fine for other threads to access it. The destructor method is not different from other methods in this respect.

@@ -188,29 +188,27 @@ void EventLoop::loop()

kj::Own<kj::AsyncIoStream> wait_stream{
m_io_context.lowLevelProvider->wrapSocketFd(m_wait_fd, kj::LowLevelAsyncIoProvider::TAKE_OWNERSHIP)};
int post_fd{m_post_fd};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

Why make a copy of m_post_fd? This code here is the only one that can change it. That is, for sure, at the end where close(post_fd) is called, post_fd will be equal to m_post_fd, right?

This is just saving a copy of the variable while the lock is held. There are three methods (loop, post, removeClient) where m_post_fd is saved to a temporary variable before the lock is release and the descriptor is used.

As you say, this should never be necessary because the code that changes this variable can only run after the code that reads it, but at least when I started debugging this issue, thread sanitizer complained that this variable was being read and written to without synchronization, so I stopped accessing it without a lock to prevent this.

Going forward it might also be clearer to access m_post_fd with locks, and this might make it easier to add clang annotations too (not sure).

I wouldn't object to a followup accessing m_post_fd directly without locks, but I would want make sure this is compatible with the tools we use (tsan, thread safety annotations) given their limitations.

Comment on lines +223 to +230
int post_fd{m_post_fd};
Unlock(lock, [&] {
char buffer = 0;
KJ_SYSCALL(write(m_post_fd, &buffer, 1));
KJ_SYSCALL(write(post_fd, &buffer, 1));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

Am I right that the problem with this code was [...]

This was actually not the problem. The problem wasn't that the code was trying to write(-1, ...), although that could have been a possible outcome of the bug. The problem was not even that it was trying to write to a closed file descriptor, although that could have been another possible outcome the bug. Instead, the actual problem, whenever it happened, was that the write() call failed because it was writing to a pipe that was closed on the reading side.

The cause of all these problems (theoretical and actual) was that removeClient was only checking the m_num_clients == 0 condition to see if it should write to the pipe before this PR, instead of checking the checking m_num_clients == 0 && m_async_fns.empty() condition like it is doing after this PR. So previously, it could write to the pipe multiple times before the loop was actually supposed to exit, and depending on thread scheduling this could occasionally cause the loop to exit slightly too early, immediately before the final write() call was made, causing the write() call to fail.

Sjors added a commit to Sjors/bitcoin that referenced this pull request Jan 27, 2025
bitcoin-core/libmultiprocess#129: Fix "disconnected: write(m_post_fd, &buffer, 1): Broken pipe" EventLoop shutdown races.

Using rebased commit from: https://github.com/Sjors/libmultiprocess/tree/2025/01/disconnected
Sjors added a commit to Sjors/bitcoin that referenced this pull request Jan 27, 2025
bitcoin-core/libmultiprocess#129: Fix "disconnected: write(m_post_fd, &buffer, 1): Broken pipe" EventLoop shutdown races.

Using rebased commit from: https://github.com/Sjors/libmultiprocess/tree/2025/01/disconnected
Copy link
Collaborator Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing and keeping discussion going on this. The bug is a pretty complicated and hard to explain, but I want to try to make everything clear as possible so the change is understood.

I am probably about to merge this PR shortly, because the bug is not straightforward to reproduce and I am pretty confident this change fixes it, and 100% confident this change improves locking in general and fixes a bunch of immediate -fsanitize=thread bugs. I also want to bump the libmultiprocess version in bitcoin core to be able support building libmultiprocess as a git subtree, and would like this fix to be included to avoid any potential issues in CI in that PR and other multirprocess bitcoin core PRs.

I would like to continue to improve and discuss this code and talk about it here, or make a followup PR that adds clang thread safety annotations, and could be another place to make related improvements.

@@ -188,29 +188,27 @@ void EventLoop::loop()

kj::Own<kj::AsyncIoStream> wait_stream{
m_io_context.lowLevelProvider->wrapSocketFd(m_wait_fd, kj::LowLevelAsyncIoProvider::TAKE_OWNERSHIP)};
int post_fd{m_post_fd};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

Hmm, but the lock is not held when making the copy at line 191?

Oh, that is a good point, I didn't notice that. For consistency it would be good to just acquire the lock whereever the variable is used, even when it not needed, or have some other more consistent rule. I think a good followup would be to add clang annotations and add the lock everywhere it says they are required, and remove it places where it isn't necessary and the annotations don't say it is required. This would add an unnecessary lock here as you are suggesting, and remove an unnecessary lock in the destructor as you have also suggested.

Comment on lines +223 to +230
int post_fd{m_post_fd};
Unlock(lock, [&] {
char buffer = 0;
KJ_SYSCALL(write(m_post_fd, &buffer, 1));
KJ_SYSCALL(write(post_fd, &buffer, 1));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #129 (comment)

Right? Or do I miss some high level context here (I am new to the internals of this library)?

Yes, the high level context that prevents what you described from happening is the addClient/removeClient reference counting. If you call post() without calling addClient(), the race condition you described will be present, but that is a higher level bug. The eventloop code is assuming that addClient and removeClient are being used whenever post() is used.

The bug being fixed in this PR is a lower-level bug that only happens during shutdown when the reference count is 0 and at the very last second, right very last removeClient call in the program is about to call write(m_post_fd), the event loop wakes up because of previous spurious writes, and closes the read end of the file descriptor (before closing m_post_fd) as well.

The cause of the spurious writes is that previously removeClient was only checking the m_num_clients == 0 condition so it was doing spurious write(m_post_fd) calls which could cause the event loop to wake up too early and exit slightly before it was supposed to exit. Now removeClient is checking m_num_clients == 0 && m_async_fns.empty() so it will not do spurious writes.

@Sjors
Copy link
Member

Sjors commented Jan 27, 2025

If you do need to make further changes here, can you rebase it? That makes it easier to update bitcoin/bitcoin#30975.

@ryanofsky ryanofsky merged commit 72f6669 into bitcoin-core:master Jan 27, 2025
ryanofsky added a commit that referenced this pull request Jan 27, 2025
063ff18 fix startAsyncThread comment (Ryan Ofsky)

Pull request description:

  While implementing #129 I noticed startAsyncThread comment wasn't totally accurate so this should fix that.

Top commit has no ACKs.

Tree-SHA512: b3ff8c53752c65c3d58af8dddd13bc718d18c230adb0b40c674b381e47629ed0780650e6bb017a3238093c4e12a7b5b02244ad90871b005a9a562330b5ea4ba7
Sjors added a commit to Sjors/bitcoin that referenced this pull request Jan 27, 2025
bitcoin-core/libmultiprocess#121: ProxyClientBase: avoid static_cast to partially constructed object
bitcoin-core/libmultiprocess#120: proxy-types.h: add static_assert to detect int/enum size mismatch
bitcoin-core/libmultiprocess#127: ProxyClientBase: avoid static_cast to partially destructed object
bitcoin-core/libmultiprocess#129: Fix "disconnected: write(m_post_fd, &buffer, 1): Broken pipe" EventLoop shutdown races.
bitcoin-core/libmultiprocess#130: refactor: Add CleanupRun function to dedup clean list code
Sjors added a commit to Sjors/bitcoin that referenced this pull request Jan 27, 2025
bitcoin-core/libmultiprocess#121: ProxyClientBase: avoid static_cast to partially constructed object
bitcoin-core/libmultiprocess#120: proxy-types.h: add static_assert to detect int/enum size mismatch
bitcoin-core/libmultiprocess#127: ProxyClientBase: avoid static_cast to partially destructed object
bitcoin-core/libmultiprocess#129: Fix "disconnected: write(m_post_fd, &buffer, 1): Broken pipe" EventLoop shutdown races.
bitcoin-core/libmultiprocess#130: refactor: Add CleanupRun function to dedup clean list code
bitcoin-core/libmultiprocess#131: doc: fix startAsyncThread comment
bitcoin-core/libmultiprocess#133: Fix debian "libatomic not found" error in downstream builds
@vasild
Copy link
Contributor

vasild commented Jan 27, 2025

Post merge ACK, the change looks good and the discussions above drifted to things that are out of the scope of this PR.

ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jan 27, 2025
This should be the final update to the libmultiprocess package via the depends
system. It brings in the libmultiprocess cmake changes from
bitcoin-core/libmultiprocess#136 needed to support
building as subtree. After this, a followup PR will add libmultiprocess as a
git subtree and depends will just use the git subtree instead of hardcoding its
own version hash.

Since there have been libmultiprocess API changes since the last update, this
commit also updates bitcoin code to be compatible with them.

This update brings in the following changes:

bitcoin-core/libmultiprocess#121 ProxyClientBase: avoid static_cast to partially constructed object
bitcoin-core/libmultiprocess#120 proxy-types.h: add static_assert to detect int/enum size mismatch
bitcoin-core/libmultiprocess#127 ProxyClientBase: avoid static_cast to partially destructed object
bitcoin-core/libmultiprocess#129 Fix "disconnected: write(m_post_fd, &buffer, 1): Broken pipe" EventLoop shutdown races.
bitcoin-core/libmultiprocess#130 refactor: Add CleanupRun function to dedup clean list code
bitcoin-core/libmultiprocess#131 doc: fix startAsyncThread comment
bitcoin-core/libmultiprocess#133 Fix debian "libatomic not found" error in downstream builds
bitcoin-core/libmultiprocess#94 c++ 20 cleanups
bitcoin-core/libmultiprocess#135 refactor: proxy-types.h API cleanup
bitcoin-core/libmultiprocess#136 cmake: Support being included with add_subdirectory
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jan 27, 2025
This should be the final update to the libmultiprocess package via the depends
system. It brings in the libmultiprocess cmake changes from
bitcoin-core/libmultiprocess#136 needed to support
building as subtree. After this, a followup PR will add libmultiprocess as a
git subtree and depends will just use the git subtree instead of hardcoding its
own version hash.

Since there have been libmultiprocess API changes since the last update, this
commit also updates bitcoin code to be compatible with them.

This update brings in the following changes:

bitcoin-core/libmultiprocess#121 ProxyClientBase: avoid static_cast to partially constructed object
bitcoin-core/libmultiprocess#120 proxy-types.h: add static_assert to detect int/enum size mismatch
bitcoin-core/libmultiprocess#127 ProxyClientBase: avoid static_cast to partially destructed object
bitcoin-core/libmultiprocess#129 Fix "disconnected: write(m_post_fd, &buffer, 1): Broken pipe" EventLoop shutdown races.
bitcoin-core/libmultiprocess#130 refactor: Add CleanupRun function to dedup clean list code
bitcoin-core/libmultiprocess#131 doc: fix startAsyncThread comment
bitcoin-core/libmultiprocess#133 Fix debian "libatomic not found" error in downstream builds
bitcoin-core/libmultiprocess#94 c++ 20 cleanups
bitcoin-core/libmultiprocess#135 refactor: proxy-types.h API cleanup
bitcoin-core/libmultiprocess#136 cmake: Support being included with add_subdirectory
bitcoin-core/libmultiprocess#137 doc: Fix broken markdown links
Sjors pushed a commit to Sjors/bitcoin that referenced this pull request Jan 28, 2025
This should be the final update to the libmultiprocess package via the depends
system. It brings in the libmultiprocess cmake changes from
bitcoin-core/libmultiprocess#136 needed to support
building as subtree. After this, a followup PR will add libmultiprocess as a
git subtree and depends will just use the git subtree instead of hardcoding its
own version hash.

Since there have been libmultiprocess API changes since the last update, this
commit also updates bitcoin code to be compatible with them.

This update brings in the following changes:

bitcoin-core/libmultiprocess#121 ProxyClientBase: avoid static_cast to partially constructed object
bitcoin-core/libmultiprocess#120 proxy-types.h: add static_assert to detect int/enum size mismatch
bitcoin-core/libmultiprocess#127 ProxyClientBase: avoid static_cast to partially destructed object
bitcoin-core/libmultiprocess#129 Fix "disconnected: write(m_post_fd, &buffer, 1): Broken pipe" EventLoop shutdown races.
bitcoin-core/libmultiprocess#130 refactor: Add CleanupRun function to dedup clean list code
bitcoin-core/libmultiprocess#131 doc: fix startAsyncThread comment
bitcoin-core/libmultiprocess#133 Fix debian "libatomic not found" error in downstream builds
bitcoin-core/libmultiprocess#94 c++ 20 cleanups
bitcoin-core/libmultiprocess#135 refactor: proxy-types.h API cleanup
bitcoin-core/libmultiprocess#136 cmake: Support being included with add_subdirectory
bitcoin-core/libmultiprocess#137 doc: Fix broken markdown links
fanquake added a commit to bitcoin/bitcoin that referenced this pull request Jan 29, 2025
…ng to subtree

4e0aa18 test: Add test for IPC serialization bug (Ryan Ofsky)
2221c88 depends: Update libmultiprocess library before converting to subtree (Ryan Ofsky)

Pull request description:

  This should be the final update to the libmultiprocess package via the depends system. It brings in the libmultiprocess cmake changes from bitcoin-core/libmultiprocess#136 needed to support building as subtree. After this, followup PR #31741 will add libmultiprocess as a git subtree and depends will just use the git subtree instead of hardcoding its own version hash.

  Since there have been libmultiprocess API changes since the last update, this commit also updates bitcoin code to be compatible with them.

  This update has the following new changes since previous update #31105:

  bitcoin-core/libmultiprocess#121 ProxyClientBase: avoid static_cast to partially constructed object
  bitcoin-core/libmultiprocess#120 proxy-types.h: add static_assert to detect int/enum size mismatch
  bitcoin-core/libmultiprocess#127 ProxyClientBase: avoid static_cast to partially destructed object
  bitcoin-core/libmultiprocess#129 Fix "disconnected: write(m_post_fd, &buffer, 1): Broken pipe" EventLoop shutdown races.
  bitcoin-core/libmultiprocess#130 refactor: Add CleanupRun function to dedup clean list code
  bitcoin-core/libmultiprocess#131 doc: fix startAsyncThread comment
  bitcoin-core/libmultiprocess#133 Fix debian "libatomic not found" error in downstream builds
  bitcoin-core/libmultiprocess#94 c++ 20 cleanups
  bitcoin-core/libmultiprocess#135 refactor: proxy-types.h API cleanup
  bitcoin-core/libmultiprocess#136 cmake: Support being included with add_subdirectory
  bitcoin-core/libmultiprocess#137 doc: Fix broken markdown links

ACKs for top commit:
  Sjors:
    ACK 4e0aa18
  vasild:
    ACK 4e0aa18
  TheCharlatan:
    ACK 4e0aa18

Tree-SHA512: 6d81cdf7f44762c7f476212295f6224054fd0a61315bb54786bc7758a2b33e5a2fce925c71e36f7bda320049aa14e7218a458ceb03dacbb869632c466c4789b0
ryanofsky added a commit to ryanofsky/libmultiprocess that referenced this pull request Feb 10, 2025
Add basic thread safety annotations to EventLoop. Use could be expanded to
other functions. Use can be expanded and deepened but this puts basic
functionality in place.

Use of annotations was discussed in
bitcoin-core#129 (comment)
@ryanofsky
Copy link
Collaborator Author

To follow up on all the discussion about clang annotations above, some basic ones are now added in #160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants