Skip to content

RFC 2: Add Fiber::ExecutionContext::MultiThreaded #15517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ysbaddaden
Copy link
Contributor

@ysbaddaden ysbaddaden commented Feb 25, 2025

Introduces the LAST EC scheduler that runs in multiple thread, with work stealing so any thread can resume any runnable fiber in the context (no more starving threads).

Unlike the ST scheduler, the MT scheduler needs to actively park threads since only one thread in the context can run the event loop (no parallel runs).

Having a single event loop for the whole context instead of having one per thread avoids situations where fibers would wait in an event loop but won't be processed because this thread happens to be busy, causing delays. With a single event loop, as soon as a thread is starving it can check the event loop and enqueue runnable fibers, that can be immediately resumed (and stolen).

NOTE: we can start running the specs in this context though they can segfault sometimes. Maybe because of some issues in spec helpers that used to expect fibers not switching, or maybe of issues in the stdlib for the same reason (for example libxml).

Kept in draft until #15511 and #15513 are merged.

refs #15342

Introduces the second EC scheduler that runs in multiple threads. Uses
the thread-safe queues (Runnables, GlobalQueue).

Contrary to the ST scheduler, the MT scheduler needs to actively park
the thread in addition to waiting on the event loop, because only one
thread is allowed to run the event loop.
@ysbaddaden ysbaddaden force-pushed the feature/execution-context-multithreaded branch from f5e466e to 7bcff33 Compare March 25, 2025 09:23
@ysbaddaden ysbaddaden marked this pull request as ready for review March 25, 2025 09:24
Comment on lines 89 to 93
private def start_schedulers(hijack)
@size.end.times do |index|
@schedulers << Scheduler.new(self, "#{@name}-#{index}")
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why do we initialize all schedulers from the start? We only start the minimal number of threads. And might never actually need all schedulers when the number of threads doesn't reach the max.
Could we lazily initialize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid parallelism issues? For example in relation to stealing. We can expect the schedulers to exist, whatever if they're used. Maybe we could lazily initialize. That will need some experimentation.

Copy link
Member

@straight-shoota straight-shoota Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Steal is bounded by @schedulers.size and the array should never shrink. So accessing should be fine. The worst that can happen is missing out on stealing from a scheduler if it was just added after we took the size. But that should not be an issue.

Either way, we should document why the code works like it does so we can be reminded the next time we look at it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd only be mutating the array during #initialize then inside a mutex, so it's safe.

But we're accessing @schedulers outside of a mutex when stealing (for obvious reasons), so there's no telling how the compiler or a weak CPU like ARM will reorder the writes when pushing to the array (size could be updated before the value is written). For example:

  • T1: mutex.lock (ensures following writes can't happen before)
  • T1: schedulers.@size += 1 (happens first)
  • T2: execution_context.steal
  • T2: schedulers.last => 💥
  • T1: schedulers.@buffer[size] = scheduler (too late)
  • T1: mutex.unlock (ensures above writes did happen)

Yes, I'm paranoid but if something can happen, it will happen. The above shouldn't happen on x86 until LLVM optimizers decide to reorder, and it will happen on ARM.

To be safe, we'd need an array-like object that uses an atomic or fence to update the size after the value has been set. It could be used for both @schedulers and @threads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: @threads isn't affected because calls to @threads.size are purely informational, we don't use it to iterate or access a thread; we only do this inside the mutex.

@ysbaddaden
Copy link
Contributor Author

I sanitized the range handling, and fixed the exclusive range/capcity, and wrote a few specs to assert the behavior. I also dropped a number of unused args.

@ysbaddaden
Copy link
Contributor Author

Now the fixed range normalization and capacity checks should be correct 🤞

@straight-shoota straight-shoota added this to the 1.16.0 milestone Mar 31, 2025
@straight-shoota straight-shoota changed the title RFC 2: Add Fiber::ExecutionContext::MultiThreaded RFC 2: Add Fiber::ExecutionContext::MultiThreaded Mar 31, 2025
@ysbaddaden ysbaddaden moved this from Review to Approved in Multi-threading Mar 31, 2025
@straight-shoota straight-shoota merged commit b83439c into crystal-lang:master Apr 3, 2025
35 checks passed
@github-project-automation github-project-automation bot moved this from Approved to Done in Multi-threading Apr 3, 2025
@ysbaddaden ysbaddaden deleted the feature/execution-context-multithreaded branch April 3, 2025 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants