Skip to content

Conversation

@kpamnany
Copy link
Member

A Julia thread runs Julia's scheduler in the context of the switching task. If no task is found to switch to, the thread will sleep while holding onto the (possibly completed) task, preventing the task from being garbage collected. This recent Discourse post illustrates precisely this problem.

A solution to this would be for an idle Julia thread to switch to a "scheduler" task, thereby freeing the old task.

Other than thread 1, the root task that is created for every Julia-started (non-GC) thread essentially ends immediately -- we call jl_finish_task at the end of jl_threadfun.

This PR uses root tasks (on all but thread 1) as scheduler tasks. This solves the problem for all but thread 1. We could do the same for thread 1 also, but it would require special-casing as we cannot use thread 1's root task for this purpose.

@kpamnany kpamnany requested review from gbaraldi and vtjnash February 19, 2025 15:36
@JeffBezanson
Copy link
Member

We could do the same for thread 1 also, but it would require special-casing as we cannot use thread 1's root task for this purpose.

Why is this?

@gbaraldi
Copy link
Member

Thread 1s root task actually does stuff like run the REPL/toplevel code. So it's a dead task

@kpamnany
Copy link
Member Author

I considered creating a new task in julia_init and making that the root task, but the thread calling julia_init also needs to be wrapped in a task. Alternatively, we could add a sched_task to the TLS alongside root_task and use that uniformly on all threads instead. I chose the simplest option, but am open to any suggestions for improvement.

@vchuravy
Copy link
Member

Could we make the root task on thread 1, do nothing but launch a task that does the work it used to do, and thus it becomes the scheduler task?

@kpamnany
Copy link
Member Author

kpamnany commented Feb 21, 2025

Could we make the root task on thread 1, do nothing but launch a task that does the work it used to do, and thus it becomes the scheduler task?

We can. It gets complicated though, because we have to return from julia_init, so we need the caller's stack, which is the standard process stack (i.e. larger than our typical task stack). Also, if we make the root task on thread 1 the scheduler task, then what's the task wrapping the julia_init caller? Do we root that somewhere?

Am I understanding your suggestion correctly?

@kpamnany
Copy link
Member Author

@vtjnash suggested a couple of alternative ways to do this. Both have pros and cons, so I'd like to invite discussion.

With the first alternative, the scheduler uses a new field in ptls to instruct the task switching code (jl_switch) to not complete the task switch to the root task, but to get another task from the scheduler and switch to that instead.

Pros:

  • Uniform behavior for all threads (as opposed to this PR, which excludes thread 1).

Cons:

  • Complicated.
  • Additional binding between the scheduler and the task switching code.

The second alternative uses a OncePerThread{Task} and not the root task, and moves the logic into Julia code.

Pros:

  • Uniform behavior for all threads.
  • Adding new code in Julia rather than C.

Cons:

  • The task switch to the "scheduler" task happens almost immediately rather than after the thread sleep interval since that interval is handler in C. This introduces additional latency due to the additional task switch.

I don't much like the first alternative -- it's tricky, hard to understand/explain, and complicates the scheduler interface which will make it harder to switch schedulers.

I like the second alternative, but feel that the "early" switch to the scheduler task is problematic and this switch should happen only after the thread sleep interval. IMO the second alternative would be best once the thread sleep logic moves to Julia.

So I feel the current PR is still the right way to do this, but would like to hear other folks' opinions.

@kpamnany
Copy link
Member Author

kpamnany commented Mar 5, 2025

Closing this in favor of #57544:

  • It is simpler.
  • It is uniform across all threads (not excluding thread 1, like this PR).
  • It is pure Julia -- no added C.

@kpamnany kpamnany closed this Mar 5, 2025
@kpamnany kpamnany deleted the kp-sched-task branch March 25, 2025 21:08
kpamnany added a commit that referenced this pull request Mar 27, 2025
A Julia thread runs Julia's scheduler in the context of the switching
task. If no task is found to switch to, the thread will sleep while
holding onto the (possibly completed) task, preventing the task from
being garbage collected. This recent [Discourse
post](https://discourse.julialang.org/t/weird-behaviour-of-gc-with-multithreaded-array-access/125433)
illustrates precisely this problem.

A solution to this would be for an idle Julia thread to switch to a
"scheduler" task, thereby freeing the old task.

This PR uses `OncePerThread` to create a "scheduler" task (that does
nothing but run `wait()` in a loop) and switches to that task when the
thread finds itself idle.

Other approaches considered and discarded in favor of this one:
#57465 and
#57543.
KristofferC pushed a commit that referenced this pull request Apr 16, 2025
A Julia thread runs Julia's scheduler in the context of the switching
task. If no task is found to switch to, the thread will sleep while
holding onto the (possibly completed) task, preventing the task from
being garbage collected. This recent [Discourse
post](https://discourse.julialang.org/t/weird-behaviour-of-gc-with-multithreaded-array-access/125433)
illustrates precisely this problem.

A solution to this would be for an idle Julia thread to switch to a
"scheduler" task, thereby freeing the old task.

This PR uses `OncePerThread` to create a "scheduler" task (that does
nothing but run `wait()` in a loop) and switches to that task when the
thread finds itself idle.

Other approaches considered and discarded in favor of this one:
#57465 and
#57543.

(cherry picked from commit 0d4d6d9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants