RFC for Waiting for Individual Tasks in task_group #1862

kboyarinov · 2025-09-29T16:13:11Z

Description

Add an RFC for new functions that allow waiting for Individual Tasks in task_group:

namespace oneapi {
namespace tbb {
enum class task_status {
   not_complete,
   complete,
   canceled
};

class task_group {
    task_status wait_for(task_completion_handle&);
    task_status run_and_wait_for(task_handle&&);
};

class task_arena {
   task_status wait_for(task_completion_handle&);
};

} // namespace tbb
} // namespace oneapi

Fixes # - issue number(s) if exists

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

bug fix - change that fixes an issue
new feature - change that adds functionality
tests - change in tests
infrastructure - change in infrastructure and CI
documentation - documentation update

Tests

added - required for new features and some bug fixes
not needed

Documentation

updated in # - add PR number
needs to be updated
not needed

Breaks backward compatibility

Yes
No
Unknown

Notify the following users

List users with @ to send notifications

Other information

vossmjp · 2025-09-29T19:23:53Z

rfcs/proposed/task_group_wait_single_task/README.md

+}
+```
+
+This solution does not restrict the thread that is notified by another thread of the task completion from entering the bypass loop of the currently


This line is hard to follow.

Rephrased and added more details.

vossmjp · 2025-09-29T19:53:32Z

rfcs/proposed/task_group_wait_single_task/README.md

+
+As an initial step, it makes sense to isolate the waiting thread so that it only executes tasks related to the same ``task_group`` as the awaited task - similar to the improvement described in the [RFC for another overload of ``task_arena::wait_for``](../task_arena_waiting/task_group_interop.md).
+
+## Exit Criteria and Open Questions


Is the intention to go through an experimental phase, or straight to production?

Initially to the experimental since the proposed API depend on the experimental task_completion_handle.

vossmjp · 2025-09-29T20:01:01Z

rfcs/proposed/task_group_wait_single_task/README.md

+Since the new waiting functions track the progress of a single task, returning a ``task_group_status`` may be misleading.
+If the group execution is cancelled, the tracked task may still execute, and returning ``canceled`` does not accurately reflect the task's 
+completion status.
+If execution is not cancelled, the function would need to track whether other tasks remain in the group and return ``not_complete` if any are still pending.


Will transfer of a task_completion lead to any confusion around cancellation, if for example, the original task has executed but the task that was transfered to is canceled? The user will see canceled as a status even though the initial task did complete. I think it's fine and will just need to be well documented in transfer.

I think that if the completion of the task was transferred to another task, the task status after the wait is also transferred. I agree that it should be clearly documented. Added explicit mention in the RFC.

akukanov · 2025-09-30T14:34:38Z

rfcs/proposed/task_group_wait_single_task/README.md

+To address this, a new enum ``task_status`` is proposed to track the status of the awaited task. ``task_status::complete`` indicates that the tracked 
+task was executed and completed, while ``task_status::canceled`` signifies that the task was not executed due to group cancellation.


I would consider extending the existing enum with one more value, named e.g. task_complete, which would be returned by the single-task waiting functions instead of complete.

When other waiting functions return task_group_status::canceled, they rely on the cancellation of the group (i.e. cancellation of the associated task_group_context). Individual tasks can be both executed or canceled.
For single-task waiting functions, this flag would mean different thing - that the task we are waiting for was canceled.

And even if the task group execution was canceled, the single-task wait can return task_complete. And the returned task_group_status knows nothing about the status of task group.

I think separating the statuses would be more obvious for users.

For single-task waiting functions, this flag would mean different thing - that the task we are waiting for was canceled.

No, it would mean the same thing - that the task group, which task we are waiting for, was cancelled. There is no way to cancel a single task.

And even if the task group execution was canceled, the single-task wait can return task_complete. And the returned task_group_status knows nothing about the status of task group.

Sure, and I see no problem with that. The task was complete, while the status of the whole group is unknown. On the other hand, if the task was cancelled, then the whole group was cancelled.

For canceled, I agree, for run_and_wait_task it means the awaited task was not executed because the cancellation of the task_group execution.
For complete, I find it a bit misleading to have a status of task_group that knows nothing about the actual status of the group and serves the status of the task instead.

Talking with @kboyarinov earlier today, we discussed that for an individual task, a good set of status values would be "executed" and "canceled", instead of complete and canceled. A task might finish before a task group is canceled. Or a long-running task could execute and then discover while executing, by querying that its task_group_context, that its group has been canceled and short-cut its execution. So for a specific task, "executed" simply means that the scheduler executed the task but does not imply any about completion of the work or other work in the task group. And then "canceled" means the task never started to execute.

There can be a semantic difference between a task being executed and a task being complete, if the completion was transferred. For me, "executed" means just "the task body has been run" while "complete" means logical completion, or essentially "task waiters and successors receive a signal". Since wait_for has the latter semantics, I really prefer "complete" or better "task_complete". It is also consistent with transfer_this_task_completion_to.

For complete, I find it a bit misleading to have a status of task_group that knows nothing about the actual status of the group and serves the status of the task instead.

This is why I suggest to name it task_group_status::task_complete to indicate this is about a task and not the whole group.
And I still see no reason to have a separate enum class while an extra enum value seems to work just as good. Yet another reason for the single enum is the consistency of task_arena::wait_for overloads.

akukanov · 2025-09-30T14:37:12Z

rfcs/proposed/task_group_wait_single_task/README.md

+
+class task_group {
+    task_status wait_for(task_completion_handle& comp_handle);
+    task_status run_and_wait_for(task_handle&& handle);


Should there be run_and_wait_for(function) as well?

If it's a function, then there cannot be predecessors since no task_completion_handle was created. So you'd execute that function as a task and wait for just that task (which has no dependencies). So why run and wait for it in the task_group at all? Why not just call the function? But then maybe it makes sense if the completion is transferred to work created inside of the task?

If the function do not transfer the completion, usage of the function does not have much sense - the function would likely be executed by the calling thread and exit after it.

But if the task transfers the completion, having a function may be useful. E.g. if the function implements a root divide-and-conquer reduce and there are several reductions in the same task_group:

auto function = []() { auto left_leaf = tg.defer(...); auto right_leaf = tg.defer(...); auto join = tg.defer(...); tbb::task_group::set_task_order(left_leaf, join); tbb::task_group::set_task_order(right_leaf, join); tbb::task_group::transfer_this_task_completion_to(join); tg.run(std::move(left_leaf)); tg.run(std::move(right_leaf)); tg.run(std::move(join)); }; tg.run_and_wait_for(function);

I propose to add it to the future enhancements & open questions sections since I don't know if mixing several reductions (or other similar tasks) in one task_group makes more sense than separating into several groups.

I am fine to keep it as an open question for the future.

Added as an open question

I see it now described as a future enhancement, but not yet added to the list of open questions,

rfcs/proposed/task_group_wait_single_task/README.md

akukanov · 2025-09-30T14:49:27Z

rfcs/proposed/task_group_wait_single_task/README.md

+```cpp
+task_status wait_for(task_completion_handle& comp_handle);
+```
+
+Waits for the completion of the task represented by ``comp_handle``.
+If completion was transferred to another task using ``tbb::task_group::transfer_completion_to``, the function waits for completion of that task.
+
+This is semantically equivalent to: ``execute([&] { tg.wait_for(comp_handle); })``.


Are we able to get the task group out of a completion handle? If not, are we able to implement the function without requiring a task group to be also provided by the caller?

With our current implementation, task_group is not required for a single-task waiting. All we need is a task_dynamic_state and the associated task_group_context, both can be obtained from the task object.

In theory, we can implement task_group::wait_task (and even run_and_wait_task if we omit the "same task group" check) as a static function. But I have proposed member functions for consistency with other waiting functions.

I am not sure if implementing task_arena::wait_for wihout the task group is possible for any other TBB implementation. From the perspective of the further inclusion into oneTBB specification, it may make sense to add task_group argument into this function and keep it unused in our implementation. What do you think?

Sure it is possible for any implementation, under the assumption that a task_completion_handle (and perhaps also task_handle) is always "bound" to a certain task group (that is, can keep a pointer/reference to the group). We just need to be clear about that, as well as at which point the binding happens (I guess that is at task creation, and not at submission). Otherwise, it is a mystery where tg comes from in the "equivalent" expression.

I have added the clarification about what tg is in this case.

I agree that this can be approached by binding the handle to the task_group. But I think that we should not specify how binding should be implemented (to ensure the validity of our implementation).

True, we should not specify how the binding is to be implemented. When I said a task handle "can keep a pointer/reference", I did not mean it should, just that an implementation that takes this approach is valid and safe.

Specifying when binding happens is different, as it affects which task group to wait for, Waiting on a wrong task group could result in a program hang.

rfcs/proposed/task_group_wait_single_task/README.md

akukanov · 2025-09-30T18:06:47Z

rfcs/proposed/task_group_wait_single_task/README.md

+An alternative approach to address this limitation is to implement a general mechanism within the scheduler that forces the thread to exit the
+bypass loop and spawn the returned task if further execution should not be continued (i.e., ``waiter.continue_execution()`` returns ``false``).
+
+## Alternative Implementation Approaches


What are benefits and downsides of the alternative approaches to the recommended one?

The main benefit of the recommended implementation approach is that waiting for completion is guaranteed to be implemented using a single r1::wait or r1::run_and_wait in case of transferring.

For both alternative approaches, we will need to switch to another waiting in case of transferring:

task_dynamic_state* state = comp_handle.get_dynamic_state(); r1::wait(state->get_wait_context()); while (state->was_transferred()) { state = state->get_new_completion_point(); r1::wait(state->get_wait_context()); }

With the recommended approach, in case of transferring the wait context pointer is migrating between tasks, hence we don't need to double check if we completion was transferred. If the wait context was released, the completion is guaranteed to happen (no matter of which "final" task in the transfer chain).

Another benefit is that the wait context is created only when the wait was requested (that is not true for the first alternative approach).

I will add more details on the benefits and downsides into the RFC.

Updated each section with the pros and cons.

So, are the alternative implementation approaches considered and rejected, or is a further discussion necessary?

akukanov

It's a well-elaborated proposal. I still have some questions and suggestions, though.

akukanov · 2025-10-16T10:41:28Z

rfcs/proposed/task_group_wait_single_task/README.md

+Consider an example using an out-of-order SYCL queue.
+
+```cpp
+sycl::queue q{sycl::property::queue::in_order{false}};


I do not think the in_order property can be constructed as false; see https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#api:property-queue-in-order.

akukanov · 2025-10-16T10:48:48Z

rfcs/proposed/task_group_wait_single_task/README.md

+    h.single_task(task3_body);
+});
+
+task3.wait(); // wait for event 


Suggested change

task3.wait(); // wait for event

task3.wait(); // wait a single task

akukanov · 2025-10-16T10:50:31Z

rfcs/proposed/task_group_wait_single_task/README.md

+        tbb::task_handle task = m_task_group.defer(body);
+        tbb::task_completion_handle comp_handle = task;
+        m_task_group.run(std::move(task));
+        return {comp_handle};


Suggested change

return {comp_handle};

return comp_handle;

akukanov · 2025-10-16T11:24:46Z

rfcs/proposed/task_group_wait_single_task/README.md

+
+Ideally, the waiting thread should only execute tasks that contribute to the completion of the awaited task. In such a model, all tasks
+within the same subgraph would need to share a common isolation tag. In theory, a successor could inherit the isolation tag from
+its predecessor. However, multiple predecessors will have different tags.


Suggested change

its predecessor. However, multiple predecessors will have different tags.

its predecessor. However, multiple predecessors may have different tags.

akukanov · 2025-10-16T11:28:13Z

rfcs/proposed/task_group_wait_single_task/README.md

+
+The following questions should be resolved before promoting the feature out of the ``experimental`` stage.
+
+* Performance targets for this feature should be clearly defined.


Suggested change

* Performance targets for this feature should be clearly defined.

* Performance targets for this feature should be clearly defined and met.

akukanov · 2025-10-16T11:30:34Z

rfcs/proposed/task_group_wait_single_task/README.md

+
+class task_group {
+    task_status wait_for(task_completion_handle& comp_handle);
+    task_status run_and_wait_for(task_handle&& handle);


I see it now described as a future enhancement, but not yet added to the list of open questions,

akukanov · 2025-10-16T11:35:19Z

rfcs/proposed/task_group_wait_single_task/README.md

+However, since the current thread notifies the ``wait_context`` it is waiting on via the corresponding waiter node, it is more appropriate to avoid
+bypassing the next task. Instead, the task should be spawned, and ``run_and_wait_task`` should exit after executing ``middle_task``.
+
+For the initial implementation, it is proposed to completely avoid bypassing the task returned from the notification list and to spawn it.


Suggested change

For the initial implementation, it is proposed to completely avoid bypassing the task returned from the notification list and to spawn it.

For the initial implementation, it is proposed to completely avoid bypassing the task returned from the notification list and to spawn it instead.

akukanov · 2025-10-16T11:37:48Z

rfcs/proposed/task_group_wait_single_task/README.md

+
+For the initial implementation, it is proposed to completely avoid bypassing the task returned from the notification list and to spawn it.
+
+There are several approaches that can be implemented in the future to improve this approach.


Suggested change

There are several approaches that can be implemented in the future to improve this approach.

There are several ways to improve the implementation in the future.

akukanov · 2025-10-16T11:48:21Z

rfcs/proposed/task_group_wait_single_task/README.md

+An alternative approach to address this limitation is to implement a general mechanism within the scheduler that forces the thread to exit the
+bypass loop and spawn the returned task if further execution should not be continued (i.e., ``waiter.continue_execution()`` returns ``false``).
+
+## Alternative Implementation Approaches


So, are the alternative implementation approaches considered and rejected, or is a further discussion necessary?

kboyarinov added 2 commits September 29, 2025 19:09

Add first version of the RFC

14b7574

Increase picture sizes, remove unnecessary assets

db90a80

kboyarinov requested review from akukanov, aleksei-fedotov, dnmokhov, isaevil and vossmjp September 29, 2025 16:16

kboyarinov added the RFC label Sep 29, 2025

vossmjp reviewed Sep 29, 2025

View reviewed changes

Address review comments

72b08f4

akukanov reviewed Sep 30, 2025

View reviewed changes

kboyarinov added 2 commits October 8, 2025 16:22

Start addressing feedback

fbb1e14

Address comments

9778371

akukanov reviewed Oct 16, 2025

View reviewed changes


		As an initial step, it makes sense to isolate the waiting thread so that it only executes tasks related to the same ``task_group`` as the awaited task - similar to the improvement described in the [RFC for another overload of ``task_arena::wait_for``](../task_arena_waiting/task_group_interop.md).

		## Exit Criteria and Open Questions

		To address this, a new enum ``task_status`` is proposed to track the status of the awaited task. ``task_status::complete`` indicates that the tracked
		task was executed and completed, while ``task_status::canceled`` signifies that the task was not executed due to group cancellation.

	task3.wait(); // wait for event
	task3.wait(); // wait a single task

	its predecessor. However, multiple predecessors will have different tags.
	its predecessor. However, multiple predecessors may have different tags.


		The following questions should be resolved before promoting the feature out of the ``experimental`` stage.

		* Performance targets for this feature should be clearly defined.

	For the initial implementation, it is proposed to completely avoid bypassing the task returned from the notification list and to spawn it.
	For the initial implementation, it is proposed to completely avoid bypassing the task returned from the notification list and to spawn it instead.


		For the initial implementation, it is proposed to completely avoid bypassing the task returned from the notification list and to spawn it.

		There are several approaches that can be implemented in the future to improve this approach.

	There are several approaches that can be implemented in the future to improve this approach.
	There are several ways to improve the implementation in the future.

RFC for Waiting for Individual Tasks in task_group #1862

Are you sure you want to change the base?

RFC for Waiting for Individual Tasks in task_group #1862

Uh oh!

Conversation

kboyarinov commented Sep 29, 2025

Description

Type of change

Tests

Documentation

Breaks backward compatibility

Notify the following users

Other information

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akukanov Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akukanov Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akukanov Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akukanov Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akukanov Oct 6, 2025 •

edited

Loading

akukanov Oct 14, 2025 •

edited

Loading

akukanov Oct 14, 2025 •

edited

Loading

akukanov Oct 16, 2025 •

edited

Loading

akukanov Oct 16, 2025 •

edited

Loading