Skip to content
Open
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
74a8a22
Draft documentation for task_group dependencies
kboyarinov Oct 1, 2025
5c1ee1c
Improve wording and formatting
kboyarinov Oct 3, 2025
5252568
Improve examples and formatting
kboyarinov Oct 3, 2025
2c83040
Fix examples
kboyarinov Oct 3, 2025
c973e8e
Fix example
kboyarinov Oct 3, 2025
7fc8ba2
Fix reduction example
kboyarinov Oct 3, 2025
0a8aa07
Fix submission in reduction sample
kboyarinov Oct 3, 2025
0b81e65
Fix returned handle check
kboyarinov Oct 3, 2025
8012e8a
Separate the documentation
kboyarinov Oct 3, 2025
e642c63
Fix including the example
kboyarinov Oct 3, 2025
f864a64
Fix newlines at the end of file
kboyarinov Oct 3, 2025
0a77400
Update doc/main/reference/task_group_bypass_support.rst
kboyarinov Oct 10, 2025
7f4d219
Update doc/main/reference/task_group_dynamic_dependencies.rst
kboyarinov Oct 10, 2025
b70a323
Update doc/main/reference/task_group_dynamic_dependencies.rst
kboyarinov Oct 10, 2025
f363917
Update doc/main/reference/task_group_dynamic_dependencies.rst
kboyarinov Oct 10, 2025
e513f87
Update doc/main/reference/task_group_dynamic_dependencies.rst
kboyarinov Oct 10, 2025
d49ae72
Remove redundant local file
kboyarinov Oct 10, 2025
73d721a
Update doc/main/reference/examples/task_group_extensions_bypassing.cpp
kboyarinov Oct 10, 2025
63112a4
Update doc/main/reference/examples/task_group_extensions_reduction.cpp
kboyarinov Oct 10, 2025
5f40d8e
Update doc/main/reference/examples/task_group_extensions_reduction.cpp
kboyarinov Oct 10, 2025
d5a0fa7
Apply comments to the examples
kboyarinov Oct 10, 2025
0845311
Fix bypass example
kboyarinov Oct 10, 2025
19d54d6
Fix typo
kboyarinov Oct 10, 2025
0027f75
Update doc/main/reference/task_group_dynamic_dependencies.rst
kboyarinov Oct 16, 2025
95eac21
Hide serial threshold for reduction example
kboyarinov Oct 16, 2025
7768cac
Merge branch 'dev/kboyarinov/task-group-docs' of https://github.com/o…
kboyarinov Oct 16, 2025
a3a0500
Update copyrights to UXL
kboyarinov Oct 16, 2025
3796fb9
Merge remote-tracking branch 'origin/master' into dev/kboyarinov/task…
kboyarinov Oct 16, 2025
73855db
Update doc/main/reference/task_group_bypass_support.rst
kboyarinov Oct 16, 2025
59b4dc0
Update doc/main/reference/examples/task_group_extensions_bypassing.cpp
kboyarinov Oct 16, 2025
6d8a33b
Update doc/main/reference/examples/task_group_extensions_bypassing.cpp
kboyarinov Oct 16, 2025
a7d90c8
Update doc/main/reference/task_group_dynamic_dependencies.rst
kboyarinov Oct 16, 2025
3e44174
Update doc/main/reference/task_group_dynamic_dependencies.rst
kboyarinov Oct 16, 2025
33e1b16
Address review comments
kboyarinov Oct 16, 2025
3cfe4e9
Minor changes
kboyarinov Oct 16, 2025
3715c6f
Change the example
kboyarinov Oct 16, 2025
a8e9580
Minor fix in the example
kboyarinov Oct 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions doc/main/reference/examples/task_group_extensions_bypassing.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
/*
Copyright (c) 2025 Intel Corporation

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

#include <cstdint>
#include <vector>
#include <iostream>

static constexpr std::size_t serial_threshold = 16;

/*begin_task_group_extensions_bypassing_example*/
#define TBB_PREVIEW_TASK_GROUP_EXTENSIONS 1
#include "oneapi/tbb/task_group.h"

template <typename Iterator, typename Function>
struct for_task {
tbb::task_handle operator()() const {
tbb::task_handle next_task;

auto size = std::distance(begin, end);
if (size < serial_threshold) {
// Execute the work serially
for (Iterator it = begin; it != end; ++it) {
f(*it);
}
} else {
// Enough work to split the range
Iterator middle = begin + size / 2;

// Submit the right subtask for execution
tg.run(for_task<Iterator, Function>{middle, end, f, tg});

// Bypass the left subtask
next_task = tg.defer(for_task<Iterator, Function>{begin, middle, f, tg});
}
return next_task;
}

Iterator begin;
Iterator end;
Function f;
tbb::task_group& tg;
}; // struct for_task

// Function accepts std::iterator_traits<RandomAccessIterator>::reference argument
template <typename RandomAccessIterator, typename Function>
void parallel_for(RandomAccessIterator begin, RandomAccessIterator end, Function f) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
template <typename RandomAccessIterator, typename Function>
void parallel_for(RandomAccessIterator begin, RandomAccessIterator end, Function f) {
template <typename RandomAccessIterator, typename Function>
void for_each(RandomAccessIterator begin, RandomAccessIterator end, Function f) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to parallel_for_each since with the name for_each it is challenging to test the function because of the ambiguity with std::for_each.

Copy link
Contributor

@akukanov akukanov Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that ambiguity with tbb::parallel_for_each is better? :)
Do not do using namespace std;, and the testing should be good, no? Or is the ambiguity caused by ADL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbb::parallel_for_each do not create an issue here. using namespace std is not used. The ambiguity is because the test for_each is defined in the global namespace. The arguments to for each are iterators of std::vector that are defined in the namespace std. Because of the ADL, std::for_each becomes visible and two versions of the function creates an ambiguity.
While explaining this, I realized that it should be possible to solve this by using raw pointers instead of vector iterators in testing.

tbb::task_group tg;
// Run the root task
tg.run_and_wait(for_task<RandomAccessIterator, Function>{begin, end, std::move(f), tg});
}
/*end_task_group_extensions_bypassing_example*/

int main() {
constexpr std::size_t N = 10000;

std::vector<std::size_t> v(N, 0);

parallel_for(v.begin(), v.end(), [](std::size_t& item) {
item = 42;
});

for (std::size_t i = 0; i < v.size(); ++i) {
if (v[i] != 42) {
std::cerr << "Error in " << i << "index" << std::endl;
return 1;
}
}
}
97 changes: 97 additions & 0 deletions doc/main/reference/examples/task_group_extensions_reduction.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
/*
Copyright (c) 2025 Intel Corporation

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

#include <cstdint>
#include <iostream>

/*begin_task_group_extensions_reduction_example*/
#define TBB_PREVIEW_TASK_GROUP_EXTENSIONS 1
#include "oneapi/tbb/task_group.h"

struct reduce_task {
static constexpr std::size_t serial_threshold = 16;

struct join_task {
void operator()() const {
*result = *left + *right;
}

std::shared_ptr<std::size_t> left;
std::shared_ptr<std::size_t> right;
std::shared_ptr<std::size_t> result;
};

tbb::task_handle operator()() const {
tbb::task_handle next_task;

std::size_t size = end - begin;
if (size < serial_threshold) {
// Perform serial reduction
for (std::size_t i = begin; i < end; ++i) {
*result += i;
}
} else {
// The range is too large to process directly
// Divide it into smaller segments for parallel execution
std::size_t middle = begin + size / 2;

std::shared_ptr<std::size_t> left_result = std::make_shared<std::size_t>(0);
tbb::task_handle left_leaf = tg.defer(reduce_task{begin, middle, left_result, tg});

std::shared_ptr<std::size_t> right_result = std::make_shared<std::size_t>(0);
tbb::task_handle right_leaf = tg.defer(reduce_task{middle, end, right_result, tg});

tbb::task_handle join = tg.defer(join_task{left_result, right_result, result});

tbb::task_group::set_task_order(left_leaf, join);
tbb::task_group::set_task_order(right_leaf, join);

tbb::task_group::transfer_this_task_completion_to(join);

// Save the left leaf for further bypassing
next_task = std::move(left_leaf);

tg.run(std::move(right_leaf));
tg.run(std::move(join));
}

return next_task;
}

std::size_t begin;
std::size_t end;
std::shared_ptr<std::size_t> result;
tbb::task_group& tg;
};

std::size_t calculate_parallel_sum(std::size_t begin, std::size_t end) {
tbb::task_group tg;

std::shared_ptr<std::size_t> reduce_result = std::make_shared<std::size_t>(0);
reduce_task root_reduce_task{begin, end, reduce_result, tg};
tg.run_and_wait(root_reduce_task);

return *reduce_result;
}
/*end_task_group_extensions_reduction_example*/

int main() {
constexpr std::size_t N = 10000;
std::size_t serial_sum = N * (N - 1) / 2;
std::size_t parallel_sum = calculate_parallel_sum(0, N);

if (serial_sum != parallel_sum) std::cerr << "Incorrect reduction result" << std::endl;
}
94 changes: 94 additions & 0 deletions doc/main/reference/task_group_bypass_support.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
.. _task_group_bypass_support:

Task Bypass Support for ``task_group``
======================================

.. note::
To enable this extension, define the ``TBB_PREVIEW_TASK_GROUP_EXTENSIONS`` macro with a value of ``1``.

.. contents::
:local:
:depth: 2

Description
***********

The |full_name| implementation extends the requirements for user-provided function object from
`tbb::task_group specification <https://oneapi-spec.uxlfoundation.org/specifications/oneapi/latest/elements/onetbb/source/task_scheduler/task_group/task_group_cls>`_
to allow them to return a ``task_handle`` object.
Copy link
Contributor

@vossmjp vossmjp Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that task_group specification is silent on the return type of the user-provided function object. So "extends the requirements" doesn't seem right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand the idea of named requirement and the return type is that if the specification says nothing about the return type means that any type can be returned, but will be ignored. So "extending" the named requirement means that the implementation will react if the handle is returned.


`Task Bypassing <../tbb_userguide/Task_Scheduler_Bypass.html>`_ allows developers to reduce task scheduling overhead by providing a hint about
which task should be executed next.

Execution of the deferred task owned by a returned ``task_handle`` is not guaranteed to occur immediately, nor to be performed by the same thread.

.. code:: cpp

tbb::task_handle task_body() {
tbb::task_handle next_task = group.defer(next_task_body);
return next_task;
}

API
***

Header
------

.. code:: cpp

#define TBB_PREVIEW_TASK_GROUP_EXTENSIONS 1
#include <oneapi/tbb/task_group.h>

Synopsis
--------

.. code:: cpp

namespace oneapi {
namespace tbb {
class task_group {
public:
// Only the requirements for the return type of function F are changed
template <typename F>
task_handle defer(F&& f);

// Only the requirements for the return type of function F are changed
template <typename F>
task_group_status run_and_wait(const F& f);

// Only the requirements for the return type of function F are changed
template <typename F>
void run(F&& f);
}; // class task_group
} // namespace tbb
} // namespace oneapi

Member Functions
----------------

.. code:: cpp

template <typename F>
task_handle defer(F&& f);

template <typename F>
task_group_status run_and_wait(const F& f);

template <typename F>
void run(F&& f);

The function object ``F`` may return a ``task_handle`` object. If the returned handle is non-empty and owns a task without dependencies, it serves as an optimization
hint for a task that could be executed next.

If the returned handle was created by a ``task_group`` other than ``*this``, the behavior is undefined.

Example
-------

The example below demonstrates how to implement a parallel for loop using ``task_group`` and divide-and-conquer pattern.

.. literalinclude:: ./examples/task_group_extensions_bypassing.cpp
:language: c++
:start-after: /*begin_task_group_extensions_bypassing_example*/
:end-before: /*end_task_group_extensions_bypassing_example*/
Loading