Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

Commit 3cd5684

Browse files
committed
Add 2.0.1 and 2.1.0 changelogs.
1 parent 12dba29 commit 3cd5684

File tree

2 files changed

+46
-9
lines changed

2 files changed

+46
-9
lines changed

CHANGELOG.md

+45-8
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,42 @@
11
# Changelog
22

3+
## Thrust 2.1.0
4+
5+
### New Features
6+
7+
- NVIDIA/thrust#1805: Add default constructors to `transform_output_iterator`
8+
and `transform_input_output_iterator`. Thanks to Mark Harris (@harrism) for this contribution.
9+
- NVIDIA/thrust#1836: Enable constructions of vectors from `std::initializer_list`.
10+
11+
### Bug Fixes
12+
13+
- NVIDIA/thrust#1768: Fix type conversion warning in the `thrust::complex` utilities. Thanks to
14+
Zishi Wu (@zishiwu123) for this contribution.
15+
- NVIDIA/thrust#1809: Fix some warnings about usage of `__host__` functions in `__device__` code.
16+
- NVIDIA/thrust#1825: Fix Thrust's CMake install rules. Thanks to Robert Maynard (@robertmaynard)
17+
for this contribution.
18+
- NVIDIA/thrust#1827: Fix `thrust::reduce_by_key` when using non-default-initializable iterators.
19+
- NVIDIA/thrust#1832: Fix bug in device-side CDP `thrust::reduce` when using a large number of
20+
inputs.
21+
22+
### Other Enhancements
23+
24+
- NVIDIA/thrust#1815: Update Thrust's libcu++ git submodule to version 1.8.1.
25+
- NVIDIA/thrust#1841: Fix invalid code in execution policy documentation example. Thanks to Raphaël
26+
Frantz (@Eren121) for this contribution.
27+
- NVIDIA/thrust#1848: Improve error messages when attempting to launch a kernel on a device that is
28+
not supported by compiled PTX versions. Thanks to Zahra Khatami (@zkhatami) for this contribution.
29+
- NVIDIA/thrust#1855: Remove usage of deprecated CUDA error codes.
30+
31+
## Thrust 2.0.1
32+
33+
### Other Enhancements
34+
35+
- Disable CDP parallelization of device-side invocations of Thrust algorithms on SM90+. The removal
36+
of device-side synchronization support in recent architectures makes Thrust's fork-join model
37+
unimplementable on device, so a serial implementation will be used instead. Host-side invocations
38+
of Thrust algorithms are not affected.
39+
340
## Thrust 2.0.0
441

542
### Summary
@@ -26,7 +63,7 @@ several minor bugfixes and cleanups.
2663
- `THRUST_INCLUDE_HOST_CODE`: Replace with `NV_IF_TARGET`.
2764
- `THRUST_INCLUDE_DEVICE_CODE`: Replace with `NV_IF_TARGET`.
2865
- `THRUST_DEVICE_CODE`: Replace with `NV_IF_TARGET`.
29-
- NVIDIA/thrust#1661: Thrusts CUDA Runtime support macros have been updated to
66+
- NVIDIA/thrust#1661: Thrust's CUDA Runtime support macros have been updated to
3067
support `NV_IF_TARGET`. They are now defined consistently across all
3168
host/device compilation passes. This should not affect most usages of these
3269
macros, but may require changes for some edge cases.
@@ -59,7 +96,7 @@ several minor bugfixes and cleanups.
5996
- CMake builds that use the Thrust packages via CPM, `add_subdirectory`,
6097
or `find_package` are not affected.
6198
- NVIDIA/thrust#1760: A compile-time error is now emitted when a `__device__`
62-
-only lambdas return type is queried from host code (requires libcu++ ≥
99+
-only lambda's return type is queried from host code (requires libcu++ ≥
63100
1.9.0).
64101
- Due to limitations in the CUDA programming model, the result of this query
65102
is unreliable, and will silently return an incorrect result. This leads to
@@ -83,7 +120,7 @@ several minor bugfixes and cleanups.
83120
to `thrust::make_zip_function`. Thanks to @mfbalin for this contribution.
84121
- NVIDIA/thrust#1722: Remove CUDA-specific error handler from code that may be
85122
executed on non-CUDA backends. Thanks to @dkolsen-pgi for this contribution.
86-
- NVIDIA/thrust#1756: Fix `copy_if` for output iterators that dont support copy
123+
- NVIDIA/thrust#1756: Fix `copy_if` for output iterators that don't support copy
87124
assignment. Thanks for @mfbalin for this contribution.
88125

89126
### Other Enhancements
@@ -157,7 +194,7 @@ numerous bugfixes and stability improvements.
157194

158195
#### New `thrust::cuda::par_nosync` Execution Policy
159196

160-
Most of Thrusts parallel algorithms are fully synchronous and will block the
197+
Most of Thrust's parallel algorithms are fully synchronous and will block the
161198
calling CPU thread until all work is completed. This design avoids many pitfalls
162199
associated with asynchronous GPU programming, resulting in simpler and
163200
less-error prone usage for new CUDA developers. Unfortunately, this improvement
@@ -222,12 +259,12 @@ on the calling GPU thread instead of launching a device-wide kernel.
222259
223260
### Enhancements
224261
225-
- NVIDIA/thrust#1511: Use CUBs new `DeviceMergeSort` API and remove Thrusts
262+
- NVIDIA/thrust#1511: Use CUB's new `DeviceMergeSort` API and remove Thrust's
226263
internal implementation.
227264
- NVIDIA/thrust#1566: Improved performance of `thrust::shuffle`. Thanks to
228265
@djns99 for this contribution.
229266
- NVIDIA/thrust#1584: Support user-defined `CMAKE_INSTALL_INCLUDEDIR` values in
230-
Thrusts CMake install rules. Thanks to @robertmaynard for this contribution.
267+
Thrust's CMake install rules. Thanks to @robertmaynard for this contribution.
231268
232269
### Bug Fixes
233270
@@ -239,7 +276,7 @@ on the calling GPU thread instead of launching a device-wide kernel.
239276
- NVIDIA/thrust#1597: Fix some collisions with the `small` macro defined
240277
in `windows.h`.
241278
- NVIDIA/thrust#1599, NVIDIA/thrust#1603: Fix some issues with version handling
242-
in Thrusts CMake packages.
279+
in Thrust's CMake packages.
243280
- NVIDIA/thrust#1614: Clarify that scan algorithm results are non-deterministic
244281
for pseudo-associative operators (e.g. floating-point addition).
245282
@@ -752,7 +789,7 @@ Starting with the upcoming 1.10.0 release, C++03 support will be dropped
752789
passing a size.
753790
This was necessary to enable usage of Thrust caching MR allocators with
754791
synchronous Thrust algorithms.
755-
This change has allowed NVC++s C++17 Parallel Algorithms implementation to
792+
This change has allowed NVC++'s C++17 Parallel Algorithms implementation to
756793
switch to use Thrust caching MR allocators for device temporary storage,
757794
which gives a 2x speedup on large multi-GPU systems such as V100 and A100
758795
DGX where `cudaMalloc` is very slow.

dependencies/cub

Submodule cub updated 1 file

0 commit comments

Comments
 (0)