Thrust 1.16.0 #1616
Replies: 1 comment 2 replies
-
Great pre-release, thank you! Is there a way to pass a stream to |
Beta Was this translation helpful? Give feedback.
-
Great pre-release, thank you! Is there a way to pass a stream to |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Thrust 1.16.0 provides a new “nosync” hint for the CUDA backend, as well as numerous bugfixes and stability improvements.
New
thrust::cuda::par_nosyncExecution PolicyMost of Thrust’s parallel algorithms are fully synchronous and will block the calling CPU thread until all work is completed. This design avoids many pitfalls associated with asynchronous GPU programming, resulting in simpler and less-error prone usage for new CUDA developers. Unfortunately, this improvement in user experience comes at a performance cost that often frustrates more experienced CUDA programmers.
Prior to this release, the only synchronous-to-asynchronous migration path for existing Thrust codebases involved significant refactoring, replacing calls to
thrustalgorithms with a limited set offuture-basedthrust::asyncalgorithms or lower-level CUB kernels. The newthrust::cuda::par_nosyncexecution policy provides a new, less-invasive entry point for asynchronous computation.par_nosyncis a hint to the Thrust execution engine that any non-essential internal synchronizations should be skipped and that an explicit synchronization will be performed by the caller before accessing results.While some Thrust algorithms require internal synchronization to safely compute their results, many do not. For example, multiple
thrust::for_eachinvocations can be launched without waiting for earlier calls to complete:Thanks to @fkallen for this contribution.
Deprecation Notices
CUDA Dynamic Parallelism Support
A future version of Thrust will remove support for CUDA Dynamic Parallelism (CDP).
This will only affect calls to Thrust algorithms made from CUDA device-side code that currently launches a kernel; such calls will instead execute sequentially on the calling GPU thread instead of launching a device-wide kernel.
Breaking Changes
cubnamespace tothrust::cub. This has caused issues with ambiguous namespaces for projects that declareusing namespace thrust;from the global namespace. We recommend against this practice.New Features
thrust::cuda::par_nosync#1568: Addthrust::cuda::par_nosyncpolicy. Thanks to @fkallen for this contribution.Enhancements
DeviceMergeSortAPI and remove Thrust’s internal implementation.thrust::shuffle. Thanks to @djns99 for this contribution.CMAKE_INSTALL_INCLUDEDIRvalues in Thrust’s CMake install rules. Thanks to @robertmaynard for this contribution.Bug Fixes
iccbuilds.min/maxmacros defined inwindows.h.nvc++.smallmacro defined inwindows.h.This discussion was created from the release Thrust 1.16.0.
Beta Was this translation helpful? Give feedback.
All reactions