[FEA] Expose runtime configurability of default stream behavior

**Is your feature request related to a problem? Please describe.**
[Currently all of libcudf operates on the default stream (stream 0) by default, and on cudaStreamPerThread if compiled with `CUDF_USE_PER_THREAD_DEFAULT_STREAM`](https://github.com/rapidsai/cudf/blob/branch-25.02/cpp/src/utilities/default_stream.cpp#L23). Some consumers of libcudf who wish to use the [per-thread default stream](https://docs.nvidia.com/cuda/cuda-runtime-api/stream-sync-behavior.html#stream-sync-behavior) instead for various reasons such as improved performance. Historically, we have supported this by compiling with `CUDA_API_PER_THREAD_DEFAULT_STREAM` and `CUDF_USE_PER_THREAD_DEFAULT_STREAM` because compile-time control was the only reasonable way to achieve this, and consumers like spark-rapids leverage this. However, as https://github.com/rapidsai/cudf/issues/13744 comes to a close we will have a fully stream-ordered API that is also completely tested to ensure that streams are being passed through everywhere to ensure that nothing is unintentionally running on the default stream if the user provides one. This fact affords us some additional options when it comes to enabling PTDS behavior.

**Describe the solution you'd like**
We should modify `get_default_stream` to support runtime configurability of its behavior to mean PTDS instead of every thread running on the default stream. This could easily be done in a thread-safe manner using a function-local static 
```c++
rmm::cuda_stream_view const get_default_stream() {
  static const default_stream = []() {
    if(getenv("PTDS")) {
      return rmm::cuda_stream_per_thread;
    } else {
      return rmm::cuda_stream_legacy;
    }
  }();
return default_stream; }
```

The above uses an environment variable, but we could just as easily expose a public API that would set some configuration that must be called before the first call to `get_default_stream`. The end result would be that we would entirely control the default stream behavior at runtime without needing to build separate binaries to support PTDS. This would allow us to support various newer higher-level APIs, such as pylibcudf, while still supporting Spark's needs.

**Describe alternatives you've considered**
We could ship separate binaries compiled for PTDS, or change our default to always build with PTDS. The former has generally been rejected on the grounds of requiring double the resources, though, while [the latter was previously attempted but rejected due to PTDS builds of cudf not being safe drop-in replacements for non-PTDS builds](https://github.com/rapidsai/cudf/pull/11281) because PTDS allows for race conditions that would not be possible with non-PTDS builds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] Expose runtime configurability of default stream behavior #17626

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Expose runtime configurability of default stream behavior #17626

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions