Skip to content

[FEA] Expose runtime configurability of default stream behavior #17626

@vyasr

Description

@vyasr

Is your feature request related to a problem? Please describe.
Currently all of libcudf operates on the default stream (stream 0) by default, and on cudaStreamPerThread if compiled with CUDF_USE_PER_THREAD_DEFAULT_STREAM. Some consumers of libcudf who wish to use the per-thread default stream instead for various reasons such as improved performance. Historically, we have supported this by compiling with CUDA_API_PER_THREAD_DEFAULT_STREAM and CUDF_USE_PER_THREAD_DEFAULT_STREAM because compile-time control was the only reasonable way to achieve this, and consumers like spark-rapids leverage this. However, as #13744 comes to a close we will have a fully stream-ordered API that is also completely tested to ensure that streams are being passed through everywhere to ensure that nothing is unintentionally running on the default stream if the user provides one. This fact affords us some additional options when it comes to enabling PTDS behavior.

Describe the solution you'd like
We should modify get_default_stream to support runtime configurability of its behavior to mean PTDS instead of every thread running on the default stream. This could easily be done in a thread-safe manner using a function-local static

rmm::cuda_stream_view const get_default_stream() {
  static const default_stream = []() {
    if(getenv("PTDS")) {
      return rmm::cuda_stream_per_thread;
    } else {
      return rmm::cuda_stream_legacy;
    }
  }();
return default_stream; }

The above uses an environment variable, but we could just as easily expose a public API that would set some configuration that must be called before the first call to get_default_stream. The end result would be that we would entirely control the default stream behavior at runtime without needing to build separate binaries to support PTDS. This would allow us to support various newer higher-level APIs, such as pylibcudf, while still supporting Spark's needs.

Describe alternatives you've considered
We could ship separate binaries compiled for PTDS, or change our default to always build with PTDS. The former has generally been rejected on the grounds of requiring double the resources, though, while the latter was previously attempted but rejected due to PTDS builds of cudf not being safe drop-in replacements for non-PTDS builds because PTDS allows for race conditions that would not be possible with non-PTDS builds.

Metadata

Metadata

Assignees

Labels

feature requestNew feature or requestlibcudfAffects libcudf (C++/CUDA) code.pylibcudfIssues specific to the pylibcudf package

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions