Skip to content

Conversation

JigaoLuo
Copy link
Contributor

Description

This is an initial draft for Issue #1959 that adds a host memory resource for a pinned bounce buffer into rmm::device_scalar.

It’s an early attempt and still incomplete, particularly in the number of constructors and unit tests. I expect this PR draft will spark some discussion, and I subscribe to the idea that “perfect is the enemy of good.” I hope it's okay to use this draft as a starting point for conversation. If any major changes are needed, I’m happy to revisit or even drop the current approach for a better one.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Signed-off-by: Jigao Luo <[email protected]>
@JigaoLuo JigaoLuo requested a review from a team as a code owner July 12, 2025 11:18
@JigaoLuo JigaoLuo requested review from davidwendt and lamarrr July 12, 2025 11:18
Copy link

copy-pr-bot bot commented Jul 12, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@JigaoLuo JigaoLuo changed the title [:construction: Draft] : Adding host-mr for pinned bounce buffer to rmm::device_scalar [ 🚧 Draft] : Adding host-mr for pinned bounce buffer to rmm::device_scalar Jul 12, 2025
@JigaoLuo JigaoLuo changed the title [ 🚧 Draft] : Adding host-mr for pinned bounce buffer to rmm::device_scalar [:construction: Draft] : Adding host-mr for pinned bounce buffer to rmm::device_scalar Jul 12, 2025
@JigaoLuo JigaoLuo changed the title [:construction: Draft] : Adding host-mr for pinned bounce buffer to rmm::device_scalar [ 🚧 Draft] : Adding host-mr for pinned bounce buffer to rmm::device_scalar Jul 12, 2025
@JigaoLuo JigaoLuo marked this pull request as draft July 12, 2025 11:20
}
}

// Disallow passing literals to set_value to avoid race conditions where the memory holding the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can do this too now? copy to bounce buffer immediately and then do the copy async

Copy link
Contributor Author

@JigaoLuo JigaoLuo Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @devavret , thanks!
Sorry. I wasn’t sure I understood you correctly.
These 3 lines refer to the else case, where no host-pinned bounce buffer is allocated. That’s actually the situation we’re currently only having in RMM.
Do you mean we just allocate a buffer for every function call of this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @devavret is referring to the deleted function below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it now. Let’s first discuss where the buffer should be placed, as mentioned above. Once that’s settled, I can address this detailed point.

return **_host_bounce_buffer;
} else {
// Case: Copying with pageable host memory — may trigger an implicit synchronization.
return _storage.front_element(stream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder if the bounce buffer copying support should be in device_buffer instead

Copy link
Contributor Author

@JigaoLuo JigaoLuo Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @harrism, thanks! Yes, that’s also my main concern. It’s why I didn’t prepare a well-considered pull request but only a draft here, as there’s a chance it’ll need to be rewritten again, just like I already did in cuDF.

We can discuss where the buffer should live. cuDF places the buffer in its scalar header, and that’s the behavior I’d like to mimic.
Alternatively, storing it in the RMM devicevector header is also possible, since the element call results in a copy as well. The twin issue is here: #1955

}
}

// Disallow passing literals to set_value to avoid race conditions where the memory holding the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @devavret is referring to the deleted function below.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice starting point. I think I agree with the suggestion that this should be implemented in the device buffer, and then exposed in the device scalar. Small vectors and other containers based on the device buffer would benefit from this, too.

@github-actions github-actions bot added the CMake label Jul 18, 2025
@JigaoLuo
Copy link
Contributor Author

JigaoLuo commented Jul 18, 2025

This is a nice starting point. I think I agree with the suggestion that this should be implemented in the device buffer, and then exposed in the device scalar. Small vectors and other containers based on the device buffer would benefit from this, too.

Hi @bdice
Thanks for the code review! I’m happy to rewrite it again if needed. One follow-up question: should the device buffer follow cuDF’s design, where both the host and device buffers have the same size?

@JigaoLuo JigaoLuo force-pushed the adding-buffer-device_scalar branch from 9e10849 to ad09350 Compare July 19, 2025 06:59
@github-actions github-actions bot removed the CMake label Jul 19, 2025
@JigaoLuo
Copy link
Contributor Author

JigaoLuo commented Jul 23, 2025

Hi all reviewers,

I’d like to close this draft and move to a new one #1996, where I’ll be rewriting device_buffer. If you think this draft should be reopened, feel free to do so or just ping me. Thanks—and let’s continue the discussion in the new PR draft!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants