Use pageable pool in HostAlloc when pinned pool is exhausted#14828
Draft
zpuller wants to merge 6 commits into
Draft
Use pageable pool in HostAlloc when pinned pool is exhausted#14828zpuller wants to merge 6 commits into
zpuller wants to merge 6 commits into
Conversation
Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Collaborator
|
NOTE: release/26.06 has been created from main. Please retarget your PR to release/26.06 if it should be included in the release. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #14840
Depends on NVIDIA/spark-rapids-jni#4593
Description
Adds a pageable pool to HostAlloc which is used when an alloc request is made with
preferPinned=trueand when the pinned pool is exhausted. The pageable pool uses a newly added (see the above linked jni PR) pool memory resource type which does regular malloc based host memory and pre-writes to it on alloc to force it to get paged into memory. This alloc happens at init time since it's a pool. Then, the DtoH performance to this type of memory is significantly faster than regular host allocated memory (roughly 80% of pinned DtoH speeds vs. 10%).This unblocks enabling gpu kudo writes by default, which is also contained in this PR.
Checklists
Documentation
Testing
(Please provide the names of the existing tests in the PR description.)
Performance
The power run shows overall 3% improvement, but shows regression in q93 and q50, but running those queries in isolation does not show a regression, so I believe it's a red herring.
power run
isolated runs
Name = query50 Means = 13084.4, 12401.0 Time diff = 683.3999999999996 Speedup = 1.0551084589952422 T-Test (test statistic, p value, df) = 2.453029748373329, 0.03974847145961769, 8.0 T-Test Confidence Interval = 40.9604914520238, 1325.8395085479756 ALERT: significant change has been detected (p-value < 0.05) ALERT: improvement in performance has been observedSpeedup results
query50: Previous (13084.4 ms) vs Current (12401.0 ms) Diff 683 E2E 1.06x
query93: Previous (15595.2 ms) vs Current (14936.0 ms) Diff 659 E2E 1.04x