Skip to content

Use pageable pool in HostAlloc when pinned pool is exhausted#14828

Draft
zpuller wants to merge 6 commits into
NVIDIA:mainfrom
zpuller:pageable_pool_sr
Draft

Use pageable pool in HostAlloc when pinned pool is exhausted#14828
zpuller wants to merge 6 commits into
NVIDIA:mainfrom
zpuller:pageable_pool_sr

Conversation

@zpuller
Copy link
Copy Markdown
Collaborator

@zpuller zpuller commented May 19, 2026

Fixes #14840

Depends on NVIDIA/spark-rapids-jni#4593

Description

Adds a pageable pool to HostAlloc which is used when an alloc request is made with preferPinned=true and when the pinned pool is exhausted. The pageable pool uses a newly added (see the above linked jni PR) pool memory resource type which does regular malloc based host memory and pre-writes to it on alloc to force it to get paged into memory. This alloc happens at init time since it's a pool. Then, the DtoH performance to this type of memory is significantly faster than regular host allocated memory (roughly 80% of pinned DtoH speeds vs. 10%).

This unblocks enabling gpu kudo writes by default, which is also contained in this PR.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Please provide the names of the existing tests in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

The power run shows overall 3% improvement, but shows regression in q93 and q50, but running those queries in isolation does not show a regression, so I believe it's a red herring.

power run
query1: Previous (1475.2 ms) vs Current (1152.6 ms) Diff 322 E2E 1.28x
query2: Previous (1230.6 ms) vs Current (1212.0 ms) Diff 18 E2E 1.02x
query3: Previous (371.4 ms) vs Current (360.8 ms) Diff 10 E2E 1.03x
query4: Previous (3621.6 ms) vs Current (3651.4 ms) Diff -29 E2E 0.99x
query5: Previous (2102.2 ms) vs Current (2075.8 ms) Diff 26 E2E 1.01x
query6: Previous (722.4 ms) vs Current (645.8 ms) Diff 76 E2E 1.12x
query7: Previous (2707.2 ms) vs Current (2402.8 ms) Diff 304 E2E 1.13x
query8: Previous (756.6 ms) vs Current (704.0 ms) Diff 52 E2E 1.07x
query9: Previous (1806.8 ms) vs Current (2506.0 ms) Diff -699 E2E 0.72x
query10: Previous (1279.4 ms) vs Current (1029.6 ms) Diff 249 E2E 1.24x
query11: Previous (2204.8 ms) vs Current (2224.8 ms) Diff -20 E2E 0.99x
query12: Previous (506.2 ms) vs Current (426.4 ms) Diff 79 E2E 1.19x
query13: Previous (1428.6 ms) vs Current (1420.2 ms) Diff 8 E2E 1.01x
query14_part1: Previous (4690.4 ms) vs Current (4371.8 ms) Diff 318 E2E 1.07x
query14_part2: Previous (4243.4 ms) vs Current (3905.0 ms) Diff 338 E2E 1.09x
query15: Previous (934.0 ms) vs Current (811.8 ms) Diff 122 E2E 1.15x
query16: Previous (1405.0 ms) vs Current (1517.8 ms) Diff -112 E2E 0.93x
query17: Previous (1362.0 ms) vs Current (1301.8 ms) Diff 60 E2E 1.05x
query18: Previous (1866.6 ms) vs Current (1491.4 ms) Diff 375 E2E 1.25x
query19: Previous (957.6 ms) vs Current (912.8 ms) Diff 44 E2E 1.05x
query20: Previous (490.0 ms) vs Current (432.6 ms) Diff 57 E2E 1.13x
query21: Previous (489.6 ms) vs Current (451.2 ms) Diff 38 E2E 1.09x
query22: Previous (935.8 ms) vs Current (921.0 ms) Diff 14 E2E 1.02x
query23_part1: Previous (5199.8 ms) vs Current (5052.2 ms) Diff 147 E2E 1.03x
query23_part2: Previous (5721.6 ms) vs Current (5614.4 ms) Diff 107 E2E 1.02x
query24_part1: Previous (5770.0 ms) vs Current (5906.6 ms) Diff -136 E2E 0.98x
query24_part2: Previous (5853.2 ms) vs Current (5570.2 ms) Diff 283 E2E 1.05x
query25: Previous (1346.0 ms) vs Current (1284.0 ms) Diff 62 E2E 1.05x
query26: Previous (651.8 ms) vs Current (589.6 ms) Diff 62 E2E 1.11x
query27: Previous (898.2 ms) vs Current (767.0 ms) Diff 131 E2E 1.17x
query28: Previous (3884.6 ms) vs Current (4075.4 ms) Diff -190 E2E 0.95x
query29: Previous (2613.0 ms) vs Current (2494.4 ms) Diff 118 E2E 1.05x
query30: Previous (1469.0 ms) vs Current (1384.2 ms) Diff 84 E2E 1.06x
query31: Previous (1386.0 ms) vs Current (1365.6 ms) Diff 20 E2E 1.01x
query32: Previous (1052.2 ms) vs Current (1213.2 ms) Diff -161 E2E 0.87x
query33: Previous (870.2 ms) vs Current (827.4 ms) Diff 42 E2E 1.05x
query34: Previous (1659.8 ms) vs Current (1564.0 ms) Diff 95 E2E 1.06x
query35: Previous (1301.8 ms) vs Current (1198.8 ms) Diff 103 E2E 1.09x
query36: Previous (1263.0 ms) vs Current (989.0 ms) Diff 274 E2E 1.28x
query37: Previous (458.6 ms) vs Current (444.0 ms) Diff 14 E2E 1.03x
query38: Previous (1545.6 ms) vs Current (1453.6 ms) Diff 92 E2E 1.06x
query39_part1: Previous (1480.2 ms) vs Current (1427.8 ms) Diff 52 E2E 1.04x
query39_part2: Previous (1177.6 ms) vs Current (1021.8 ms) Diff 155 E2E 1.15x
query40: Previous (1030.6 ms) vs Current (951.4 ms) Diff 79 E2E 1.08x
query41: Previous (259.2 ms) vs Current (252.2 ms) Diff 7 E2E 1.03x
query42: Previous (288.6 ms) vs Current (268.0 ms) Diff 20 E2E 1.08x
query43: Previous (684.2 ms) vs Current (622.2 ms) Diff 62 E2E 1.10x
query44: Previous (739.2 ms) vs Current (645.2 ms) Diff 94 E2E 1.15x
query45: Previous (1067.6 ms) vs Current (904.6 ms) Diff 162 E2E 1.18x
query46: Previous (1093.2 ms) vs Current (1013.2 ms) Diff 80 E2E 1.08x
query47: Previous (1458.2 ms) vs Current (1333.4 ms) Diff 124 E2E 1.09x
query48: Previous (936.8 ms) vs Current (719.2 ms) Diff 217 E2E 1.30x
query49: Previous (1468.4 ms) vs Current (1647.0 ms) Diff -178 E2E 0.89x
query50: Previous (7432.0 ms) vs Current (8175.0 ms) Diff -743 E2E 0.91x
query51: Previous (1443.8 ms) vs Current (1379.0 ms) Diff 64 E2E 1.05x
query52: Previous (404.6 ms) vs Current (373.4 ms) Diff 31 E2E 1.08x
query53: Previous (509.8 ms) vs Current (503.6 ms) Diff 6 E2E 1.01x
query54: Previous (1169.0 ms) vs Current (1076.0 ms) Diff 93 E2E 1.09x
query55: Previous (330.6 ms) vs Current (312.0 ms) Diff 18 E2E 1.06x
query56: Previous (700.2 ms) vs Current (646.6 ms) Diff 53 E2E 1.08x
query57: Previous (1166.4 ms) vs Current (973.4 ms) Diff 193 E2E 1.20x
query58: Previous (798.6 ms) vs Current (627.2 ms) Diff 171 E2E 1.27x
query59: Previous (1434.8 ms) vs Current (1373.4 ms) Diff 61 E2E 1.04x
query60: Previous (911.6 ms) vs Current (848.8 ms) Diff 62 E2E 1.07x
query61: Previous (1153.4 ms) vs Current (959.0 ms) Diff 194 E2E 1.20x
query62: Previous (947.2 ms) vs Current (951.8 ms) Diff -4 E2E 1.00x
query63: Previous (727.8 ms) vs Current (646.6 ms) Diff 81 E2E 1.13x
query64: Previous (8177.8 ms) vs Current (7678.2 ms) Diff 499 E2E 1.07x
query65: Previous (2704.2 ms) vs Current (2706.4 ms) Diff -2 E2E 1.00x
query66: Previous (2260.0 ms) vs Current (2028.2 ms) Diff 231 E2E 1.11x
query67: Previous (8598.2 ms) vs Current (8933.2 ms) Diff -335 E2E 0.96x
query68: Previous (1052.4 ms) vs Current (965.6 ms) Diff 86 E2E 1.09x
query69: Previous (1047.6 ms) vs Current (891.6 ms) Diff 155 E2E 1.17x
query70: Previous (1253.6 ms) vs Current (1223.6 ms) Diff 30 E2E 1.02x
query71: Previous (2837.0 ms) vs Current (2773.8 ms) Diff 63 E2E 1.02x
query72: Previous (1990.0 ms) vs Current (1985.6 ms) Diff 4 E2E 1.00x
query73: Previous (830.8 ms) vs Current (825.4 ms) Diff 5 E2E 1.01x
query74: Previous (1767.8 ms) vs Current (1751.0 ms) Diff 16 E2E 1.01x
query75: Previous (5636.4 ms) vs Current (5347.6 ms) Diff 288 E2E 1.05x
query76: Previous (1506.8 ms) vs Current (1562.8 ms) Diff -56 E2E 0.96x
query77: Previous (879.4 ms) vs Current (759.2 ms) Diff 120 E2E 1.16x
query78: Previous (7633.0 ms) vs Current (7434.4 ms) Diff 198 E2E 1.03x
query79: Previous (865.8 ms) vs Current (739.2 ms) Diff 126 E2E 1.17x
query80: Previous (3545.6 ms) vs Current (3405.8 ms) Diff 139 E2E 1.04x
query81: Previous (1765.2 ms) vs Current (1651.0 ms) Diff 114 E2E 1.07x
query82: Previous (609.0 ms) vs Current (615.6 ms) Diff -6 E2E 0.99x
query83: Previous (571.4 ms) vs Current (520.4 ms) Diff 51 E2E 1.10x
query84: Previous (632.6 ms) vs Current (820.2 ms) Diff -187 E2E 0.77x
query85: Previous (1253.4 ms) vs Current (1230.2 ms) Diff 23 E2E 1.02x
query86: Previous (1058.6 ms) vs Current (853.0 ms) Diff 205 E2E 1.24x
query87: Previous (1505.8 ms) vs Current (1408.4 ms) Diff 97 E2E 1.07x
query88: Previous (3004.8 ms) vs Current (2959.0 ms) Diff 45 E2E 1.02x
query89: Previous (823.8 ms) vs Current (785.0 ms) Diff 38 E2E 1.05x
query90: Previous (623.6 ms) vs Current (648.2 ms) Diff -24 E2E 0.96x
query91: Previous (1007.8 ms) vs Current (798.2 ms) Diff 209 E2E 1.26x
query92: Previous (478.2 ms) vs Current (466.8 ms) Diff 11 E2E 1.02x
query93: Previous (9983.8 ms) vs Current (12322.6 ms) Diff -2338 E2E 0.81x
--------------------------------------------------------------------
Name = query50
Means = 7432.0, 8175.0
Time diff = -743.0
Speedup = 0.9091131498470948
T-Test (test statistic, p value, df) = -4.056845888499365, 0.0036493721317598132, 8.0
T-Test Confidence Interval = -1165.338220269558, -320.6617797304421
ALERT: significant change has been detected (p-value < 0.05)
ALERT: regression in performance has been observed
--------------------------------------------------------------------
Name = query93
Means = 9983.8, 12322.6
Time diff = -2338.800000000001
Speedup = 0.8102023923522632
T-Test (test statistic, p value, df) = -3.684714382976494, 0.006177048192725054, 8.0
T-Test Confidence Interval = -3802.490780575194, -875.1092194248081
ALERT: significant change has been detected (p-value < 0.05)
ALERT: regression in performance has been observed
query94: Previous (2326.6 ms) vs Current (2144.4 ms) Diff 182 E2E 1.08x
query95: Previous (3772.6 ms) vs Current (3760.6 ms) Diff 12 E2E 1.00x
query96: Previous (7207.6 ms) vs Current (5031.4 ms) Diff 2176 E2E 1.43x
--------------------------------------------------------------------
Name = query96
Means = 7207.6, 5031.4
Time diff = 2176.2000000000007
Speedup = 1.4325237508446955
T-Test (test statistic, p value, df) = 12.795660914170304, 1.3125065749762358e-06, 8.0
T-Test Confidence Interval = 1784.010316896265, 2568.3896831037364
ALERT: significant change has been detected (p-value < 0.05)
ALERT: improvement in performance has been observed
query97: Previous (1879.0 ms) vs Current (1780.8 ms) Diff 98 E2E 1.06x
query98: Previous (1080.0 ms) vs Current (1072.0 ms) Diff 8 E2E 1.01x
query99: Previous (1292.4 ms) vs Current (1253.4 ms) Diff 39 E2E 1.03x
benchmark: Previous (205600.0 ms) vs Current (199400.0 ms) Diff 6200 E2E 1.03x
--------------------------------------------------------------------
Name = benchmark
Means = 205600.0, 199400.0
Time diff = 6200.0
Speedup = 1.0310932798395185
T-Test (test statistic, p value, df) = 3.3046111035119616, 0.010784464814920392, 8.0
T-Test Confidence Interval = 1873.552744808153, 10526.447255191848
ALERT: significant change has been detected (p-value < 0.05)
ALERT: improvement in performance has been observed
isolated runs Name = query50 Means = 13084.4, 12401.0 Time diff = 683.3999999999996 Speedup = 1.0551084589952422 T-Test (test statistic, p value, df) = 2.453029748373329, 0.03974847145961769, 8.0 T-Test Confidence Interval = 40.9604914520238, 1325.8395085479756 ALERT: significant change has been detected (p-value < 0.05) ALERT: improvement in performance has been observed

Speedup results

query50: Previous (13084.4 ms) vs Current (12401.0 ms) Diff 683 E2E 1.06x

query93: Previous (15595.2 ms) vs Current (14936.0 ms) Diff 659 E2E 1.04x

abellina and others added 6 commits May 1, 2026 16:01
Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
@nvauto
Copy link
Copy Markdown
Collaborator

nvauto commented May 25, 2026

NOTE: release/26.06 has been created from main. Please retarget your PR to release/26.06 if it should be included in the release.

@sameerz sameerz added the task Work required that improves the product but is not user facing label May 26, 2026
@sameerz sameerz requested a review from revans2 May 26, 2026 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

task Work required that improves the product but is not user facing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Enable gpu kudo writes by default

4 participants