Add standalone C++/HIP examples investigating hipMemAllocationTypeUncached for P2P atomics by Copilot · Pull Request #419 · ROCm/iris

Copilot · 2026-03-04T02:15:59Z

hipMemAllocationTypeUncached (0x40000000) is accepted by hipMemCreate but CLR still hardcodes the coarse-grained GPU pool, so P2P atomics crash identically to hipMemAllocationTypePinned. These examples make that reproducible and debuggable on any machine.

iris/hip.py

Add hipMemAllocationTypeUncached = 0x40000000 constant
Add alloc_type parameter to mem_create() (default: hipMemAllocationTypePinned, backward-compatible)
Thread alloc_type through _HIPVMemP2P and _run_p2p_atomics_hip

tests/unittests/test_hsa_vmem_reproducible.py

test_hip_vmem_uncached_alloc_type: probes hipMemCreate with uncached type; skips if driver rejects it, otherwise tests local allocation + single-rank atomic only. Cross-rank access not tested — empirically it still causes SIGSEGV (same as pinned), confirming CLR ignores the type.
Replace magic tolerances 0.01 / 0.5 with named constants _ATOMIC_EXACT_TOL / _ATOMIC_COUNT_TOL

csrc/standalone/ (new)

Two minimal, self-contained C++ programs — no MPI, no torchrun. Each spawns two ranks via fork + execl("/proc/self/exe") (self-exec avoids HSA internal thread corruption from plain fork), coordinating over a socketpair with SCM_RIGHTS FD passing.

p2p_atomics_hsa.cpp — Path 3 (HSA fine-grained, correct)

// Allocate from the fine-grained pool directly — CLR never reaches this path
hsa_amd_vmem_handle_create(fine_grained_pool, size, MEMORY_TYPE_NONE, 0, &handle);
hsa_amd_vmem_export_shareable_handle(&dmabuf_fd, handle, 0);
// ... SCM_RIGHTS exchange ...
hsa_amd_vmem_import_shareable_handle(peer_fd, &peer_handle);
// P2P sys-scope atomics → 0/N failures

p2p_atomics_hip.cpp — Path 2 (HIP VMem, always coarse-grained)

./p2p_atomics_hip                # safe: P2P non-atomic read only
./p2p_atomics_hip --uncached     # same setup with hipMemAllocationTypeUncached
./p2p_atomics_hip --atomics      # WARNING: GPU page fault (SIGSEGV) — demonstrates the bug

CLR routes both --pinned and --uncached through the coarse-grained pool (CoarseGrain=1 in KFD). P2P atomics crash identically for both.

Implementation notes:

hipMemset/hipMemcpy(D2H) silently fail on HSA VMem VAs (not registered in HIP's pointer tables); init and readback use k_zero / k_copy kernels with a hipMalloc bounce buffer
k_copy calls __threadfence_system() before the read for cross-GPU write visibility
Agent-scope atomic results are informational only; sys-scope drives PASS/FAIL

Original prompt

This section details on the original issue you should resolve

<issue_title>VMem allocator and atomics</issue_title>
<issue_description>From previous tests, only the runner machine had issues running some atomics (all reduce for example) when using the vmem allocator. Could be a bad initialization flag. I am not able to reproduce the bug on other GPUs. Please use the vmem allocator, run all unittests and figure out a fix. For reference, consider shallow cloning https://github.com/ROCm/rocm-systems to understand how the low level apis. Your outcomes should be 2 things: 1: minimal repro that doesn't include any iris code that show the bug, 2: ideally a bug fix patch (I suspect the code had some wrong flag when allocating memory or something similar)</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes VMem allocator and atomics #418

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…with malloc_fine_grained The root cause of intermittent P2P atomic failures with VMemAllocator was that hipMemCreate creates coarse-grained GPU memory, which does not support correct cross-GPU atomic operations for scope=cta or scope=gpu. Changes: - iris/allocators/vmem_allocator.py: Complete rewrite to use malloc_fine_grained (hipExtMallocWithFlags with hipDeviceMallocFinegrained) for physical memory. Add establish_peer_access() using import_dmabuf_handle (hipImportExternalMemory) which preserves fine-grained memory type for cross-GPU atomic correctness. Remove get_allocation_segments() (no longer needed for single allocation model). Change import_external_tensor() to copy semantics (same as TorchAllocator). - iris/symmetric_heap.py: Update as_symmetric() docstring to reflect copy semantics for both torch and vmem allocators. - tests/unittests/test_atomic_add_triton.py: Add test_atomic_add_vmem() covering all scope/sem combinations for VMemAllocator. - tests/unittests/test_vmem_allocator.py: Update test_vmem_import_external_tensor to reflect copy semantics. - tests/unittests/test_pytorch_import_mechanism.py: Update test_as_symmetric_basic to reflect copy semantics (no shared memory with external tensor). Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

…r_heap_base Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

… probe test Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

…s HIP coarse-grained) Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Initial plan

f5f4518

Copilot AI assigned Copilot and mawad-amd Mar 4, 2026

Copilot started work on behalf of mawad-amd March 4, 2026 02:16 View session

Copilot stopped work on behalf of mawad-amd due to an error March 4, 2026 03:22
Copilot has encountered an internal error. If the problem persists, please contact GitHub Support, including the request ID `D7B8:FA51:11A8205:1442095:69A7A58C`. To retry, leave a comment on this pull...

Copilot started work on behalf of mawad-amd March 4, 2026 04:19 View session

Copilot stopped work on behalf of mawad-amd due to an error March 4, 2026 05:23
Copilot has encountered an internal error. If the problem persists, please contact GitHub Support, including the request ID `345B:30BAD7:18BF256:1C30B2C:69A7C1BB`. To retry, leave a comment on this pu...

Copilot started work on behalf of mawad-amd March 4, 2026 08:14 View session

Copilot AI and others added 2 commits March 4, 2026 08:53

Address code review: use Any type annotation, rename peer_heap to pee…

c20f920

…r_heap_base Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix vmem allocator and atomics issue in runner machine~~ Fix VMemAllocator P2P atomic failures by switching from coarse-grained to fine-grained memory Mar 4, 2026

Copilot finished work on behalf of mawad-amd March 4, 2026 08:58

Copilot started work on behalf of mawad-amd March 4, 2026 13:04 View session

Add HSA VMem Path 3: APIs, minimal P2P repro, and HsaVMemAllocator

c5581cd

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot AI changed the title ~~Fix VMemAllocator P2P atomic failures by switching from coarse-grained to fine-grained memory~~ Add HSA VMem (Path 3): direct fine-grained VMem APIs, minimal P2P repro, and HsaVMemAllocator Mar 4, 2026

Copilot finished work on behalf of mawad-amd March 4, 2026 13:45

Copilot started work on behalf of mawad-amd March 4, 2026 14:12 View session

Confirm HSA VMem P2P atomic correctness vs HIP VMem bug at repro level

31e7633

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot AI changed the title ~~Add HSA VMem (Path 3): direct fine-grained VMem APIs, minimal P2P repro, and HsaVMemAllocator~~ Add HSA VMem (Path 3) as fine-grained alternative to broken HIP VMem for P2P atomics Mar 4, 2026

Copilot finished work on behalf of mawad-amd March 4, 2026 14:34

Copilot started work on behalf of mawad-amd March 4, 2026 14:38 View session

Try hipMemAllocationTypeUncached: add constant, alloc_type param, and…

cde093a

… probe test Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot AI changed the title ~~Add HSA VMem (Path 3) as fine-grained alternative to broken HIP VMem for P2P atomics~~ Confirm HSA VMem (Path 3) fixes P2P atomic coherency; probe hipMemAllocationTypeUncached Mar 4, 2026

Copilot finished work on behalf of mawad-amd March 4, 2026 14:51

Copilot started work on behalf of mawad-amd March 4, 2026 15:01 View session

Add standalone C++/HIP examples: P2P VMem atomics (HSA fine-grained v…

79d8680

…s HIP coarse-grained) Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot AI changed the title ~~Confirm HSA VMem (Path 3) fixes P2P atomic coherency; probe hipMemAllocationTypeUncached~~ Add standalone C++/HIP examples investigating hipMemAllocationTypeUncached for P2P atomics Mar 4, 2026

Copilot finished work on behalf of mawad-amd March 4, 2026 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add standalone C++/HIP examples investigating hipMemAllocationTypeUncached for P2P atomics#419

Add standalone C++/HIP examples investigating hipMemAllocationTypeUncached for P2P atomics#419
Copilot wants to merge 7 commits intomainfrom
copilot/fix-vmem-allocator-atomics

Copilot AI commented Mar 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

iris/hip.py

tests/unittests/test_hsa_vmem_reproducible.py

csrc/standalone/ (new)

Comments on the Issue (you are @copilot in this section)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 4, 2026 •

edited

Loading