Add RDMA proxy thread backend by mawad-amd · Pull Request #275 · ROCm/iris

mawad-amd · 2025-10-31T02:34:10Z

Motivation

Add RDMA + Proxy thread backend

Technical Details

CPU-GPU queue
Device-side building of CPU-GPU packet
Proxy thread talking to the NIC

Unclear yet how to merge this backend into Iris RMA backend but would like a single backend for both.

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull Request Overview

This PR adds experimental InfiniBand RDMA (Remote Direct Memory Access) support to Iris for multi-node GPU communication. The implementation provides a symmetric heap model with RDMA operations (put/get/atomics) accessible from Triton kernels, using PyTorch distributed for bootstrapping and InfiniBand for high-performance inter-node communication.

Key changes:

RDMA backend with InfiniBand support (optional build via CMake)
CPU-GPU work queue for asynchronous RDMA operations
Triton device APIs for RDMA put/get/atomic operations with symmetric heap addressing

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
setup.py	Adds CMake build system for optional RDMA C++ extension with InfiniBand detection
iris/experimental/iris_rdma.py	Main Python API providing RDMA context, symmetric heap, and Triton device APIs
iris/experimental/iris_rdma/python/bindings.cpp	PyBind11 bindings exposing C++ RDMA backend to Python
iris/experimental/iris_rdma/src/*.hpp	C++ implementation: network backend, queue pairs, work queue, proxy thread, logging
iris/experimental/init.py	Exports iris_rdma module with optional import handling
examples/22-24_rdma_*	Example programs demonstrating producer-consumer, GET, and atomic operations
docker/*	Updated Dockerfile and scripts with InfiniBand device support
run.sh, rebuild.sh	Helper scripts for running and rebuilding

Comments suppressed due to low confidence (1)

iris/experimental/iris_rdma/src/iris_manager.hpp:1

Corrected spelling of 'its' to 'it's' in comment.

// SPDX-License-Identifier: MIT

Copilot · 2025-11-07T23:28:53Z

iris/experimental/iris_rdma/src/queue_pair.hpp

+  void dump_cq_info() const {
+    LOG_DEBUG("cq: %p", cq_);
+    LOG_DEBUG("handle: %u", cq_->channel);
+    LOG_DEBUG("cq_context: %p", cq_->cq_context);
+    LOG_DEBUG("context: %p", cq_->context);
+    LOG_DEBUG("cqe: %u", cq_->cqe);
+    LOG_DEBUG("comp_events_completed: %u", cq_->comp_events_completed);
+    LOG_DEBUG("async_events_completed: %u", cq_->async_events_completed);
+
+  }


The dump_cq_info() function appears to be a debugging utility that is called in production code (see network_backend.hpp line 489). Consider removing this call from the hot path (poll_cq) or guarding it behind a debug flag to avoid performance overhead in production.

Copilot · 2025-11-07T23:28:54Z

iris/experimental/iris_rdma/src/network_backend.hpp

+    int sq_length = 64;  // Send queue length // TODO: FIX THAT
+


The TODO comment "FIX THAT" is vague and doesn't explain what needs to be fixed. Consider clarifying what specific issue needs to be addressed (e.g., "TODO: Make queue length configurable" or "TODO: Calculate optimal queue length based on workload").

Suggested change

int sq_length = 64; // Send queue length // TODO: FIX THAT

int sq_length = 64; // Send queue length

// TODO: Make send queue length (sq_length) configurable or calculate based on workload/device capabilities

Copilot · 2025-11-07T23:28:54Z

iris/experimental/iris_rdma/src/iris_manager.hpp

+            std::this_thread::sleep_for(std::chrono::microseconds(10));
+          }
+          if (n <= 0) {
+            LOG_DEBUG("Warning: PUT completion not polled (may be OK if async)");


Using LOG_DEBUG for a warning message is inconsistent. Consider using LOG_WARN for warning messages to maintain proper log level semantics.

Suggested change

LOG_DEBUG("Warning: PUT completion not polled (may be OK if async)");

LOG_WARN("Warning: PUT completion not polled (may be OK if async)");

Copilot · 2025-11-07T23:28:54Z

iris/experimental/iris_rdma/src/iris_manager.hpp

+            std::this_thread::sleep_for(std::chrono::microseconds(10));
+          }
+          if (n <= 0) {
+            LOG_DEBUG("Warning: GET completion not polled (may be OK if async)");


Similar to Comment 4, these warning messages use LOG_DEBUG instead of LOG_WARN. For consistency and proper log level semantics, warnings should use LOG_WARN.

Suggested change

LOG_DEBUG("Warning: GET completion not polled (may be OK if async)");

LOG_WARN("Warning: GET completion not polled (may be OK if async)");

Copilot · 2025-11-07T23:28:55Z

iris/experimental/iris_rdma/src/iris_manager.hpp

+            std::this_thread::sleep_for(std::chrono::microseconds(10));
+          }
+          if (n <= 0) {
+            LOG_DEBUG("Warning: ATOMIC_EXCH completion not polled (may be OK if async)");


Similar to Comment 4, these warning messages use LOG_DEBUG instead of LOG_WARN. For consistency and proper log level semantics, warnings should use LOG_WARN.

Copilot · 2025-11-07T23:28:56Z

iris/experimental/iris_rdma.py

+    # Extract source address (min of pointer block where data is stored)
+    src_ptr_u64 = src_ptr.to(tl.uint64)
+    src_ptr_val = tl.min(src_ptr_u64, axis=0)
+    max_src_ptr = tl.max(src_ptr_u64, axis=0)


Variable max_src_ptr is not used.

Suggested change

max_src_ptr = tl.max(src_ptr_u64, axis=0)

Copilot · 2025-11-07T23:28:56Z

setup.py

+
+    def build_extension(self, ext):
+        if not isinstance(ext, CMakeExtension):
+            return super().build_extension(ext)


Mixing implicit and explicit returns may indicate an error, as implicit returns always return None.

Suggested change

return super().build_extension(ext)

super().build_extension(ext)

Copilot · 2025-11-07T23:28:57Z

iris/experimental/iris_rdma.py

+import triton
+import triton.language as tl
+import numpy as np
+import sys


Import of 'sys' is not used.

Suggested change

import sys

Copilot · 2025-11-07T23:28:57Z

examples/23_rdma_consumer_pull/rdma_consumer_pull.py

+import torch.distributed as dist
+import triton
+import triton.language as tl
+import time


Import of 'time' is not used.

Suggested change

import time

Copilot · 2025-11-07T23:28:57Z

examples/22_rdma_producer_consumer/rdma_producer_consumer.py

+import torch.distributed as dist
+import triton
+import triton.language as tl
+import time


Import of 'time' is not used.

Suggested change

import time

mawad-amd added 12 commits October 30, 2025 12:05

Add RDMA host-side code

ead3883

Add setup.py

94f6164

Add RDMA iris backend

71c85f2

Add RDMA example

92e7bad

Update docker

fff5dd4

Add iris manager

8c872be

Add iris manager

5abb6f6

Device-side enqueue

42b50fd

Add host-side queue

abbf8ec

Update example

c7269d7

Fix bug

72b5fbf

Cleanup

a7a5333

github-actions bot added in-progress We are working on it iris Iris project issue labels Oct 31, 2025

mawad-amd added 4 commits October 30, 2025 22:23

Cleanup case

1fc0ac5

Add logger

da14081

Add missing logger

211856c

Write QPs

955dc3d

mawad-amd requested a review from Copilot November 7, 2025 23:24

Copilot AI reviewed Nov 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RDMA proxy thread backend #275

Add RDMA proxy thread backend #275
mawad-amd wants to merge 16 commits intomainfrom
muhaawad/rdma

mawad-amd commented Oct 31, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	int sq_length = 64; // Send queue length // TODO: FIX THAT
	int sq_length = 64; // Send queue length
	// TODO: Make send queue length (sq_length) configurable or calculate based on workload/device capabilities

	LOG_DEBUG("Warning: PUT completion not polled (may be OK if async)");
	LOG_WARN("Warning: PUT completion not polled (may be OK if async)");

	LOG_DEBUG("Warning: GET completion not polled (may be OK if async)");
	LOG_WARN("Warning: GET completion not polled (may be OK if async)");

	return super().build_extension(ext)
	super().build_extension(ext)

Conversation

mawad-amd commented Oct 31, 2025

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant