Ensure to use correct GPU device in RunSince when it's invoked by new thread #24192

chilo-ms · 2025-03-26T19:25:31Z

Running cuda kernel on incorrect GPU device will end up getting CUDA error: invalid resource handle.

CUDA EP and TRT EP both have this issue when ExecutionMode::ORT_PARALLEL is enabled.

Repro code:

provider = [
        [
            ('TensorrtExecutionProvider', {
            'device_id': 0,
            }),
        ],
        [
            ('TensorrtExecutionProvider', {
            'device_id': 1,
            }),
        ]
       ]

class ThreadObj():
    def __init__(self, model_path: str, iterations: int, idx: int):
       ...
        sess_opt = ort.SessionOptions()
        sess_opt.execution_mode = ort.ExecutionMode.ORT_PARALLEL
        self.inference_session = ort.InferenceSession(model_path, sess_opt, provider[idx % 2])
     
    def warmup(self):
        self.inference_session.run(None, self.input)

    def run(self, thread_times, threads_complete):
        for iter in range(self.iterations):
            self.inference_session.run(None, self.input)

def thread_target(obj, thread_times, threads_complete):
    obj.run(thread_times, threads_complete)

...

iterations = 500
num_threads = 13
t_obj_list = []
thread_list = []

for tidx in range(num_threads):
    obj = ThreadObj(model_path, iterations, tidx)
    t_obj_list.append(obj)
    obj.warmup()
    
for t_obj in t_obj_list:
    thread = threading.Thread(target=thread_target, daemon=True, args=(t_obj,thread_times,threads_complete,))
    thread.start()
    thread_list.append(thread)

...

The reason is when the inference session is initialized, it can be bound to device > 0, whereas when running the inference, i.e. RunSince can be invoked by a new thread and new threads default to using device 0, then we will hit the error of using the incorrect GPU device.
This PR provides a general fix for both CUDA EP and TRT EP to call cudaSetDeivce in RunSince.

jywu-msft · 2025-03-28T19:37:00Z

Can a test be added for CUDA EP/TRT EP to stress this? (or existing test enhanced)

chilo-ms · 2025-03-28T21:00:30Z

Can a test be added for CUDA EP/TRT EP to stress this? (or existing test enhanced)

i thought about adding the test but it needs to have multiple GPUs to test.
Checking whether our CI has multiple GPUs or not.

chilo-ms added 4 commits March 26, 2025 10:45

update

4093ed8

update

e63db6e

update

91d3419

update

9647041

chilo-ms requested review from tianleiwu and jywu-msft March 26, 2025 19:26

chilo-ms added 7 commits March 26, 2025 13:34

fix compile warning

4ae911a

update

376689b

fix typo

ddf67ac

fix compile warning

7bd1afd

call SetDevice function only when ORT_ENABLE_STREAM is on

b62ecd9

fix bug

8f0e1d3

fix bug

0687b25

add test

f41ca4f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure to use correct GPU device in RunSince when it's invoked by new thread #24192

Ensure to use correct GPU device in RunSince when it's invoked by new thread #24192

chilo-ms commented Mar 26, 2025 •

edited

Loading

jywu-msft commented Mar 28, 2025

chilo-ms commented Mar 28, 2025

Ensure to use correct GPU device in RunSince when it's invoked by new thread #24192

Are you sure you want to change the base?

Ensure to use correct GPU device in RunSince when it's invoked by new thread #24192

Conversation

chilo-ms commented Mar 26, 2025 • edited Loading

jywu-msft commented Mar 28, 2025

chilo-ms commented Mar 28, 2025

chilo-ms commented Mar 26, 2025 •

edited

Loading