Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure to use correct GPU device in RunSince when it's invoked by new thread #24192

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

chilo-ms
Copy link
Contributor

@chilo-ms chilo-ms commented Mar 26, 2025

Running cuda kernel on incorrect GPU device will end up getting CUDA error: invalid resource handle.

CUDA EP and TRT EP both have this issue when ExecutionMode::ORT_PARALLEL is enabled.

Repro code:

provider = [
        [
            ('TensorrtExecutionProvider', {
            'device_id': 0,
            }),
        ],
        [
            ('TensorrtExecutionProvider', {
            'device_id': 1,
            }),
        ]
       ]

class ThreadObj():
    def __init__(self, model_path: str, iterations: int, idx: int):
       ...
        sess_opt = ort.SessionOptions()
        sess_opt.execution_mode = ort.ExecutionMode.ORT_PARALLEL
        self.inference_session = ort.InferenceSession(model_path, sess_opt, provider[idx % 2])
     
    def warmup(self):
        self.inference_session.run(None, self.input)

    def run(self, thread_times, threads_complete):
        for iter in range(self.iterations):
            self.inference_session.run(None, self.input)

def thread_target(obj, thread_times, threads_complete):
    obj.run(thread_times, threads_complete)

...

iterations = 500
num_threads = 13
t_obj_list = []
thread_list = []

for tidx in range(num_threads):
    obj = ThreadObj(model_path, iterations, tidx)
    t_obj_list.append(obj)
    obj.warmup()
    
for t_obj in t_obj_list:
    thread = threading.Thread(target=thread_target, daemon=True, args=(t_obj,thread_times,threads_complete,))
    thread.start()
    thread_list.append(thread)

...

The reason is when the inference session is initialized, it can be bound to device > 0, whereas when running the inference, i.e. RunSince can be invoked by a new thread and new threads default to using device 0, then we will hit the error of using the incorrect GPU device.
This PR provides a general fix for both CUDA EP and TRT EP to call cudaSetDeivce in RunSince.

@chilo-ms chilo-ms requested review from tianleiwu and jywu-msft March 26, 2025 19:26
@jywu-msft
Copy link
Member

Can a test be added for CUDA EP/TRT EP to stress this? (or existing test enhanced)

@chilo-ms
Copy link
Contributor Author

Can a test be added for CUDA EP/TRT EP to stress this? (or existing test enhanced)

i thought about adding the test but it needs to have multiple GPUs to test.
Checking whether our CI has multiple GPUs or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants