TensorRT backend not work when the device_id is greater than 0 #19467

jax11235 · 2024-02-08T13:57:23Z

Describe the issue

Cuda backend runs fine on multiple gpus, but TensorRT fails when the device_id is specified as a value greater than 0.
One successful workaround I have tried so far is using multiple processes, and each using different environment variables CUDA_VISIBLE_DEVICES=device_id.

To reproduce

...

Urgency

Urgent, because the project is organized in multiple threads, and no workaround works.

Platform

Linux

OS Version

20.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.3 and 1.17.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

TensorRT 8.6

The text was updated successfully, but these errors were encountered:

chilo-ms · 2024-02-09T23:03:06Z

Some conversations here
#16274

chilo-ms · 2024-02-09T23:24:32Z

Can you share your C++ code of how you set device id?

We suggest to use session_options.AppendExecutionProvider_TensorRT_V2(tensorrt_options);
please see here for reference:
https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#click-below-for-c-api-example

jax11235 · 2024-02-10T02:32:31Z

This is part of my code to append tensorrt backend to sess_options:

    OrtApi const& ortApi = Ort::GetApi(); // Uses ORT_API_VERSION
    OrtTensorRTProviderOptionsV2* tensorrt_options = nullptr;
    Ort::ThrowOnError(ortApi.CreateTensorRTProviderOptions(&tensorrt_options));
    std::vector<const char*> keys{"device_id", "trt_max_workspace_size", "trt_fp16_enable", "trt_engine_cache_enable", "trt_engine_cache_path", "trt_timing_cache_enable"};
    std::vector<const char*> values{device_id.c_str(), memory_limit.c_str(), "1", "1", tensorrt_engine_cache_path.c_str(), "1"};
    Ort::ThrowOnError(ortApi.UpdateTensorRTProviderOptions(tensorrt_options, keys.data(), values.data(), keys.size()));
    Ort::ThrowOnError(ortApi.SessionOptionsAppendExecutionProvider_TensorRT_V2(so, tensorrt_options));
    ortApi.ReleaseTensorRTProviderOptions(tensorrt_options);

chilo-ms · 2024-02-10T18:59:47Z

Thanks, the code to append TRT EP to session options looks okay.

Could you also share the

application code where it creates multiple sessions and specifies different device_id to intialize session with TRT EP? and where you set CUDA_VISIBLE_DEVICES
the error message

Please note that users can only specify device_id at session initialization time and it's one gpu for one session.
Once the session is created, you can't change the device. It means when you call session.Run(), TRT EP always uses that device to perform inference, i think even though you set CUDA_VISIBLE_DEVICES=device_id before session.Run(), it won't have effect.

I'm still not sure what your application code is for now.
But we tested from our side, TRT EP can run different device within one session through provider options "device_id":
onnxruntime_perf_test -e tensorrt -r 1 -i "device_id|2" model.onnx

chilo-ms · 2024-02-10T19:02:54Z

Reply other discussion thread here

What I mean is that different threads uses different sessions, but if the first thread init the session with the environment variable CUDA_VISIBLE_DEVICES, subsequent threads can not change the device by modifying CUDA_VISIBLE_DEVICES, they use the same device with the first thread.

"subsequent threads can not change device by modifying CUDA_VISIBLE_DEVICES", other threads try to change the device at session creation? Or session.Run()?

jax11235 · 2024-02-11T02:02:41Z

Thanks, the code to append TRT EP to session options looks okay.

Could you also share the

application code where it creates multiple sessions and specifies different device_id to intialize session with TRT EP? and where you set CUDA_VISIBLE_DEVICES

the error message

Please note that users can only specify device_id at session initialization time and it's one gpu for one session. Once the session is created, you can't change the device. It means when you call session.Run(), TRT EP always uses that device to perform inference, i think even though you set CUDA_VISIBLE_DEVICES=device_id before session.Run(), it won't have effect.

I'm still not sure what your application code is for now. But we tested from our side, TRT EP can run different device within one session through provider options "device_id": onnxruntime_perf_test -e tensorrt -r 1 -i "device_id|2" model.onnx

Without setting CUDA_VISIBLE_DEVICES, the way I create sessions is equivalent to creating a session array, each session with a different device_id appended to a new session_option.
The error message is TensorRT EP execution context enqueue failed, the same as #16274.

jax11235 · 2024-02-11T02:15:20Z

Reply other discussion thread here

What I mean is that different threads uses different sessions, but if the first thread init the session with the environment variable CUDA_VISIBLE_DEVICES, subsequent threads can not change the device by modifying CUDA_VISIBLE_DEVICES, they use the same device with the first thread.

"subsequent threads can not change device by modifying CUDA_VISIBLE_DEVICES", other threads try to change the device at session creation? Or session.Run()?

My previous description was inaccurate, I create theses sessions sequentially in a single thread, like:

for (int i...) {
    setenv(CUDA_VISIBLE_DEVICES, device_ids[i], 1);
    sessions[i] = ...; // create a new session with a different session_options all with device_id=0
    sessions[i].run(...); // warmup
}

Above code can use a device with device_id > 0, but all sessions use the same device with device_id=device_ids[0].

chilo-ms · 2024-02-13T19:23:38Z

Without setting CUDA_VISIBLE_DEVICES, the way I create sessions is equivalent to creating a session array, each session with a different device_id appended to a new session_option. The error message is TensorRT EP execution context enqueue failed, the same as #16274.

That's a bit strange, i can't repro from my side with devide_id specified by provider option:

for (int i...) {
    std::vector<const char*> keys{"device_id"};
    std::vector<const char*> values{device_id[i].c_str()}
    Ort::ThrowOnError(ortApi.UpdateTensorRTProviderOptions(tensorrt_options, keys.data(), values.data(), keys.size()));
    sessions[i] = ...; // create a new session with a different session_options with different device id
    sessions[i].run(...); // warmup
}

The error message is TensorRT EP execution context enqueue failed, the same as #16274.

Is that all the error message?
The issue has another error message: Error Code 1: Cuda Runtime (invalid resource handle)

Could you help enable verbose log by adding a line of code as below to see the TRT EP's full log?

Ort::Env env(ORT_LOGGING_LEVEL_VERBOSE, "test");

you will see the device_id configured by TRT EP provider options: (you should see different devide_id for different session)

...
2024-02-13 18:45:22.629147053 [V:onnxruntime:test, tensorrt_execution_provider.cc:1700 TensorrtExecutionProvider] [TensorRT EP] TensorRT provider options: device_id: 2, trt_max_partition_iterations: 1000, trt_min_subgraph_size: 1, trt_max_workspace_size: 1073741824, trt_fp16_enable: 0, trt_int8_enable: 0, trt_int8_calibration_cache_name: , int8_calibration_cache_available: 0, trt_int8_use_native_tensorrt_calibration_table: 0, trt_dla_enable: 0, trt_dla_core: 0, trt_dump_subgraphs: 0, trt_engine_cache_enable: 0, trt_cache_path: , trt_global_cache_path: , trt_engine_decryption_enable: 0, trt_engine_decryption_lib_path: , trt_force_sequential_engine_build: 0, trt_context_memory_sharing_enable: 0, trt_layer_norm_fp32_fallback: 0, trt_build_heuristics_enable: 0, trt_sparsity_enable: 0, trt_builder_optimization_level: 3, trt_auxiliary_streams: -1, trt_tactic_sources: , trt_profile_min_shapes: , trt_profile_max_shapes: , trt_profile_opt_shapes: , trt_cuda_graph_enable: 0, trt_dump_ep_context_model: 0, trt_ep_context_file_path: , trt_ep_context_embed_mode: 0, trt_cache_prefix:
...

chilo-ms · 2024-02-13T19:24:22Z

My previous description was inaccurate, I create theses sessions sequentially in a single thread, like:
for (int i...) {
    setenv(CUDA_VISIBLE_DEVICES, device_ids[i], 1);
    sessions[i] = ...; // create a new session with a different session_options all with device_id=0
    sessions[i].run(...); // warmup
}
Above code can use a device with device_id > 0, but all sessions use the same device with device_id=device_ids[0].

I can't repro from my side using CUDA_VISIBLE_DEVICES either

chilo-ms · 2024-02-13T19:29:51Z

Follow-up questions here:

How do you check that session using which devicde_id?
Can you create one session with device_id > 0 via TRT EP provider options and run the session successfully?
Or the session can only be run successfully with device_id = 0

jax11235 · 2024-02-14T07:30:46Z

Without setting CUDA_VISIBLE_DEVICES, the way I create sessions is equivalent to creating a session array, each session with a different device_id appended to a new session_option. The error message is TensorRT EP execution context enqueue failed, the same as #16274.

That's a bit strange, i can't repro from my side with devide_id specified by provider option:
for (int i...) {
    std::vector<const char*> keys{"device_id"};
    std::vector<const char*> values{device_id[i].c_str()}
    Ort::ThrowOnError(ortApi.UpdateTensorRTProviderOptions(tensorrt_options, keys.data(), values.data(), keys.size()));
    sessions[i] = ...; // create a new session with a different session_options with different device id
    sessions[i].run(...); // warmup
}
The error message is TensorRT EP execution context enqueue failed, the same as #16274.

Is that all the error message? The issue has another error message: Error Code 1: Cuda Runtime (invalid resource handle)

Could you help enable verbose log by adding a line of code as below to see the TRT EP's full log?
Ort::Env env(ORT_LOGGING_LEVEL_VERBOSE, "test");
you will see the device_id configured by TRT EP provider options: (you should see different devide_id for different session)
...
2024-02-13 18:45:22.629147053 [V:onnxruntime:test, tensorrt_execution_provider.cc:1700 TensorrtExecutionProvider] [TensorRT EP] TensorRT provider options: device_id: 2, trt_max_partition_iterations: 1000, trt_min_subgraph_size: 1, trt_max_workspace_size: 1073741824, trt_fp16_enable: 0, trt_int8_enable: 0, trt_int8_calibration_cache_name: , int8_calibration_cache_available: 0, trt_int8_use_native_tensorrt_calibration_table: 0, trt_dla_enable: 0, trt_dla_core: 0, trt_dump_subgraphs: 0, trt_engine_cache_enable: 0, trt_cache_path: , trt_global_cache_path: , trt_engine_decryption_enable: 0, trt_engine_decryption_lib_path: , trt_force_sequential_engine_build: 0, trt_context_memory_sharing_enable: 0, trt_layer_norm_fp32_fallback: 0, trt_build_heuristics_enable: 0, trt_sparsity_enable: 0, trt_builder_optimization_level: 3, trt_auxiliary_streams: -1, trt_tactic_sources: , trt_profile_min_shapes: , trt_profile_max_shapes: , trt_profile_opt_shapes: , trt_cuda_graph_enable: 0, trt_dump_ep_context_model: 0, trt_ep_context_file_path: , trt_ep_context_embed_mode: 0, trt_cache_prefix:
...

Thanks for your reply, I have switched to multiple processes version, it runs fine on multiple cuda devices.
Because you can't reproduce it, currently I think it may be other non-code issues, I will do more tests if have time.

jax11235 · 2024-02-14T07:37:59Z

Follow-up questions here:

How do you check that session using which devicde_id?

Can you create one session with device_id > 0 via TRT EP provider options and run the session successfully?
Or the session can only be run successfully with device_id = 0

1: cuda ep works with correct device, just switch the ep to tensorrt.
2: no, if set CUDA_VISIBLE_DEVICES, can use a device with device_id>0, but the specified value of device_id in session_option is still 0.

jax11235 · 2024-02-23T11:04:49Z

I decide to close this issue and reopen it when there is new progress.

chilo-ms · 2025-03-14T16:33:24Z

@jax11235
We recently root caused a multithreading issue which involves running on GPU device > 0 and fixed it for TRT EP.
#24010
The issue seems to be what you encountered here.

Could you please try this fix? it should be able to fix your problem here.

jax11235 · 2025-03-14T16:50:26Z

Good job! I will plan to test it recently.

tianleiwu added the ep:TensorRT issues related to TensorRT execution provider label Feb 9, 2024

jywu-msft assigned chilo-ms and yf711 Feb 9, 2024

jywu-msft unassigned chilo-ms Feb 10, 2024

jax11235 closed this as completed Feb 23, 2024

83204273 mentioned this issue Mar 12, 2025

TensorRT backend not work when the device_id is greater than 0 in multiple threads. #24001

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT backend not work when the device_id is greater than 0 #19467

TensorRT backend not work when the device_id is greater than 0 #19467

jax11235 commented Feb 8, 2024 •

edited

Loading

chilo-ms commented Feb 9, 2024

chilo-ms commented Feb 9, 2024

jax11235 commented Feb 10, 2024

chilo-ms commented Feb 10, 2024 •

edited

Loading

chilo-ms commented Feb 10, 2024

jax11235 commented Feb 11, 2024

jax11235 commented Feb 11, 2024

chilo-ms commented Feb 13, 2024 •

edited

Loading

chilo-ms commented Feb 13, 2024 •

edited

Loading

chilo-ms commented Feb 13, 2024 •

edited

Loading

jax11235 commented Feb 14, 2024

jax11235 commented Feb 14, 2024

jax11235 commented Feb 23, 2024

chilo-ms commented Mar 14, 2025 •

edited

Loading

jax11235 commented Mar 14, 2025

TensorRT backend not work when the device_id is greater than 0 #19467

TensorRT backend not work when the device_id is greater than 0 #19467

Comments

jax11235 commented Feb 8, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

chilo-ms commented Feb 9, 2024

chilo-ms commented Feb 9, 2024

jax11235 commented Feb 10, 2024

chilo-ms commented Feb 10, 2024 • edited Loading

chilo-ms commented Feb 10, 2024

jax11235 commented Feb 11, 2024

jax11235 commented Feb 11, 2024

chilo-ms commented Feb 13, 2024 • edited Loading

chilo-ms commented Feb 13, 2024 • edited Loading

chilo-ms commented Feb 13, 2024 • edited Loading

jax11235 commented Feb 14, 2024

jax11235 commented Feb 14, 2024

jax11235 commented Feb 23, 2024

chilo-ms commented Mar 14, 2025 • edited Loading

jax11235 commented Mar 14, 2025

jax11235 commented Feb 8, 2024 •

edited

Loading

chilo-ms commented Feb 10, 2024 •

edited

Loading

chilo-ms commented Feb 13, 2024 •

edited

Loading

chilo-ms commented Feb 13, 2024 •

edited

Loading

chilo-ms commented Feb 13, 2024 •

edited

Loading

chilo-ms commented Mar 14, 2025 •

edited

Loading