You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The generated TRT engine has significantly higher computational overhead than expected.
The network I'm currently using is a test network. I'm trying to develop an algorithm with branch switching, and this test network serves as groundwork for my subsequent development. The structure of this test network is very simple: it takes a 13224*224 tensor as input, makes a simple if-else decision, and then routes the tensor to different branches of either resnet18 or resnet101 based on the decision result.
According to the speed test results, no matter which branch the network jumps to, the engine takes about 3.8ms with little difference. As a control experiment, I also tested networks without branch switching, using only resnet18 or resnet101, which took about 2.9ms and 0.8ms respectively - these results are normal and meet expectations.
I have uploaded the code for generating the ONNX file and the code for converting it to a TRT engine.
I can't figure out what I might have done wrong. Even with such a simple test model, I can't achieve the expected results. I hope to receive your reply as soon as possible.
) Description
The generated TRT engine has significantly higher computational overhead than expected.
The network I'm currently using is a test network. I'm trying to develop an algorithm with branch switching, and this test network serves as groundwork for my subsequent development. The structure of this test network is very simple: it takes a 13224*224 tensor as input, makes a simple if-else decision, and then routes the tensor to different branches of either resnet18 or resnet101 based on the decision result.
According to the speed test results, no matter which branch the network jumps to, the engine takes about 3.8ms with little difference. As a control experiment, I also tested networks without branch switching, using only resnet18 or resnet101, which took about 2.9ms and 0.8ms respectively - these results are normal and meet expectations.
I have uploaded the code for generating the ONNX file and the code for converting it to a TRT engine.
I can't figure out what I might have done wrong. Even with such a simple test model, I can't achieve the expected results. I hope to receive your reply as soon as possible.
Environment
TensorRT Version:10.7.0.3
NVIDIA GPU: NVIDIA GeForce RTX 4070 Laptop
NVIDIA Driver Version:556.12
CUDA Version:12.5
CUDNN Version:8.4.0
Operating System:win11
Python Version (if applicable):3.9.21
Tensorflow Version (if applicable):
PyTorch Version (if applicable):1.12.0+cu116
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
//matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
Please include:
-->
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):The text was updated successfully, but these errors were encountered: