The generated TRT engine has significantly higher computational overhead than expected. #4402

e271828184 · 2025-03-27T15:08:12Z

) Description

The generated TRT engine has significantly higher computational overhead than expected.

The network I'm currently using is a test network. I'm trying to develop an algorithm with branch switching, and this test network serves as groundwork for my subsequent development. The structure of this test network is very simple: it takes a 13224*224 tensor as input, makes a simple if-else decision, and then routes the tensor to different branches of either resnet18 or resnet101 based on the decision result.

According to the speed test results, no matter which branch the network jumps to, the engine takes about 3.8ms with little difference. As a control experiment, I also tested networks without branch switching, using only resnet18 or resnet101, which took about 2.9ms and 0.8ms respectively - these results are normal and meet expectations.
I have uploaded the code for generating the ONNX file and the code for converting it to a TRT engine.
I can't figure out what I might have done wrong. Even with such a simple test model, I can't achieve the expected results. I hope to receive your reply as soon as possible.

Environment

TensorRT Version:10.7.0.3

NVIDIA GPU: NVIDIA GeForce RTX 4070 Laptop

NVIDIA Driver Version:556.12

CUDA Version:12.5

CUDNN Version:8.4.0

Operating System:win11

Python Version (if applicable):3.9.21

Tensorflow Version (if applicable):

PyTorch Version (if applicable):1.12.0+cu116

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

import torch.onnx
import torch
import torchvision

class MyScriptModule(torch.nn.Module):
    def __init__(self):
        super().__init__()

        self.res18 = torch.jit.trace(torchvision.models.resnet18().eval(),
                                      torch.rand(1, 3, 224, 224))
        self.res101 = torch.jit.trace(torchvision.models.resnet101().eval(),
                                     torch.rand(1, 3, 224, 224))

    def forward(self, input):
        max=torch.max(input)
        temp=input*2
        # result = self.res18(temp)
        if(max>9999):
            result=self.res18(temp)
            result=torch.sin(result)
        else:
            result=self.res101(temp)
            result=torch.abs(result)+0.1
            result=torch.log(result)
        return result

my_script_module = torch.jit.script(MyScriptModule().eval())
def export_to_onnx():
    items = torch.rand(1, 3, 224, 224)
    example = my_script_module
    torch.onnx.export(example,    items , "resnet_example.onnx", input_names=["input"],verbose=False,  do_constant_folding=True)

export_to_onnx()

 trtexec --onnx=C:/D/mmd_torch112/resnet_example.onnx --saveEngine=resnet_example.engine    --noCompilationCache --profilingVerbosity=detailed --precisionConstraints=obey

//matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered
-->

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The generated TRT engine has significantly higher computational overhead than expected. #4402

The generated TRT engine has significantly higher computational overhead than expected. #4402

e271828184 commented Mar 27, 2025 •

edited

Loading

The generated TRT engine has significantly higher computational overhead than expected. #4402

The generated TRT engine has significantly higher computational overhead than expected. #4402

Comments

e271828184 commented Mar 27, 2025 • edited Loading

Environment

Relevant Files

Steps To Reproduce

e271828184 commented Mar 27, 2025 •

edited

Loading