Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The generated TRT engine has significantly higher computational overhead than expected. #4402

Open
e271828184 opened this issue Mar 27, 2025 · 0 comments

Comments

@e271828184
Copy link

e271828184 commented Mar 27, 2025

) Description

The generated TRT engine has significantly higher computational overhead than expected.

The network I'm currently using is a test network. I'm trying to develop an algorithm with branch switching, and this test network serves as groundwork for my subsequent development. The structure of this test network is very simple: it takes a 13224*224 tensor as input, makes a simple if-else decision, and then routes the tensor to different branches of either resnet18 or resnet101 based on the decision result.

According to the speed test results, no matter which branch the network jumps to, the engine takes about 3.8ms with little difference. As a control experiment, I also tested networks without branch switching, using only resnet18 or resnet101, which took about 2.9ms and 0.8ms respectively - these results are normal and meet expectations.
I have uploaded the code for generating the ONNX file and the code for converting it to a TRT engine.
I can't figure out what I might have done wrong. Even with such a simple test model, I can't achieve the expected results. I hope to receive your reply as soon as possible.

Environment

TensorRT Version:10.7.0.3

NVIDIA GPU: NVIDIA GeForce RTX 4070 Laptop

NVIDIA Driver Version:556.12

CUDA Version:12.5

CUDNN Version:8.4.0

Operating System:win11

Python Version (if applicable):3.9.21

Tensorflow Version (if applicable):

PyTorch Version (if applicable):1.12.0+cu116

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

import torch.onnx
import torch
import torchvision

class MyScriptModule(torch.nn.Module):
    def __init__(self):
        super().__init__()

        self.res18 = torch.jit.trace(torchvision.models.resnet18().eval(),
                                      torch.rand(1, 3, 224, 224))
        self.res101 = torch.jit.trace(torchvision.models.resnet101().eval(),
                                     torch.rand(1, 3, 224, 224))

    def forward(self, input):
        max=torch.max(input)
        temp=input*2
        # result = self.res18(temp)
        if(max>9999):
            result=self.res18(temp)
            result=torch.sin(result)
        else:
            result=self.res101(temp)
            result=torch.abs(result)+0.1
            result=torch.log(result)
        return result

my_script_module = torch.jit.script(MyScriptModule().eval())
def export_to_onnx():
    items = torch.rand(1, 3, 224, 224)
    example = my_script_module
    torch.onnx.export(example,    items , "resnet_example.onnx", input_names=["input"],verbose=False,  do_constant_folding=True)

export_to_onnx()

 trtexec --onnx=C:/D/mmd_torch112/resnet_example.onnx --saveEngine=resnet_example.engine    --noCompilationCache --profilingVerbosity=detailed --precisionConstraints=obey

//matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered
    -->

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant