Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After upgrading from 8.6 to 10.8 or 10.9, tensorrt's results are inconsistent with onnxrt #4400

Open
2730gf opened this issue Mar 27, 2025 · 8 comments

Comments

@2730gf
Copy link

2730gf commented Mar 27, 2025

Description

After upgrading tensorrt to 10.8, the model accuracy decreased.
After setting all nodes of the model to output, the model accuracy was aligned. It was suspected that the fusion strategy introduced by the upgrade caused the accuracy problem.
Finally, we used the polygraphy tool to find an onnx subgraph that could reproduce the problem.
By the way, it needs to be reminded that the model only uses fp32 precision, not even fp16 or int8

Environment

TensorRT Version: 10.8 & 10.9

NVIDIA GPU: 3090

NVIDIA Driver Version: 550.67

CUDA Version: 12.2

Relevant Files

I post onnx file here:
Model link: https://github.com/2730gf/issues/blob/main/trt_inconsistent/mini_graph.onnx

Steps To Reproduce

Commands or scripts:
polygraphy run mini_graph.onnx -v -v -v -v -v --pool-limit workspace:20G --onnxrt --trt --validate --atol 1e-4 --rtol 1e-3 --onnx-outputs p2o.Concat.125 --trt-outputs p2o.Concat.125
Image

@lix19937
Copy link

lix19937 commented Apr 2, 2025

@2730gf Your output distribution is very similar, can you upload the full screenshot ?

@2730gf
Copy link
Author

2730gf commented Apr 2, 2025

@lix19937 Thanks for your reply. I have updated the screenshot.

@lix19937
Copy link

lix19937 commented Apr 2, 2025

So from your sshot, the max dist in one place, and mainly dist in [0, 0.0316], different arch of gpu, the tactic of kernel impl is diffenrent, BTW, you can evaluate through indicators.

@2730gf
Copy link
Author

2730gf commented Apr 2, 2025

@lix19937 This model is fp32, and generally speaking, using fp32 inference will not produce such a large diff.
And when we using trt8.6, the accuracy is completely aligned, but trt10.8 is not aligned.

Most importantly, it has already affected the accuracy of the model. That's why we start to analyze the output diff of the intermediate nodes

@lix19937
Copy link

lix19937 commented Apr 2, 2025

Try to open --noTF32 , then re-compare.

BTW, you can use polygraphy to location which layer begin to diff.

@2730gf
Copy link
Author

2730gf commented Apr 2, 2025

@lix19937
Under polygraphy, TF32 is turned off by default.
[I] TF32 is disabled by default. Turn on TF32 for better performance with minor accuracy differences.
Initially, we discovered an accuracy issue in a large-scale model. Currently, the mini_graph.onnx serves as a minimal reproducible example of the problem, containing only dozens of nodes.
We observed that when all intermediate nodes are designated as outputs, the accuracy aligns. Therefore, we suspect that the new kernel fusion is responsible for the issue.
These kernel fusion is not transparent to users. I hope NV can help take a look.

@lix19937
Copy link

lix19937 commented Apr 3, 2025

Sorry, I thought you were using trtexec.
From your onnx, it like a self-atten, can you tell me the torch implementation corresponding to this subgraph. From historical experience , compiling an atten layer which uses torch's scaled_dot_product_attention to a trt engine results in incorrect outputs.
https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/transformer.py#L159

@2730gf
Copy link
Author

2730gf commented Apr 3, 2025

@lix19937 Hello, it is not trained using torch, but it is indeed a transformer-based model. Is there any way to fix this accuracy issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants