-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bert model split into many layers after int8 quantization #4397
Comments
What is cmd you used ? |
first I export torch to onnx using :
then I export onnx to trt engine using python trt
|
what flag |
|
Because FP16 (not int8 quant), mylin compiler the some shape-ops as one node. |
so why it can not compile int8 quant model as one node |
There are 2 situations.
|
ok |
I first post the issue in
NVIDIA/TensorRT-Model-Optimizer#159
I quantize a pytorch bert model using TensorRT-Model-Optimizer
before quantization, I export this model to tensorrt and there is only one layer
but after quantization there are many layers, why?
can this be fixed?
(only part of these layers)
The text was updated successfully, but these errors were encountered: