Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNXRuntimeError] Non-zero status code returned while running SkipLayerNormalization node. #4779

Open
wppply opened this issue Aug 13, 2020 · 9 comments
Labels
model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. stale issues that have not been addressed in a while; categorized by a bot

Comments

@wppply
Copy link

wppply commented Aug 13, 2020

Describe the bug
I am trying to follow this tutorial to transfer my 2 layer bert into ONNX and optimize with onnxruntime_tools.
It works smoothly when I transfer my tf model from .pb to .onnx.

Urgency
If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 10.14.6
  • ONNX Runtime installed from (source or binary):pip install --quiet --upgrade onnxruntime==1.4.0
  • ONNX Runtime version: pip install --quiet --upgrade onnxruntime-tools==1.4.0
  • Python version:3.7.7
  • Visual Studio version (if applicable):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:

To Reproduce
I follow this tutorial https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/transformers/notebooks/Tensorflow_Keras_Bert-Squad_OnnxRuntime_CPU.ipynb
it works well for tf epxorted model --> export ONNX model --> inference --> export Optimized ONNX mdoel.

! python -m tf2onnx.convert --saved-model /Users/mye29/Downloads/tmp_tiny_bert/export/1597187163/ --opset=10 --output=model.onnx
length = 32
input_ids = np.array([[128] * length], dtype=np.int32)
input_mask = np.array([[1] * length], dtype=np.int32)
segment_ids = np.array([[1] * length], dtype=np.int32)
label_id = [0]

inputs_onnx = {"input_ids_1:0": input_ids, 
               "input_mask_1:0": input_mask, 
               "segment_ids_1:0": segment_ids, 
               "label_ids_1:0": label_id}

sess_options = onnxruntime.SessionOptions()
session = onnxruntime.InferenceSession("model.onnx", sess_options, providers=['CPUExecutionProvider'])

total_runs = 1000
start = time.time()
for _ in range(total_runs):
    results = session.run(None, inputs_onnx)
end = time.time()
print("ONNX Runtime cpu inference time for sequence length {} (model not optimized): {} ms".format(
    32, format((end - start) * 1000 / total_runs, '.2f')))

However it doesnt work after I optimize_model

optimized_model_path = 'tf_{}_opt_cpu.onnx'.format("model")

from onnxruntime_tools import optimizer
optimized_model = optimizer.optimize_model("model.onnx", 
                                           model_type='bert_tf', 
                                           opt_level=1,
                                           num_heads=2, hidden_size=128)
optimized_model.use_dynamic_axes()
optimized_model.save_model_to_file(optimized_model_path)

the optimization remove one redundant input "label_ids_1:0"

length = 32
input_ids = np.array([[128] * length], dtype=np.int32)
input_mask = np.array([[1] * length], dtype=np.int32)
segment_ids = np.array([[1] * length], dtype=np.int32)

inputs_onnx = {"input_ids_1:0": input_ids, 
               "input_mask_1:0": input_mask, 
               "segment_ids_1:0": segment_ids}

The following step would give me error on CPU

sess_options = onnxruntime.SessionOptions()
# sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL

session = onnxruntime.InferenceSession(optimized_model_path, sess_options)
# use one run to warm up a session
session.run(None, inputs_onnx)

# measure the latency.
start = time.time()
for _ in range(total_runs):
    opt_results = session.run(None, inputs_onnx)
end = time.time()
print("ONNX Runtime cpu inference time on optimized model: {} ms".format(format((end - start) * 1000 / total_runs, '.2f')))
del session
      4 session = onnxruntime.InferenceSession(optimized_model_path, sess_options)
      5 # use one run to warm up a session
----> 6 session.run(None, inputs_onnx)
      7 
      8 # measure the latency.

/anaconda3/envs/tf115/lib/python3.7/site-packages/onnxruntime/capi/session.py in run(self, output_names, input_feed, run_options)
    108             output_names = [output.name for output in self._outputs_meta]
    109         try:
--> 110             return self._sess.run(output_names, input_feed, run_options)
    111         except C.EPFail as err:
    112             if self._enable_fallback:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running SkipLayerNormalization node. Name:'SkipLayerNorm_AddBias_6' Status Message: input is expected to have 3 dimensions, got 2

I uploaded my model here https://drive.google.com/drive/folders/1S7ekooSbXAu6UuyynW5RyGmL1FKtoYqh?usp=sharing

Expected behavior
Expect to give me a loss like non-optimized one, and much faster 👍
Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

@hariharans29 hariharans29 added model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. type:bug labels Aug 13, 2020
@hariharans29
Copy link
Member

Looks like there is some bug in the optimizer script. @tianleiwu

@colourful-tree
Copy link

colourful-tree commented Aug 19, 2020

@wppply @hariharans29

fusion = FusionLayerNormalizationTF(self)

Remove [line 91-92], it work for me.

And i also change :

if input.name in bert_graph_inputs:

to
if input.name in ["segment_ids:0", "input_mask:0", "input_ids:0"]:

But i find that model.optimizer.onnx this model does not faster than model.onnx.

My tinybert model is 2-transformer layers with 12 heads and 120 hidden_dim.

@tianleiwu
Copy link
Contributor

@wppply,

Thanks for reporting the issue.

The cause of the error is a path in ONNX graph like the following:

SkipLayerNormalization (SkipLayerNorm1) --> Reshape (bert/encoder/Reshape_1) --> SkipLayerNormalization (
SkipLayerNorm_AddBias_6)

The correct one:

SkipLayerNormalization (SkipLayerNorm1) --> SkipLayerNormalization (
SkipLayerNorm_AddBias_6)

For normal BERT graph, the Reshape will be removed in postprocess. However, for this model, the optimizer failed to fuse Attention and EmbedLayerNormalization (because subgraph pattern is different) so the Reshape node has not been removed.

@wppply
Copy link
Author

wppply commented Sep 9, 2020

@tianleiwu
Thanks for reply. Will this issue be fixed in the next release?

@GumpCode
Copy link

GumpCode commented Oct 26, 2020

meet the same error when i was used bert-base

@stevewyl
Copy link

After changing the codes @colourful-tree referred to, still got the same error.
Package version: Using tensorflow=1.12.0, onnx=1.8.0, tf2onnx=1.7.2/995bd6, onnxruntime-noopenmp=1.6.0

process_embedding: Create Embedding node
prune_graph: Graph pruned: 0 inputs, 0 outputs and 34 nodes are removed
fuse_mask_2: Failed to fuse mask
apply: Fused SkipLayerNormalization count: 24
prune_graph: Graph pruned: 0 inputs, 0 outputs and 0 nodes are removed
apply: Fused FastGelu(add bias) count: 12
apply: Fused SkipLayerNormalization(add bias) count: 24
optimize: opset verion: 11

@wppply
Copy link
Author

wppply commented Jan 7, 2021

add this option in the optimizer.optimize_model will help to solve the issue
optimization_options=BertOptimizationOptions("gpt2")
or manually make the change the option
enable_skip_layer_norm = False

@stale
Copy link

stale bot commented Apr 19, 2022

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@stale stale bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 19, 2022
@KokinSok
Copy link

KokinSok commented Mar 7, 2024

same error here - Onnx is becoming stale!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

8 participants