Skip to content

[ONNXRuntimeError] Non-zero status code returned while running SkipLayerNormalization node. #4779

Closed as not planned
@wppply

Description

@wppply

Describe the bug
I am trying to follow this tutorial to transfer my 2 layer bert into ONNX and optimize with onnxruntime_tools.
It works smoothly when I transfer my tf model from .pb to .onnx.

Urgency
If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 10.14.6
  • ONNX Runtime installed from (source or binary):pip install --quiet --upgrade onnxruntime==1.4.0
  • ONNX Runtime version: pip install --quiet --upgrade onnxruntime-tools==1.4.0
  • Python version:3.7.7
  • Visual Studio version (if applicable):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:

To Reproduce
I follow this tutorial https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/transformers/notebooks/Tensorflow_Keras_Bert-Squad_OnnxRuntime_CPU.ipynb
it works well for tf epxorted model --> export ONNX model --> inference --> export Optimized ONNX mdoel.

! python -m tf2onnx.convert --saved-model /Users/mye29/Downloads/tmp_tiny_bert/export/1597187163/ --opset=10 --output=model.onnx
length = 32
input_ids = np.array([[128] * length], dtype=np.int32)
input_mask = np.array([[1] * length], dtype=np.int32)
segment_ids = np.array([[1] * length], dtype=np.int32)
label_id = [0]

inputs_onnx = {"input_ids_1:0": input_ids, 
               "input_mask_1:0": input_mask, 
               "segment_ids_1:0": segment_ids, 
               "label_ids_1:0": label_id}

sess_options = onnxruntime.SessionOptions()
session = onnxruntime.InferenceSession("model.onnx", sess_options, providers=['CPUExecutionProvider'])

total_runs = 1000
start = time.time()
for _ in range(total_runs):
    results = session.run(None, inputs_onnx)
end = time.time()
print("ONNX Runtime cpu inference time for sequence length {} (model not optimized): {} ms".format(
    32, format((end - start) * 1000 / total_runs, '.2f')))

However it doesnt work after I optimize_model

optimized_model_path = 'tf_{}_opt_cpu.onnx'.format("model")

from onnxruntime_tools import optimizer
optimized_model = optimizer.optimize_model("model.onnx", 
                                           model_type='bert_tf', 
                                           opt_level=1,
                                           num_heads=2, hidden_size=128)
optimized_model.use_dynamic_axes()
optimized_model.save_model_to_file(optimized_model_path)

the optimization remove one redundant input "label_ids_1:0"

length = 32
input_ids = np.array([[128] * length], dtype=np.int32)
input_mask = np.array([[1] * length], dtype=np.int32)
segment_ids = np.array([[1] * length], dtype=np.int32)

inputs_onnx = {"input_ids_1:0": input_ids, 
               "input_mask_1:0": input_mask, 
               "segment_ids_1:0": segment_ids}

The following step would give me error on CPU

sess_options = onnxruntime.SessionOptions()
# sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL

session = onnxruntime.InferenceSession(optimized_model_path, sess_options)
# use one run to warm up a session
session.run(None, inputs_onnx)

# measure the latency.
start = time.time()
for _ in range(total_runs):
    opt_results = session.run(None, inputs_onnx)
end = time.time()
print("ONNX Runtime cpu inference time on optimized model: {} ms".format(format((end - start) * 1000 / total_runs, '.2f')))
del session
      4 session = onnxruntime.InferenceSession(optimized_model_path, sess_options)
      5 # use one run to warm up a session
----> 6 session.run(None, inputs_onnx)
      7 
      8 # measure the latency.

/anaconda3/envs/tf115/lib/python3.7/site-packages/onnxruntime/capi/session.py in run(self, output_names, input_feed, run_options)
    108             output_names = [output.name for output in self._outputs_meta]
    109         try:
--> 110             return self._sess.run(output_names, input_feed, run_options)
    111         except C.EPFail as err:
    112             if self._enable_fallback:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running SkipLayerNormalization node. Name:'SkipLayerNorm_AddBias_6' Status Message: input is expected to have 3 dimensions, got 2

I uploaded my model here https://drive.google.com/drive/folders/1S7ekooSbXAu6UuyynW5RyGmL1FKtoYqh?usp=sharing

Expected behavior
Expect to give me a loss like non-optimized one, and much faster 👍
Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

Metadata

Metadata

Assignees

No one assigned

    Labels

    model:transformerissues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.staleissues that have not been addressed in a while; categorized by a bot

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions