[ONNXRuntimeError]  Non-zero status code returned while running SkipLayerNormalization node.

**Describe the bug**
I am trying to follow this tutorial to transfer my 2 layer bert into ONNX and optimize with  `onnxruntime_tools`. 
 It works smoothly when  I transfer my tf model from .pb to .onnx. 

**Urgency**
If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.

**System information**
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 10.14.6
- ONNX Runtime installed from (source or binary):pip install --quiet --upgrade onnxruntime==1.4.0
- ONNX Runtime version: pip install --quiet --upgrade onnxruntime-tools==1.4.0
- Python version:3.7.7
- Visual Studio version (if applicable):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version:
- GPU model and memory:

**To Reproduce**
I follow this tutorial https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/transformers/notebooks/Tensorflow_Keras_Bert-Squad_OnnxRuntime_CPU.ipynb
it works well for tf epxorted model --> export ONNX model --> inference --> export Optimized ONNX mdoel.

```
! python -m tf2onnx.convert --saved-model /Users/mye29/Downloads/tmp_tiny_bert/export/1597187163/ --opset=10 --output=model.onnx
```
```
length = 32
input_ids = np.array([[128] * length], dtype=np.int32)
input_mask = np.array([[1] * length], dtype=np.int32)
segment_ids = np.array([[1] * length], dtype=np.int32)
label_id = [0]

inputs_onnx = {"input_ids_1:0": input_ids, 
               "input_mask_1:0": input_mask, 
               "segment_ids_1:0": segment_ids, 
               "label_ids_1:0": label_id}

sess_options = onnxruntime.SessionOptions()
session = onnxruntime.InferenceSession("model.onnx", sess_options, providers=['CPUExecutionProvider'])

total_runs = 1000
start = time.time()
for _ in range(total_runs):
    results = session.run(None, inputs_onnx)
end = time.time()
print("ONNX Runtime cpu inference time for sequence length {} (model not optimized): {} ms".format(
    32, format((end - start) * 1000 / total_runs, '.2f')))
```

**However** it doesnt work after I  `optimize_model`
```
optimized_model_path = 'tf_{}_opt_cpu.onnx'.format("model")

from onnxruntime_tools import optimizer
optimized_model = optimizer.optimize_model("model.onnx", 
                                           model_type='bert_tf', 
                                           opt_level=1,
                                           num_heads=2, hidden_size=128)
optimized_model.use_dynamic_axes()
optimized_model.save_model_to_file(optimized_model_path)
```

the optimization remove one redundant input "label_ids_1:0"
```
length = 32
input_ids = np.array([[128] * length], dtype=np.int32)
input_mask = np.array([[1] * length], dtype=np.int32)
segment_ids = np.array([[1] * length], dtype=np.int32)

inputs_onnx = {"input_ids_1:0": input_ids, 
               "input_mask_1:0": input_mask, 
               "segment_ids_1:0": segment_ids}
```
The following step would give me error on CPU
```
sess_options = onnxruntime.SessionOptions()
# sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL

session = onnxruntime.InferenceSession(optimized_model_path, sess_options)
# use one run to warm up a session
session.run(None, inputs_onnx)

# measure the latency.
start = time.time()
for _ in range(total_runs):
    opt_results = session.run(None, inputs_onnx)
end = time.time()
print("ONNX Runtime cpu inference time on optimized model: {} ms".format(format((end - start) * 1000 / total_runs, '.2f')))
del session
```

```
      4 session = onnxruntime.InferenceSession(optimized_model_path, sess_options)
      5 # use one run to warm up a session
----> 6 session.run(None, inputs_onnx)
      7 
      8 # measure the latency.

/anaconda3/envs/tf115/lib/python3.7/site-packages/onnxruntime/capi/session.py in run(self, output_names, input_feed, run_options)
    108             output_names = [output.name for output in self._outputs_meta]
    109         try:
--> 110             return self._sess.run(output_names, input_feed, run_options)
    111         except C.EPFail as err:
    112             if self._enable_fallback:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running SkipLayerNormalization node. Name:'SkipLayerNorm_AddBias_6' Status Message: input is expected to have 3 dimensions, got 2
```

I uploaded my model here https://drive.google.com/drive/folders/1S7ekooSbXAu6UuyynW5RyGmL1FKtoYqh?usp=sharing

**Expected behavior**
Expect to give me a loss like non-optimized one, and much faster 👍 
**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ONNXRuntimeError] Non-zero status code returned while running SkipLayerNormalization node. #4779

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ONNXRuntimeError] Non-zero status code returned while running SkipLayerNormalization node. #4779

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions