Description
Describe the bug
I am trying to follow this tutorial to transfer my 2 layer bert into ONNX and optimize with onnxruntime_tools
.
It works smoothly when I transfer my tf model from .pb to .onnx.
Urgency
If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 10.14.6
- ONNX Runtime installed from (source or binary):pip install --quiet --upgrade onnxruntime==1.4.0
- ONNX Runtime version: pip install --quiet --upgrade onnxruntime-tools==1.4.0
- Python version:3.7.7
- Visual Studio version (if applicable):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version:
- GPU model and memory:
To Reproduce
I follow this tutorial https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/transformers/notebooks/Tensorflow_Keras_Bert-Squad_OnnxRuntime_CPU.ipynb
it works well for tf epxorted model --> export ONNX model --> inference --> export Optimized ONNX mdoel.
! python -m tf2onnx.convert --saved-model /Users/mye29/Downloads/tmp_tiny_bert/export/1597187163/ --opset=10 --output=model.onnx
length = 32
input_ids = np.array([[128] * length], dtype=np.int32)
input_mask = np.array([[1] * length], dtype=np.int32)
segment_ids = np.array([[1] * length], dtype=np.int32)
label_id = [0]
inputs_onnx = {"input_ids_1:0": input_ids,
"input_mask_1:0": input_mask,
"segment_ids_1:0": segment_ids,
"label_ids_1:0": label_id}
sess_options = onnxruntime.SessionOptions()
session = onnxruntime.InferenceSession("model.onnx", sess_options, providers=['CPUExecutionProvider'])
total_runs = 1000
start = time.time()
for _ in range(total_runs):
results = session.run(None, inputs_onnx)
end = time.time()
print("ONNX Runtime cpu inference time for sequence length {} (model not optimized): {} ms".format(
32, format((end - start) * 1000 / total_runs, '.2f')))
However it doesnt work after I optimize_model
optimized_model_path = 'tf_{}_opt_cpu.onnx'.format("model")
from onnxruntime_tools import optimizer
optimized_model = optimizer.optimize_model("model.onnx",
model_type='bert_tf',
opt_level=1,
num_heads=2, hidden_size=128)
optimized_model.use_dynamic_axes()
optimized_model.save_model_to_file(optimized_model_path)
the optimization remove one redundant input "label_ids_1:0"
length = 32
input_ids = np.array([[128] * length], dtype=np.int32)
input_mask = np.array([[1] * length], dtype=np.int32)
segment_ids = np.array([[1] * length], dtype=np.int32)
inputs_onnx = {"input_ids_1:0": input_ids,
"input_mask_1:0": input_mask,
"segment_ids_1:0": segment_ids}
The following step would give me error on CPU
sess_options = onnxruntime.SessionOptions()
# sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL
session = onnxruntime.InferenceSession(optimized_model_path, sess_options)
# use one run to warm up a session
session.run(None, inputs_onnx)
# measure the latency.
start = time.time()
for _ in range(total_runs):
opt_results = session.run(None, inputs_onnx)
end = time.time()
print("ONNX Runtime cpu inference time on optimized model: {} ms".format(format((end - start) * 1000 / total_runs, '.2f')))
del session
4 session = onnxruntime.InferenceSession(optimized_model_path, sess_options)
5 # use one run to warm up a session
----> 6 session.run(None, inputs_onnx)
7
8 # measure the latency.
/anaconda3/envs/tf115/lib/python3.7/site-packages/onnxruntime/capi/session.py in run(self, output_names, input_feed, run_options)
108 output_names = [output.name for output in self._outputs_meta]
109 try:
--> 110 return self._sess.run(output_names, input_feed, run_options)
111 except C.EPFail as err:
112 if self._enable_fallback:
InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running SkipLayerNormalization node. Name:'SkipLayerNorm_AddBias_6' Status Message: input is expected to have 3 dimensions, got 2
I uploaded my model here https://drive.google.com/drive/folders/1S7ekooSbXAu6UuyynW5RyGmL1FKtoYqh?usp=sharing
Expected behavior
Expect to give me a loss like non-optimized one, and much faster 👍
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.