Segmentation error when using graph optimization #5175

NagarajSMurthy · 2020-09-15T12:02:23Z

Describe the bug
When creating an inference session, even after specifying the session option 'optimized_model_filepath' to store the optimized model, the program gives segmentation error.

Urgency
Moderate

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Raspberry Pi 3B+
ONNX Runtime installed from (source or binary): source
ONNX Runtime version: 1.4.0
Python version: 3.8.0
Visual Studio version (if applicable):
GCC/Compiler version (if compiling from source): 6.3.0
CUDA/cuDNN version:
GPU model and memory:

To Reproduce

so = onnxruntime.SessionOptions()
so.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_BASIC
so.optimized_model_filepath = '/home/pi/Desktop/AI/optimised_model.onnx'

ort_session = onnxruntime.InferenceSession(path,sess_options=so)

Expected behavior
An inference session would have started with graph optimizations set.

mrry · 2020-09-15T21:38:20Z

Sorry about that, @NagarajSMurthy! Could you capture a stack trace for the segmentation error (e.g. by running the Python process under gdb)? That will help us to identify which component is causing the problem?

Also, to clarify, does the failure still occur if you do not pass any SessionOptions when creating the InferenceSession?

NagarajSMurthy · 2020-09-16T04:18:12Z

Thanks for the response @mrry
I ran the process under gdb and this is what I got:

`(gdb) run onnx_test.py
Starting program: /usr/local/bin/python3 onnx_test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
/home/pi/Desktop/EdgeAI/MNIST/MnistConvnet_new_reg_3.onnx
running

Program received signal SIGSEGV, Segmentation fault.
0x753d5618 in onnxruntime::FreeDimensionOverrideTransformer::FreeDimensionOverrideTransformer(gsl::span<onnxruntime::FreeDimensionOverride const>) ()
from /home/pi/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
`
Also I do get a SegFault when I run without passing any SessionOptions. When I disable all the graph optimizations, it works perfectly fine (#3296 (comment)).
If I disable all graph optimizations and give only the optimized_file_path option in SessionOptions, it saves a optimised model to the location.

mrry · 2020-09-16T04:39:41Z

Thanks for the update. A couple more questions:

Does the same error occur when loading a different ONNX model (e.g. the simple MNIST model from https://github.com/onnx/models/blob/master/vision/classification/mnist/model/mnist-8.tar.gz)?
Can you include the full stack trace that gdb reports? (It probably won't tell us much more, but it may have some useful information.)

NagarajSMurthy · 2020-09-16T15:35:32Z

Yes, I used the model that you have pointed at. Below is the error under gdb:
`gdb python3
GNU gdb (Raspbian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...done.

(gdb) run onnx_test.py
Starting program: /usr/local/bin/python3 onnx_test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
/home/pi/Desktop/EdgeAI/MNIST/model.onnx
running

Program received signal SIGSEGV, Segmentation fault.
0x753d5618 in onnxruntime::FreeDimensionOverrideTransformer::FreeDimensionOverrideTransformer(gsl::span<onnxruntime::FreeDimensionOverride const>) ()
from /home/pi/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
(gdb)
`

NagarajSMurthy · 2020-09-18T04:52:22Z

How can I resolve this issue? @mrry

tianleiwu · 2020-09-20T21:38:10Z

@skottmckay, @MaximKalininMS, it looks like a bug in FreeDimensionOverrideTransformer. Please help take a look, and suggest whether it needs to be fixed in 1.5 release.

skottmckay · 2020-09-21T21:19:56Z

I don't see any obvious cause as the free dimension override list should be empty unless specified via session options. If the list is empty the FreeDimensionOverrideTransformer constructor doesn't have much to do.

Would need someone with a raspberry pi to do a debug build and see if something is creating a bogus value in SessionOptions.free_dimension_overrides.

NagarajSMurthy · 2020-09-22T07:36:34Z

I have got a RaspberryPi. How to do a debug build?
Also, I did not find the SessionOptions.free_dimension_overrides method in the python documentation.

skottmckay · 2020-09-23T04:01:12Z

You'd need to do a debug build and be familiar with a debugger like gdb to set breakpoints in a few places to see if SessionOptions.free_dimension_overrides gets corrupted, and if so when.

skottmckay · 2020-09-24T08:48:00Z

One other thing you could try is building for arm64. I believe the RaspberryPi 3B+ has an Arm Cortex-A53 chip which is 64-bit. May or may not help.

If you built with the dockerfile you could modify it to use these as the build args to build arm64

ARG BUILDARGS="--config ${BUILDTYPE} --arm64"

If you wanted to do a debug build, update it from MinSizeRel to Debug:

ARG BUILDTYPE=Debug

If you are able to get a debug build onto the device you should be able to run gdb as you did previously. When you hit the seg fault you could run a few commands to get a bit more info about the issue.

Print out the value of overrides_to_apply
(gdb) p overrides_to_apply
Print the backtrace
(gdb) bt
Go up a few frames to the InferenceSession class and print session options

(gdb) up 3
(gdb) p session_options_

skottmckay · 2020-10-03T01:43:48Z

I can't reproduce on a Raspberry Pi 4. When you built ORT did you build on the device, and what was the command line used?

NagarajSMurthy · 2020-10-07T12:45:01Z

@skottmckay
I built using the docker file only. I'll use this command while building as you suggested and update:
ARG BUILDARGS="--config ${BUILDTYPE} --arm64"

I didn't cross-compile. I built ORT on my Raspberry Pi 3.
ORT is working if I disable the graph optimizations.

skottmckay · 2020-10-08T04:57:53Z

I've since learned that by default the Raspberry Pi OS is 32-bit despite the CPU being 64-bit. There's a beta 64-bit OS but that's not officially released yet. Due to that, --arm is the correct flag to use (unless you use the 64-bit OS for which --arm64 would be correct).

The docker file involves using an emulation layer which shouldn't be necessary if building on device. Not sure if that is problematic. You could try commenting out the lines with RUN [ "cross-build-start" ] and RUN [ "cross-build-end" ]

The docker file is also using an image for the stretch distro of the OS. Do you have that installed or the more recent buster? Can try cat /etc/*release to see.

ORT is 'working' if you disable optimizations as it doesn't run the code that breaks in when that is the case.

skottmckay · 2020-10-09T07:47:33Z

Using https://github.com/microsoft/onnxruntime/blob/master/dockerfiles/Dockerfile.arm32v7 with 2 changes I was able to build ORT and run it successfully on a Raspberry Pi 4B with optimizations enabled.

Update to match python version on Raspberry Pi 4 of 3.7
+#FROM balenalib/raspberrypi3-python:latest-stretch-build
+FROM balenalib/raspberrypi3-python:3.7-stretch-build
Comment out the cross-compilation start/end
-RUN [ "cross-build-start" ]
+# RUN [ "cross-build-start" ]

-RUN [ "cross-build-end" ]
+# RUN [ "cross-build-end" ]

docker build -t ort-stretch-py37-arm32v7 -f Dockerfile.arm32v7 .
docker create -ti --name ort_temp ort-stretch-py37-arm32v7 bash
docker cp ort_temp:/code/onnxruntime/build/Linux/MinSizeRel/dist/onnxruntime-1.5.2-cp37-cp37m-linux_armv7l.whl .
docker cp ort_temp:/code/onnxruntime/build/Linux/MinSizeRel/testdata/ort_github_issue_4031.onnx .
pip3 install -U ./onnxruntime-1.5.2-cp37-cp37m-linux_armv7l.whl

import onnxruntime as ort

model = 'ort_github_issue_4031.onnx'
optimized_model = 'ort_github_issue_4031.opt.onnx'
so = ort.SessionOptions()
so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
so.optimized_model_filepath = optimized_model
ort_session = ort.InferenceSession(model,sess_options=so)

stale · 2020-12-19T08:12:50Z

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

stale · 2022-04-19T08:54:08Z

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

mrry added component:optimizer labels Sep 15, 2020

stale bot added the stale issues that have not been addressed in a while; categorized by a bot label Dec 19, 2020

faxu removed the type:SegFault/InvalidMemAccess label Feb 23, 2021

stale bot removed the stale issues that have not been addressed in a while; categorized by a bot label Feb 23, 2021

faxu removed the type:bug label Aug 18, 2021

stale bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 19, 2022

sophies927 added core runtime issues related to core runtime and removed component:optimizer labels Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation error when using graph optimization #5175

Segmentation error when using graph optimization #5175

NagarajSMurthy commented Sep 15, 2020

mrry commented Sep 15, 2020

NagarajSMurthy commented Sep 16, 2020

mrry commented Sep 16, 2020

NagarajSMurthy commented Sep 16, 2020

NagarajSMurthy commented Sep 18, 2020

tianleiwu commented Sep 20, 2020 •

edited

Loading

skottmckay commented Sep 21, 2020

NagarajSMurthy commented Sep 22, 2020

skottmckay commented Sep 23, 2020

skottmckay commented Sep 24, 2020

skottmckay commented Oct 3, 2020

NagarajSMurthy commented Oct 7, 2020

skottmckay commented Oct 8, 2020

skottmckay commented Oct 9, 2020

stale bot commented Dec 19, 2020

stale bot commented Apr 19, 2022

Segmentation error when using graph optimization #5175

Segmentation error when using graph optimization #5175

Comments

NagarajSMurthy commented Sep 15, 2020

mrry commented Sep 15, 2020

NagarajSMurthy commented Sep 16, 2020

mrry commented Sep 16, 2020

NagarajSMurthy commented Sep 16, 2020

NagarajSMurthy commented Sep 18, 2020

tianleiwu commented Sep 20, 2020 • edited Loading

skottmckay commented Sep 21, 2020

NagarajSMurthy commented Sep 22, 2020

skottmckay commented Sep 23, 2020

skottmckay commented Sep 24, 2020

skottmckay commented Oct 3, 2020

NagarajSMurthy commented Oct 7, 2020

skottmckay commented Oct 8, 2020

skottmckay commented Oct 9, 2020

stale bot commented Dec 19, 2020

stale bot commented Apr 19, 2022

tianleiwu commented Sep 20, 2020 •

edited

Loading