Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation error when using graph optimization #5175

Open
NagarajSMurthy opened this issue Sep 15, 2020 · 16 comments
Open

Segmentation error when using graph optimization #5175

NagarajSMurthy opened this issue Sep 15, 2020 · 16 comments
Labels
core runtime issues related to core runtime stale issues that have not been addressed in a while; categorized by a bot

Comments

@NagarajSMurthy
Copy link

Describe the bug
When creating an inference session, even after specifying the session option 'optimized_model_filepath' to store the optimized model, the program gives segmentation error.

Urgency
Moderate

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Raspberry Pi 3B+
  • ONNX Runtime installed from (source or binary): source
  • ONNX Runtime version: 1.4.0
  • Python version: 3.8.0
  • Visual Studio version (if applicable):
  • GCC/Compiler version (if compiling from source): 6.3.0
  • CUDA/cuDNN version:
  • GPU model and memory:

To Reproduce

so = onnxruntime.SessionOptions()
so.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_BASIC
so.optimized_model_filepath = '/home/pi/Desktop/AI/optimised_model.onnx'

ort_session = onnxruntime.InferenceSession(path,sess_options=so)

Expected behavior
An inference session would have started with graph optimizations set.

@mrry
Copy link
Contributor

mrry commented Sep 15, 2020

Sorry about that, @NagarajSMurthy! Could you capture a stack trace for the segmentation error (e.g. by running the Python process under gdb)? That will help us to identify which component is causing the problem?

Also, to clarify, does the failure still occur if you do not pass any SessionOptions when creating the InferenceSession?

@NagarajSMurthy
Copy link
Author

Thanks for the response @mrry
I ran the process under gdb and this is what I got:

`(gdb) run onnx_test.py
Starting program: /usr/local/bin/python3 onnx_test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
/home/pi/Desktop/EdgeAI/MNIST/MnistConvnet_new_reg_3.onnx
running

Program received signal SIGSEGV, Segmentation fault.
0x753d5618 in onnxruntime::FreeDimensionOverrideTransformer::FreeDimensionOverrideTransformer(gsl::span<onnxruntime::FreeDimensionOverride const>) ()
from /home/pi/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
`
Also I do get a SegFault when I run without passing any SessionOptions. When I disable all the graph optimizations, it works perfectly fine (#3296 (comment)).
If I disable all graph optimizations and give only the optimized_file_path option in SessionOptions, it saves a optimised model to the location.

@mrry
Copy link
Contributor

mrry commented Sep 16, 2020

Thanks for the update. A couple more questions:

@NagarajSMurthy
Copy link
Author

Yes, I used the model that you have pointed at. Below is the error under gdb:
`gdb python3
GNU gdb (Raspbian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...done.

(gdb) run onnx_test.py
Starting program: /usr/local/bin/python3 onnx_test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
/home/pi/Desktop/EdgeAI/MNIST/model.onnx
running

Program received signal SIGSEGV, Segmentation fault.
0x753d5618 in onnxruntime::FreeDimensionOverrideTransformer::FreeDimensionOverrideTransformer(gsl::span<onnxruntime::FreeDimensionOverride const>) ()
from /home/pi/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
(gdb)
`

@NagarajSMurthy
Copy link
Author

How can I resolve this issue? @mrry

@tianleiwu
Copy link
Contributor

tianleiwu commented Sep 20, 2020

@skottmckay, @MaximKalininMS, it looks like a bug in FreeDimensionOverrideTransformer. Please help take a look, and suggest whether it needs to be fixed in 1.5 release.

@skottmckay
Copy link
Contributor

I don't see any obvious cause as the free dimension override list should be empty unless specified via session options. If the list is empty the FreeDimensionOverrideTransformer constructor doesn't have much to do.

Would need someone with a raspberry pi to do a debug build and see if something is creating a bogus value in SessionOptions.free_dimension_overrides.

@NagarajSMurthy
Copy link
Author

I have got a RaspberryPi. How to do a debug build?
Also, I did not find the SessionOptions.free_dimension_overrides method in the python documentation.

@skottmckay
Copy link
Contributor

You'd need to do a debug build and be familiar with a debugger like gdb to set breakpoints in a few places to see if SessionOptions.free_dimension_overrides gets corrupted, and if so when.

@skottmckay
Copy link
Contributor

One other thing you could try is building for arm64. I believe the RaspberryPi 3B+ has an Arm Cortex-A53 chip which is 64-bit. May or may not help.

If you built with the dockerfile you could modify it to use these as the build args to build arm64

ARG BUILDARGS="--config ${BUILDTYPE} --arm64"

If you wanted to do a debug build, update it from MinSizeRel to Debug:

ARG BUILDTYPE=Debug

If you are able to get a debug build onto the device you should be able to run gdb as you did previously. When you hit the seg fault you could run a few commands to get a bit more info about the issue.

  1. Print out the value of overrides_to_apply
    (gdb) p overrides_to_apply

  2. Print the backtrace
    (gdb) bt

  3. Go up a few frames to the InferenceSession class and print session options

(gdb) up 3
(gdb) p session_options_

@skottmckay
Copy link
Contributor

I can't reproduce on a Raspberry Pi 4. When you built ORT did you build on the device, and what was the command line used?

@NagarajSMurthy
Copy link
Author

@skottmckay
I built using the docker file only. I'll use this command while building as you suggested and update:
ARG BUILDARGS="--config ${BUILDTYPE} --arm64"

I didn't cross-compile. I built ORT on my Raspberry Pi 3.
ORT is working if I disable the graph optimizations.

@skottmckay
Copy link
Contributor

I've since learned that by default the Raspberry Pi OS is 32-bit despite the CPU being 64-bit. There's a beta 64-bit OS but that's not officially released yet. Due to that, --arm is the correct flag to use (unless you use the 64-bit OS for which --arm64 would be correct).

The docker file involves using an emulation layer which shouldn't be necessary if building on device. Not sure if that is problematic. You could try commenting out the lines with RUN [ "cross-build-start" ] and RUN [ "cross-build-end" ]

The docker file is also using an image for the stretch distro of the OS. Do you have that installed or the more recent buster? Can try cat /etc/*release to see.

ORT is 'working' if you disable optimizations as it doesn't run the code that breaks in when that is the case.

@skottmckay
Copy link
Contributor

Using https://github.com/microsoft/onnxruntime/blob/master/dockerfiles/Dockerfile.arm32v7 with 2 changes I was able to build ORT and run it successfully on a Raspberry Pi 4B with optimizations enabled.

  1. Update to match python version on Raspberry Pi 4 of 3.7
    +#FROM balenalib/raspberrypi3-python:latest-stretch-build
    +FROM balenalib/raspberrypi3-python:3.7-stretch-build

  2. Comment out the cross-compilation start/end
    -RUN [ "cross-build-start" ]
    +# RUN [ "cross-build-start" ]

-RUN [ "cross-build-end" ]
+# RUN [ "cross-build-end" ]

docker build -t ort-stretch-py37-arm32v7 -f Dockerfile.arm32v7 .
docker create -ti --name ort_temp ort-stretch-py37-arm32v7 bash
docker cp ort_temp:/code/onnxruntime/build/Linux/MinSizeRel/dist/onnxruntime-1.5.2-cp37-cp37m-linux_armv7l.whl .
docker cp ort_temp:/code/onnxruntime/build/Linux/MinSizeRel/testdata/ort_github_issue_4031.onnx .
pip3 install -U ./onnxruntime-1.5.2-cp37-cp37m-linux_armv7l.whl
import onnxruntime as ort

model = 'ort_github_issue_4031.onnx'
optimized_model = 'ort_github_issue_4031.opt.onnx'
so = ort.SessionOptions()
so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
so.optimized_model_filepath = optimized_model
ort_session = ort.InferenceSession(model,sess_options=so)

@stale
Copy link

stale bot commented Dec 19, 2020

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@stale stale bot added the stale issues that have not been addressed in a while; categorized by a bot label Dec 19, 2020
@stale stale bot removed the stale issues that have not been addressed in a while; categorized by a bot label Feb 23, 2021
@faxu faxu removed the type:bug label Aug 18, 2021
@stale
Copy link

stale bot commented Apr 19, 2022

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@stale stale bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 19, 2022
@sophies927 sophies927 added core runtime issues related to core runtime and removed component:optimizer labels Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core runtime issues related to core runtime stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

6 participants