Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault (core dumped) while running Matrix Multiplication #211

Open
AjinkyaBankar opened this issue Jan 30, 2021 · 19 comments
Open

Comments

@AjinkyaBankar
Copy link

I am running a simple matrix multiplication code on Ubuntu 20.04 with CUDA 11. It keeps showing the following line for several times:

GPGPU-Sim PTX: CUDA API function "unsigned int __cudaPushCallConfiguration(dim3, dim3, size_t, CUstream_st*)" has been called.

Then it says:

Segmentation fault (core dumped)

Kindly help to solve the problem. Thank you.

@mkhairy
Copy link
Contributor

mkhairy commented Feb 10, 2021

@tgrogers I think we have fixed that, right?

@PsFreedom
Copy link

I have this problem as well in Ubuntu 20.04.
Then I tried several GPGPU-Sim, GCC, and CUDA versions but the problem still persists.

Decided to change back to Ubuntu 18.04, it works now, the problem disappeared. Tried on CUDA 9.1 and the latest 11.3. So, you can just change to Ubuntu 18.04.

@electricSamarth
Copy link

I am facing the exact same issue, I am running Ubuntu 20.04.2 LTS, I don't wanna shift to 18.04? Any sort of help will be useful.

@mattsinc
Copy link

mattsinc commented May 4, 2021

Have any of you tried running the debug version of GPGPU-Sim with this and running the application in gdb? Curious what the backtrace shows.

@electricSamarth
Copy link

Hi @mattsinc. As it turns out, it works if I use the debug version instead of the release version. I did this after reading your comment, thinking maybe before running gdb, I should just try the debug version.
Thanks for your time :)

@mattsinc
Copy link

mattsinc commented May 6, 2021

Well I'm glad that is working, but it does make it difficult to debug. If you run release with gdb, does it give you any information about where the failure is happening?

@electricSamarth
Copy link

Hi @mattsinc , sorry for the late reply but here is a screenshot of the release environment running the program in GDB. The documentation says that the debug version is slower. So I need to get the release version working.
image

Does this have anything with me running Ubuntu 20.04 ?

@mattsinc
Copy link

Not sure about Ubuntu 20, what does the backtrace show?

Matt

@electricSamarth
Copy link

image
This is how the backtrace looks

@mattsinc
Copy link

mattsinc commented May 12, 2021

Interesting. Just speculating, my guess is the version of gcc in Ubuntu 20 is using the __my_func__ variable differently than prior versions have assumed. I don't know exactly why that might be, but I will note that the line where your failure is coming from:

announce_call(__my_func__);

is only be used when you set the debug level is >= 3. I'm assuming you are setting PTX_SIM_DEBUG to 3 or higher on the command line? If you don't need that level of debug information, you could reduce the debug level and this problem in theory should go away. Alternatively, you could just change the debug level for the prints in cuda_runtime_api if you want debug information elsewhere but don't want these specific prints.

Matt

@electricSamarth
Copy link

I have explicitly not set the environment variable you are talking about. I did try setting it to 2, I am still getting the same error and same gdb output. Also, here is the end of that backtrace
image

@mattsinc
Copy link

Interesting -- if you didn't set the flag to anything, then the code from the first backtrace shouldn't have been triggered :)

Not sure what else to try, hopefully one of the main maintainers can chime in. @mkhairy above you said this was fixed. Is there a fix to push in for it?

Matt

@electricSamarth
Copy link

electricSamarth commented May 12, 2021

What confuses me the most is that when I run the set_environment to debug it works perfectly (it gives me the output as it should) but I get a SEGSEV when I source set_environment default i.e. release
Also, @mattsinc thanks for your time and advice :)

@electricSamarth
Copy link

@AjinkyaBankar @tgrogers @mkhairy can you guys help?

@electricSamarth
Copy link

@mattsinc I used gcc 7 and it worked, the release version works

@mkhairy
Copy link
Contributor

mkhairy commented May 17, 2021 via email

@jiashenC
Copy link

I have tried this fix, but the problem still exists. Following other posts, the debug version works fine but the release version fails. It first outputs many lines of

GPGPU-Sim PTX: CUDA API function "unsigned int __cudaPushCallConfiguration(dim3, dim3, size_t, CUstream_st*)" has been called.

and then seg faults.

I am on Ubuntu 16.04 LTS with gcc 8.

@electricSamarth
Copy link

electricSamarth commented Jan 13, 2022 via email

@jiashenC
Copy link

I am still getting the same error with gcc-7.5.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants