-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Null pointer dereference on concurrent VC_SM_CMA_IOCTL_MEM_IMPORT_DMABUF ioctl #6701
Comments
Investigating. It looks like it fails and is still holding sm_state->lock, so |
#6703 fixes one issue. Allocating and freeing kernel IDs took the spinlock, but looking up the value didn't. Another thread importing or freeing could therefore corrupt the idr whilst a thread was doing a lookup, resulting in a duff buffer pointer. Your test case ran for just over 100000 iterations (compared to a few thousands before), but still failed. |
Definitely timing related as adding logging reduces the rate of reproduction :-/ |
The logging output is leaving me confused at the moment. I've tweaked the test case to have a separate dmabuf with a different size per thread, so that I can identify which log messages are associated to which thread. In the failure case I have the "attempt to import" message, but I don't get the log line from One to look at further tomorrow. |
Not wasting my time at all - more eyes on code is always a good thing.
The error was fairly infrequent (every few days), but when triggered resulted in all the monitors on a police CCTV system being blanked! They weren't too happy to say the least. Although in this case I've already been down that path in trying to work out how we can have skipped the vc_sm_add_resource call. Neither of the pr_debug messages from vc_sm_cma_vchi_import failing appear in my logs, and I've added a log message if Regarding threading, the userspace ioctl call will call On close of the fd, |
I did a bit of debugging and found the following:
I explicitly removed the kfree(buffer), so addresses should be unique and not identical due to reuse. Two things:
|
Continuing:
I traced further into |
I give up for today. After staring at the code for the last two hours I tried replacing |
The weird part is that the vpu_event call shouldn't happen until all firmware references to the memory block have been released, and that includes the one that vc-sm-cma has just taken whilst importing the dmabuf. I have just spotted that there is a slightly surprising loop in I'm adding logging / error handling to check that |
I guess that's the one mentioned here? |
So switching to With The bit I still don't understand is how we're getting that callback twice (I think that's more likely than the sequencing getting messed up during allocation). I am also seeing numerous mailbox calls timing out, but checking the VPU logs it is maxed out dealing with all these buffer mapping calls, so that's not so unsurprising. |
Something is certainly going wrong and stopping messages being handled. |
So this might not even be a Linux-side bug but something in whatever is handling the mailbox replies on the other end? I would assume that the part responsible for handling the 'import' or 'release' message also doesn't accidentally produce two responses. |
It's something in the handling of VCHI. It's most likely that "slots" aren't getting freed under some conditions, and we end up with none available. I'll be talking to pelwell in the morning over it - he's the man who knows how that is meant to all work. |
I'm now either getting lead astray due to incorrect debug output, or weirder things are going on. The VPU side is implying that with more than 4 threads running we end up with allocations still mapped in the VPU after the application closes. Those shouldn't have been released on the kernel side until the firmware has acknowledged releasing them, so there's something weird there. That may be the error handling for closing the app whilst ioctls are in process, but still it isn't nice. I need to reacquaint myself with how this is meant to work too - I do now understand why we get raspberrypi-exp-gpio, raspberrypi-clk, and raspberrypi-firmware timeouts with this test app - for some reason the SM thread on the VPU is running at absolute highest priority (barring interrupts), so the mailbox thread barely gets a look in. |
So at least some of the leaks are due to interrupting the VCHIQ call.
The VPU side does support a free using the resource address, so we could send that message, but how do we handle the case of no message slots being available at the point that we're aborting, and what if someone imports the same dmabuf (hence same address) multiple times? So many implications! |
Those resource leaks on program exit are another issue, unrelated to the NULL-deref, correct? I've since looked into how to avoid this issue and figured I might use a custom kernel driver that does the physical address lookup, returns a custom file handle and uses dma_sync_sg_for_cpu while that file handle is open. Once I close the handle the dma buffer gets released. I don't really need the explicit VPU import as having the physical address of the dmabuf is sufficient for my use case. I put the kernel driver source code here: https://github.com/info-beamer/dma_phys_addr (Also I'm only 90% sure of what I'm doing is correct, but output looks good and now that I use the synchronization calls, it no longer produces glitch art) |
Potentially related: Happened on a Raspberry Pi 3 Model B Rev 1.2 (a02082) running the latest 6.12.21-v8+ kernel:
|
Describe the bug
I've observed kernel null pointer dereferences while using the VC_SM_CMA_IOCTL_MEM_IMPORT_DMABUF ioctl. A traceback might look like this:
Once that happened, other calls interfacing with the hardware might lock up and in my case the hardware watchdog resets the CPU. See also the discussion on the Pi forum.
Steps to reproduce the behaviour
On a Pi4, run the code from https://gist.github.com/dividuum/da0a9a7038b592898ea269f19917e438. After a few seconds, the program will stop showing output and the kernel log will likely show a traceback similar to the one above. Using more threads seems to speed up the time it takes to crash.
Device (s)
Raspberry Pi 4 Mod. B
System
Tested on
Logs
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: