-
Couldn't load subscription status.
- Fork 5.3k
Testing PCIe graphics cards on Pi5 #7072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rpi-6.17.y
Are you sure you want to change the base?
Conversation
d352ac5 to
d783ebd
Compare
|
👀 |
8b1e2b1 to
fe3d35c
Compare
df370fe to
06b5122
Compare
Using increased bit depth for no reason increases power consumption, and differs from the behaviour prior to the conversion to use the HDMI helper functions. Initialise the state max_bpc and requested_max_bpc to the minimum value supported. This only affects Raspberry Pi, as the other users of the helpers (rockchip/inno_hdmi and sunx4i) only support a bit depth of 8. Signed-off-by: Dave Stevenson <[email protected]>
DSI0 and DSI1 have different widths for the command FIFO (24bit vs 32bit), but the driver was assuming the 32bit width of DSI1 in all cases. DSI0 also wants the data packed as 24bit big endian, so the formatting code needs updating. Handle the difference via the variant structure. Signed-off-by: Dave Stevenson <[email protected]>
The Raspberry Pi RP1 chip has the Cadence GEM ethernet controller, so add a compatible string for it. Signed-off-by: Dave Stevenson <[email protected]>
The RP1 chip has the Cadence GEM block, but wants the tx_clock to always run at 125MHz, in the same way as sama7g5. Add the relevant configuration. Signed-off-by: Dave Stevenson <[email protected]>
During normal operations, the cursor position update is done through an asynchronous plane update, which on the vc4 driver basically just modifies the right dlist word to move the plane to the new coordinates. However, when we have the overscan margins setup, we fall back to a regular commit when we are next to the edges. And since that commit happens to be on a cursor plane, it's considered a legacy cursor update by KMS. The main difference it makes is that it won't wait for its completion (ie, next vblank) before returning. This means if we have multiple commits happening in rapid succession, we can have several of them happening before the next vblank. In parallel, our dlist allocation is tied to a CRTC state, and each time we do a commit we end up with a new CRTC state, with the previous one being freed. This means that we free our previous dlist entry (but don't clear it though) every time a new one is being committed. Now, if we were to have two commits happening before the next vblank, we could end up freeing reusing the same dlist entries before the next vblank. Indeed, we would start from an initial state taking, for example, the dlist entries 10 to 20, then start a commit taking the entries 20 to 30 and setting the dlist pointer to 20, and freeing the dlist entries 10 to 20. However, since we haven't reach vblank yet, the HVS is still using the entries 10 to 20. If we were to make a new commit now, chances are the allocator are going to give the 10 to 20 entries back, and we would change their content to match the new state. If vblank hasn't happened yet, we just corrupted the active dlist entries. A first attempt to solve this was made by creating an intermediate dlist buffer to store the current (ie, as of the last commit) dlist content, that we would update each time the HVS is done with a frame. However, if the interrupt handler missed the vblank window, we would end up copying our intermediate dlist to the hardware one during the composition, essentially creating the same issue. Since making sure that our interrupt handler runs within a fixed, constrained, time window would require to make Linux a real-time kernel, this seems a bit out of scope. Instead, we can work around our original issue by keeping the dlist slots allocation longer. That way, we won't reuse a dlist slot while it's still in flight. In order to achieve this, instead of freeing the dlist slot when its associated CRTC state is destroyed, we'll queue it in a list. A naive implementation would free the buffers in that queue when we get our end of frame interrupt. However, there's still a race since, just like in the shadow dlist case, we don't control when the handler for that interrupt is going to run. Thus, we can end up with a commit adding an old dlist allocation to our queue during the window between our actual interrupt and when our handler will run. And since that buffer is still being used for the composition of the current frame, we can't free it right away, exposing us to the original bug. Fortunately for us, the hardware provides a frame counter that is increased each time the first line of a frame is being generated. Associating the frame counter the image is supposed to go away to the allocation, and then only deallocate buffers that have a counter below or equal to the one we see when the deallocation code should prevent the above race from occurring. Signed-off-by: Maxime Ripard <[email protected]>
Users are reporting running out of DLIST memory. Add a debugfs file to dump out all the allocations. Signed-off-by: Dave Stevenson <[email protected]>
We have a read-modify-write race when updating SCALER_DISPCTRL for underrun and end-of-frame interrupts. Ideally it would be fixed via a spinlock or similar, but that will require a reasonable amount of study to ensure we don't get deadlocks. The underrun reporting is only for debug, so disable it for now. Signed-off-by: Dave Stevenson <[email protected]>
The dmabuf import already checks that the backing buffer is contiguous and rejects it if it isn't. vc4 also requires that the buffer is in the bottom 1GB of RAM, and this is all correctly defined via dma-ranges. However the kernel silently uses swiotlb to bounce dma buffers around if they are in the wrong region. This relies on dma sync functions to be called in order to copy the data to/from the bounce buffer. DRM is based on all memory allocations being coherent with the GPU so that any updates to a framebuffer will be acted on without the need for any additional update. This is fairly fundamentally incompatible with needing to call dma_sync_ to handle the bounce buffer copies, and therefore we have to detect and reject mappings that use bounce buffers. Signed-off-by: Dave Stevenson <[email protected]>
DSI0 is misbehaving and needs to action things on vblank to work around it. Add a new hook to call across during vblank. Signed-off-by: Dave Stevenson <[email protected]>
The initialisation sequence differs slightly from the documentation in that the clocks are meant to be running before resets and similar. Signed-off-by: Dave Stevenson <[email protected]>
vc4_dsi_bridge_disable wasn't resetting things during shutdown, so add that in. Signed-off-by: Dave Stevenson <[email protected]>
The block must be enabled for the FIFO resets to be actioned, so ensure this is the case. Signed-off-by: Dave Stevenson <[email protected]>
The pixel to byte FIFO appears to not always reset correctly, which can lead to colour errors and/or horizontal shifts. Reset on every vblank to work around the issue. Signed-off-by: Dave Stevenson <[email protected]>
The TC358762 bridge and panel decodes the mode differently on DSI0 to DSI1 for no obvious reason, and results in a shift off the screen. Whilst it would be possible to change the compatible used for the panel, that then messes up Pi5. As it appears to be restricted to vc4 DSI0, fix up the mode in vc4_dsi. Signed-off-by: Dave Stevenson <[email protected]>
Some DSI peripheral drivers wish to send commands in the post_disable or panel unprepare callback. These are called after the DSI host's disable call, but before the host's post_disable if pre_enable_prev_first is set. Don't reset the block until post_disable to allow these commands to be sent. Signed-off-by: Dave Stevenson <[email protected]>
Enable the TMP117 driver as a module. See: raspberrypi#7077 Signed-off-by: Phil Elwell <[email protected]>
Masquerading Interrupt split transfers as Control puts the transfer into the non-periodic handler in the hub. This stops the hub dropping complete-split data in the microframe after a CSPLIT should have arrived, improving resilience to host IRQ latency. Devices are none the wiser - the handshake tokens are the same. Originally devised by Hans Petter Selasky @ FreeBSD. (v2: dwc2 needs an un-masquerade prior to channel interrupt handling) Signed-off-by: Jonathan Bell <[email protected]>
Upstream series https://lore.kernel.org/linux-media/[email protected]/ The subdev format documentation has a subsection describing how to use the media bus pixel codes for serial buses. While it describes the sampling part well, it doesn't really describe the current convention used for the components order. Let's improve that. Signed-off-by: Maxime Ripard <[email protected]>
Upstream series https://lore.kernel.org/linux-media/[email protected]/ The tc358743 is an HDMI to MIPI-CSI2 bridge. It can output all three HDMI 1.4 video formats: RGB 4:4:4, YCbCr 4:2:2, and YCbCr 4:4:4. RGB 4:4:4 is converted to the MIPI-CSI2 RGB888 video format, and listed in the driver as MEDIA_BUS_FMT_RGB888_1X24. Most CSI2 receiver drivers then map MEDIA_BUS_FMT_RGB888_1X24 to V4L2_PIX_FMT_RGB24. However, V4L2_PIX_FMT_RGB24 is defined as having its color components in the R, G and B order, from left to right. MIPI-CSI2 however defines the RGB888 format with blue first. This essentially means that the R and B will be swapped compared to what V4L2_PIX_FMT_RGB24 defines. The proper MBUS format would be BGR888, so let's use that. Fixes: d32d986 ("[media] Driver for Toshiba TC358743 HDMI to CSI-2 bridge") Signed-off-by: Maxime Ripard <[email protected]>
This patch adds support for external FSIN-triggered snapshot mode to the OmniVision OV9282 sensor driver. It enables frame capture synchronized with an external hardware trigger signal. Signed-off-by: Omer Faruk Edemen <[email protected]>
Adds DT property `trigger-mode` to enable FSIN-triggered frame capture. Includes overlay and README update for ov9281_trig. Signed-off-by: Omer Faruk Edemen <[email protected]>
Although the PIO throughput benefits from larger burst sizes, only the first two DMA channels support a burst size of 8 - the others are capped at 4. To avoid misconfiguring the PIO hardware, retrieve the actual max_burst value from the DMA channel's capabilities. Signed-off-by: Phil Elwell <[email protected]>
e68b449 to
1d1d7d6
Compare
Signed-off-by: Maxime Ripard <[email protected]>
Replace the use of vcdbg with vclog, and correct the documentation URL. See: raspberrypi#7093 Signed-off-by: Phil Elwell <[email protected]>
See: raspberrypi#7091 Signed-off-by: Phil Elwell <[email protected]>
See: raspberrypi#7091 Signed-off-by: Phil Elwell <[email protected]>
|
Would you be able to rebase this PR, mostly to get a nicer view of this PR against the rebased 6.17.y branch? Also, is there any semi-official word from Pi Towers if this has a chance of merging? It's certainly been stable for me across multiple iterations of testing on multiple generations of AMD and Intel GPUs... |
I don't think we want this to be in the pi tree but not in the upstream tree. Ideally any "correct" commits here should be submitted upstream. |
…P Mode commit 3776c685ebe5f43e9060af06872661de55e80b9a upstream. Currently, whenever there is a need to transmit an Action frame, the brcmfmac driver always uses the P2P vif to send the "actframe" IOVAR to firmware. The P2P interfaces were available when wpa_supplicant is managing the wlan interface. However, the P2P interfaces are not created/initialized when only hostapd is managing the wlan interface. And if hostapd receives an ANQP Query REQ Action frame even from an un-associated STA, the brcmfmac driver tries to use an uninitialized P2P vif pointer for sending the IOVAR to firmware. This NULL pointer dereferencing triggers a driver crash. [ 1417.074538] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [...] [ 1417.075188] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT) [...] [ 1417.075653] Call trace: [ 1417.075662] brcmf_p2p_send_action_frame+0x23c/0xc58 [brcmfmac] [ 1417.075738] brcmf_cfg80211_mgmt_tx+0x304/0x5c0 [brcmfmac] [ 1417.075810] cfg80211_mlme_mgmt_tx+0x1b0/0x428 [cfg80211] [ 1417.076067] nl80211_tx_mgmt+0x238/0x388 [cfg80211] [ 1417.076281] genl_family_rcv_msg_doit+0xe0/0x158 [ 1417.076302] genl_rcv_msg+0x220/0x2a0 [ 1417.076317] netlink_rcv_skb+0x68/0x140 [ 1417.076330] genl_rcv+0x40/0x60 [ 1417.076343] netlink_unicast+0x330/0x3b8 [ 1417.076357] netlink_sendmsg+0x19c/0x3f8 [ 1417.076370] __sock_sendmsg+0x64/0xc0 [ 1417.076391] ____sys_sendmsg+0x268/0x2a0 [ 1417.076408] ___sys_sendmsg+0xb8/0x118 [ 1417.076427] __sys_sendmsg+0x90/0xf8 [ 1417.076445] __arm64_sys_sendmsg+0x2c/0x40 [ 1417.076465] invoke_syscall+0x50/0x120 [ 1417.076486] el0_svc_common.constprop.0+0x48/0xf0 [ 1417.076506] do_el0_svc+0x24/0x38 [ 1417.076525] el0_svc+0x30/0x100 [ 1417.076548] el0t_64_sync_handler+0x100/0x130 [ 1417.076569] el0t_64_sync+0x190/0x198 [ 1417.076589] Code: f9401e80 aa1603e2 f9403be1 5280e483 (f9400000) Fix this, by always using the vif corresponding to the wdev on which the Action frame Transmission request was initiated by the userspace. This way, even if P2P vif is not available, the IOVAR is sent to firmware on AP vif and the ANQP Query RESP Action frame is transmitted without crashing the driver. Move init_completion() for "send_af_done" from brcmf_p2p_create_p2pdev() to brcmf_p2p_attach(). Because the former function would not get executed when only hostapd is managing wlan interface, and it is not safe to do reinit_completion() later in brcmf_p2p_tx_action_frame(), without any prior init_completion(). And in the brcmf_p2p_tx_action_frame() function, the condition check for P2P Presence response frame is not needed, since the wpa_supplicant is properly sending the P2P Presense Response frame on the P2P-GO vif instead of the P2P-Device vif. Cc: [email protected] Fixes: 18e2f61 ("brcmfmac: P2P action frame tx") Signed-off-by: Gokul Sivakumar <[email protected]> Acked-by: Arend van Spriel <[email protected]> Link: https://patch.msgid.link/[email protected] [Cc stable] Signed-off-by: Johannes Berg <[email protected]>
|
I need to tidy up the branch. AMD GPUs only need 70fe325 and the defconfig update. Module size for amdgpu is 2.5MB. I haven't tested with 16kB page sizes, so it may only apply to bcm2711_defconfig and not bcm2712_defconfig. We could add in radeon for the older cards (module size is 466kB), but it won't work with labwc / wlroots as the DRM driver doesn't support atomic updates. Intel Xe isn't quite there yet, and the places it calls into i915 are annoying. Easiest to drop it for now. The i915 changes were proposed by Mesa devs so may be acceptable to mainline. The BAR resizing patch is already being discussed upstream and has been looked at by P33M. (Module size is 814kB). nouveau is a fair way off, so that can just be dropped for now. (Module size is 632kB). |
…node") We lost a line in the forward port, which meant that it always used /dev/fb0, and complained that the sysfs nodes already existed. Fixes: c91c9f2 ("fbdev: Allow client to request a particular /dev/fbN node") Signed-off-by: Dave Stevenson <[email protected]>
8851BU, 8852BU
There were various points where the loader was using uninitialised data, had the potential to run off the end of an array, or was handling core functions incorrectly. Fix these up. Also handle 24bpp and 32bpp framebuffers. Signed-off-by: Dave Stevenson <[email protected]>
The mappings are the reverse of r8g8b8 and r5g6b5 respectively Signed-off-by: Dave Stevenson <[email protected]>
Modify the PDAF Datatype of the Arducam 64MP camera from 0x30 to 0x12 so that the Raspberry Pi 5 cfe driver can receive PDAF data. Signed-off-by: Lee Jackson <[email protected]>
Various PCIe controllers on ARM64 platforms don't support cache snooping, which leads to numerous issues when attempting to use PCIe graphics cards. Switching ttm_prot_from_caching to return pgprot_dmacoherent for ttm_cached pages solves the issue, albeit with a performance hit. There is a second check in ttm_prot_from_caching that also needs updating. Signed-off-by: Yang Bo <[email protected]> Signed-off-by: Dave Stevenson <[email protected]>
Signed-off-by: Dave Stevenson <[email protected]>
fe3d35c to
72c2683
Compare
|
Branch cleaned to have the minimal changes (6 lines) required to support AMD gpus (either amdgpu or radeon). Other cards are dropped. I'll try to find a few mins to propose these upstream in the next few days. |
|
@6by9 last night I was thinking about it — and indeed, the Intel changes, while simpler, need more time in the oven. It would be nice to get full AMD support, and it seems like the changes shouldn't be hard to make a case for. Until we get the Xe driver past the weird corrupt/glitchy graphics bug, it's fine having it separate. I'm still working on getting someone on the Intel driver side to take a look, it may be something simple! |
|
BTW I've created https://github.com/6by9/linux/tree/rpi-6.18.y-pcie-gpu with both the AMD and Intel changes in. |
1d77945 to
4b60b95
Compare
|
@6by9 - I was thinking it would be nice to have a PR open on this repo for the convenience of users who want to just do an rpi-update to it (instead of a full kernel recompile). If I create and maintain a PR will it still trigger the same build process as yours? Or would updating it every few weeks or once a month create any undue burden on the RPi CI infrastructure? Just wanting to make it easier for people to test, since recompiling the kernel is a step too far for some. |
Wanted for the CI builds.