Skip to content

Conversation

@6by9
Copy link
Contributor

@6by9 6by9 commented Sep 30, 2025

Wanted for the CI builds.

@geerlingguy
Copy link

👀

6by9 and others added 15 commits October 20, 2025 11:30
Using increased bit depth for no reason increases power
consumption, and differs from the behaviour prior to the
conversion to use the HDMI helper functions.

Initialise the state max_bpc and requested_max_bpc to the
minimum value supported. This only affects Raspberry Pi,
as the other users of the helpers (rockchip/inno_hdmi and
sunx4i) only support a bit depth of 8.

Signed-off-by: Dave Stevenson <[email protected]>
DSI0 and DSI1 have different widths for the command FIFO (24bit
vs 32bit), but the driver was assuming the 32bit width of DSI1
in all cases.
DSI0 also wants the data packed as 24bit big endian, so the
formatting code needs updating.

Handle the difference via the variant structure.

Signed-off-by: Dave Stevenson <[email protected]>
The Raspberry Pi RP1 chip has the Cadence GEM ethernet
controller, so add a compatible string for it.

Signed-off-by: Dave Stevenson <[email protected]>
The RP1 chip has the Cadence GEM block, but wants the tx_clock
to always run at 125MHz, in the same way as sama7g5.
Add the relevant configuration.

Signed-off-by: Dave Stevenson <[email protected]>
During normal operations, the cursor position update is done through an
asynchronous plane update, which on the vc4 driver basically just
modifies the right dlist word to move the plane to the new coordinates.

However, when we have the overscan margins setup, we fall back to a
regular commit when we are next to the edges. And since that commit
happens to be on a cursor plane, it's considered a legacy cursor update
by KMS.

The main difference it makes is that it won't wait for its completion
(ie, next vblank) before returning. This means if we have multiple
commits happening in rapid succession, we can have several of them
happening before the next vblank.

In parallel, our dlist allocation is tied to a CRTC state, and each time
we do a commit we end up with a new CRTC state, with the previous one
being freed. This means that we free our previous dlist entry (but don't
clear it though) every time a new one is being committed.

Now, if we were to have two commits happening before the next vblank, we
could end up freeing reusing the same dlist entries before the next
vblank.

Indeed, we would start from an initial state taking, for example, the
dlist entries 10 to 20, then start a commit taking the entries 20 to 30
and setting the dlist pointer to 20, and freeing the dlist entries 10 to
20. However, since we haven't reach vblank yet, the HVS is still using
the entries 10 to 20.

If we were to make a new commit now, chances are the allocator are going
to give the 10 to 20 entries back, and we would change their content to
match the new state. If vblank hasn't happened yet, we just corrupted
the active dlist entries.

A first attempt to solve this was made by creating an intermediate dlist
buffer to store the current (ie, as of the last commit) dlist content,
that we would update each time the HVS is done with a frame. However, if
the interrupt handler missed the vblank window, we would end up copying
our intermediate dlist to the hardware one during the composition,
essentially creating the same issue.

Since making sure that our interrupt handler runs within a fixed,
constrained, time window would require to make Linux a real-time kernel,
this seems a bit out of scope.

Instead, we can work around our original issue by keeping the dlist
slots allocation longer. That way, we won't reuse a dlist slot while
it's still in flight. In order to achieve this, instead of freeing the
dlist slot when its associated CRTC state is destroyed, we'll queue it
in a list.

A naive implementation would free the buffers in that queue when we get
our end of frame interrupt. However, there's still a race since, just
like in the shadow dlist case, we don't control when the handler for
that interrupt is going to run. Thus, we can end up with a commit adding
an old dlist allocation to our queue during the window between our
actual interrupt and when our handler will run. And since that buffer is
still being used for the composition of the current frame, we can't free
it right away, exposing us to the original bug.

Fortunately for us, the hardware provides a frame counter that is
increased each time the first line of a frame is being generated.
Associating the frame counter the image is supposed to go away to the
allocation, and then only deallocate buffers that have a counter below
or equal to the one we see when the deallocation code should prevent the
above race from occurring.

Signed-off-by: Maxime Ripard <[email protected]>
Users are reporting running out of DLIST memory. Add a
debugfs file to dump out all the allocations.

Signed-off-by: Dave Stevenson <[email protected]>
We have a read-modify-write race when updating SCALER_DISPCTRL for
underrun and end-of-frame interrupts.
Ideally it would be fixed via a spinlock or similar, but that will
require a reasonable amount of study to ensure we don't get deadlocks.

The underrun reporting is only for debug, so disable it for now.

Signed-off-by: Dave Stevenson <[email protected]>
The dmabuf import already checks that the backing buffer is contiguous
and rejects it if it isn't. vc4 also requires that the buffer is
in the bottom 1GB of RAM, and this is all correctly defined via
dma-ranges.

However the kernel silently uses swiotlb to bounce dma buffers
around if they are in the wrong region. This relies on dma sync
functions to be called in order to copy the data to/from the
bounce buffer.

DRM is based on all memory allocations being coherent with the
GPU so that any updates to a framebuffer will be acted on without
the need for any additional update. This is fairly fundamentally
incompatible with needing to call dma_sync_ to handle the bounce
buffer copies, and therefore we have to detect and reject mappings
that use bounce buffers.

Signed-off-by: Dave Stevenson <[email protected]>
DSI0 is misbehaving and needs to action things on vblank to
work around it.
Add a new hook to call across during vblank.

Signed-off-by: Dave Stevenson <[email protected]>
The initialisation sequence differs slightly from the documentation
in that the clocks are meant to be running before resets and
similar.

Signed-off-by: Dave Stevenson <[email protected]>
vc4_dsi_bridge_disable wasn't resetting things during shutdown,
so add that in.

Signed-off-by: Dave Stevenson <[email protected]>
The block must be enabled for the FIFO resets to be actioned,
so ensure this is the case.

Signed-off-by: Dave Stevenson <[email protected]>
The pixel to byte FIFO appears to not always reset correctly,
which can lead to colour errors and/or horizontal shifts.
Reset on every vblank to work around the issue.

Signed-off-by: Dave Stevenson <[email protected]>
The TC358762 bridge and panel decodes the mode differently on
DSI0 to DSI1 for no obvious reason, and results in a shift off
the screen.
Whilst it would be possible to change the compatible used for
the panel, that then messes up Pi5.

As it appears to be restricted to vc4 DSI0, fix up the mode
in vc4_dsi.

Signed-off-by: Dave Stevenson <[email protected]>
Some DSI peripheral drivers wish to send commands in the
post_disable or panel unprepare callback. These are called
after the DSI host's disable call, but before the host's
post_disable if pre_enable_prev_first is set.

Don't reset the block until post_disable to allow these
commands to be sent.

Signed-off-by: Dave Stevenson <[email protected]>
pelwell and others added 7 commits October 20, 2025 11:30
Enable the TMP117 driver as a module.

See: raspberrypi#7077

Signed-off-by: Phil Elwell <[email protected]>
Masquerading Interrupt split transfers as Control puts the transfer into
the non-periodic handler in the hub. This stops the hub dropping
complete-split data in the microframe after a CSPLIT should have
arrived, improving resilience to host IRQ latency. Devices are none
the wiser - the handshake tokens are the same.

Originally devised by Hans Petter Selasky @ FreeBSD.

(v2: dwc2 needs an un-masquerade prior to channel interrupt handling)

Signed-off-by: Jonathan Bell <[email protected]>
Upstream series https://lore.kernel.org/linux-media/[email protected]/

The subdev format documentation has a subsection describing how to use
the media bus pixel codes for serial buses. While it describes the
sampling part well, it doesn't really describe the current convention
used for the components order.

Let's improve that.

Signed-off-by: Maxime Ripard <[email protected]>
Upstream series https://lore.kernel.org/linux-media/[email protected]/

The tc358743 is an HDMI to MIPI-CSI2 bridge. It can output all three
HDMI 1.4 video formats: RGB 4:4:4, YCbCr 4:2:2, and YCbCr 4:4:4.

RGB 4:4:4 is converted to the MIPI-CSI2 RGB888 video format, and listed
in the driver as MEDIA_BUS_FMT_RGB888_1X24.

Most CSI2 receiver drivers then map MEDIA_BUS_FMT_RGB888_1X24 to
V4L2_PIX_FMT_RGB24.

However, V4L2_PIX_FMT_RGB24 is defined as having its color components in
the R, G and B order, from left to right. MIPI-CSI2 however defines the
RGB888 format with blue first.

This essentially means that the R and B will be swapped compared to what
V4L2_PIX_FMT_RGB24 defines.

The proper MBUS format would be BGR888, so let's use that.

Fixes: d32d986 ("[media] Driver for Toshiba TC358743 HDMI to CSI-2 bridge")
Signed-off-by: Maxime Ripard <[email protected]>
This patch adds support for external FSIN-triggered snapshot mode
to the OmniVision OV9282 sensor driver. It enables frame capture
synchronized with an external hardware trigger signal.

Signed-off-by: Omer Faruk Edemen <[email protected]>
Adds DT property `trigger-mode` to enable FSIN-triggered frame capture.
Includes overlay and README update for ov9281_trig.

Signed-off-by: Omer Faruk Edemen <[email protected]>
Although the PIO throughput benefits from larger burst sizes, only the
first two DMA channels support a burst size of 8 - the others are capped
at 4. To avoid misconfiguring the PIO hardware, retrieve the actual
max_burst value from the DMA channel's capabilities.

Signed-off-by: Phil Elwell <[email protected]>
mripard and others added 4 commits October 20, 2025 11:47
Replace the use of vcdbg with vclog, and correct the documentation URL.

See: raspberrypi#7093

Signed-off-by: Phil Elwell <[email protected]>
@geerlingguy
Copy link

Would you be able to rebase this PR, mostly to get a nicer view of this PR against the rebased 6.17.y branch?

Also, is there any semi-official word from Pi Towers if this has a chance of merging? It's certainly been stable for me across multiple iterations of testing on multiple generations of AMD and Intel GPUs...

@popcornmix
Copy link
Collaborator

Also, is there any semi-official word from Pi Towers if this has a chance of merging?

I don't think we want this to be in the pi tree but not in the upstream tree.
That creates a lot of pain with merge conflicts.

Ideally any "correct" commits here should be submitted upstream.
We are generally happy to backport commits that have been accepted upstream to get them sooner.

…P Mode

commit 3776c685ebe5f43e9060af06872661de55e80b9a upstream.

Currently, whenever there is a need to transmit an Action frame,
the brcmfmac driver always uses the P2P vif to send the "actframe" IOVAR to
firmware. The P2P interfaces were available when wpa_supplicant is managing
the wlan interface.

However, the P2P interfaces are not created/initialized when only hostapd
is managing the wlan interface. And if hostapd receives an ANQP Query REQ
Action frame even from an un-associated STA, the brcmfmac driver tries
to use an uninitialized P2P vif pointer for sending the IOVAR to firmware.
This NULL pointer dereferencing triggers a driver crash.

 [ 1417.074538] Unable to handle kernel NULL pointer dereference at virtual
 address 0000000000000000
 [...]
 [ 1417.075188] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT)
 [...]
 [ 1417.075653] Call trace:
 [ 1417.075662]  brcmf_p2p_send_action_frame+0x23c/0xc58 [brcmfmac]
 [ 1417.075738]  brcmf_cfg80211_mgmt_tx+0x304/0x5c0 [brcmfmac]
 [ 1417.075810]  cfg80211_mlme_mgmt_tx+0x1b0/0x428 [cfg80211]
 [ 1417.076067]  nl80211_tx_mgmt+0x238/0x388 [cfg80211]
 [ 1417.076281]  genl_family_rcv_msg_doit+0xe0/0x158
 [ 1417.076302]  genl_rcv_msg+0x220/0x2a0
 [ 1417.076317]  netlink_rcv_skb+0x68/0x140
 [ 1417.076330]  genl_rcv+0x40/0x60
 [ 1417.076343]  netlink_unicast+0x330/0x3b8
 [ 1417.076357]  netlink_sendmsg+0x19c/0x3f8
 [ 1417.076370]  __sock_sendmsg+0x64/0xc0
 [ 1417.076391]  ____sys_sendmsg+0x268/0x2a0
 [ 1417.076408]  ___sys_sendmsg+0xb8/0x118
 [ 1417.076427]  __sys_sendmsg+0x90/0xf8
 [ 1417.076445]  __arm64_sys_sendmsg+0x2c/0x40
 [ 1417.076465]  invoke_syscall+0x50/0x120
 [ 1417.076486]  el0_svc_common.constprop.0+0x48/0xf0
 [ 1417.076506]  do_el0_svc+0x24/0x38
 [ 1417.076525]  el0_svc+0x30/0x100
 [ 1417.076548]  el0t_64_sync_handler+0x100/0x130
 [ 1417.076569]  el0t_64_sync+0x190/0x198
 [ 1417.076589] Code: f9401e80 aa1603e2 f9403be1 5280e483 (f9400000)

Fix this, by always using the vif corresponding to the wdev on which the
Action frame Transmission request was initiated by the userspace. This way,
even if P2P vif is not available, the IOVAR is sent to firmware on AP vif
and the ANQP Query RESP Action frame is transmitted without crashing the
driver.

Move init_completion() for "send_af_done" from brcmf_p2p_create_p2pdev()
to brcmf_p2p_attach(). Because the former function would not get executed
when only hostapd is managing wlan interface, and it is not safe to do
reinit_completion() later in brcmf_p2p_tx_action_frame(), without any prior
init_completion().

And in the brcmf_p2p_tx_action_frame() function, the condition check for
P2P Presence response frame is not needed, since the wpa_supplicant is
properly sending the P2P Presense Response frame on the P2P-GO vif instead
of the P2P-Device vif.

Cc: [email protected]
Fixes: 18e2f61 ("brcmfmac: P2P action frame tx")
Signed-off-by: Gokul Sivakumar <[email protected]>
Acked-by: Arend van Spriel <[email protected]>
Link: https://patch.msgid.link/[email protected]
[Cc stable]
Signed-off-by: Johannes Berg <[email protected]>
@6by9
Copy link
Contributor Author

6by9 commented Oct 21, 2025

I need to tidy up the branch.

AMD GPUs only need 70fe325 and the defconfig update.
I want to understand that first commit a little more as I'm not getting what the change in ttm_bo_util.c is doing for us. Actually I think it can again be reduced to a #ifdef CONFIG_ARM64 around the first if clause which is going to be less painful. If I understand it, then I can make the case upstream. They may want it under a dedicated Kconfig option rather than just CONFIG_ARM64, but that's still not a deal breaker.

Module size for amdgpu is 2.5MB. I haven't tested with 16kB page sizes, so it may only apply to bcm2711_defconfig and not bcm2712_defconfig.

We could add in radeon for the older cards (module size is 466kB), but it won't work with labwc / wlroots as the DRM driver doesn't support atomic updates.

Intel Xe isn't quite there yet, and the places it calls into i915 are annoying. Easiest to drop it for now. The i915 changes were proposed by Mesa devs so may be acceptable to mainline. The BAR resizing patch is already being discussed upstream and has been looked at by P33M. (Module size is 814kB).

nouveau is a fair way off, so that can just be dropped for now. (Module size is 632kB).

6by9 and others added 7 commits October 21, 2025 20:05
…node")

We lost a line in the forward port, which meant that it always used
/dev/fb0, and complained that the sysfs nodes already existed.

Fixes: c91c9f2 ("fbdev: Allow client to request a particular /dev/fbN node")

Signed-off-by: Dave Stevenson <[email protected]>
There were various points where the loader was using uninitialised
data, had the potential to run off the end of an array, or was
handling core functions incorrectly. Fix these up.

Also handle 24bpp and 32bpp framebuffers.

Signed-off-by: Dave Stevenson <[email protected]>
The mappings are the reverse of r8g8b8 and r5g6b5 respectively

Signed-off-by: Dave Stevenson <[email protected]>
Modify the PDAF Datatype of the Arducam 64MP camera from 0x30 to 0x12
so that the Raspberry Pi 5 cfe driver can receive PDAF data.

Signed-off-by: Lee Jackson <[email protected]>
Various PCIe controllers on ARM64 platforms don't support cache
snooping, which leads to numerous issues when attempting to use
PCIe graphics cards.

Switching ttm_prot_from_caching to return pgprot_dmacoherent for
ttm_cached pages solves the issue, albeit with a performance hit.
There is a second check in ttm_prot_from_caching that also needs
updating.

Signed-off-by: Yang Bo <[email protected]>
Signed-off-by: Dave Stevenson <[email protected]>
@6by9 6by9 force-pushed the rpi-6.17.y-pcie-gpu branch from fe3d35c to 72c2683 Compare October 22, 2025 09:48
@6by9
Copy link
Contributor Author

6by9 commented Oct 22, 2025

Branch cleaned to have the minimal changes (6 lines) required to support AMD gpus (either amdgpu or radeon). Other cards are dropped.

I'll try to find a few mins to propose these upstream in the next few days.

@6by9 6by9 marked this pull request as ready for review October 22, 2025 09:52
@geerlingguy
Copy link

@6by9 last night I was thinking about it — and indeed, the Intel changes, while simpler, need more time in the oven. It would be nice to get full AMD support, and it seems like the changes shouldn't be hard to make a case for.

Until we get the Xe driver past the weird corrupt/glitchy graphics bug, it's fine having it separate. I'm still working on getting someone on the Intel driver side to take a look, it may be something simple!

@6by9
Copy link
Contributor Author

6by9 commented Oct 22, 2025

BTW I've created https://github.com/6by9/linux/tree/rpi-6.18.y-pcie-gpu with both the AMD and Intel changes in.
6.18 is only on rc2, so pretty early days but most things are working.

@geerlingguy
Copy link

@6by9 - I was thinking it would be nice to have a PR open on this repo for the convenience of users who want to just do an rpi-update to it (instead of a full kernel recompile). If I create and maintain a PR will it still trigger the same build process as yours? Or would updating it every few weeks or once a month create any undue burden on the RPi CI infrastructure?

Just wanting to make it easier for people to test, since recompiling the kernel is a step too far for some.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.