[metal] synchronize acceleration structure builds using fences by jtbirdsell · Pull Request #9645 · gfx-rs/wgpu

jtbirdsell · 2026-06-04T20:03:52Z

Connections

Fixes #9215. Related to #9100 (results below). This is option 1 from the discussion in #9215.

Description

On Metal, place_acceleration_structure_barrier was an empty no-op, and Metal does not order acceleration structure commands encoded on the same MTLAccelerationStructureCommandEncoder. A TLAS build could end up consuming a BLAS that was still building, which shows up as garbage intersections in the repro and would be a hang in bigger workloads.

Apple's docs rule out fixing this within one encoder ("Don't update a fence and then wait for the same fence within a pass because it can create a GPU deadlock"), so the fix splits the encoder at sync points instead:

place_acceleration_structure_barrier ends any open AS encoder after encoding updateFence:, and the next AS encoder starts with waitForFence:. It splits unconditionally rather than interpreting the barrier's usage flags. wgpu-core only emits AS barriers where ordering is required, and an encoder is only open when prior AS commands exist, so this never splits needlessly, and it stays correct if core's barrier emission changes later.
read_acceleration_structure_compact_size splits first if the open encoder contains builds. wgpu-core encodes the size query with no barrier after the build of the structure being queried, so without this the compacted size can be read from a still-building BLAS.
One MTLFence is created lazily and reused for the encoder's lifetime, wait first then update within each pass, which is the reuse pattern Apple documents. Creating a fence per split would be a use after free hazard with commandBufferWithUnretainedReferences, which doesn't retain the fence past encoding.
The pending wait flag is cleared at end_encoding/discard_encoding, so a wait can never land in a different command buffer than its update. That could deadlock if buffers are submitted out of order.

Both fence methods go back to macOS 11 / iOS 14, the same availability as acceleration structures themselves, so Intel Macs are fine (unlike the Metal 26 barrier(afterQueueStages:beforeStages:) alternative).

Testing

On an M3 MacBook Pro (10 core GPU, hardware RT):

https://github.com/Vecvec/macos-ray-tracing-test pointed at this branch: 20/20 runs failed before this change, 0/100 after. On this hardware the race isn't even intermittent, and only the cases that build BLAS and TLAS in the same submission fail.
cargo xtask test: same results before and after (945 passed; the 4 failures are naga SPV snapshots and Metal shader passthrough, pre-existing on trunk on my machine and unrelated). The ray_tracing group passes 42/42 including the blas_compaction tests, which exercise the compact size path.
cargo xtask cts --backend metal: no regressions. One pre-existing maxStorageBufferBindingSize:validate failure, also present on unpatched trunk.
Re Ray tracing example tests failing on metal #9100: the ray tracing example tests all pass on this M3 both with and without this change at current trunk, so I couldn't reproduce that one. Noted on the issue.

Squash or Rebase?

Squash.

Checklist

I self-reviewed and fully understand this PR.
WebGPU implementations built with wgpu may be affected behaviorally.
Validation and feature gates are in place to confine behavioral changes.
Tests demonstrate the validation and altered logic works.
CHANGELOG.md entries for the user-facing effects of this change are present.
The PR is minimal, and doesn't make sense to land as multiple PRs.
Commits are logically scoped and individually reviewable.
The PR description has enough context to understand the motivation and solution implemented.

Metal does not order acceleration structure commands encoded on the same encoder, so place_acceleration_structure_barrier now splits the encoder: it updates an MTLFence, ends the encoder, and the next acceleration structure encoder waits on the fence before encoding anything. read_acceleration_structure_compact_size does the same when the open encoder contains builds, since wgpu-core encodes the size query without a barrier after the build it depends on. Fixes gfx-rs#9215

Vecvec · 2026-06-09T19:20:52Z

I think that this is a definite improvement, but I am concerned about what happens if the build commands are in separate command encoders. The only guarantee I could find was rather vague:

As much as possible, the perceived order in which Metal executes the commands is the same as the way you order them. Although Metal might reorder some of your commands before processing them, this usually only occurs when there’s a performance gain and no other perceivable impact.

This seems to have been broken anyway, so I'm not sure it can be trusted.

Vecvec

The code looks good, I hope to test this within the next day or two.

Vecvec · 2026-06-11T05:40:56Z

+        // wgpu-core encodes this with no barrier after the build of the
+        // acceleration structure being queried, so if the current encoder
+        // contains builds, split it; otherwise the size could be read from a
+        // still-building acceleration structure.
+        if self.state.acceleration_structure_builder_has_builds {
+            self.split_acceleration_structure_builder();
+        }


Note: This is probably a bug in wgpu-core (probably also #8825). Thanks for noticing this.

Vecvec · 2026-06-23T22:01:10Z

I've run the tests on my phone (my macbook still doesn't reproduce this), and this does fix the issue. However, there appears to be a race when splitting the encoder. I think it would be ideal if this could also be resolved.

jtbirdsell force-pushed the fix/metal-as-sync branch from 4fd2754 to a9778b0 Compare June 4, 2026 20:04

jtbirdsell mentioned this pull request Jun 4, 2026

Ray tracing example tests failing on metal #9100

Open

inner-daemons self-requested a review June 10, 2026 15:31

inner-daemons assigned inner-daemons and Vecvec Jun 10, 2026

jimblandy requested a review from Vecvec June 10, 2026 15:32

Vecvec reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[metal] synchronize acceleration structure builds using fences#9645

[metal] synchronize acceleration structure builds using fences#9645
jtbirdsell wants to merge 1 commit into
gfx-rs:trunkfrom
jtbirdsell:fix/metal-as-sync

jtbirdsell commented Jun 4, 2026

Uh oh!

Vecvec commented Jun 9, 2026

Uh oh!

Vecvec left a comment

Uh oh!

Vecvec Jun 11, 2026

Uh oh!

Vecvec commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jtbirdsell commented Jun 4, 2026

Uh oh!

Vecvec commented Jun 9, 2026

Uh oh!

Vecvec left a comment

Choose a reason for hiding this comment

Uh oh!

Vecvec Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Vecvec commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants