-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[hal metal] ray tracing acceleration structures #7660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
[hal metal] ray tracing acceleration structures #7660
Conversation
e30b663
to
f3830cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! Glad there didn't need to be any wgpu-core changes. Largely looks good, but I'm not extremely knowledgeable about metal. One question / comment, but haven't yet checked everything with spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done checks against the Metal spec. It seems this requires MacOS 13.0+ not 11.0+ due to some more recent functions being used. Confusingly, the vertex buffer field suggests that the only format supported is f32x3 so I'm not sure what descriptor.set_vertex_format
does.
@@ -890,6 +890,11 @@ impl super::PrivateCapabilities { | |||
&& (device.supports_family(MTLGPUFamily::Apple7) | |||
|| device.supports_family(MTLGPUFamily::Mac2)), | |||
supports_shared_event: version.at_least((10, 14), (12, 0), os_is_mac), | |||
supports_raytracing: if version.at_least((11, 0), (14, 0), os_is_mac) { | |||
device.supports_raytracing() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think raytracing support needs supportsRaytracingFromRender
due to support of ray queries in fragment shaders (Requires MacOS 12.0+).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That function is not exposed in the Rust metal crate. But I did bump the min required versions to macOS 13 and iOS 16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the metal crate is still taking PRs (Idk what state of deprecated they are in) it would probably be a good idea to add this (and the other later ones).
f3830cb
to
234e75b
Compare
} | ||
|
||
unsafe fn destroy_acceleration_structure( | ||
&self, | ||
_acceleration_structure: super::AccelerationStructure, | ||
) { | ||
unimplemented!() | ||
// self.counters.acceleration_structures.sub(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I there a reason not to have HalCounters::acceleration_structures
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking back at the history I couldn't find a reason, but it's possible it's buried somewhere.
for descriptor in descriptors { | ||
let acceleration_structure_descriptor = | ||
conv::map_acceleration_structure_descriptor(descriptor.entries); | ||
/* The Rust metal crate does not expose metal::MTLAccelerationStructureUsage yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, not exposed in the Rust metal crate.
@@ -35,6 +35,7 @@ var acc_struct: acceleration_structure; | |||
|
|||
struct PushConstants { | |||
light: vec3<f32>, | |||
padding: f32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that metal always sends at least 16 bytes for push constants, even if we only pass in 12 bytes. And then the shader validation complains that the receiver here only expects 12 bytes.
5f1c464
to
2a6d9b6
Compare
Almost, had to remove the I can split those first four commits into a separate PR if that helps with the review. |
I just remembered that structures have minimum versions, and it seems |
38b3de7
to
9442511
Compare
Bumped the min required version even further up. |
I also managed to reduce the issue with the acelleration structure not intersecting any rays to a perfect reproducer and it is wild: See the last commit "Bug reproducer", which modifies the ray_cube_fragment example to generate two BLASes: One with 152 triangles and one with 153 triangles. With Metal on macOS the instances of the BLAS with 152 triangles (16344 bytes This also breaks Vulkan on Linux with a Using an example from metal-rs without wgpu does not reproduce this bug. It seems we are either lacking some validation step or are doing something wrong with our handling of acceleration structures in general. @Vecvec: What testing hardware do you have available? Can you maybe see why Vulkan is failing this too? |
9442511
to
90082ad
Compare
I've got a couple of raytracing supported machines (plus llvmpipe which I will also be testing on). I'll have a look and see if I can get any ideas of what the issue might be. |
Hits a divide by zero on Microsoft Basic Render Driver (though it doesn't seem to be related to the memory used, and only on one of my comuters). Can't get it to fail on the real gpus yet. Was able to reproduce the llvmpipe seg fault (edit: Don't think it's the same problem as the one here), will continue testing. |
Might be that it tries to normalize a zero-length vector. The modified example does simply duplicate triangles so that could cause some vectors to become zero. I narrowed the Metal issue down further and it is indeed caused by Edit: Officially the limits are way higher, see https://developer.apple.com/documentation/metal/mtlaccelerationstructureusage/extendedlimits. |
Most other resources are created with an auto release pool around them, is it possible that that is fixing this issue somehow? |
90082ad
to
982103f
Compare
Added one in I would say we try to land this PR and then open an issue for it to solve that separately. BTW, I noticed the CI runner "Test Mac aarch64" job is not failing. Probably the test runner is too old to support hardware raytracing and skips the relevant tests. |
That's annoying, I wonder what it could be
Yes, though it could be some time before it lands.
I checked and it does skip. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excluding the things that aren't exposed by the metal crate this looks good to me.
.flags | ||
.contains(wgt::AccelerationStructureGeometryFlags::OPAQUE), | ||
); | ||
// wgt::AccelerationStructureGeometryFlags::NO_DUPLICATE_ANY_HIT_INVOCATION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like this should set allowDuplicateIntersectionFunctionInvocation
if NO_DUPLICATE_ANY_HIT_INVOCATION
is not set but metal-rs doesn't support this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added it to: gfx-rs/metal-rs#361
28e71af
to
5d1c263
Compare
5d1c263
to
d8ad785
Compare
Although setting I think that it might be best if the program just claims to metal that we are using every |
Instead of calling it inside
Maybe you have every model in many LOD levels (to avoid high frequency noise in the distance) or do asset streaming? No idea either, hardware ray tracing is still somewhat new and I haven't seen that much code around it yet. |
I think there would still be issues where you encode the build after a use of the
Yes, I hadn't thought of that. How expensive is the |
Actually I've found something called a |
Interesting.
That is essentially where we are right now with the dependency tracking. We add all indirectly used BLASes to the command buffer via
|
I think I can simplify your counter example further: Imagine we build the same TLAS in two different command buffers, but we never submit (thus discard) the second. And then use that TLAS later in a render pass. Now, the actions in the second build of that TLAS should have no effect. This might already be wrong in other aspects unrelated to this PR, like the validation layer and how it sees the dependencies. |
Yep, this has been pain for me. I've previously reworked lots of the validation due to this problem. It's possible there is more, but I've been working on fixing this.
Except |
About I think I found a bug in the Metal driver. Acceleration structures don't work in headless mode. That is, if I attach a window to the test process (does not even have to have its surface linked to wgpu, nor does the window have to be presented / visible in the compositor), the tests suddenly succeed! |
That's an odd driver bug, I wonder how it's caused... On another, completely unrelated, note, I've been looking at the possible ways to make the
Fwiw I've never used Metal so I'm guessing based on docs alone and have probably missed some cool trick that all other impl.s use. |
Connections
Fixes: #7402
Description
Implements the missing ray tracing acceleration structures in the HAL metal backend.
Testing
The examples ray_scene, ray_shadows, ray_cube_compute, ray_cube_fragment and ray_traced_triangle all work.
That is if invoked via
cargo run --bin wgpu-examples ray_traced_triangle
, but not viacargo xtask test ray_traced_triangle
, still current CI runner is too old to catch that as it does not support hardware ray tracing.Squash or Rebase?
Squash
Checklist
cargo fmt
.taplo format
.cargo clippy --tests
cargo xtask test
to run tests.CHANGELOG.md
entry.