Move fence synchronisation into wgpu-hal#9475
Conversation
|
After rebasing, the code that was causing the deadlock appears to have been refactored, so ignore the comments about a deadlock. Edit: was #9307 |
| // Lock ordering requires that the fence lock be acquired after the snatch lock and | ||
| // before the command index lock. | ||
| let fence = self.device.fence.write(); | ||
| let fence = self.device.fence.read(); |
There was a problem hiding this comment.
I don't particularly like that things are set up this way (or at least, I think it should be documented more clearly what is going on), but I believe that the fence RwLock is being used not just to protect &mut self methods on the fence, but also to ensure mutual exclusion between submit and other things that don't want a submit to happen concurrently. Which will no longer be the case if submit only acquires a read lock.
I also don't particularly like that there are separate locks for the fence and command indices (I feel like the Fence could also have responsibility for giving out command indices), nor do I like that the protection against concurrent submits (see https://github.com/gfx-rs/wgpu/pull/9307/changes#diff-150156a37cf3627465ceb22096ed995ee26ae640007c421e726134bafd499dbeR1679-R1683) is more pessimistic than necessary (the validate_command_buffers processing probably could be done concurrently). But looking for solutions that don't bite off too much refactoring, one strategy might be to switch to using the command indices lock rather than the fence lock to provide mutual exclusion with concurrent submits (and document this, since it's non-obvious). If we do that, then I think we could get rid of the fence lock in wgpu_core entirely and rely on the locking in hal.
There was a problem hiding this comment.
I believe that the fence
RwLockis being used not just to protect&mut selfmethods on the fence, but also to ensure mutual exclusion betweensubmitand other things that don't want a submit to happen concurrently. Which will no longer be the case ifsubmitonly acquires a read lock.
I had assumed (at least for submission) that command indices was held for that purpose. The only thing which appeared to use fences for exclusion was present, which should still exclude due to using a write.
There was a problem hiding this comment.
The ones I'm worried about are:
wgpu/wgpu-core/src/device/resource.rs
Lines 5223 to 5225 in fa37706
Lines 168 to 176 in fa37706
There was a problem hiding this comment.
Do these require no queue work to be submitted? to me it seems that these shouldn't require queue locking -aquire uses it to wait for it's previous usage to be finished and configure talks about guarding against queue submissions:
wgpu/wgpu-core/src/device/resource.rs
Lines 5235 to 5236 in fa37706
I think that this probably requires an issue for going through and documenting or switching to use a RwLock<()> for exclusion.
|
The other way for Vulkan & OpenGl that I can see would probably be having a Fence wrapper that destroyed (or reset on Vulkan) the fence when dropped. It couldn't be clone as I think it would have to be unsafe, but it would remove the need for the lock. However, for Vulkan at least, the free list would have to have an Edit: If unsafe was fine the lock could be forgotten and then force unlocked. That wouldn't require the Arc at least. |
Connections
Fixes #9470
Description
Previously, queue submit wrote to the fence in wgpu-core. However, device.poll read from the same fence, meaning that
queue.submitcould not execute whiledevice.pollwas running. This could be a problem in a multi-threaded environment with long execution times of command buffer. This PR moves the locks into wgpu-hal (where needed) so that they can be dropped when waiting. Also fixes a deadlock exposed because the fence writeOne thing that I don't like is the requirement for
Arc<RwLock<Fence>>s inside aRwLock<Vec<...>>s for vulkan and openGL, but I could not find any way to remove this. The innerRwLocks are to stop the backend from resetting fences that are being waited on (the wait here should be minimal because the fences are already determined to be signalled) and the Arcs are to allow them to be kept while outside of the main lock (such as inwait).The lock on the fence in core was not removed. It seems that present uses it to exclude other queue operations. @inner-daemons, is the only operation this needs to exclude
queue.submit? I'm happy to complete that in this PR, or wait it until another PR.Testing

Existing tests run the same as previously. Solves the issue.
Squash or Rebase?
Squash
Checklist
wgpumay be affected behaviorally. (At least, in terms of speed)CHANGELOG.mdentries for the user-facing effects of this change are present.