Attempt to fix a flaky coroutine-dump-verifying test #4589
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #4418
(unless it keeps happening)
This problem couldn't be reproduced locally, to this fix is purely analytical.
The problematic test attempts to launch a coroutine then await until the coroutine suspends.
The way it was doing that before the change is:
waiton the test body side;notifyon the coroutine side right before the suspension point;TIMED_WAITstate, indicating that its scheduler worker has finished its piece of work and now waits for new commands, which must mean the suspension point was reached.The problem is that thread states are not synchronization primitives, and no happens-before is established between the code a thread executes before the state change and the code right after the state change is observed.
With this change, we establish a complete happens-before chain:
resumed as a coroutine.completeon a latch happens-before theresume.complete, as suspension and thecompleteare done in the same thread.With no way to verify the fix, it's unclear if that was the problem, so we can only hope the change helps.