-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Debug: implement call injection to invoke debug event handlers at signal-based traps. #11930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debug: implement call injection to invoke debug event handlers at signal-based traps. #11930
Conversation
…roperly in Pulley. This is a followup to bytecodealliance#11895 where I had disabled a test that failed to emit a debug event for a hostcall-generated trap on a divide-by-zero in Pulley. This PR allows that test to pass, and brings Pulley back to parity with native Cranelift in debug support currently. This was a bit of a "start to pull the thread and the entire finished mechanism materializes" PR; happy to consider ways to split it up if needed. In short, disabling signal-based traps on a Pulley configuration still relies on Pulley opcodes (e.g., divide) actually trapping, in a way that looks more like a "native ISA trap"; so I had to start to build out the actual trap-handling mechanisms. In any case, this will all be needed for followup work soon that will handle traps on native platforms (redirecting from signals by injecting calls), so this is not a distraction. This PR includes, ranked in decreasing order of "may scare other Wasmtime maintainers" score: - A raw `NonNull<dyn VMStore>` in the `CallThreadState`, with a long comment about provenance and mut-borrow exclusivity. This is needed right now to allow the interpreter to invoke the debug event handler, but will soon be needed when injecting hostcalls on signals, because a signal context also has no state available from the Wasm code other than what is in TLS. Hence, we need a way to get the store back from the Wasm when we do something that is "morally a hostcall" at a trapping instruction. I do believe this is sound, or at least close to it if not (please scrutinize carefully!); the basic idea is that the Wasm acts as an opaque blob in the middle, and the pointer comes out of it one way or another (the normal way, as the first arg to a hostcall, or the weird way, via TLS and the CallThreadState during a trap). Exclusive ownership is still clear at any given point and only one `&mut` ever exists in the current frame at a time. That said, I haven't tested with miri yet. This does require careful thought about the Wasm compilation, too; we need the moral equivalent of a `&mut self` reborrow as-if we were making a hostcall on each trapping instruction. It turns out that we already treat them as memory-fence instructions, so nothing loaded from the store can be moved or cached across them, and I've added a comment now about how this is load-bearing. - Updates to `CallThreadState`'s "exit state", normally set by the exit trampoline, that we now also set when we invoke a debug event handler during a trap context[^1] so that `Store::debug_frames` properly sees the current activation. This is a little more awkward than it could be because we store the *trampoline* FP, not last Wasm FP, and there is no trampoline frame in this case, so I've added a flag and some conditionals. I'm happy to refactor instead to go (back) to storing the last Wasm FP instead, with the extra load in the exit trampoline to compute that. - A whole bunch of plumbing, creating a large but mechanical diff, in the code translator to actually add debug tags on all traps and calls to `raise`. It turns out that once I got all of the above working in Pulley, the test disagreed about current Wasm PC between native and Pulley, and Pulley was right; native was getting it wrong because the `raise` libcall was sunk to the bottom in a cold block and, without tags, we scanned backward to pick up the last Wasm PC in the function. This new plumbing and addition of tags in all the appropriate places fixes that. [^1]: I keep saying "during a trap context" here, but to avoid any signal-safety scares, note that when this is done for native signals in a followup PR, we will inject a hostcall by modifying stack/register state and returning from the actual signal context, so it really is as-if we did a hostcall from a trapping instruction.
…nal-based traps. This repurposes the code from bytecodealliance#11826 to "inject calls": when in a signal handler, we can update the register state to redirect execution upon signal-handler return to a special hand-written trampoline, and this trampoline can save all registers and enter the host, just as if a hostcall had occurred.
|
I'll note for brainstorming purposes that the current problem in front of me is how to rework macOS' Mach ports-based signal handler to work with this. To recap a bit what the requirements on each platform are, and how this "call injection" works:
The basic need is to inject enough state into the register context, along with redirecting PC, that a stub can take control, find the Store state, invoke any debug event handler, then restore all context and return to the guest if it's a resumable trap (which this PR doesn't have, but we will have in a few more PRs for breakpoints). One can see how this is a little tricky. The approach I've taken that is at least Windows and Linux-compatible is to update only registers, not the stack (because Windows); inject args into the registers; save off the original register values and PC to the macOS inverts most of the "can do" and "can't do" bits: we can push to the stack (unlike Windows) but we can't read TLS, so we have nowhere to save state that we clobber when redirecting other than to push it to the stack. So probably the best we can do is to push the original register values to the guest stack ourselves from the exception handler thread. Of course this means that we need a slightly different stub for macOS (for x86-64 and aarch64 both); and we'll need a slightly different stub for Windows/x86-64 too because of fastcall when we call the host code. One more thing about the riscv64 stub: it saves all of the V-extension state, because vector registers are separate from float registers, but unlike our other three architectures, we don't unconditionally assume that vector registers are present. So technically to run with V disabled with debugging enabled, if we care about that, we need an alternate riscv64 stub too that elides that bit. Note that we need to care because we have to save everything, not just the ABI callee-saves, because we're "interrupting" with no regalloc cooperation. All of this to say: I am starting to think that the efficiency advantage of "trap-based implicit hostcalls", with all that entails (breakpoints that are just break instructions we can patch in), may not be worth the complexity and maintenance burden. The alternative is to go with hostcall-based-traps universally. (We still do need the wonky raw Partly that would make me sad, but on the other hand, it would make me quite happy too: it would mean that we are one PR away from breakpoints if we go with the bitmask scheme, or two if we still patch in a call (self-modifying code but not trapping). I'm happy to go either way, and these stubs were quite fun to write, but with my "not impossible to maintain" hat on, I think I know the better answer... (cc @alexcrichton and @fitzgen for thoughts) |
|
Quick napkin math on efficiency if we abandon call injection on signals:
The upshot of all that is that it's much more portable and easier to reason about, and the latter at least is in short supply otherwise with everything else we're adding for debugging. One could see this as "hostcalls everywhere" as in debug RFC v1, except with SMC to avoid overhead until patched in. |
Subscribe to Label Actioncc @fitzgen
This issue or pull request has been labeled: "cranelift", "pulley", "wasmtime:api"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
|
Closing as this is pushed to "post-MVP debugging" due to all the above complexities; will keep the branch around for mining for the good bits later as needed. |
(Stacked on top of #11921)
This repurposes the code from #11826 to "inject calls": when in a signal
handler, we can update the register state to redirect execution upon
signal-handler return to a special hand-written trampoline, and this
trampoline can save all registers and enter the host, just as if a
hostcall had occurred.
As before, this is Linux-only in its current draft. I need to add macOS and Windows support, still. Putting this up to show how a few loose ends in #11921 get used.