Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFICallbackTrampoline has unsafe handling of SP/CSP #60285

Open
eseidel opened this issue Mar 8, 2025 · 8 comments
Open

FFICallbackTrampoline has unsafe handling of SP/CSP #60285

eseidel opened this issue Mar 8, 2025 · 8 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi

Comments

@eseidel
Copy link
Contributor

eseidel commented Mar 8, 2025

// Clobbers all volatile registers, including the callback ID in R9.

This BLR calls out int the (generated) _FFICallbackFoo... which sets up Dart SP, marshalls arguments from C, etc.

But then at the end, it unsets SP, reverting back to CSP being controlling and SP just being a normal temp.

Example:

      "name": "[Optimized] _FfiCallbackthrowExceptionDouble",
      "index_in_entries": 3010,
      "offset": 851104,
      "size": 396,
      "disassembly": [
        "mov sp, csp",
        "sub csp, csp, #0x1000",
        "stp fp, lr, [sp, #-16]!",
        "mov fp, sp",
        "stp fp, lr, [sp, #-16]!",
        "mov fp, sp",
...
        "mov sp, fp",
        "ldp fp, lr, [sp], #16 !",
        "mov sp, fp",
        "ldp fp, lr, [sp], #16 !",
        "mov csp, sp",
        "ret",
        "brk #0x0"
      ],

However at the very next line after returning from _FfiCallbackFoo , this code (FfiCallbackTrampolineStub) calls into EnterFullSafepoint, which, (at least in the --use-slow-path) case then calls back into the VM via EnterSafepointStub, after having assumed that SP was set up:

void Assembler::EnterFullSafepoint(Register state) {

https://github.com/dart-lang/sdk/blob/main/runtime/vm/compiler/stub_code_compiler_arm64.cc#L339

My guess is that this code is unsafe in the --use-slow-path case, particularly against signals (e.g. the profiler). Since it seems to assume that SP is set up (by dumping registers to it), but doesn't bump CSP accordingly, and then calls into the VM I'm not entirely sure how the --use-slow-path --stacktrace-every=100 tests are passing?

My recommendation would be that (somewhat unrelatedly) FFICallbackTrampoline should move off of using SP directly and instead stick with CSP (since it's just a normal C function anyway) and then explicitly set up/tear down SP around the call to EnterFullSafepoint. I think the only reason that FFICallbackTrampoline is using SP at all is just because that's what Assembler has easy methods for. If those methods were either forked for CSP handling (or could taken extra argument to control SP/CSP) a bunch of FFICallbackTrampoline would be cleaned up (and maybe this bug would be more obvious).

@eseidel
Copy link
Contributor Author

eseidel commented Mar 8, 2025

We ran into this when we recently refactored our system to instead of running FFICallbackTrampolineStub in the simulator, instead run it on the CPU (so it could do the temporary isolate creation) before transitioning to the simulator to run _FFICallbackFoo (per callback stubs generated by the frontend afaict). But the only things we copy into the simulator are THR (for the case of a temporary isolate), the argument registers, and the top frame from the CSP. The only things we copy out are the argument/return registers. This was then crashing for us in EnterSafePointStub in the --use-slow-path case, since it seemed to be assuming that SP was set up (in our case it was 0 since we hadn't copied it out of the simulator since the simulator has a different stack and typically subroutines don't depend on temp variables from other subroutines as this one is).

@lrhn lrhn added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi labels Mar 8, 2025
@dcharkes
Copy link
Contributor

My recommendation would be that (somewhat unrelatedly) FFICallbackTrampoline should move off of using SP directly and instead stick with CSP (since it's just a normal C function anyway) and then explicitly set up/tear down SP around the call to EnterFullSafepoint.

Conceptually this sounds reasonable. 👍

cc @liamappelbe who last touched the trampolines.

We ran into this when we recently refactored our system to instead of running FFICallbackTrampolineStub in the simulator, instead run it on the CPU (so it could do the temporary isolate creation)

@rmacnak-google All the FFI callback tests pass on simarm64_arm64 that should include the NativeCallable.listener tests. Do you know why didn't we run into the same issue?

@mraleph
Copy link
Member

mraleph commented Mar 10, 2025

I think the only case where this type of code fails is if we go slow case and before we update CSP we hit a signal and the distance between SP value and (stale) CSP is larger than 16 words (which is documented size of the redzone).

So you need to hit the signal precisely here:

  __ EnterFrame(0);
  __ PushRegisters(all_registers);
// vvv
  __ mov(CALLEE_SAVED_TEMP, CSP);
  __ mov(CALLEE_SAVED_TEMP2, SP);
  __ ReserveAlignedFrameSpace(0); 
// ^^^ ...
  __ mov(CSP, SP);

Which I guess almost never happens?

@eseidel
Copy link
Contributor Author

eseidel commented Mar 10, 2025

Wouldn't one be in danger the moment the EnterFrame(0) happens? That's when SP has then moved further than CSP since at the start of this function SP == CSP (at least when called from the FFICallbackTrampoline).

[edit] nm, I guess the "redzone" implies that those 16 bytes are safe, but rather pushing all the registers is what actually gets us in trouble?

@rmacnak-google
Copy link
Contributor

All the FFI callback tests pass on simarm64_arm64 that should include the NativeCallable.listener tests. Do you know why didn't we run into the same issue?

My implementation uses a different stub that does not run under the simulator.

16 words (which is documented size of the redzone).

Where is this documented? The Procedure Call Standard for ARM 64-bit Architecture gives the stack constraint

A process may only access (for reading or writing) the closed interval of the entire stack delimited by
[SP, stack-base – 1].

which I read as specifying there is no redzone.

@mraleph
Copy link
Member

mraleph commented Mar 10, 2025

Where is this documented? The Procedure Call Standard for ARM 64-bit Architecture gives the stack constraint

https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms#Respect-the-stacks-red-zone

Conflicting documentation is the best form of documentation :)

(Yeah, its only Apple - I should have said that. I have not checked Android or Linux ARM64 or Windows ARM64)

@eseidel
Copy link
Contributor Author

eseidel commented Mar 10, 2025

FWIW https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst is what I use for the abi standard (I welcome links to better docs!)

@rmacnak-google
Copy link
Contributor

signals (e.g. the profiler)

The Dart profiler on Mac/iOS doesn't use SIGPROF anymore (lldb is unreasonably slow handling signals), so the signal would need to be something caused by the embedder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi
Projects
None yet
Development

No branches or pull requests

5 participants