-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Summary
JIT-compiled code execution on ARM64 macOS (Apple Silicon) fails non-deterministically, with approximately 44% failure rate in multi-threaded scenarios. The failures manifest as SIGBUS, incorrect results, or silent corruption.
Environment
- OS: macOS 14.x (Sonoma) or later
- Architecture: ARM64 (Apple Silicon - M1/M2/M3)
- Crate:
cranelift-jit - Rust version: 1.75+ (any recent stable)
Observed Behavior
In a real-world JIT compiler (Rayzor - a Haxe-to-native compiler using cranelift-jit), we observe:
- Without MAP_JIT fix: ~56% success rate (50 runs)
- With MAP_JIT fix: 100% success rate (50+ consecutive runs)
The failures manifest as:
- SIGBUS (Bus error: 10)
- Incorrect computation results
- Segmentation fault
- Silent wrong values
Note on Minimal Reproduction
A simple test case may not reliably reproduce the issue because:
- The failure is non-deterministic and depends on timing, memory layout, and CPU scheduling
- Simple tests may not trigger the problematic code paths
- The issue is more likely with:
- Complex JIT-compiled functions with multiple blocks
- Multiple functions compiled together
- Closures and captured variables passed to threads
- Runtime library functions called from JIT code
Complex Reproduction (Rayzor Compiler)
The issue was observed and fixed in the Rayzor compiler, a Haxe-to-native compiler using cranelift-jit. The compiler:
- Compiles 50+ runtime functions using cranelift-jit
- Spawns threads that execute closures containing JIT function calls
- Uses channels and mutexes with JIT-compiled callback functions
E2E Test Case: compiler/examples/test_rayzor_stdlib_e2e.rs
Commits for testing:
- Before fix (unstable):
0eb9472- Uses upstream cranelift without MAP_JIT - After fix (stable):
9a0e80e- Uses fork with MAP_JIT + pthread_jit_write_protect_np
# Test BEFORE fix (~56% success rate)
git clone https://github.com/darmie/rayzor
cd rayzor
git checkout 0eb9472
cargo build --release --package compiler --example test_rayzor_stdlib_e2e
# Run stability test
passed=0; failed=0
for i in {1..50}; do
if timeout 120 ./target/release/examples/test_rayzor_stdlib_e2e 2>&1 | grep -q "All tests passed"; then
passed=$((passed+1))
else
echo "Run $i: FAILED"
failed=$((failed+1))
fi
done
echo "Before fix - Passed: $passed/50, Failed: $failed/50"
# Test AFTER fix (100% success rate)
git checkout 9a0e80e
cargo build --release --package compiler --example test_rayzor_stdlib_e2e
passed=0; failed=0
for i in {1..50}; do
if timeout 120 ./target/release/examples/test_rayzor_stdlib_e2e 2>&1 | grep -q "All tests passed"; then
passed=$((passed+1))
else
echo "Run $i: FAILED"
failed=$((failed+1))
fi
done
echo "After fix - Passed: $passed/50, Failed: $failed/50"Results:
| Commit | Configuration | Success Rate |
|---|---|---|
0eb9472 |
Upstream cranelift (no MAP_JIT) | ~56% (28/50) |
9a0e80e |
darmie/wasmtime fix-plt-aarch64 | 100% (50/50) |
Simple Test Case (May Not Reliably Fail)
For reference, here's a minimal test that exercises the same code paths:
use cranelift::prelude::*;
use cranelift_jit::{JITBuilder, JITModule};
use cranelift_module::{Linkage, Module, FuncId};
use std::thread;
fn define_function(module: &mut JITModule, name: &str, op: &str) -> FuncId {
let mut sig = module.make_signature();
sig.params.push(AbiParam::new(types::I64));
sig.params.push(AbiParam::new(types::I64));
sig.returns.push(AbiParam::new(types::I64));
let func_id = module.declare_function(name, Linkage::Export, &sig).unwrap();
let mut ctx = module.make_context();
ctx.func.signature = sig;
let mut builder_ctx = FunctionBuilderContext::new();
{
let mut builder = FunctionBuilder::new(&mut ctx.func, &mut builder_ctx);
let block = builder.create_block();
builder.append_block_params_for_function_params(block);
builder.switch_to_block(block);
builder.seal_block(block);
let a = builder.block_params(block)[0];
let b = builder.block_params(block)[1];
let result = match op {
"add" => builder.ins().iadd(a, b),
"sub" => builder.ins().isub(a, b),
"mul" => builder.ins().imul(a, b),
_ => builder.ins().iadd(a, b),
};
builder.ins().return_(&[result]);
builder.finalize();
}
module.define_function(func_id, &mut ctx).unwrap();
module.clear_context(&mut ctx);
func_id
}
fn main() {
let mut flag_builder = settings::builder();
flag_builder.set("use_colocated_libcalls", "false").unwrap();
flag_builder.set("is_pic", "false").unwrap();
let isa_builder = cranelift_native::builder().unwrap();
let isa = isa_builder.finish(settings::Flags::new(flag_builder)).unwrap();
let builder = JITBuilder::with_isa(isa, cranelift_module::default_libcall_names());
let mut module = JITModule::new(builder);
let add_id = define_function(&mut module, "add", "add");
let sub_id = define_function(&mut module, "sub", "sub");
let mul_id = define_function(&mut module, "mul", "mul");
module.finalize_definitions().unwrap();
let add_fn: fn(i64, i64) -> i64 = unsafe {
std::mem::transmute(module.get_finalized_function(add_id))
};
let sub_fn: fn(i64, i64) -> i64 = unsafe {
std::mem::transmute(module.get_finalized_function(sub_id))
};
let mul_fn: fn(i64, i64) -> i64 = unsafe {
std::mem::transmute(module.get_finalized_function(mul_id))
};
let handles: Vec<_> = (0..20).map(|thread_id| {
thread::spawn(move || {
for i in 0..5000 {
let a = (thread_id * 1000 + i) as i64;
let b = (i * 7) as i64;
assert_eq!(add_fn(a, b), a + b, "add failed");
assert_eq!(sub_fn(a, b), a - b, "sub failed");
assert_eq!(mul_fn(a, b), a * b, "mul failed");
}
})
}).collect();
for handle in handles {
handle.join().unwrap();
}
println!("All tests passed!");
}Cargo.toml:
[package]
name = "jit_repro"
version = "0.1.0"
edition = "2021"
[dependencies]
cranelift = { version = "0.125", features = ["jit", "module", "native"] }
cranelift-jit = "0.125"
cranelift-module = "0.125"
cranelift-codegen = "0.125"
cranelift-frontend = "0.125"
cranelift-native = "0.125"Root Cause Analysis
Two issues combine to cause this:
1. Missing MAP_JIT flag
cranelift-jit allocates executable memory using the standard allocator (alloc::alloc), which doesn't set the MAP_JIT flag. On Apple Silicon, memory intended for JIT execution must be allocated with:
mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | MAP_JIT, -1, 0);Without MAP_JIT (0x0800), the kernel cannot properly track the memory for W^X enforcement.
2. Missing W^X mode switch for spawned threads
Apple Silicon enforces W^X (Write XOR Execute) at the hardware level. Each thread has an independent write/execute mode:
pthread_jit_write_protect_np(0)= write mode (can write JIT memory, cannot execute)pthread_jit_write_protect_np(1)= execute mode (can execute JIT code, cannot write)
Threads inherit write mode by default. The current implementation doesn't switch spawned threads to execute mode before calling JIT code, causing crashes.
Proposed Solution
- Use
mmapwithMAP_JITfor memory allocation on ARM64 macOS instead of the standard allocator - Call
pthread_jit_write_protect_np(1)after making memory executable to switch to execute mode - Add memory barriers (DSB SY + ISB SY) for proper icache coherency on Apple Silicon's heterogeneous cores
Technical References
- Apple: Writing ARM64 Code for Apple Platforms
- Porting Just-In-Time Compilers to Apple Silicon
- Apple Silicon has independent instruction caches per core (P-cores and E-cores), requiring explicit barriers
Related Issues
- Support PLT entries in
cranelift-jitcrate on aarch64 #2735 - Support PLT entries incranelift-jitcrate on aarch64 - Cranelift: JIT assertion failure when using
ArgumentPurpose::StructArgumenton macOS (A64) #8852 - Cranelift: JIT assertion failure when usingArgumentPurpose::StructArgumenton macOS (A64) - Cranelift: JIT relocations depend on system allocator behaviour #4000 - Cranelift: JIT relocations depend on system allocator behaviour