Skip to content

Commit 09586ff

Browse files
vtjnashKristofferC
authored andcommitted
stackwalk: fix heuristic termination (#57801)
When getting stacktraces on non-X86 platforms, the first frame may not have been set up yet, incorrectly triggering this bad-frame detection logic. This should fix the issue of async unwind failing after only getting 2 frames, if the first frame happens to land in the function header. This is not normally an issue on X86 or non-signals, but also causes no expected issues to be the same logic there too. Fix #52334 After (on arm64-apple-darwin24.3.0): ``` julia> f(1) Warning: detected a stack overflow; program state may be corrupted, so further execution might be unreliable. ERROR: StackOverflowError: Stacktrace: [1] f(x::Int64) @ Main ./REPL[3]:1 [2] g(x::Int64) @ Main ./REPL[4]:1 --- the above 2 lines are repeated 39990 more times --- [79983] f(x::Int64) @ Main ./REPL[3]:1 ``` n.b. This will not fix and is not related to any issues where profiling gets only a single stack frame during profiling of syscalls on Apple AArch64. This fix is specific to the bug where it gets exactly 2 frames. (cherry picked from commit f82917a)
1 parent 2916acd commit 09586ff

File tree

1 file changed

+7
-3
lines changed

1 file changed

+7
-3
lines changed

src/stackwalk.c

+7-3
Original file line numberDiff line numberDiff line change
@@ -98,9 +98,13 @@ static int jl_unw_stepn(bt_cursor_t *cursor, jl_bt_element_t *bt_data, size_t *b
9898
}
9999
uintptr_t oldsp = thesp;
100100
have_more_frames = jl_unw_step(cursor, from_signal_handler, &return_ip, &thesp);
101-
if (oldsp >= thesp && !jl_running_under_rr(0)) {
102-
// The stack pointer is clearly bad, as it must grow downwards.
101+
if ((n < 2 ? oldsp > thesp : oldsp >= thesp) && !jl_running_under_rr(0)) {
102+
// The stack pointer is clearly bad, as it must grow downwards,
103103
// But sometimes the external unwinder doesn't check that.
104+
// Except for n==0 when there is no oldsp and n==1 on all platforms but i686/x86_64.
105+
// (on x86, the platform first pushes the new stack frame, then does the
106+
// call, on almost all other platforms, the platform first does the call,
107+
// then the user pushes the link register to the frame).
104108
have_more_frames = 0;
105109
}
106110
if (return_ip == 0) {
@@ -132,11 +136,11 @@ static int jl_unw_stepn(bt_cursor_t *cursor, jl_bt_element_t *bt_data, size_t *b
132136
// * The way that libunwind handles it in `unw_get_proc_name`:
133137
// https://lists.nongnu.org/archive/html/libunwind-devel/2014-06/msg00025.html
134138
uintptr_t call_ip = return_ip;
139+
#if defined(_CPU_ARM_)
135140
// ARM instruction pointer encoding uses the low bit as a flag for
136141
// thumb mode, which must be cleared before further use. (Note not
137142
// needed for ARM AArch64.) See
138143
// https://github.com/libunwind/libunwind/pull/131
139-
#ifdef _CPU_ARM_
140144
call_ip &= ~(uintptr_t)0x1;
141145
#endif
142146
// Now there's two main cases to adjust for:

0 commit comments

Comments
 (0)