Skip to content

Fix musl TLS init crash with signal blocking#356

Open
jbachorik wants to merge 1 commit intomainfrom
jb/interrupted_tls
Open

Fix musl TLS init crash with signal blocking#356
jbachorik wants to merge 1 commit intomainfrom
jb/interrupted_tls

Conversation

@jbachorik
Copy link
Collaborator

What does this PR do?:
Fixes a crash on musl libc systems where profiling signal handlers interrupt thread-local storage (TLS) initialization, causing reentrancy into musl's non-reentrant TLS setup code.

Motivation:
On musl-based systems (Alpine Linux, embedded systems), the profiler crashes with this stack trace:

#0 Recording::recordExecutionSample
#1 FlightRecorder::recordEvent
#2 Profiler::recordSample
#3 CTimer::signalHandler         ← Signal handler runs
#4 ld-musl-x86_64.so.1           ← INSIDE musl's TLS initialization
#5 Java_...initializeContextTls0 ← JNI call accessing context_tls_v1

The issue occurs because:

  1. Thread calls initializeContextTls0() which accesses context_tls_v1 (thread_local variable)
  2. First access to thread_local triggers musl's TLS initialization
  3. Profiling signal (SIGPROF/SIGVTALRM/CTimer) arrives during musl's TLS setup
  4. Signal handler tries to access ProfiledThread TLS via pthread_getspecific()
  5. This reenters musl's non-reentrant TLS code → crash/deadlock

Additional Notes:

  • The fix uses RAII-based signal blocking during the critical TLS initialization window
  • Consolidates CriticalSection and new SignalBlocker into unified guards.h/cpp module
  • Uses proper ACQUIRE/RELEASE memory ordering for thread safety
  • Blocks all profiling signals: SIGPROF, SIGVTALRM, and RT signals (SIGRTMIN to SIGRTMIN+5)
  • All 128 tests pass after the fix

How to test the change?:

  1. Automated tests: All existing tests pass (128/128), including TLS-related tests:

    • gtestDebug_context_sanity_ut - Context TLS tests
    • gtestDebug_test_tlsPriming - TLS priming tests
    • gtestDebug_stress_callTraceStorage - Uses CriticalSection
  2. Manual testing on musl: Deploy on Alpine Linux or musl-based system with high thread creation rate during active profiling to reproduce the race condition.

  3. Code review: Focus areas:

    • Thread safety: ACQUIRE/RELEASE memory ordering in guards.cpp
    • Async-signal safety: Signal blocking covers all profiling signals
    • Exception safety: RAII pattern guarantees signal mask restoration

For Datadog employees:

  • This PR doesn't touch any of that.
  • JIRA: PROF-13683

🤖 Generated with Claude Code

@jbachorik jbachorik added the AI label Feb 5, 2026
@dd-octo-sts
Copy link

dd-octo-sts bot commented Feb 5, 2026

CI Test Results

Run: #21720936948 | Commit: 15cbf27 | Duration: 0s (longest job)

All 0 test jobs passed

Summary

Metric Value
Total jobs 0
Passed 0
Failed 0

Updated: 2026-02-05 17:19:56 UTC

@dd-octo-sts
Copy link

dd-octo-sts bot commented Feb 5, 2026

Scan-Build Report

User:runner@runnervmkj6or
Working Directory:/home/runner/work/java-profiler/java-profiler/ddprof-lib/src/test/make
Command Line:make -j4 all
Clang Version:Ubuntu clang version 18.1.3 (1ubuntu1)
Date:Thu Feb 5 17:06:48 2026

Bug Summary

Bug TypeQuantityDisplay?
All Bugs1
Unused code
Dead assignment1

Reports

Bug Group Bug Type ▾ File Function/Method Line Path Length
Unused codeDead assignmentlibraryPatcher_linux.cpppatch_library_unlocked941

Block profiling signals during context_tls_v1 initialization
to prevent signal handler reentrancy into musl's TLS setup.
Consolidate CriticalSection and SignalBlocker into guards module.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@jbachorik jbachorik marked this pull request as ready for review February 5, 2026 17:05
@jbachorik jbachorik requested a review from a team as a code owner February 5, 2026 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants