Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Linux] Prevent GC from running during process teardown #57832

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

d-netto
Copy link
Member

@d-netto d-netto commented Mar 19, 2025

Context

We send a signal 15 to shutdown our servers.

We noticed that some of our servers that receive the termination signal are segfaulting in GC, which leads to false alarms in our internal monitors that track GC-related crashes.

Hypothesis

We suspect this pathological case may be happening:

  • Process receives signal 15, which is captured by the signal listener thread.
  • Signal listener initiates process' teardown (e.g. through raise).
  • IIRC such operation is not atomic in Linux, i.e. the kernel will gradually kill the threads, but it's possible for us to spent a few ms in a state where part of the threads in the system are alive, and part have already been killed (this point needs some confirmation).
  • With part of the process alive, and part of the process dead, we try to enter a GC, see a bunch of Julia data structures in an intermediate/corrupted state, which leads us to crash when running the GC.

Mitigation

Since our main goal is to get rid of the GC crashes that happen around server shutdown, we believe that it would be sufficient to just prevent the last bullet point. I.e. we prevent the system from even running a GC when we're about to kill the process, and we wait for any ongoing GC to finish.

Co-debugged with @kpamnany.

@d-netto d-netto requested a review from vtjnash March 19, 2025 21:14
@d-netto d-netto added system:linux Affects only Linux GC Garbage collector labels Mar 19, 2025
@d-netto d-netto force-pushed the dcn-kp-no-gc-on-teardown branch from 6be4221 to 71ba5fa Compare March 19, 2025 23:37
@d-netto d-netto requested a review from gbaraldi March 19, 2025 23:38
@d-netto d-netto force-pushed the dcn-kp-no-gc-on-teardown branch 2 times, most recently from 0bada5d to e1e3d4f Compare March 20, 2025 13:35
@d-netto d-netto force-pushed the dcn-kp-no-gc-on-teardown branch from e1e3d4f to 4d64c18 Compare March 20, 2025 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector system:linux Affects only Linux
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant