Skip to content

Conversation

@jscheffl
Copy link
Contributor

@jscheffl jscheffl commented Jan 3, 2026

While I was fixing Edge Worker bugs I noticed that launched processes in the executor/worker always emit warnings in style:

DeprecationWarning: This process (pid=NNN) is multi-threaded, use of fork() may lead to deadlocks in the child.

I assume we are aware in Airflow that this is not a deprecation and the way we fork processes is not in danger of deadlock. Therefore I propose to suppress the warning. Assuming it is not planned to fix this.

Note proposing to fix it as it also appears in all other executors in console, seen in LocalExecutor as well.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeprecationWarning: This process (pid=NNN) is multi-threaded, use of fork() may lead to deadlocks in the child.

Actually this is only on MacOS and this is an indication that there is an issue to solve. We should not start threads before forking on MacOS because it might easily cause SIGSEGV.

So likely we should track down and remove those cases.

However, this might not be easy if we ware importing a lot of stuff before doing fork and part of the task isolation is to minimise the number of things that get imported when "import airflow" is performed. Especially that it might involve local_settings.py -> that can contain some imports from 3rd-party modules that cause Thread creation during import.

This is especially probable in case of remote logging initialization for example - because logging libraries or metrics, and generally telemetric libraries often create threads to do stuff in the background - and their threads are i/o bound, so they opted for threads as they have no problem with GIL.

I think - if anything - we should turn the warning into an error at some point in time when we are close completion of task isolation and then very deliberately track and remove those threads that start before forking - and make sure no "custom" - potentially creating threads - code is run before forking as well.

But suppressing it is a bad idea, because it is a real issue that we need to deal with sooner or later.

@jscheffl
Copy link
Contributor Author

jscheffl commented Jan 4, 2026

Actually it is not on MacOS, I am on Linux.

@potiuk
Copy link
Member

potiuk commented Jan 4, 2026

Actually it is not on MacOS, I am on Linux.

Let me clarify. It may fail on MacOS and other platforms, but CPython developers decided to turn it into a warning for all of them in python/cpython#100229 . They decided to do it, because the problem is generally non-fixable on MacOS. It's unlikely you will handle it differently for those platform so they decided to eventually make it into an error when you have threads running and fork (but for now it's a deprecation warning) - precisely because it is unreliable on some platforms and there is no way to fix it for those platforms.

On MacOS it started to break as of 2018 (original article by Barry Warsaw https://www.wefearchange.org/2018/11/forkmacos.rst ) in Objective-C runtime changes for High Sierra.

Here is a detailed discussion about it:

https://discuss.python.org/t/concerns-regarding-deprecation-of-fork-with-alive-threads/33555/14

But bottomline, the idea is to raise the warning in BOTH Linux and MacOS, as a "soft nudge" for developers (i.e. us) to do something with it because it is known to cause problems (https://github.com/apache/airflow/issues?q=is%3Aissue%20SIGSEGV -> 15 issues raised so far)

So silencing it, is removing the nudge.

@potiuk
Copy link
Member

potiuk commented Jan 4, 2026

Just to be constructive: I think the right solution (@ashb @amoghrajesh @kaxil ?) is - after we complete isolation work - is to review all the intialisation code that happens in all our processes and make sure that we:

a) do not start any threads ourselves before forking in any of our components that use forking
b) do not run any custom code before forking (because we are not able to prevent it from running threads)

That will likely require to make something that I've been talking for a long while (and as far as I remember @ashb ackknowledged that we have a goal to do that during/after task isolation) - that instead of doing a lot of things during the import airflow (including local settings and conf initialization) - we switch to explicit initialization of all components when needed when particular commands of airflow start.

This is also nicely accompanied by our shared components separation - we already have a few components that require implicit initialization as discused with @amoghrajesh. We will have to explicitly initialize some of our shared libraries (differently in task.sdk and differently in airflow.core.

We will also likely have to do some tests (prek hooks?) to prevent accidental adding of such imports. I've done similar (but simpler check in Breeze - where we only make sure to run specific imports for auto-completion - because auto-completeion runs outside of venv of Breeze and excessive imports while doing it are a) slow b) imported extra libraries might not be on PYTHONPATH - and it saved me numerous times from adding a top-level import that would have caused auto-complete failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants