-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Suppress Warning in Executor for Supervisor on os.fork() #60087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Suppress Warning in Executor for Supervisor on os.fork() #60087
Conversation
potiuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeprecationWarning: This process (pid=NNN) is multi-threaded, use of fork() may lead to deadlocks in the child.
Actually this is only on MacOS and this is an indication that there is an issue to solve. We should not start threads before forking on MacOS because it might easily cause SIGSEGV.
So likely we should track down and remove those cases.
However, this might not be easy if we ware importing a lot of stuff before doing fork and part of the task isolation is to minimise the number of things that get imported when "import airflow" is performed. Especially that it might involve local_settings.py -> that can contain some imports from 3rd-party modules that cause Thread creation during import.
This is especially probable in case of remote logging initialization for example - because logging libraries or metrics, and generally telemetric libraries often create threads to do stuff in the background - and their threads are i/o bound, so they opted for threads as they have no problem with GIL.
I think - if anything - we should turn the warning into an error at some point in time when we are close completion of task isolation and then very deliberately track and remove those threads that start before forking - and make sure no "custom" - potentially creating threads - code is run before forking as well.
But suppressing it is a bad idea, because it is a real issue that we need to deal with sooner or later.
|
Actually it is not on MacOS, I am on Linux. |
Let me clarify. It may fail on MacOS and other platforms, but CPython developers decided to turn it into a warning for all of them in python/cpython#100229 . They decided to do it, because the problem is generally non-fixable on MacOS. It's unlikely you will handle it differently for those platform so they decided to eventually make it into an error when you have threads running and fork (but for now it's a deprecation warning) - precisely because it is unreliable on some platforms and there is no way to fix it for those platforms. On MacOS it started to break as of 2018 (original article by Barry Warsaw https://www.wefearchange.org/2018/11/forkmacos.rst ) in Objective-C runtime changes for High Sierra. Here is a detailed discussion about it: https://discuss.python.org/t/concerns-regarding-deprecation-of-fork-with-alive-threads/33555/14 But bottomline, the idea is to raise the warning in BOTH Linux and MacOS, as a "soft nudge" for developers (i.e. us) to do something with it because it is known to cause problems (https://github.com/apache/airflow/issues?q=is%3Aissue%20SIGSEGV -> 15 issues raised so far) So silencing it, is removing the nudge. |
|
Just to be constructive: I think the right solution (@ashb @amoghrajesh @kaxil ?) is - after we complete isolation work - is to review all the intialisation code that happens in all our processes and make sure that we: a) do not start any threads ourselves before forking in any of our components that use forking That will likely require to make something that I've been talking for a long while (and as far as I remember @ashb ackknowledged that we have a goal to do that during/after task isolation) - that instead of doing a lot of things during the This is also nicely accompanied by our We will also likely have to do some tests (prek hooks?) to prevent accidental adding of such imports. I've done similar (but simpler check in Breeze - where we only make sure to run specific imports for auto-completion - because auto-completeion runs outside of venv of Breeze and excessive imports while doing it are a) slow b) imported extra libraries might not be on PYTHONPATH - and it saved me numerous times from adding a top-level import that would have caused auto-complete failures. |
While I was fixing Edge Worker bugs I noticed that launched processes in the executor/worker always emit warnings in style:
I assume we are aware in Airflow that this is not a deprecation and the way we fork processes is not in danger of deadlock. Therefore I propose to suppress the warning. Assuming it is not planned to fix this.
Note proposing to fix it as it also appears in all other executors in console, seen in LocalExecutor as well.