-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Source cncf provider to use airflow.sdk.configuration.conf #60074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
|
Please note that static checks and tests fails |
|
@shahar1 |
sunank200
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes looks good but static checks needs fix,
Not 100% sure that the failing tests are related to the changes from this PR, but let's start with simply fixing the static checks (run |
|
And Always Be Rebased @neronsoda
Some of the errors might also happen because your branch is based on a version of main that had some termporary problem already solved in the current main. |
|
I’d like to ask for guidance before opening a PR regarding a provider-side change :) In a previous commit, some provider code paths were modified to use SDK configuration via: from airflow.providers.common.compat.sdk import confUnlike Core Because of this, when provider modules access configuration values at import time, e.g.: conf.get("kubernetes_executor", "namespace")
conf.get("local_kubernetes_executor", "kubernetes_queue")I can reproduce At the moment, I see a few possible ways to address this and wanted to ask which approach is preferred:
conf.get("kubernetes_executor", "namespace", fallback="default")
try:
NAMESPACE = conf.get("kubernetes_executor", "namespace")
except AirflowConfigException:
NAMESPACE = "default"
From a minimal-change perspective, option 1 (explicit fallbacks) seems safest, |
|
Hi 👋 While running The root cause appears to be that SDK With this change, all Kubernetes provider tests are passing. Before commit, I’d like to confirm whether adding explicit fallbacks in provider code is considered an acceptable approach. Once confirmed, I’ll proceed with the commit. Thanks! |
|
I think for now (cc @amoghrajesh @kaxil @ashb) - the only good way is to make sure those defaults are also loaded by shared configuration. The issue is that those defauilts are coming from providers and you need to load provider's configuration before the defaults can be used, so this is an equivalent of pre-fetching defaults "just in case" before provider configuration is loaded. So option 3:
I do not think "minimak changes" is a decision factor. Correctness is more important. The root cause is the same as described in #60087 (comment) -> we currently completely do not control the sequence in which different parts of airflow are initialized, this depends pretty much on the way how we import things -> in this case the problem is that we first retrieve the default values, before provider manager is initialized and defaults are retrieved. I think eventually, it will be fixed when we switch to explicit initialization of internal airflow shared functionalities |
The decision to not load provider configurations in shared library was kind of deliberate to keep the shared library agnostic of providers, but I guess we can have the sdk conf load the providers conf with an aim to stop providers relying on airflow-core and consuming from airflow task sdk instead? |
We can again do the same thing as with other shared libraries and Then it is really a matter of proper sequence of initialization in each of the components
This sequence should be done at the very beginning of every component start (and tests). This way conf will not have to know anything about providers. ProvidersManager will need to know configuration structure (and can use core configuration for whatever is needed) -> but then it would simply inject the new configuration list retrieved from providers into configuraiton, so that anything else can just read "configuration" without knowing if it came from providers os core. |
|
And just to explain a bit more context:
And once we do that now (which should not be that difficult to be honest) - we might be able to get rid of those pesky So for example in scheduler, kubernetes configuraitoin could be read before ProvidersManager() was initialized, and we needed defauly values for that params in case they were not defined locally. However, if we do the initialization first, and in a controlled way and inject the defaults retrieved from providers, this will not be needed any more. I think one of the critical places there are CLi parameters, on one hand, some of them are reading the values from config but they really should be read before ProvidersManager() was initialized. This might however also change with #59805 of @jason810496 -> because with that change (after performance testing) it turns out that we can initialize ProvidersManager() partially when we are parsing CLI - without initializing all of it it as provider's manager has partial lazy initialization built-in. Once this is done, I think the main reason where we could not have provider's manager initialization done realatively early and before all places we needed - will be gone, and we might get rid of those pesky fallback_defaults. |
|
@potiuk I fully agree with the direction you described. Would you prefer that this work waits until the initialization/injection mechanism is implemented first, Happy to proceed in whichever way makes the most sense. (+ follow-up) Given that, would it make sense to proceed with this PR using the same fallback approach for now, and then follow up with a separate issue/PR to introduce the proper initialization/injection mechanism, after which we could remove those fallbacks in a single cleanup? |
I think it was a bit too hasty, and here we will have a lot of fallbacks in a few places if we want to remove it from cncf.kubernetes - I think edge provider had only a few of those, but there are lot more here. And I think that was the main reason why the "default.cfg" was introduced - that it would introduce a LOT of non-DRY duplication across a number of files - the fallbacks would have to be repeated, so puting them together in a single file, was a lot more reasonable and maintainable. I think it's difficult to implement the generic mechanism of this sort without having an implemtnation of it to test it, so I think the best idea is to implement it in this PR - at least to make the tests pass - when they do, we might want to extract the "generic" config part to a separate - prerequisite - part. This is what we often do here - implement a bigger PR so that we know "where we are going and testing it" and then, when succeeding - splitting it up. That seems to be most effective way. So if you are ok with it - and no more comments from others, feel free to proceed this way - this will make this one longer to implement, but then as a follow-up we will apply it to edge as well and see that it's "generic-enough" |
|
cc @sunank200 fyi on the comments above, I agree with that direction. |
|
Thanks for the detailed guidance :) |

This is PR migrates the cncf provider to use
airflow.sdk.configuration.confinstead ofairflow.configuration.conf.This change maintains backward compatibility through the common compat module.
related: #60000
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.