-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow Provider(s)
edge3
Versions of Apache Airflow Providers
apache-airflow-providers-celery==3.13.0
apache-airflow-providers-common-compat==1.8.3
apache-airflow-providers-common-io==1.6.4
apache-airflow-providers-common-sql=1.28.2
apache-airflow-providers-fab==3.0.1
apache-airflow-providers-standard==1.9.1
apache-airflow-providers-postgres==6.4.0
apache-airflow-providers-edge3==1.4.0
Apache Airflow version
3.1.2
Operating System
Debian GNU/Linux 12 (bookworm)
Deployment
Official Apache Airflow Helm Chart
Deployment details
Setup scale: (yes we know this is very large scale)
8 api servers x 64 worker processes each
9 Edge workers attempting to connect continuously (and continuously failing)
etc... (rest are not relevant)
What happened
Upon edge worker startup we get the following error:
405 Client Error: Method not allowed for url: http:///edge_worker/v1/worker/
Possible relevant values in the helm chart values.yaml:
apiSecretKey: <VALUE>
jwtSecret: <VALUE>
config:
edge:
api_enabled: 'True'
api_url: ''http://<AIRFLOW-API-SERVER-URL>/edge_worker/v1/rcpapi'
This behavior seems to come and go - workers can fail on startup and after some time start up normally.
What you think should happen instead
No response
How to reproduce
Reproducing this seems to be tricky. Even for us this behavior sometimes exists and other times not. The best information we can give for reproducing this seems to be the airflow config.
Anything else
405 Method not allowed seems to suggest that the API servers did not start with the necessary portion of the edge plugin.
This is very odd, and is the kind of behavior we'd expect if the config lacked edge: api_enabled: 'True', api_url: '...'
However we double and triple checked that these configurations are set in the api servers (and on the edge worker too, for what it's worth)
Additional note (could be related):
Upon startup of each api-server we get an error related to edge plugin import:
[error] Failed to import plugin edge_executor [airflow.plugins_manager] loc=plugins_manager.py:268
....
psycop2.errors.LockNotAvailable: canceling statement due to lock timeout
We checked the lock timeout on our postgres db and it is set to 0 (unlimited)
We aren't pointing to this as the root cause because previously edge has worked for us while this plugin has been similarly not imported. But this might point in the right direction
We know this isn't related to the amount of api_servers (we tried with 1) but it might be related to the amount of worker processes - we don't know for sure.
Thank you for the time considering our issue!!
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct