Centralized runtime control of Edge Worker concurrency in distributed deployments#62896
Centralized runtime control of Edge Worker concurrency in distributed deployments#62896dheerajturaga wants to merge 7 commits intoapache:mainfrom
Conversation
In distributed Airflow deployments using EdgeExecutor, edge workers run
at remote sites that are often unreachable from the central control
plane — behind firewalls, in air-gapped networks, or on edge computing
nodes. Prior to this change, the worker's concurrency (the number of
tasks it runs in parallel) could only be set at startup via the -c CLI
flag. Adjusting it required direct shell access to the remote machine
and a full worker restart, causing job execution downtime and defeating
the operational value of the edge architecture.
This contribution solves the problem entirely by introducing a
server-driven concurrency control mechanism that works within the
existing security and connectivity constraints of edge deployments.
Rather than requiring a new communication channel, it repurposes the
worker's existing heartbeat protocol — the only bidirectional link
between the central Airflow API server and the remote worker — to
deliver concurrency updates. An administrator runs:
airflow edge set-worker-concurrency --edge-hostname <worker> --concurrency <n>
This writes the desired concurrency to the central database. On the
worker's next heartbeat (typically within seconds), the API server
returns the new value in its response, and the worker adopts it
immediately — no restart, no remote access, no downtime.
The design follows the established pattern for queue management
introduced in this provider, but extends it to a runtime-mutable
execution parameter for the first time. It required coordinated changes
across five architectural layers: the database schema (new migration),
the SQLAlchemy model, the FastAPI worker API response contract, the
async worker heartbeat loop, and the CLI command surface. The solution
preserves the core architectural guarantee of the edge executor — that
workers never hold a direct database connection — while making a
previously static configuration parameter dynamically controllable from
the central site
This capability was made possible by DB schema versioning introduced in apache#61155
jscheffl
left a comment
There was a problem hiding this comment.
Cool improvement! Actually had the exact same demand today with our deployment.
One remark, one question.
And one additional: Would you also extend the web UI in a follow-up that similar like queues the concurrency can be adjusted in UI?
Yes! UI and docs will be updated in a follow up once this is merged! |
…stalls When edge3 tables existed before alembic tracking was introduced, airflow db migrate would stamp directly to head without applying incremental migrations, leaving the schema out of sync (e.g. missing the concurrency column added in 3.2.0).
|
Cool! Then once the CI errors are fixed I can make a second pass review and then LGTM! |
@jscheffl done! |
In distributed Airflow deployments using EdgeExecutor, edge workers run
at remote sites that are often unreachable from the central control
plane — behind firewalls, in air-gapped networks, or on edge computing
nodes. Prior to this change, the worker's concurrency (the number of
tasks it runs in parallel) could only be set at startup via the -c CLI
flag. Adjusting it required direct shell access to the remote machine
and a full worker restart, causing job execution downtime and defeating
the operational value of the edge architecture.
This contribution solves the problem entirely by introducing a
server-driven concurrency control mechanism that works within the
existing security and connectivity constraints of edge deployments.
Rather than requiring a new communication channel, it repurposes the
worker's existing heartbeat protocol — the only bidirectional link
between the central Airflow API server and the remote worker — to
deliver concurrency updates. An administrator runs:
This writes the desired concurrency to the central database. On the
worker's next heartbeat, the API server
returns the new value in its response, and the worker adopts it
immediately — no restart, no remote access, no downtime.
The design follows the established pattern for queue management
introduced in this provider, but extends it to a runtime-mutable
execution parameter for the first time. It required coordinated changes
across five architectural layers: the database schema (new migration),
the SQLAlchemy model, the FastAPI worker API response contract, the
async worker heartbeat loop, and the CLI command surface. The solution
preserves the core architectural guarantee of the edge executor — that
workers never hold a direct database connection — while making a
previously static configuration parameter dynamically controllable from
the central site.
This capability was made possible by the database schema versioning foundation
introduced in #61155
Was generative AI tooling used to co-author this PR?
ClaudeCode