Skip to content

Centralized runtime control of Edge Worker concurrency in distributed deployments#62896

Open
dheerajturaga wants to merge 7 commits intoapache:mainfrom
dheerajturaga:feat/edge-cli-concurrency
Open

Centralized runtime control of Edge Worker concurrency in distributed deployments#62896
dheerajturaga wants to merge 7 commits intoapache:mainfrom
dheerajturaga:feat/edge-cli-concurrency

Conversation

@dheerajturaga
Copy link
Member

@dheerajturaga dheerajturaga commented Mar 4, 2026

In distributed Airflow deployments using EdgeExecutor, edge workers run
at remote sites that are often unreachable from the central control
plane — behind firewalls, in air-gapped networks, or on edge computing
nodes. Prior to this change, the worker's concurrency (the number of
tasks it runs in parallel) could only be set at startup via the -c CLI
flag. Adjusting it required direct shell access to the remote machine
and a full worker restart, causing job execution downtime and defeating
the operational value of the edge architecture.

This contribution solves the problem entirely by introducing a
server-driven concurrency control mechanism that works within the
existing security and connectivity constraints of edge deployments.
Rather than requiring a new communication channel, it repurposes the
worker's existing heartbeat protocol — the only bidirectional link
between the central Airflow API server and the remote worker — to
deliver concurrency updates. An administrator runs:

airflow edge set-worker-concurrency --edge-hostname <worker> --concurrency <n>

This writes the desired concurrency to the central database. On the
worker's next heartbeat, the API server
returns the new value in its response, and the worker adopts it
immediately — no restart, no remote access, no downtime.

The design follows the established pattern for queue management
introduced in this provider, but extends it to a runtime-mutable
execution parameter for the first time. It required coordinated changes
across five architectural layers: the database schema (new migration),
the SQLAlchemy model, the FastAPI worker API response contract, the
async worker heartbeat loop, and the CLI command surface. The solution
preserves the core architectural guarantee of the edge executor — that
workers never hold a direct database connection — while making a
previously static configuration parameter dynamically controllable from
the central site.

This capability was made possible by the database schema versioning foundation
introduced in #61155

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)
    ClaudeCode

  In distributed Airflow deployments using EdgeExecutor, edge workers run
  at remote sites that are often unreachable from the central control
  plane — behind firewalls, in air-gapped networks, or on edge computing
  nodes. Prior to this change, the worker's concurrency (the number of
  tasks it runs in parallel) could only be set at startup via the -c CLI
  flag. Adjusting it required direct shell access to the remote machine
  and a full worker restart, causing job execution downtime and defeating
  the operational value of the edge architecture.

  This contribution solves the problem entirely by introducing a
  server-driven concurrency control mechanism that works within the
  existing security and connectivity constraints of edge deployments.
  Rather than requiring a new communication channel, it repurposes the
  worker's existing heartbeat protocol — the only bidirectional link
  between the central Airflow API server and the remote worker — to
  deliver concurrency updates. An administrator runs:

    airflow edge set-worker-concurrency --edge-hostname <worker> --concurrency <n>

  This writes the desired concurrency to the central database. On the
  worker's next heartbeat (typically within seconds), the API server
  returns the new value in its response, and the worker adopts it
  immediately — no restart, no remote access, no downtime.

  The design follows the established pattern for queue management
  introduced in this provider, but extends it to a runtime-mutable
  execution parameter for the first time. It required coordinated changes
  across five architectural layers: the database schema (new migration),
  the SQLAlchemy model, the FastAPI worker API response contract, the
  async worker heartbeat loop, and the CLI command surface. The solution
  preserves the core architectural guarantee of the edge executor — that
  workers never hold a direct database connection — while making a
  previously static configuration parameter dynamically controllable from
  the central site
This capability was made possible by DB schema versioning introduced in apache#61155
@dheerajturaga dheerajturaga requested a review from jscheffl as a code owner March 4, 2026 22:22
@boring-cyborg boring-cyborg bot added area:providers provider:edge Edge Executor / Worker (AIP-69) / edge3 labels Mar 4, 2026
@dheerajturaga dheerajturaga changed the title Centralized runtime control of Edge Worker concurrency in distributed Airflow deployments Centralized runtime control of Edge Worker concurrency in distributed deployments Mar 4, 2026
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool improvement! Actually had the exact same demand today with our deployment.

One remark, one question.

And one additional: Would you also extend the web UI in a follow-up that similar like queues the concurrency can be adjusted in UI?

@dheerajturaga
Copy link
Member Author

dheerajturaga commented Mar 5, 2026

Cool improvement! Actually had the exact same demand today with our deployment.

One remark, one question.

And one additional: Would you also extend the web UI in a follow-up that similar like queues the concurrency can be adjusted in UI?

Yes! UI and docs will be updated in a follow up once this is merged!
I had this ask from my userbase for a while now but I needed #61155 first to allow for this :)

…stalls

  When edge3 tables existed before alembic tracking was introduced,
  airflow db migrate would stamp directly to head without applying
  incremental migrations, leaving the schema out of sync (e.g. missing
  the concurrency column added in 3.2.0).
@jscheffl
Copy link
Contributor

jscheffl commented Mar 6, 2026

Cool! Then once the CI errors are fixed I can make a second pass review and then LGTM!

@dheerajturaga
Copy link
Member Author

Cool! Then once the CI errors are fixed I can make a second pass review and then LGTM!

@jscheffl done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:edge Edge Executor / Worker (AIP-69) / edge3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants