Two-token mechanism for task execution to prevent token expiration while tasks wait in executor queues #60108

anishgirianish · 2026-01-04T20:57:42Z

Tasks waiting in Celery queue may have their JWT tokens expire before execution starts.
Fixes: #53713

Summary

Fixes #59553 - Tasks waiting in Celery queue fail when JWT tokens expire
before execution starts.

Implements a two-token mechanism for task execution to prevent token
expiration while tasks wait in executor queues.

How it works

Workload Token: Long-lived token (24h default) sent with task
workloads. Can only be used to call the /run endpoint.
Execution Token: Short-lived token (10min default) issued by /run for
subsequent API calls during task execution.

Changes

Add JWTBearerWorkloadDep dependency for workload token validation on
/run endpoint
Add TOKEN_SCOPE_WORKLOAD, SCOPE_MAPPING, and generate_workload_token()
in tokens.py
Update /run endpoint to accept workload tokens and return execution
tokens
Add jwt_workload_token_expiration_time config option (default: 86400s)
Implement scope validation using FastAPI's SecurityScopes mechanism
Reject workload-scoped tokens on all other endpoints (execution scope
required)

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

tirkarthi · 2026-01-07T02:00:17Z

As per my understanding this was removed in #55506 to use a middleware that refreshes token. Are you running an instance with execution api only separately with api-server? Could this middleware approach be extended for task-sdk calls too?

cc: @vincbeck @pierrejeambrun

anishgirianish · 2026-01-07T05:29:32Z

Hi @tirkarthi,
Thanks for pointing out the middleware approach from #55506 - that's helpful context.

I took a stab at extending that pattern in #60197, handling expired tokens transparently in JWTBearer + middleware so no client-side changes are needed. Would love your thoughts on it.

Totally happy to go with whichever approach the team feels is better!

cc: @vincbeck @pierrejeambrun

vincbeck · 2026-01-07T14:32:58Z

Hi @tirkarthi, Thanks for pointing out the middleware approach from #55506 - that's helpful context.

I took a stab at extending that pattern in #60197, handling expired tokens transparently in JWTBearer + middleware so no client-side changes are needed. Would love your thoughts on it.

Totally happy to go with whichever approach the team feels is better!

cc: @vincbeck @pierrejeambrun

Would love to hear @ashb or @amoghrajesh 's opinion on this one

ashb

We can't do this approach. It lets any Execution API token be resurrected which fundamentally breaks lots of security assumptions -- it amounts to having tokens not expire. That is bad.

Instead what we should do is generate a new token (i.e. ones with extra/different set of JWT claims) that is only valid for the /run endpoint and valid for longer (say 24hours, make it configurable) and this is what gets sent in the workload.

The run endpoint then would set the header to give the running task a "short lived" token (the one we have right now basically) that is usable on the rest of the Execution API. This approach is safer as the existing controls in the /run endpoint already prevent a task being run one than once, which should also prevent against "resurrecting" an expired token and using it to access things like connections etc. And we should validate that the token used on all endpoints but run is explicitly lacking this new claim.

ashb

Much better approach, and on the right track, thanks.

Some changes though:

"queue" is not the right thing to use, as these tokens could be used for executing other workloads soon (for instance we have already talked about wanting Dag level callbacks to be executed on the workers, not in the dag processor, which would be done by having a new type from the ExecuteTaskWorkload).

so maybe we have "scope": "ExecuteTaskWorkload"?
A little bit of refactoring is needed before we are ready to merge this.

airflow-core/src/airflow/api_fastapi/auth/tokens.py

airflow-core/src/airflow/api_fastapi/execution_api/deps.py

airflow-core/src/airflow/api_fastapi/execution_api/app.py

airflow-core/src/airflow/api_fastapi/execution_api/deps.py

airflow-core/src/airflow/api_fastapi/execution_api/routes/__init__.py

airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py

airflow-core/src/airflow/config_templates/config.yml

ashb · 2026-01-15T14:19:09Z

airflow-core/src/airflow/api_fastapi/execution_api/app.py

+            from airflow.api_fastapi.execution_api.deps import _container
+
+            class InProcessContainer:
+                """Minimal container for in-process execution, bypassing svcs lifecycle."""


Why can't we use svcs? Why do we need to implement our version of it?

I'm still getting familiar with the codebase, but from what I understand, I added this to fix ServiceNotFoundError failures in CI. With InProcessExecutionAPI, the svcs lifespan runs later (when transport is accessed), but services like JWTGenerator are needed before that. This container bypasses the lifecycle and returns pre-created instances from app.state. I may well be missing something - if there's a cleaner pattern you'd recommend, I'd really appreciate the guidance.

airflow-core/src/airflow/api_fastapi/execution_api/deps.py

ashb · 2026-01-15T14:30:40Z

airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py

+
+JWTBearerWorkloadDep = Depends(JWTBearerWorkloadScope(path_param_name="task_instance_id"))
+
+ti_run_router = VersionedAPIRouter(dependencies=[JWTBearerWorkloadDep])


We shouldn't need this extra router, when we already have router, and we can then declare the JWTBearerWorkloadDep on the path

ashb

I'm not sold yet on the way we are handling the dependencies/mutliple routers etc. Let me have a look at what options we have in FastAPI as i feel this should be possible in a nicer manner.

I'm seeing if we can make use of https://fastapi.tiangolo.com/advanced/security/oauth2-scopes/ or something like it.

anishgirianish · 2026-01-17T09:12:02Z

I'm not sold yet on the way we are handling the dependencies/mutliple routers etc. Let me have a look at what options we have in FastAPI as i feel this should be possible in a nicer manner.

I'm seeing if we can make use of https://fastapi.tiangolo.com/advanced/security/oauth2-scopes/ or something like it.

@ashb That sounds like a great approach! I'd be really grateful if you could share your findings on FastAPI OAuth2 scopes - happy to refactor based on your recommendations. Thank you so much for taking the time to look into this!

anishgirianish · 2026-01-23T03:30:18Z

Hi @ashb, thank you for the feedback on the earlier iteration.
As you suggested, I implemented scope validation using FastAPI's SecurityScopes mechanism. The approach uses a single router with JWTBearerBaseDep at the router level for basic auth, and scope checks are done at the endpoint/module level via JWTBearerDep (execution scope) and JWTBearerWorkloadDep (workload scope for /run only).

Would really appreciate another look when you get a chance. Happy to make any further adjustments based on your feedback. Thanks!

anishgirianish requested review from amoghrajesh, ashb and kaxil as code owners January 4, 2026 20:57

boring-cyborg bot added area:API Airflow's REST/HTTP API area:task-sdk labels Jan 4, 2026

anishgirianish force-pushed the fix/token-expiration-worker branch from b183c74 to 9c31417 Compare January 4, 2026 21:05

anishgirianish mentioned this pull request Jan 4, 2026

AIRFLOW__SCHEDULER__TASK_QUEUED_TIMEOUT configuration ignored #59553

Closed

2 tasks

anishgirianish force-pushed the fix/token-expiration-worker branch 3 times, most recently from c707ddc to 4ef9dfe Compare January 4, 2026 22:45

eladkal added this to the Airflow 3.1.6 milestone Jan 6, 2026

anishgirianish mentioned this pull request Jan 7, 2026

fix(execution-api): Refresh expired JWT tokens for active tasks #59553 #60197

Closed

ephraimbuddy modified the milestones: Airflow 3.1.6, Airflow 3.1.7 Jan 7, 2026

ashb requested changes Jan 7, 2026

View reviewed changes

anishgirianish force-pushed the fix/token-expiration-worker branch from 4ef9dfe to b32da6b Compare January 8, 2026 18:11

anishgirianish requested review from XD-DENG, hussein-awala, o-nikolas, pierrejeambrun and vincbeck as code owners January 8, 2026 18:11

anishgirianish force-pushed the fix/token-expiration-worker branch from 14a516a to 5915391 Compare January 9, 2026 02:05

ashb reviewed Jan 9, 2026

View reviewed changes

ashb self-requested a review January 9, 2026 12:09

anishgirianish force-pushed the fix/token-expiration-worker branch from e7e3ae1 to e879863 Compare January 9, 2026 23:52

anishgirianish changed the title ~~Add token refresh mechanism for Execution API (#59553)~~ Two-token mechanism for task execution to prevent token expiration while tasks wait in executor queues (#59553) Jan 10, 2026

anishgirianish force-pushed the fix/token-expiration-worker branch from b511b8f to 57ac225 Compare January 10, 2026 07:07

ashb reviewed Jan 15, 2026

View reviewed changes

airflow-core/src/airflow/api_fastapi/execution_api/deps.py Outdated Show resolved Hide resolved

ashb reviewed Jan 15, 2026

View reviewed changes

airflow-core/src/airflow/api_fastapi/execution_api/deps.py Outdated Show resolved Hide resolved

ashb reviewed Jan 15, 2026

View reviewed changes

ashb removed the area:API Airflow's REST/HTTP API label Jan 15, 2026

anishgirianish force-pushed the fix/token-expiration-worker branch from 0e19b1e to a336f6e Compare January 17, 2026 08:59

anishgirianish force-pushed the fix/token-expiration-worker branch from a336f6e to 5480717 Compare January 23, 2026 01:06

anishgirianish added 19 commits January 22, 2026 21:25

added two token mechanism for task execution

04bdf8c

fix failing tests

e104ffa

fix failing test

e590545

further enhanced the implementation

a4eb612

further clean ups

560fe2f

clean ups

ee8a73d

test fixes

b4a5e05

fix jwt bearer class not registered

041b003

clean ups uneccesary overrides

f34cef4

clean ups

8ea9379

add back the override

66f92d4

clean up ovveride

c970271

rollback to explicit override

e977817

use real container

27b0e8b

refactor in process container override

d782eb9

bring back the override

8264c20

refactored token checks and dependecy

d38bb20

implement scope based token authentication

3e0e443

fix failing test

7ac2da1

anishgirianish force-pushed the fix/token-expiration-worker branch from 5d8bd80 to 7ac2da1 Compare January 23, 2026 03:25


		JWTBearerWorkloadDep = Depends(JWTBearerWorkloadScope(path_param_name="task_instance_id"))

		ti_run_router = VersionedAPIRouter(dependencies=[JWTBearerWorkloadDep])

Two-token mechanism for task execution to prevent token expiration while tasks wait in executor queues #60108

Are you sure you want to change the base?

Two-token mechanism for task execution to prevent token expiration while tasks wait in executor queues #60108

Conversation

anishgirianish commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tirkarthi commented Jan 7, 2026

Uh oh!

anishgirianish commented Jan 7, 2026

Uh oh!

vincbeck commented Jan 7, 2026

Uh oh!

ashb left a comment

Choose a reason for hiding this comment

Uh oh!

ashb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ashb Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

anishgirianish Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ashb Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

ashb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anishgirianish commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anishgirianish commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

anishgirianish commented Jan 4, 2026 •

edited

Loading

anishgirianish Jan 17, 2026 •

edited

Loading

ashb left a comment •

edited

Loading

anishgirianish commented Jan 17, 2026 •

edited

Loading