Skip to content

Conversation

@anishgirianish
Copy link
Contributor

@anishgirianish anishgirianish commented Jan 4, 2026

Tasks waiting in Celery queue may have their JWT tokens expire before execution starts.
Fixes: #53713


Summary

Fixes #59553 - Tasks waiting in Celery queue fail when JWT tokens expire
before execution starts.

Implements a two-token mechanism for task execution to prevent token
expiration while tasks wait in executor queues.

How it works

  • Workload Token: Long-lived token (24h default) sent with task
    workloads. Can only be used to call the /run endpoint.
  • Execution Token: Short-lived token (10min default) issued by /run for
    subsequent API calls during task execution.

Changes

  • Add JWTBearerWorkloadDep dependency for workload token validation on
    /run endpoint
  • Add TOKEN_SCOPE_WORKLOAD, SCOPE_MAPPING, and generate_workload_token()
    in tokens.py
  • Update /run endpoint to accept workload tokens and return execution
    tokens
  • Add jwt_workload_token_expiration_time config option (default: 86400s)
  • Implement scope validation using FastAPI's SecurityScopes mechanism
  • Reject workload-scoped tokens on all other endpoints (execution scope
    required)

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:task-sdk labels Jan 4, 2026
@anishgirianish anishgirianish force-pushed the fix/token-expiration-worker branch from b183c74 to 9c31417 Compare January 4, 2026 21:05
@anishgirianish anishgirianish force-pushed the fix/token-expiration-worker branch 3 times, most recently from c707ddc to 4ef9dfe Compare January 4, 2026 22:45
@eladkal eladkal added this to the Airflow 3.1.6 milestone Jan 6, 2026
@tirkarthi
Copy link
Contributor

As per my understanding this was removed in #55506 to use a middleware that refreshes token. Are you running an instance with execution api only separately with api-server? Could this middleware approach be extended for task-sdk calls too?

cc: @vincbeck @pierrejeambrun

@anishgirianish
Copy link
Contributor Author

Hi @tirkarthi,
Thanks for pointing out the middleware approach from #55506 - that's helpful context.

I took a stab at extending that pattern in #60197, handling expired tokens transparently in JWTBearer + middleware so no client-side changes are needed. Would love your thoughts on it.

Totally happy to go with whichever approach the team feels is better!

cc: @vincbeck @pierrejeambrun

@vincbeck
Copy link
Contributor

vincbeck commented Jan 7, 2026

Hi @tirkarthi, Thanks for pointing out the middleware approach from #55506 - that's helpful context.

I took a stab at extending that pattern in #60197, handling expired tokens transparently in JWTBearer + middleware so no client-side changes are needed. Would love your thoughts on it.

Totally happy to go with whichever approach the team feels is better!

cc: @vincbeck @pierrejeambrun

Would love to hear @ashb or @amoghrajesh 's opinion on this one

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do this approach. It lets any Execution API token be resurrected which fundamentally breaks lots of security assumptions -- it amounts to having tokens not expire. That is bad.

Instead what we should do is generate a new token (i.e. ones with extra/different set of JWT claims) that is only valid for the /run endpoint and valid for longer (say 24hours, make it configurable) and this is what gets sent in the workload.

The run endpoint then would set the header to give the running task a "short lived" token (the one we have right now basically) that is usable on the rest of the Execution API. This approach is safer as the existing controls in the /run endpoint already prevent a task being run one than once, which should also prevent against "resurrecting" an expired token and using it to access things like connections etc. And we should validate that the token used on all endpoints but run is explicitly lacking this new claim.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better approach, and on the right track, thanks.

Some changes though:

  • "queue" is not the right thing to use, as these tokens could be used for executing other workloads soon (for instance we have already talked about wanting Dag level callbacks to be executed on the workers, not in the dag processor, which would be done by having a new type from the ExecuteTaskWorkload).

    so maybe we have "scope": "ExecuteTaskWorkload"?

  • A little bit of refactoring is needed before we are ready to merge this.

@ashb ashb self-requested a review January 9, 2026 12:09
@anishgirianish anishgirianish force-pushed the fix/token-expiration-worker branch from e7e3ae1 to e879863 Compare January 9, 2026 23:52
@anishgirianish anishgirianish changed the title Add token refresh mechanism for Execution API (#59553) Two-token mechanism for task execution to prevent token expiration while tasks wait in executor queues (#59553) Jan 10, 2026
@anishgirianish anishgirianish force-pushed the fix/token-expiration-worker branch from b511b8f to 57ac225 Compare January 10, 2026 07:07
from airflow.api_fastapi.execution_api.deps import _container

class InProcessContainer:
"""Minimal container for in-process execution, bypassing svcs lifecycle."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use svcs? Why do we need to implement our version of it?

Copy link
Contributor Author

@anishgirianish anishgirianish Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still getting familiar with the codebase, but from what I understand, I added this to fix ServiceNotFoundError failures in CI. With InProcessExecutionAPI, the svcs lifespan runs later (when transport is accessed), but services like JWTGenerator are needed before that. This container bypasses the lifecycle and returns pre-created instances from app.state. I may well be missing something - if there's a cleaner pattern you'd recommend, I'd really appreciate the guidance.


JWTBearerWorkloadDep = Depends(JWTBearerWorkloadScope(path_param_name="task_instance_id"))

ti_run_router = VersionedAPIRouter(dependencies=[JWTBearerWorkloadDep])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need this extra router, when we already have router, and we can then declare the JWTBearerWorkloadDep on the path

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sold yet on the way we are handling the dependencies/mutliple routers etc. Let me have a look at what options we have in FastAPI as i feel this should be possible in a nicer manner.

I'm seeing if we can make use of https://fastapi.tiangolo.com/advanced/security/oauth2-scopes/ or something like it.

@ashb ashb removed the area:API Airflow's REST/HTTP API label Jan 15, 2026
@anishgirianish anishgirianish force-pushed the fix/token-expiration-worker branch from 0e19b1e to a336f6e Compare January 17, 2026 08:59
@anishgirianish
Copy link
Contributor Author

anishgirianish commented Jan 17, 2026

I'm not sold yet on the way we are handling the dependencies/mutliple routers etc. Let me have a look at what options we have in FastAPI as i feel this should be possible in a nicer manner.

I'm seeing if we can make use of https://fastapi.tiangolo.com/advanced/security/oauth2-scopes/ or something like it.

@ashb That sounds like a great approach! I'd be really grateful if you could share your findings on FastAPI OAuth2 scopes - happy to refactor based on your recommendations. Thank you so much for taking the time to look into this!

@anishgirianish anishgirianish force-pushed the fix/token-expiration-worker branch from a336f6e to 5480717 Compare January 23, 2026 01:06
@anishgirianish anishgirianish force-pushed the fix/token-expiration-worker branch from 5d8bd80 to 7ac2da1 Compare January 23, 2026 03:25
@anishgirianish
Copy link
Contributor Author

Hi @ashb, thank you for the feedback on the earlier iteration.
As you suggested, I implemented scope validation using FastAPI's SecurityScopes mechanism. The approach uses a single router with JWTBearerBaseDep at the router level for basic auth, and scope checks are done at the endpoint/module level via JWTBearerDep (execution scope) and JWTBearerWorkloadDep (workload scope for /run only).

Would really appreciate another look when you get a chance. Happy to make any further adjustments based on your feedback. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ExecuteTask activity token can expire before the task starts running AIRFLOW__SCHEDULER__TASK_QUEUED_TIMEOUT configuration ignored

6 participants