Skip to content

[airflow] Implement airflow-task-implicit-multiple-outputs (AIR202)#25152

Open
Dev-iL wants to merge 5 commits into
astral-sh:mainfrom
Dev-iL:2605/AIR/multiple_outputs/with_skill
Open

[airflow] Implement airflow-task-implicit-multiple-outputs (AIR202)#25152
Dev-iL wants to merge 5 commits into
astral-sh:mainfrom
Dev-iL:2605/AIR/multiple_outputs/with_skill

Conversation

@Dev-iL
Copy link
Copy Markdown
Contributor

@Dev-iL Dev-iL commented May 14, 2026

Important

Disclosure: this PR description was AI-drafted under my direction and reviewed by me before posting.

Summary

Implements rule AIR202 (airflow-task-implicit-multiple-outputs) that flags @task-decorated functions whose multiple_outputs behavior is determined by Airflow's runtime inference rather than being set explicitly on the decorator.

At runtime, Airflow's _infer_multiple_outputs inspects the function's return annotation: if it resolves to a subclass of collections.abc.Mapping, the return value is split into one XCom per key; otherwise the entire return value is stored as a single XCom. This couples typing to XCom layout in a non-obvious way — renaming, removing, or refining the return annotation silently changes a DAG's XCom behavior. Passing multiple_outputs= explicitly makes the author's intent clear and insulates the DAG from future changes to inference.

What the rule flags

from airflow.sdk import task  # or equivalent Airflow 2.x import path


# AIR202 — annotation in the Mapping family
@task
def extract() -> dict:
    return {"x": 1, "y": 2}


# AIR202 — body returns a dict literal
@task()
def transform():
    return {"a": 1}


# AIR202 — call-form decorator with unrelated kwargs
@task(retries=3)
def with_retries() -> dict[str, int]:
    return {"n": 1}


# AIR202 — variant decorators with parens
@task.short_circuit(trigger_rule="all_done")
def gate() -> dict:
    return {"go": True}

Suggested fix

The fix is always available but marked unsafe: it inserts the value Airflow would have inferred, preserving runtime behavior at the moment of the rewrite, but the author may have intended a different XCom layout and a multi-return function may not always return a dict.

@task(multiple_outputs=True)
def extract() -> dict:
    return {"x": 1, "y": 2}


@task(multiple_outputs=True)
def transform():
    return {"a": 1}


@task(retries=3, multiple_outputs=True)
def with_retries() -> dict[str, int]:
    return {"n": 1}

For the call-form path (@task(...) with existing parentheses), the fix uses add_argument so existing kwargs and formatting are preserved, including on multiline decorators. For the bare form (@task, @task.branch), the fix appends (multiple_outputs=...) after the decorator expression.

Detection signals

The rule fires when either of two independent signals holds and multiple_outputs is absent from the decorator:

  1. Annotation signal — the return annotation resolves to a member of the Mapping family: dict, Dict, Mapping, MutableMapping, OrderedDict, DefaultDict, Counter, ChainMap, TypedDict (and the collections / typing variants thereof).
  2. Body signal — the function body contains at least one return statement whose value is a dict literal, a dict comprehension, or a dict(...) call. ReturnStatementVisitor is used so returns inside nested if / for / while / try blocks are considered, while returns inside nested function definitions or class bodies are correctly not considered.

What it does NOT flag

  • Functions where multiple_outputs= is already set explicitly (either True or False).
  • @task.sensor-decorated functions — the sensor decorator hardcodes multiple_outputs=False.
  • Functions whose only dict-returning code lives in a nested function or class method (the outer task is not affected).
  • Functions whose return annotation is not in the Mapping family and whose body does not return a dict-shaped value.

Implementation details

  • Resolves @task / @task.<variant> via the semantic model, handling both bare (@task) and call (@task(...)) forms via map_callable. The supported variants list is local to the rule: python, virtualenv, external_python, branch, branch_virtualenv, branch_external_python, short_circuit, docker, kubernetes, pyspark.
  • Annotation matching walks subscripts (e.g., dict[str, int], Mapping[str, Any]) down to the head and uses match_typing_expr / known-module resolution.
  • Body matching uses ReturnStatementVisitor from ruff_python_ast::helpers, which already skips nested function and class definitions — so nested-scope dict returns are correctly ignored without a custom traversal.
  • The inserted value (True / False) mirrors Airflow's inference: True when the annotation is in the Mapping family, False otherwise. When both signals fire but the annotation is non-Mapping (e.g., body returns a dict but the annotation is list), the inserted value is still False to preserve runtime behavior.

Test Plan

Added snapshot tests in AIR202.py covering:

Group Cases
Annotation positives dict, dict[str, int], Dict, Mapping, MutableMapping, OrderedDict, DefaultDict, Counter, ChainMap, TypedDict
Body positives dict literal, dict comprehension, dict(...) call
Decorator forms bare @task, @task(), @task(retries=3), multiline @task(...), @task.docker(image=...), @task.virtualenv(), @task.short_circuit(trigger_rule=...), @task.branch
Explicit-kwarg negatives multiple_outputs=True, multiple_outputs=False — never flagged
Sensor negative @task.sensor returning a dict — never flagged
Nested-scope negatives outer task with inner function returning a dict; outer task with inner class whose method returns a dict — never flagged

A sweep against the local Apache Airflow checkout (cargo run -p ruff -- check ~/airflow --no-cache --preview --select AIR202) produced 19 diagnostics on the Airflow codebase itself, all of which were verified to be genuine cases of implicit multiple_outputs inference (no false positives).

Related:

Community approval: https://lists.apache.org/thread/99s321s2p5xqhzlm0x5wjgy044nggtpm

@astral-sh-bot astral-sh-bot Bot requested a review from ntBre May 14, 2026 09:51
@Dev-iL
Copy link
Copy Markdown
Contributor Author

Dev-iL commented May 14, 2026

CC: @MichaReiser @Lee-W @sjyangkevin

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented May 14, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+19 -0 violations, +0 -0 fixes in 1 projects; 55 projects unchanged)

apache/airflow (+19 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --no-fix --output-format concise --preview --select ALL

+ airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_task_instances.py:346:17: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ airflow-core/tests/unit/models/test_dagrun.py:2623:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/amazon/tests/system/amazon/aws/example_sagemaker_condition.py:104:5: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/amazon/tests/system/amazon/aws/example_sagemaker_condition.py:63:5: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/amazon/tests/system/amazon/aws/example_ssm.py:123:1: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/cncf/kubernetes/tests/unit/cncf/kubernetes/decorators/test_kubernetes.py:146:13: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/cncf/kubernetes/tests/unit/cncf/kubernetes/decorators/test_kubernetes.py:82:13: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/google/tests/system/google/cloud/cloud_sql/example_cloud_sql_query_iam.py:349:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/google/tests/system/google/cloud/cloud_sql/example_cloud_sql_query_iam.py:364:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/google/tests/system/google/cloud/cloud_sql/example_cloud_sql_query_ssl.py:356:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/google/tests/system/google/cloud/cloud_sql/example_cloud_sql_query_ssl.py:371:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/google/tests/system/google/cloud/datafusion/example_datafusion.py:231:5: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/standard/tests/unit/standard/decorators/test_python.py:142:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/standard/tests/unit/standard/decorators/test_python.py:149:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/standard/tests/unit/standard/decorators/test_python.py:161:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/standard/tests/unit/standard/decorators/test_python.py:195:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ task-sdk/tests/task_sdk/definitions/decorators/test_task_group.py:166:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ task-sdk/tests/task_sdk/definitions/decorators/test_task_group.py:225:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ task-sdk/tests/task_sdk/execution_time/test_task_runner.py:2931:13: AIR202 `@task`-decorated function relies on `multiple_outputs` inference

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
AIR202 19 19 0 0 0

@ntBre ntBre added rule Implementing or modifying a lint rule preview Related to preview mode features labels May 14, 2026
Copy link
Copy Markdown
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks reasonable to me if @Lee-W is happy with it. I just had a few minor comments.

Comment on lines +120 to +122
let inferred = annotation_is_mapping;
let mut diagnostic = checker.report_diagnostic(
AirflowTaskImplicitMultipleOutputs { inferred },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inferred doesn't really make it obvious to me what this field means. I might change the struct definition to use annotation_is_mapping too and then do something like this:

Suggested change
let inferred = annotation_is_mapping;
let mut diagnostic = checker.report_diagnostic(
AirflowTaskImplicitMultipleOutputs { inferred },
let mut diagnostic = checker.report_diagnostic(
AirflowTaskImplicitMultipleOutputs { annotation_is_mapping },

Another good name might be has_mapping_annotation, but this is a pretty small nit either way.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the struct field and the local to annotation_is_mapping throughout

// `@task.<variant>` or `@task.<variant>()`.
if let Expr::Attribute(ExprAttribute { value, attr, .. }) = expr {
let variant = attr.as_str();
if !SUPPORTED_VARIANTS.contains(&variant) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another small nit, but if this is the only use of SUPPORTED_VARIANTS, I'd probably just inline it as a matches! here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlined as a matches! in is_supported_task_decorator, kept the task.sensor exclusion note as an inline comment.

Comment on lines +118 to +135
AIR202 [*] `@task`-decorated function relies on `multiple_outputs` inference
--> AIR202.py:49:1
|
49 | @task
| ^^^^^
50 | def annotated_typed_dict() -> MyTD: # AIR202
51 | return {"x": 1}
|
help: Add `multiple_outputs=False`
46 | x: int
47 |
48 |
- @task
49 + @task(multiple_outputs=False)
50 | def annotated_typed_dict() -> MyTD: # AIR202
51 | return {"x": 1}
52 |
note: This is an unsafe fix and may change runtime behavior
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the expected behavior here? It might be tricky to get this right, but I assumed that MyTD should ideally be recognized as a Mapping subclass and get a multiple_outputs=True suggestion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem I see is in order to get this perfectly right we need to venture into type checker territory, don't we? The practical tradeoff was to detect only "obvious" mapping types.

How do you suggest to approach this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since TypedDict specifically has dedicated support in ruff (analyze::class::any_qualified_base_class + match_typing_qualified_name) - I've added it. So now, when the annotation is a Name referencing a locally-defined class whose base chain includes typing.TypedDict, we treat it as Mapping and fix to multiple_outputs=True. The fixture's expected fix flipped to True.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is the ideal middle ground. It's somewhat unlikely that a customized class would be used here.

Comment on lines +114 to +116
let body_returns_dict = body_has_dict_return(function_def, semantic);

if !annotation_is_mapping && !body_returns_dict {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do something like this?

Suggested change
let body_returns_dict = body_has_dict_return(function_def, semantic);
if !annotation_is_mapping && !body_returns_dict {
if !annotation_is_mapping && !body_has_dict_return(function_def, semantic) {

That way we can avoid the more expensive function body traversal if the first check short-circuits.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short-circuiting added


/// Returns `true` if any return statement in the function body returns an
/// inline dict literal, a dict comprehension, or a `dict(...)` call.
fn body_has_dict_return(function_def: &StmtFunctionDef, semantic: &SemanticModel) -> bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also check for other Mapping subclasses, or is this part of the check intentionally more conservative?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning as #25152 (comment)

I'm worried that checking for more types would make the rule much more complex.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AI's explanation of this:

Detecting that an arbitrary expression returns a Mapping subclass requires type inference (the expression could be a variable, a call result, a comprehension projecting into a custom class, etc.). The annotation-path check, by contrast, now does handle TypedDict subclasses. The body-path check stays focused on the three syntactic patterns that are unambiguously dicts at the AST level: dict literals, dict comprehensions, and dict(...) calls. Widening it would require either trusting variable type inference or accepting a high false-positive rate.

Dev-iL and others added 5 commits May 15, 2026 14:01
Flags `@task`-decorated functions whose `multiple_outputs` behavior is
determined by Airflow's runtime annotation-based inference rather than
being set explicitly. Triggers when the decorator omits
`multiple_outputs=` and either the return annotation resolves to a
`collections.abc.Mapping` subclass or the body returns a dict literal,
dict comprehension, or `dict(...)` call. Provides an unsafe autofix that
inserts `multiple_outputs=True` (annotation path) or `=False` (body-only
path), mirroring Airflow's `_infer_multiple_outputs` logic.

Covers `@task` and supported variants (`python`, `virtualenv`,
`external_python`, `branch`, `short_circuit`, `docker`, `kubernetes`,
`pyspark`, …). Excludes `@task.sensor`, which hardcodes
`multiple_outputs=False`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds two negative test cases that confirm dict returns inside nested
function definitions and inner class methods are not attributed to the
outer `@task`-decorated function, locking the `ReturnStatementVisitor`
nesting behavior against future regressions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds three new positive cases:
- @task(retries=3): call form with an existing kwarg; fix inserts
  multiple_outputs=True alongside it via add_argument
- multiline @task(\n  retries=3,\n): fix preserves indentation and
  inserts kwarg on the existing kwarg line
- @task.short_circuit(trigger_rule="all_done"): variant decorator with
  call form; fix inserts multiple_outputs=True first

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Renames struct, file, and all wiring from
AirflowTaskMultipleOutputsImplicit / task_multiple_outputs_implicit to
AirflowTaskImplicitMultipleOutputs / task_implicit_multiple_outputs.
"implicit X" puts the qualifier next to the noun it qualifies, matching
conventional English naming.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rename violation field `inferred` → `annotation_is_mapping` for clarity
- Inline `SUPPORTED_VARIANTS` allowlist as a `matches!` arm at its sole
  use site
- Short-circuit the function-body traversal when the annotation already
  proves the return type is a `Mapping` subclass
- Detect locally-defined `TypedDict` subclasses (e.g. `class MyTD(TypedDict)`)
  via `analyze::class::any_qualified_base_class`, so `-> MyTD` annotations
  now correctly fix to `multiple_outputs=True`

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Dev-iL Dev-iL force-pushed the 2605/AIR/multiple_outputs/with_skill branch from 395d15f to 0ab23ac Compare May 15, 2026 11:01
@Lee-W
Copy link
Copy Markdown
Contributor

Lee-W commented May 18, 2026

Hey, a bit of a lack of bandwidth these days. The idea definitely looks good to me @Dev-iL, and I discussed it already. let me take a look at the detail as well

Copy link
Copy Markdown
Contributor

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One nitpick question

Comment thread crates/ruff_linter/resources/test/fixtures/airflow/AIR202.py
Comment on lines +180 to +185
@task
def nested_function_returning_dict(): # should NOT flag: dict returned by inner fn, not outer
def inner():
return {"x": 1}

return inner()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@task
def nested_function_returning_dict():
    def inner() -> dict:
        return {"x": 1}

    return inner()

What will happen in Airflow if we do this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just verified. it's pretty much the same as

@task
def nested_function_returning_dict():
return {"x": 1}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case was not included since it opens a Pandora's box of recursion. I will consult my design docs in a few h and see what the exact reason behind this decision was.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it has a good reason. And I feel we should not handle it either. but yep, exploring it is never a bad thing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just verified. it's pretty much the same as

@task def nested_function_returning_dict(): return {"x": 1}

Right, I think we don't flag this because we can't tell what the user had in mind. Currently, Airflow doesn't care about the annotation of the inner function. If it did, the inferred value of multiple_outputs would be True. But is that what the user wants? Since the current implementation matches the behavior of Airflow, there's no inference surprise to warn the user about.

I suggest we can recommend users to enable ANN201 on their dags folder, and then this rule will be easily applicable.


Reason 1 - we cannot trust the annotation of the the inner functions

The example contained in the code is likely oversimplified. It is more obvious when we don't return the output of inner directly:

@task
def a():
    def inner() -> dict: ...
    return {**inner(), "extra": 1}  # merged dict: still a Mapping, but no annotation on the merge

@task
def b():
    def left() -> dict: ...
    def right() -> list: ...
    return left() if cond else right()  # conditional: one branch dict, one not

@task
def c():
    def inner() -> dict: ...
    result = inner()
    result["computed"] = expensive()
    return result  # local mutation through a variable

@task
def d():
    from .helpers import build_payload  # cross-module: annotation lives elsewhere
    return build_payload()

Handling the above properly is a type-checker's job, not a lint heuristic.

Reason 2 - ReturnStatementVisitor (intentionally) does not descend into nested scopes.

The body-path heuristic walks only the outer function's own return statements; return inner() is an Expr::Call, not a dict literal / dict comprehension / dict(...). There's nothing at the AST level that says the outer task returns a Mapping.


Bottom line - if we want to cover this case in the future, the right place is a type-checker-backed variant (e.g. running under ty).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really want to cover it... but yep, if we were to do that, ty would be a better chioce

Comment on lines +118 to +135
AIR202 [*] `@task`-decorated function relies on `multiple_outputs` inference
--> AIR202.py:49:1
|
49 | @task
| ^^^^^
50 | def annotated_typed_dict() -> MyTD: # AIR202
51 | return {"x": 1}
|
help: Add `multiple_outputs=False`
46 | x: int
47 |
48 |
- @task
49 + @task(multiple_outputs=False)
50 | def annotated_typed_dict() -> MyTD: # AIR202
51 | return {"x": 1}
52 |
note: This is an unsafe fix and may change runtime behavior
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is the ideal middle ground. It's somewhat unlikely that a customized class would be used here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preview Related to preview mode features rule Implementing or modifying a lint rule

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants