[`airflow`] Implement airflow-task-implicit-multiple-outputs (AIR202) by Dev-iL · Pull Request #25152 · astral-sh/ruff

Dev-iL · 2026-05-14T09:51:24Z

Important

Disclosure: this PR description was AI-drafted under my direction and reviewed by me before posting.

Summary

Implements rule AIR202 (airflow-task-implicit-multiple-outputs) that flags @task-decorated functions whose multiple_outputs behavior is determined by Airflow's runtime inference rather than being set explicitly on the decorator.

At runtime, Airflow's _infer_multiple_outputs inspects the function's return annotation: if it resolves to a subclass of collections.abc.Mapping, the return value is split into one XCom per key; otherwise the entire return value is stored as a single XCom. This couples typing to XCom layout in a non-obvious way — renaming, removing, or refining the return annotation silently changes a DAG's XCom behavior. Passing multiple_outputs= explicitly makes the author's intent clear and insulates the DAG from future changes to inference.

What the rule flags

from airflow.sdk import task  # or equivalent Airflow 2.x import path


# AIR202 — annotation in the Mapping family
@task
def extract() -> dict:
    return {"x": 1, "y": 2}


# AIR202 — body returns a dict literal
@task()
def transform():
    return {"a": 1}


# AIR202 — call-form decorator with unrelated kwargs
@task(retries=3)
def with_retries() -> dict[str, int]:
    return {"n": 1}


# AIR202 — variant decorators with parens
@task.short_circuit(trigger_rule="all_done")
def gate() -> dict:
    return {"go": True}

Suggested fix

The fix is always available but marked unsafe: it inserts the value Airflow would have inferred, preserving runtime behavior at the moment of the rewrite, but the author may have intended a different XCom layout and a multi-return function may not always return a dict.

@task(multiple_outputs=True)
def extract() -> dict:
    return {"x": 1, "y": 2}


@task(multiple_outputs=True)
def transform():
    return {"a": 1}


@task(retries=3, multiple_outputs=True)
def with_retries() -> dict[str, int]:
    return {"n": 1}

For the call-form path (@task(...) with existing parentheses), the fix uses add_argument so existing kwargs and formatting are preserved, including on multiline decorators. For the bare form (@task, @task.branch), the fix appends (multiple_outputs=...) after the decorator expression.

Detection signals

The rule fires when either of two independent signals holds and multiple_outputs is absent from the decorator:

Annotation signal — the return annotation resolves to a member of the Mapping family: dict, Dict, Mapping, MutableMapping, OrderedDict, DefaultDict, Counter, ChainMap, TypedDict (and the collections / typing variants thereof).
Body signal — the function body contains at least one return statement whose value is a dict literal, a dict comprehension, or a dict(...) call. ReturnStatementVisitor is used so returns inside nested if / for / while / try blocks are considered, while returns inside nested function definitions or class bodies are correctly not considered.

What it does NOT flag

Functions where multiple_outputs= is already set explicitly (either True or False).
@task.sensor-decorated functions — the sensor decorator hardcodes multiple_outputs=False.
Functions whose only dict-returning code lives in a nested function or class method (the outer task is not affected).
Functions whose return annotation is not in the Mapping family and whose body does not return a dict-shaped value.

Implementation details

Resolves @task / @task.<variant> via the semantic model, handling both bare (@task) and call (@task(...)) forms via map_callable. The supported variants list is local to the rule: python, virtualenv, external_python, branch, branch_virtualenv, branch_external_python, short_circuit, docker, kubernetes, pyspark.
Annotation matching walks subscripts (e.g., dict[str, int], Mapping[str, Any]) down to the head and uses match_typing_expr / known-module resolution.
Body matching uses ReturnStatementVisitor from ruff_python_ast::helpers, which already skips nested function and class definitions — so nested-scope dict returns are correctly ignored without a custom traversal.
The inserted value (True / False) mirrors Airflow's inference: True when the annotation is in the Mapping family, False otherwise. When both signals fire but the annotation is non-Mapping (e.g., body returns a dict but the annotation is list), the inserted value is still False to preserve runtime behavior.

Test Plan

Added snapshot tests in AIR202.py covering:

Group	Cases
Annotation positives	`dict`, `dict[str, int]`, `Dict`, `Mapping`, `MutableMapping`, `OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`, `TypedDict`
Body positives	dict literal, dict comprehension, `dict(...)` call
Decorator forms	bare `@task`, `@task()`, `@task(retries=3)`, multiline `@task(...)`, `@task.docker(image=...)`, `@task.virtualenv()`, `@task.short_circuit(trigger_rule=...)`, `@task.branch`
Explicit-kwarg negatives	`multiple_outputs=True`, `multiple_outputs=False` — never flagged
Sensor negative	`@task.sensor` returning a dict — never flagged
Nested-scope negatives	outer task with inner function returning a dict; outer task with inner class whose method returns a dict — never flagged

A sweep against the local Apache Airflow checkout (cargo run -p ruff -- check ~/airflow --no-cache --preview --select AIR202) produced 19 diagnostics on the Airflow codebase itself, all of which were verified to be genuine cases of implicit multiple_outputs inference (no false positives).

Dev-iL · 2026-05-14T09:52:01Z

CC: @MichaReiser @Lee-W @sjyangkevin

astral-sh-bot · 2026-05-14T09:59:14Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+19 -0 violations, +0 -0 fixes in 1 projects; 55 projects unchanged)

apache/airflow (+19 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --no-fix --output-format concise --preview --select ALL

+ airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_task_instances.py:346:17: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ airflow-core/tests/unit/models/test_dagrun.py:2623:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/amazon/tests/system/amazon/aws/example_sagemaker_condition.py:104:5: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/amazon/tests/system/amazon/aws/example_sagemaker_condition.py:63:5: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/amazon/tests/system/amazon/aws/example_ssm.py:123:1: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/cncf/kubernetes/tests/unit/cncf/kubernetes/decorators/test_kubernetes.py:146:13: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/cncf/kubernetes/tests/unit/cncf/kubernetes/decorators/test_kubernetes.py:82:13: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/google/tests/system/google/cloud/cloud_sql/example_cloud_sql_query_iam.py:349:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/google/tests/system/google/cloud/cloud_sql/example_cloud_sql_query_iam.py:364:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/google/tests/system/google/cloud/cloud_sql/example_cloud_sql_query_ssl.py:356:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/google/tests/system/google/cloud/cloud_sql/example_cloud_sql_query_ssl.py:371:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/google/tests/system/google/cloud/datafusion/example_datafusion.py:231:5: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/standard/tests/unit/standard/decorators/test_python.py:142:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/standard/tests/unit/standard/decorators/test_python.py:149:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/standard/tests/unit/standard/decorators/test_python.py:161:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ providers/standard/tests/unit/standard/decorators/test_python.py:195:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ task-sdk/tests/task_sdk/definitions/decorators/test_task_group.py:166:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ task-sdk/tests/task_sdk/definitions/decorators/test_task_group.py:225:9: AIR202 `@task`-decorated function relies on `multiple_outputs` inference
+ task-sdk/tests/task_sdk/execution_time/test_task_runner.py:2931:13: AIR202 `@task`-decorated function relies on `multiple_outputs` inference

Changes by rule (1 rules affected)

code	total	+ violation	- violation	+ fix	- fix
AIR202	19	19	0	0	0

ntBre

Thanks, this looks reasonable to me if @Lee-W is happy with it. I just had a few minor comments.

ntBre · 2026-05-14T20:26:09Z

+    let inferred = annotation_is_mapping;
+    let mut diagnostic = checker.report_diagnostic(
+        AirflowTaskImplicitMultipleOutputs { inferred },


inferred doesn't really make it obvious to me what this field means. I might change the struct definition to use annotation_is_mapping too and then do something like this:

Suggested change

let inferred = annotation_is_mapping;

let mut diagnostic = checker.report_diagnostic(

AirflowTaskImplicitMultipleOutputs { inferred },

let mut diagnostic = checker.report_diagnostic(

AirflowTaskImplicitMultipleOutputs { annotation_is_mapping },

Another good name might be has_mapping_annotation, but this is a pretty small nit either way.

Renamed the struct field and the local to annotation_is_mapping throughout

ntBre · 2026-05-14T20:38:33Z

+    // `@task.<variant>` or `@task.<variant>()`.
+    if let Expr::Attribute(ExprAttribute { value, attr, .. }) = expr {
+        let variant = attr.as_str();
+        if !SUPPORTED_VARIANTS.contains(&variant) {


Another small nit, but if this is the only use of SUPPORTED_VARIANTS, I'd probably just inline it as a matches! here.

Inlined as a matches! in is_supported_task_decorator, kept the task.sensor exclusion note as an inline comment.

ntBre · 2026-05-14T20:47:35Z

+AIR202 [*] `@task`-decorated function relies on `multiple_outputs` inference
+  --> AIR202.py:49:1
+   |
+49 | @task
+   | ^^^^^
+50 | def annotated_typed_dict() -> MyTD:  # AIR202
+51 |     return {"x": 1}
+   |
+help: Add `multiple_outputs=False`
+46 |     x: int
+47 |
+48 |
+   - @task
+49 + @task(multiple_outputs=False)
+50 | def annotated_typed_dict() -> MyTD:  # AIR202
+51 |     return {"x": 1}
+52 |
+note: This is an unsafe fix and may change runtime behavior


Is this the expected behavior here? It might be tricky to get this right, but I assumed that MyTD should ideally be recognized as a Mapping subclass and get a multiple_outputs=True suggestion.

The problem I see is in order to get this perfectly right we need to venture into type checker territory, don't we? The practical tradeoff was to detect only "obvious" mapping types.

How do you suggest to approach this?

Since TypedDict specifically has dedicated support in ruff (analyze::class::any_qualified_base_class + match_typing_qualified_name) - I've added it. So now, when the annotation is a Name referencing a locally-defined class whose base chain includes typing.TypedDict, we treat it as Mapping and fix to multiple_outputs=True. The fixture's expected fix flipped to True.

I feel this is the ideal middle ground. It's somewhat unlikely that a customized class would be used here.

ntBre · 2026-05-14T20:51:43Z

+    let body_returns_dict = body_has_dict_return(function_def, semantic);
+
+    if !annotation_is_mapping && !body_returns_dict {


Could we do something like this?

Suggested change

let body_returns_dict = body_has_dict_return(function_def, semantic);

if !annotation_is_mapping && !body_returns_dict {

if !annotation_is_mapping && !body_has_dict_return(function_def, semantic) {

That way we can avoid the more expensive function body traversal if the first check short-circuits.

Short-circuiting added

ntBre · 2026-05-14T20:52:36Z

+
+/// Returns `true` if any return statement in the function body returns an
+/// inline dict literal, a dict comprehension, or a `dict(...)` call.
+fn body_has_dict_return(function_def: &StmtFunctionDef, semantic: &SemanticModel) -> bool {


Should this also check for other Mapping subclasses, or is this part of the check intentionally more conservative?

Same reasoning as #25152 (comment)

I'm worried that checking for more types would make the rule much more complex.

The AI's explanation of this:

Detecting that an arbitrary expression returns a Mapping subclass requires type inference (the expression could be a variable, a call result, a comprehension projecting into a custom class, etc.). The annotation-path check, by contrast, now does handle TypedDict subclasses. The body-path check stays focused on the three syntactic patterns that are unambiguously dicts at the AST level: dict literals, dict comprehensions, and dict(...) calls. Widening it would require either trusting variable type inference or accepting a high false-positive rate.

Flags `@task`-decorated functions whose `multiple_outputs` behavior is determined by Airflow's runtime annotation-based inference rather than being set explicitly. Triggers when the decorator omits `multiple_outputs=` and either the return annotation resolves to a `collections.abc.Mapping` subclass or the body returns a dict literal, dict comprehension, or `dict(...)` call. Provides an unsafe autofix that inserts `multiple_outputs=True` (annotation path) or `=False` (body-only path), mirroring Airflow's `_infer_multiple_outputs` logic. Covers `@task` and supported variants (`python`, `virtualenv`, `external_python`, `branch`, `short_circuit`, `docker`, `kubernetes`, `pyspark`, …). Excludes `@task.sensor`, which hardcodes `multiple_outputs=False`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds two negative test cases that confirm dict returns inside nested function definitions and inner class methods are not attributed to the outer `@task`-decorated function, locking the `ReturnStatementVisitor` nesting behavior against future regressions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@task

Adds three new positive cases: - @task(retries=3): call form with an existing kwarg; fix inserts multiple_outputs=True alongside it via add_argument - multiline @task(\n retries=3,\n): fix preserves indentation and inserts kwarg on the existing kwarg line - @task.short_circuit(trigger_rule="all_done"): variant decorator with call form; fix inserts multiple_outputs=True first Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Renames struct, file, and all wiring from AirflowTaskMultipleOutputsImplicit / task_multiple_outputs_implicit to AirflowTaskImplicitMultipleOutputs / task_implicit_multiple_outputs. "implicit X" puts the qualifier next to the noun it qualifies, matching conventional English naming. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Rename violation field `inferred` → `annotation_is_mapping` for clarity - Inline `SUPPORTED_VARIANTS` allowlist as a `matches!` arm at its sole use site - Short-circuit the function-body traversal when the annotation already proves the return type is a `Mapping` subclass - Detect locally-defined `TypedDict` subclasses (e.g. `class MyTD(TypedDict)`) via `analyze::class::any_qualified_base_class`, so `-> MyTD` annotations now correctly fix to `multiple_outputs=True` Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Lee-W · 2026-05-18T02:20:39Z

Hey, a bit of a lack of bandwidth these days. The idea definitely looks good to me @Dev-iL, and I discussed it already. let me take a look at the detail as well

Lee-W

LGTM. One nitpick question

Lee-W · 2026-05-18T02:33:45Z

+@task
+def nested_function_returning_dict():  # should NOT flag: dict returned by inner fn, not outer
+    def inner():
+        return {"x": 1}
+
+    return inner()


@task def nested_function_returning_dict(): def inner() -> dict: return {"x": 1} return inner()

What will happen in Airflow if we do this?

just verified. it's pretty much the same as

@task
def nested_function_returning_dict():
return {"x": 1}

I think this case was not included since it opens a Pandora's box of recursion. I will consult my design docs in a few h and see what the exact reason behind this decision was.

I doubt it has a good reason. And I feel we should not handle it either. but yep, exploring it is never a bad thing

just verified. it's pretty much the same as

@task def nested_function_returning_dict(): return {"x": 1}

Right, I think we don't flag this because we can't tell what the user had in mind. Currently, Airflow doesn't care about the annotation of the inner function. If it did, the inferred value of multiple_outputs would be True. But is that what the user wants? Since the current implementation matches the behavior of Airflow, there's no inference surprise to warn the user about.

I suggest we can recommend users to enable ANN201 on their dags folder, and then this rule will be easily applicable.

Reason 1 - we cannot trust the annotation of the the inner functions

The example contained in the code is likely oversimplified. It is more obvious when we don't return the output of inner directly:

@task def a(): def inner() -> dict: ... return {**inner(), "extra": 1} # merged dict: still a Mapping, but no annotation on the merge @task def b(): def left() -> dict: ... def right() -> list: ... return left() if cond else right() # conditional: one branch dict, one not @task def c(): def inner() -> dict: ... result = inner() result["computed"] = expensive() return result # local mutation through a variable @task def d(): from .helpers import build_payload # cross-module: annotation lives elsewhere return build_payload()

Handling the above properly is a type-checker's job, not a lint heuristic.

Reason 2 - ReturnStatementVisitor (intentionally) does not descend into nested scopes.

The body-path heuristic walks only the outer function's own return statements; return inner() is an Expr::Call, not a dict literal / dict comprehension / dict(...). There's nothing at the AST level that says the outer task returns a Mapping.

Bottom line - if we want to cover this case in the future, the right place is a type-checker-backed variant (e.g. running under ty).

I don't really want to cover it... but yep, if we were to do that, ty would be a better chioce

Lee-W · 2026-05-18T02:59:34Z

+AIR202 [*] `@task`-decorated function relies on `multiple_outputs` inference
+  --> AIR202.py:49:1
+   |
+49 | @task
+   | ^^^^^
+50 | def annotated_typed_dict() -> MyTD:  # AIR202
+51 |     return {"x": 1}
+   |
+help: Add `multiple_outputs=False`
+46 |     x: int
+47 |
+48 |
+   - @task
+49 + @task(multiple_outputs=False)
+50 | def annotated_typed_dict() -> MyTD:  # AIR202
+51 |     return {"x": 1}
+52 |
+note: This is an unsafe fix and may change runtime behavior


I feel this is the ideal middle ground. It's somewhat unlikely that a customized class would be used here.

astral-sh-bot Bot assigned ntBre May 14, 2026

astral-sh-bot Bot requested a review from ntBre May 14, 2026 09:51

ntBre added rule Implementing or modifying a lint rule preview Related to preview mode features labels May 14, 2026

ntBre reviewed May 14, 2026

View reviewed changes

Dev-iL and others added 5 commits May 15, 2026 14:01

Dev-iL force-pushed the 2605/AIR/multiple_outputs/with_skill branch from 395d15f to 0ab23ac Compare May 15, 2026 11:01

Lee-W reviewed May 18, 2026

View reviewed changes

		let body_returns_dict = body_has_dict_return(function_def, semantic);

		if !annotation_is_mapping && !body_returns_dict {

Conversation

Dev-iL commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What the rule flags

Suggested fix

Detection signals

What it does NOT flag

Implementation details

Test Plan

Uh oh!

Dev-iL commented May 14, 2026

Uh oh!

astral-sh-bot Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Uh oh!

ntBre left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Lee-W commented May 18, 2026

Uh oh!

Lee-W left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Reason 1 - we cannot trust the annotation of the the inner functions

Reason 2 - ReturnStatementVisitor (intentionally) does not descend into nested scopes.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Dev-iL commented May 14, 2026 •

edited

Loading

astral-sh-bot Bot commented May 14, 2026 •

edited

Loading

`ruff-ecosystem` results

Reason 2 - `ReturnStatementVisitor` (intentionally) does not descend into nested scopes.