Skip to content

Implement execution_timeout semantics for DbtCloudRunJobOperator in deferrable mode#61472

Open
SameerMesiah97 wants to merge 1 commit intoapache:mainfrom
SameerMesiah97:61467-DBTCloudRunJobOperator-Deferrable-Timeout
Open

Implement execution_timeout semantics for DbtCloudRunJobOperator in deferrable mode#61472
SameerMesiah97 wants to merge 1 commit intoapache:mainfrom
SameerMesiah97:61467-DBTCloudRunJobOperator-Deferrable-Timeout

Conversation

@SameerMesiah97
Copy link
Contributor

Description

This change implements execution_timeout semantics for DbtCloudRunJobOperator when running in deferrable mode.

Previously, when the operator deferred execution, execution_timeout was not enforced because the task process is no longer running and the scheduler cannot terminate it via SIGTERM. As a result, dbt Cloud jobs could continue running after the Airflow task exceeded its execution timeout.

This update restores parity with non-deferrable execution by explicitly enforcing execution timeouts through the trigger/operator interaction. The operator now computes an absolute execution deadline derived from execution_timeout before deferring, the trigger emits a timeout event when that deadline is exceeded, and the operator cancels the dbt Cloud job and fails the task when the event is received.

Rationale

In non-deferrable mode, execution_timeout is enforced by the scheduler, which terminates the task process and invokes on_kill() to cancel the external dbt Cloud job.

In deferrable mode, execution is handed off to a trigger running in the triggerer process, which does not have an associated worker process that can be terminated. However, the absence of a worker process does not remove the requirement to honor task-level execution semantics. From a user perspective, execution_timeout represents a hard task-level limit that should behave consistently regardless of whether execution is deferrable or not.

Without explicit handling in the trigger/operator interaction, execution_timeout silently stops working in deferrable mode, leading to leaked dbt Cloud jobs and inconsistent behavior between deferrable and non-deferrable execution. Deferrable execution should adapt how timeouts are enforced, but not whether they are enforced.

This change ensures execution_timeout semantics are preserved in deferrable mode and remain consistent with non-deferrable execution.

Notes

  • The existing timeout parameter continues to limit only how long the operator waits for job completion and does not imply cancellation.
  • When both execution_timeout and timeout are set, the earlier deadline takes precedence.

Tests

  • Added trigger-level tests asserting timeout events when the execution deadline is exceeded.
  • Added operator-level tests verifying job cancellation and task failure on execution timeout.
  • Extended trigger serialization tests to ensure the optional execution_deadline field is always serialized.

Documentation

  • The docstring for DbtCloudRunJobTrigger has been updated to document the new execution_deadline parameter and clarify its behavior.
  • The docstring for DbtCloudRunJobOperator has been updated to clarify the behavior of the timeout parameter and distinguish it from task-level execution_timeout.

Backwards Compatibility

This change does not alter public APIs or method signatures. The runtime behavior changes in that dbt Cloud jobs are now explicitly cancelled when execution_timeout is reached during deferrable execution, whereas previously the job could continue running after the task timed out.

Closes: #61467

@SameerMesiah97
Copy link
Contributor Author

Requesting review for this.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good. Though I would feel more comfortable if I saw some output of running job/ screenshot showing the behaviour. Possible @SameerMesiah97 ?

@SameerMesiah97
Copy link
Contributor Author

SameerMesiah97 commented Feb 15, 2026

That looks good. Though I would feel more comfortable if I saw some output of running job/ screenshot showing the behaviour. Possible @SameerMesiah97 ?

My free trial has expired but I will see if I can still use the DBT Cloud API. I will update the PR description with the screenshots once I am able to reproduce it again.

There is a link to the issue which has comprehensive reproduction steps so I believe it can be easily verified by someone with DBT Cloud API access.

For future reference, would you advise including screenshots by default for any bugs? Or are explicit steps sufficient and this is more of a sanity check on a case-by-case basis?

Edit: I just attempted to use the DBT Cloud API (which is necessary to reproduce this bug). No accesss as free trial has expired. @potiuk

@potiuk
Copy link
Member

potiuk commented Feb 16, 2026

For future reference, would you advise including screenshots by default for any bugs? Or are explicit steps sufficient and this is more of a sanity check on a case-by-case basis?

in case there is something a bit more complex withich requires access to services

@SameerMesiah97
Copy link
Contributor Author

in case there is something a bit more complex withich requires access to services

I was just wondering whether screenshots are strictly required for this PR to be validated?

@SameerMesiah97
Copy link
Contributor Author

@josh-fell

if possible, could you have a look at this as well?

@SameerMesiah97 SameerMesiah97 force-pushed the 61467-DBTCloudRunJobOperator-Deferrable-Timeout branch from a7f2927 to 2fd5254 Compare March 5, 2026 23:17
Restore execution_timeout semantics in deferrable mode by propagating
timeouts through the trigger and explicitly cancelling dbt Cloud jobs
when the task exceeds its execution deadline.

This preserves behavior parity with non-deferrable execution and avoids
leaking dbt jobs.
@SameerMesiah97 SameerMesiah97 force-pushed the 61467-DBTCloudRunJobOperator-Deferrable-Timeout branch from 2fd5254 to 7138f79 Compare March 6, 2026 19:37
@SameerMesiah97
Copy link
Contributor Author

SameerMesiah97 commented Mar 8, 2026

That looks good. Though I would feel more comfortable if I saw some output of running job/ screenshot showing the behaviour. Possible @SameerMesiah97 ?

@potiuk

I have used the following DAG to test the behavior of execution_timeout before and after my implementation:

from airflow import DAG
from airflow.providers.dbt.cloud.operators.dbt import DbtCloudRunJobOperator
from datetime import datetime, timedelta

with DAG(
    dag_id="dbt_deferrable_timeout_none_repro",
    start_date=datetime(2024, 1, 1),
    schedule=None,
    catchup=False,
) as dag:

    run_dbt = DbtCloudRunJobOperator(
        dbt_cloud_conn_id = "test_dbt_conn",
        task_id="run_dbt",
        job_id= **********0798,
        wait_for_termination=True,
        deferrable=True,
        execution_timeout=timedelta(seconds=30),
    )

Behavior before the fix

DAG run:
image

Job run status after execution_timeout period (30s) has elapsed:
image

You can see that the job keeps running even after the timeout period.

Behavior after the fix

DAG run:
image

Job run status after execution_timeout period (30s) has elapsed:
image

The Job run is cancelled via a request by the operator due to my implementation.

Note: Only the last 4 digits of the JOB ID and RUN ID have been provided for security reasons but it should be sufficient for you to match the DAG runs to the Job runs. Additional private information has been blacked out as well for the same reason.

Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like more a bug/something we missed in Airflow in general rather than one bug in a provider? Am I wrong?

@SameerMesiah97
Copy link
Contributor Author

This looks like more a bug/something we missed in Airflow in general rather than one bug in a provider? Am I wrong?

That’s definitely possible but at the moment I cannot see any opportunities for a generalisable abstraction because cancellation semantics may differ between operators. And whilst I suspect other operators might have this class of bug, I don’t have concrete evidence. So based on what I know now, this is the most appropriate solution I can think of for this operator. However, I will definitely check other operators to see if I can reproduce this type of bug for them as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DbtCloudRunJobOperator does not enforce execution_timeout semantics in Deferrable mode

3 participants