Fix DbtCloudRunJobTrigger timeout error message and add final status check#61980
Fix DbtCloudRunJobTrigger timeout error message and add final status check#61980eran-moses-human wants to merge 1 commit intoapache:mainfrom
Conversation
…check The timeout error message in DbtCloudRunJobTrigger.run() printed self.end_time (an absolute epoch timestamp) labelled as "seconds", producing nonsensical output like "after 1771200015.8 seconds" instead of a meaningful duration. Additionally, the timeout check fired before sleeping, without a final status poll. A job completing during asyncio.sleep() could be incorrectly reported as timed out. Changes: - Move asyncio.sleep() before the timeout check so the trigger sleeps first, then evaluates the deadline. - Add a final is_still_running() call when the timeout fires so that jobs completing at the boundary are handled correctly. - Replace the misleading epoch-as-duration message with a clear "within the configured timeout" message. - Update existing timeout test and add a new test for the edge case where a job completes at the timeout boundary. Closes: apache#61979 Co-authored-by: Cursor <cursoragent@cursor.com>
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
SameerMesiah97
left a comment
There was a problem hiding this comment.
Looks good. I would fix the CI failures (one looks unrelated but the other prek related which you can fix).
Just a heads up: you might get conflicts if PR #61472 merges first as it is touching the same lines of code.
| ) | ||
| return | ||
| # Job reached a terminal state — exit loop to handle below. | ||
| break |
There was a problem hiding this comment.
It looks like the timeout emission could be extended by poll_interval, meaning the actual timeout could occur up to one full polling cycle later than the configured value. For example, with timeout=60s and poll_interval=30s, the timeout may not be emitted until around 90s depending on scheduling.
I believe this behavior was already present prior to this PR, so it’s not a regression. That said, we might consider sleeping for min(poll_interval, remaining_time) to align the timeout more closely with the configured value. This should not block the PR, but I think it might be worth exploring.
| async def test_dbt_job_run_timeout_but_job_completes( | ||
| self, mock_get_job_status, mocked_is_still_running | ||
| ): | ||
| """Assert that a job completing at the timeout boundary is treated as success, not timeout.""" |
There was a problem hiding this comment.
Nit: the docstring mentions "completing at the timeout boundary", but the test appears to simulate completion after the deadline has technically passed but before timeout emission. I would suggest clarifying it accordingly.
|
@eran-moses-human There are some merge conflicts now. Can you resolve and re-push when you get a chance? |
Summary
Fixes two bugs in
DbtCloudRunJobTrigger.run():self.end_time(an absolute epoch timestamp, e.g.1771200015.8) labelled as "seconds", producing nonsensical output. Replaced with a clear"within the configured timeout"message.asyncio.sleep()could be incorrectly reported as timed out. Now performs one finalis_still_running()call before yielding a timeout error.Changes
providers/dbt/cloud/src/airflow/providers/dbt/cloud/triggers/dbt.py:asyncio.sleep()before the timeout checkis_still_running()call when the timeout firesproviders/dbt/cloud/tests/unit/dbt/cloud/triggers/test_dbt.py:test_dbt_job_run_timeout_but_job_completesfor the edge case where a job completes at the timeout boundaryCloses #61979
Made with Cursor