Skip to content

AIP-82 Save references between assets and triggers #43826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 25, 2024

Conversation

vincbeck
Copy link
Contributor

@vincbeck vincbeck commented Nov 8, 2024

Resolves #42510.

This PR adds a new attributes watchers to the Asset class and saves references between assets and triggers in the DB. For example:

trigger = SqsSensorTrigger(sqs_queue="my_queue")
asset = Asset("example_asset_watchers", watchers=[trigger])

with DAG(
    dag_id="example_dataset_watcher",
    schedule=[asset],
    catchup=False,
):
    task = EmptyOperator(task_id="task",)

    chain(task)

This PR creates the trigger in the DB if it does not exist and save the reference between asset and trigger.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:Scheduler including HA (high availability) scheduler area:task-sdk labels Nov 8, 2024
@vincbeck vincbeck force-pushed the vincbeck/aip-82-save-references branch from 947e028 to 598bc69 Compare November 8, 2024 16:01
@vincbeck
Copy link
Contributor Author

vincbeck commented Nov 8, 2024

@Lee-W @uranusjr When working on it I realized that assets are added in the DB from DAG definition but never removed (or at least I did not see the code). Meaning, as a DAG author if I define an asset in my DAG and then later on remove it, the asset is never removed from the DB. Am I wrong? If not, is it intended?

@Lee-W
Copy link
Member

Lee-W commented Nov 9, 2024

@Lee-W @uranusjr When working on it I realized that assets are added in the DB from DAG definition but never removed (or at least I did not see the code). Meaning, as a DAG author if I define an asset in my DAG and then later on remove it, the asset is never removed from the DB. Am I wrong? If not, is it intended?

Yep, this is by design as of now. To keep the asset history.

@vincbeck vincbeck force-pushed the vincbeck/aip-82-save-references branch 2 times, most recently from d27cab5 to 682b713 Compare November 12, 2024 15:29
@vincbeck
Copy link
Contributor Author

@Lee-W @uranusjr When working on it I realized that assets are added in the DB from DAG definition but never removed (or at least I did not see the code). Meaning, as a DAG author if I define an asset in my DAG and then later on remove it, the asset is never removed from the DB. Am I wrong? If not, is it intended?

Yep, this is by design as of now. To keep the asset history.

Alright, thank you. I handled it then. I removed the references from asset and triggers if the asset is no longer used

@vincbeck
Copy link
Contributor Author

@Lee-W Any chance you can review it? You have some experience around assets that could be interesting to have :)

@Lee-W
Copy link
Member

Lee-W commented Nov 14, 2024

@Lee-W Any chance you can review it? You have some experience around assets that could be interesting to have :)

Sure thing :) Will take a look later today

@vincbeck vincbeck force-pushed the vincbeck/aip-82-save-references branch from 682b713 to c2660ea Compare November 14, 2024 19:27
@vincbeck vincbeck force-pushed the vincbeck/aip-82-save-references branch 4 times, most recently from 9543a51 to c4c5c3e Compare November 14, 2024 20:59
@vincbeck vincbeck requested a review from Lee-W November 19, 2024 14:28
@vincbeck vincbeck requested a review from uranusjr November 19, 2024 14:28
@vincbeck vincbeck force-pushed the vincbeck/aip-82-save-references branch 3 times, most recently from 30c05f1 to de9386d Compare November 20, 2024 18:55
@vincbeck vincbeck force-pushed the vincbeck/aip-82-save-references branch 2 times, most recently from 92074e2 to cc80d52 Compare November 20, 2024 19:22
@vincbeck vincbeck force-pushed the vincbeck/aip-82-save-references branch from cc80d52 to 0f9510d Compare November 22, 2024 15:50
@gopidesupavan
Copy link
Member

Is there a plan to add tests separate ?

@vincbeck
Copy link
Contributor Author

Is there a plan to add tests separate ?

Definitely! It is covered in #42515. However, I did not add unit test for my changes in collection.py but I think I should. Though, I'd like to do it in a separate PR. This PR is blocking other in flight PRs I have on that project so unless someone has concerns/feedbacks, I'd like to merge it :)

@gopidesupavan
Copy link
Member

Is there a plan to add tests separate ?

Definitely! It is covered in #42515. However, I did not add unit test for my changes in collection.py but I think I should. Though, I'd like to do it in a separate PR. This PR is blocking other in flight PRs I have on that project so unless someone has concerns/feedbacks, I'd like to merge it :)

ah okay make sense i didnt see that task.

Copy link
Member

@gopidesupavan gopidesupavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have played around this table updates, and I believe that every DAG collection call verifies the asset relationships and removes any unused ones. Additionally, asset relationships are only valid during the trigger's lifespan.

This is my understand :) ?

@vincbeck
Copy link
Contributor Author

LGTM. I have played around this table updates, and I believe that every DAG collection call verifies the asset relationships and removes any unused ones. Additionally, asset relationships are only valid during the trigger's lifespan.

This is my understand :) ?

This is correct. Just one clarification here. The trigger lifespan here is as long as the trigger is referenced as watcher of one asset. These triggers will basically live way longer than the current triggers today, those which are associated to a task. But as soon as you remove these references, these triggers will be removed

@vincbeck vincbeck merged commit bee7f0c into apache:main Nov 25, 2024
46 checks passed
@vincbeck vincbeck deleted the vincbeck/aip-82-save-references branch November 25, 2024 16:13
@gopidesupavan
Copy link
Member

LGTM. I have played around this table updates, and I believe that every DAG collection call verifies the asset relationships and removes any unused ones. Additionally, asset relationships are only valid during the trigger's lifespan.
This is my understand :) ?

This is correct. Just one clarification here. The trigger lifespan here is as long as the trigger is referenced as watcher of one asset. These triggers will basically live way longer than the current triggers today, those which are associated to a task. But as soon as you remove these references, these triggers will be removed

cool, thank you :)

LefterisXefteris pushed a commit to LefterisXefteris/airflow that referenced this pull request Jan 5, 2025
got686-yandex pushed a commit to got686-yandex/airflow that referenced this pull request Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler area:task-sdk
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AIP-82. Save references asset <-> triggers when parsing DAGs
4 participants