-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AIP-82 Save references between assets and triggers #43826
base: main
Are you sure you want to change the base?
Conversation
947e028
to
598bc69
Compare
@Lee-W @uranusjr When working on it I realized that assets are added in the DB from DAG definition but never removed (or at least I did not see the code). Meaning, as a DAG author if I define an asset in my DAG and then later on remove it, the asset is never removed from the DB. Am I wrong? If not, is it intended? |
Yep, this is by design as of now. To keep the asset history. |
d27cab5
to
682b713
Compare
Alright, thank you. I handled it then. I removed the references from asset and triggers if the asset is no longer used |
@Lee-W Any chance you can review it? You have some experience around assets that could be interesting to have :) |
Sure thing :) Will take a look later today |
airflow/dag_processing/collection.py
Outdated
# Create the trigger in the DB if it does not exist | ||
if not trigger_model: | ||
trigger_model = Trigger.from_object(trigger_class_path_to_asset_dict[trigger_class_path]) | ||
session.add(trigger_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether collect all the models together and use add_all
would be better 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collecting all model objects first is cleaner code IMO; this loop + add
+ append
approach is a lot more difficult to read. Also the repeated scalar
+ limit
call is not very performant; it is better to select all the existing triggers first in one query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestions, I tried to apply them. Please let me know if this is what you thought
airflow/dag_processing/collection.py
Outdated
# Remove references from assets no longer used | ||
all_assets = session.scalars(select(AssetModel)) | ||
# orphan_assets = set() | ||
for asset_model in all_assets: | ||
if (asset_model.name, asset_model.uri) not in self.assets: | ||
asset_model.triggers = [] | ||
# orphan_assets.add(asset_model.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do this actively? What happens if we just leave those associations there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the trigger will keep updating the asset in cases of events. More importantly, if we keep the association between the asset and the trigger, it will be impossible to clean-up these triggers. I want to be able to remove triggers that are not used (meaning, not associated to a task and an asset). Which means they will keep infinitely pooling an external resource. That could be very costly.
On that same topic, when doing some testing, I noticed that this function is called per DAG (am I wrong?). As a consequence, this piece of code removes the associations I just created before. I need to fix that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
682b713
to
c2660ea
Compare
fe5d227
to
9543a51
Compare
9543a51
to
c4c5c3e
Compare
Resolves #42510.
This PR adds a new attributes
watchers
to theAsset
class and saves references between assets and triggers in the DB. For example:This PR creates the trigger in the DB if it does not exist and save the reference between
asset
andtrigger
.^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.