-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many recomputations when setting many to many #145
Comments
Yeah in general there is a high chance to miss updates at certain ends with M2M, kinda learned that the hard when I implemented it initially. M2Ms are esp. tricky, if cfs span across them in interdepencies or even link back like this:
which seems awkward at first, but can happen, if you have various cf aggregates on model
|
I understand it is tricky, but I have trouble understanding the exact example you provided Do you know if those behaviours that are enforced now are well tested? I so, I could experiment and see if I break any will trying to fix my test |
I looked into your test PR example, this is what happens under the hood (tested with sqlite): with CaptureQueriesContext(connection) as cqc:
advert_1.tags.set([tag_2])
for i, q in enumerate(cqc.captured_queries):
print(i, q) Lets start with what django does normally w'o any computed fields here:
which translates roughly to this:
First takeaway - thats already pretty noisy. Have not thought that through in all details, but to me it seems this could be achieved in 3 queries instead of 5. Second takeaway and really important for computed fields handling below -
Whether there is a smarter way to treat this - idk. At handler/signal level this def. lacks the knowledge about the action being a "forth-and-back" thingy cause by Edit: Thinking about the superfluous calls during But to let you not in the rain here - you can speedup things alot by not using |
Here is a rewrite with bulk actions, where possible: from django.test import TestCase
from . import models
from django.test.utils import CaptureQueriesContext
from django.db import connection
from computedfields.models import update_dependent
class TestAdvertTags(TestCase):
def test(self):
# grab the through model
Through = models.Advert.tags.through
models.run_counter = 0
tag_1 = models.Tag.objects.create(name="T1")
tag_2 = models.Tag.objects.create(name="T2")
advert_1 = models.Advert.objects.create(name="A1")
print('######', advert_1.all_tags)
advert_2 = models.Advert.objects.create(name="A2")
assert models.run_counter == 2, f"advert.run_counter={models.run_counter}"
#advert_1.tags.add(tag_1)
# add() now done by bulk_action + update_dependent (steps should be put into a transaction)
Through.objects.bulk_create([Through(advert=advert_1, tag=tag_1)])
update_dependent(advert_1, update_fields=['all_tags'])
advert_1.refresh_from_db()
print('######', advert_1.all_tags)
assert models.run_counter == 3, f"advert.run_counter={models.run_counter}"
with CaptureQueriesContext(connection) as cqc:
#advert_1.tags.set([tag_2])
# set() now done by bulk actions + update_dependent (steps should be put into a transaction)
Through.objects.filter(advert=advert_1).delete()
Through.objects.bulk_create([Through(advert=advert_1, tag=tag_2)])
update_dependent(advert_1, update_fields=['all_tags'])
assert models.run_counter == 4, f"advert.run_counter={models.run_counter}"
for i, q in enumerate(cqc.captured_queries):
print(i, q)
advert_1.refresh_from_db()
print('######', advert_1.all_tags) This passes your test and does alot less on SQL:
Note that this is highly customized to the actions/data you have provided by the PR test. It currently assumes, that the M2M changeset has low intersection to the previous set. If the sets have in fact high intersections, then it should be done differently by pre-narrowing things on python first. Also note that the Maybe this helps you to get better perf while keeping things mostly unchanged on other ends. And if you still suffer from update pressure even with this change, maybe try |
What also would be possible to mitigate the doubled processing for M2M def m2m_set(manager, changeset):
try:
disable_m2m_handler()
old = calc_preupdate(pulled_from_manager_changeset)
manager.set(changeset)
calc_update(pulled_manager_changeset, old=old)
finally:
enable_m2m_handler()
...
# usage:
# advert_1.tags.set([s1, s2]) now becomes
m2m_set(advert_1.tags, [s1, s2])
|
@martinlehoux I did some checks for related managers in general. And as I thought, they currently dont work with the default |
Ok i didn't know about bulk=False |
After thinking a bit more, I'm not sure the example i set up really shows my issue in a real project. I'll try to upgrade the sample so that it is more accurate. |
@martinlehoux Yes thats likely an issue with my hand-crafted optimization of your shown model/data - those hand-crafted solutions always end up being of limited usage for a broader adoption. |
I don't think it's easy to explain, but i'll give you a more precise example class Tag(ComputedFieldsModel):
name = models.CharField(max_length=32, unique=True)
class Advert(ComputedFieldsModel):
name = models.CharField(max_length=32)
tags = models.ManyToManyField(Tag, related_name="adverts")
@computed(
field=models.CharField(max_length=500),
depends=[("tags", ["name"])],
)
def all_tags(self) -> str:
global run_counter
run_counter += 1
if not self.pk:
return ""
return ", ".join(self.tags.values_list("name", flat=True))
class Room(ComputedFieldsModel):
name = models.CharField(max_length=32)
advert = models.ForeignKey(Advert, related_name="rooms", on_delete=models.CASCADE)
@computed(
field=models.BooleanField(),
depends=[("advert.tags", ["name"])],
)
def is_ready(self) -> bool:
global run_counter
run_counter += 1
if not self.pk:
return False
return self.advert.tags.filter(name="ready").exists()
class TestAdvertTags(TestCase):
def test(self):
tag_ready = models.Tag.objects.create(name="ready")
advert_1 = models.Advert.objects.create(name="A1")
room_11 = models.Room.objects.create(name="R11", advert=advert_1)
advert_1.tags.add(tag_ready)
advert_2 = models.Advert.objects.create(name="A2")
room_21 = models.Room.objects.create(name="R21", advert=advert_2)
advert_2.tags.add(tag_ready) In this case, when I
I feel it should be possible to say that only Room 21 is concerned by the Is there a use case I a not seeing that would explain this? |
Hello @jerch , do you have any guidance to help debug this issue ? |
@martinlehoux Sorry, had not yet time to look further into it. Hopefully will get back it next week... |
I updated the test case, but don't feel pressured haha I'd really like to help, but I have trouble understanding where I should look. Maybe I didn't spend enough time |
I think I found what bugs me: with this Room model class Room(ComputedFieldsModel):
name = models.CharField(max_length=32)
advert = models.ForeignKey(Advert, related_name="rooms", on_delete=models.CASCADE)
@computed(
field=models.BooleanField(),
depends=[("advert.tags", ["name"])],
)
def is_ready(self) -> bool: the graph for Tag looks like
I don't understand why |
Issue
set()
many update #144tags.set(...)
(actually, it's a.set
trigger by an admin form), I see my computed field recomputed for all my advertspost_remove
handler is the one triggering all these updatesResolution
The text was updated successfully, but these errors were encountered: