-
Notifications
You must be signed in to change notification settings - Fork 43
(8/5) Reduce VMM reservation contention #7533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
other_aa_instances AS ( | ||
SELECT anti_affinity_group_instance_membership.group_id,instance_id | ||
FROM anti_affinity_group_instance_membership | ||
JOIN our_aa_groups | ||
ON anti_affinity_group_instance_membership.group_id = our_aa_groups.group_id | ||
WHERE instance_id != ").param().sql(" | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: I'd almost be inclined to make these individual subqueries into individual const &str
s, in part so they can be shared between sled_insert_resource_query
and sled_find_targets_query
, and in part so you can add doc comments to them (or inline comments in the queries) to help describe what's going on. Up to you how far to go here, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets a fair bit more complicated in #7572 - do you mind if I punt some of the cleanup to that PR, to avoid a merge conflict with myself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go for it!
COALESCE(SUM(CAST(sled_resource_vmm.reservoir_ram AS INT8)), 0) + " | ||
).param().sql(" <= sled.reservoir_size | ||
), | ||
our_aa_groups AS ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this offline at the 27 Feb 2025 hypervisor huddle. An alternative approach we discussed (IIRC) went along these lines:
- assign a score to each sled based on how preferable it is (e.g., add one point to the score for every "allow"-level affinity rule that would be violated by placing the instance on that sled)
- when trying to reserve on a sled, see if the sled's score is at least as good (less than or equal to, in this example) as it was originally
- if not, don't reserve there; instead, set its current score to its new score and add it back to the candidate pool
Assuming the instance's affinity group assignments are fixed (so that there are a fixed number of "allow"-level rules that it can break) I think this will always terminate (fixed number of rules implies a maximum possible score, and retrying always increases the threshold score).
Even if my recollection is accurate, I'm not sure I would bother going too far with this in this PR. I think this is because I'm inclined to think of the "allow"-level affinity rules as being entirely best-effort: Nexus might eventually decide to violate them for other reasons (e.g. "co-locating these instances minimizes fragmentation" or "separating these instances is necessary to evacuate this sled for update"), even for instances that were "ideally" placed originally. If that's so (and especially if we expect that the system may eventually choose to migrate VMs to optimize their placements, including their adherence to affinity rules--RFD 494 hints very lightly at this), I'm not sure I would do a lot of extra work in this path to make sure that instances always end up with an ideal initial placement.
I find myself not too convicted about this, though; if implementing a scoring system like the above (or something like it) turns out to be pretty easy to do, I wouldn't be opposed to having it. But I think what's here is good enough to merge if something more sophisticated would take a lot of extra time or add a lot of complexity.
#7498 was introduced to benchmark the cost of concurrent instance provisioning, and it demonstrated that through contention, performance can be significantly on the VMM reservation pathway.
This PR optimizes that pathway, by removing the VMM reservation transaction, and instead replacing it with some non-transactional queries:
sled_find_targets_query
)sled_insert_resource_query
to INSERT a desired VMM record, and to re-validate our constraints.This change significantly improves performance in the
vmm-reservation
benchmark, while upholding the necessary constraints implicit to VMM provisioning.