Skip to content

(8/5) Reduce VMM reservation contention #7533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 159 commits into from
Mar 20, 2025
Merged

(8/5) Reduce VMM reservation contention #7533

merged 159 commits into from
Mar 20, 2025

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Feb 12, 2025

#7498 was introduced to benchmark the cost of concurrent instance provisioning, and it demonstrated that through contention, performance can be significantly on the VMM reservation pathway.

This PR optimizes that pathway, by removing the VMM reservation transaction, and instead replacing it with some non-transactional queries:

  1. First, we query to see if the VMM reservation has already succeeded (for idempotency)
  2. Next, we query for all viable sled targets and affinity information (sled_find_targets_query)
  3. After parsing that data and picking a sled, we call sled_insert_resource_query to INSERT a desired VMM record, and to re-validate our constraints.

This change significantly improves performance in the vmm-reservation benchmark, while upholding the necessary constraints implicit to VMM provisioning.

Comment on lines +65 to +71
other_aa_instances AS (
SELECT anti_affinity_group_instance_membership.group_id,instance_id
FROM anti_affinity_group_instance_membership
JOIN our_aa_groups
ON anti_affinity_group_instance_membership.group_id = our_aa_groups.group_id
WHERE instance_id != ").param().sql("
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: I'd almost be inclined to make these individual subqueries into individual const &strs, in part so they can be shared between sled_insert_resource_query and sled_find_targets_query, and in part so you can add doc comments to them (or inline comments in the queries) to help describe what's going on. Up to you how far to go here, though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets a fair bit more complicated in #7572 - do you mind if I punt some of the cleanup to that PR, to avoid a merge conflict with myself?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go for it!

COALESCE(SUM(CAST(sled_resource_vmm.reservoir_ram AS INT8)), 0) + "
).param().sql(" <= sled.reservoir_size
),
our_aa_groups AS (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this offline at the 27 Feb 2025 hypervisor huddle. An alternative approach we discussed (IIRC) went along these lines:

  • assign a score to each sled based on how preferable it is (e.g., add one point to the score for every "allow"-level affinity rule that would be violated by placing the instance on that sled)
  • when trying to reserve on a sled, see if the sled's score is at least as good (less than or equal to, in this example) as it was originally
  • if not, don't reserve there; instead, set its current score to its new score and add it back to the candidate pool

Assuming the instance's affinity group assignments are fixed (so that there are a fixed number of "allow"-level rules that it can break) I think this will always terminate (fixed number of rules implies a maximum possible score, and retrying always increases the threshold score).


Even if my recollection is accurate, I'm not sure I would bother going too far with this in this PR. I think this is because I'm inclined to think of the "allow"-level affinity rules as being entirely best-effort: Nexus might eventually decide to violate them for other reasons (e.g. "co-locating these instances minimizes fragmentation" or "separating these instances is necessary to evacuate this sled for update"), even for instances that were "ideally" placed originally. If that's so (and especially if we expect that the system may eventually choose to migrate VMs to optimize their placements, including their adherence to affinity rules--RFD 494 hints very lightly at this), I'm not sure I would do a lot of extra work in this path to make sure that instances always end up with an ideal initial placement.

I find myself not too convicted about this, though; if implementing a scoring system like the above (or something like it) turns out to be pretty easy to do, I wouldn't be opposed to having it. But I think what's here is good enough to merge if something more sophisticated would take a lot of extra time or add a lot of complexity.

Base automatically changed from vmm-reserve-bench to main March 4, 2025 20:02
@smklein smklein merged commit e8bfadf into main Mar 20, 2025
16 checks passed
@smklein smklein deleted the vmm-reduce-contention branch March 20, 2025 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants