-
Notifications
You must be signed in to change notification settings - Fork 43
[nexus] Affinity and Anti-Affinity Groups #7076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)] | ||
pub enum AffinityGroupMember { | ||
Instance(Uuid), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is an "enum of one", but RFD 522 discusses having anti-affinity groups which contain "either instances or affinity groups". I figured I'd just define these as "members" to be flexible for future work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We respond with a member ID here, which isn't the most readable way for the user to see the group members. They'd need to then fetch each to get the name – and whilst we could string those queries on the console, the CLI wouldn't do that. Any thoughts? Perhaps responding with: ID, name and project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully a quick drive-by review wasn't too premature, but this stuff is relevant to my interests, so I wanted to take a peek. Overall, everything looks very straightforward and reasonable --- I commented on a few small things, but it's quite possible you were planning to get to all of them and just hadn't gotten around to it yet.
nexus/db-model/src/affinity.rs
Outdated
#[derive(Queryable, Insertable, Clone, Debug, Selectable)] | ||
#[diesel(table_name = affinity_group_instance_membership)] | ||
pub struct AffinityGroupInstanceMembership { | ||
pub group_id: DbTypedUuid<AffinityGroupKind>, | ||
pub instance_id: DbTypedUuid<InstanceKind>, | ||
} | ||
|
||
#[derive(Queryable, Insertable, Clone, Debug, Selectable)] | ||
#[diesel(table_name = anti_affinity_group_instance_membership)] | ||
pub struct AntiAffinityGroupInstanceMembership { | ||
pub group_id: DbTypedUuid<AntiAffinityGroupKind>, | ||
pub instance_id: DbTypedUuid<InstanceKind>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I note that these lack created/deleted timestamps, implying that:
- we intend to hard-delete rather than soft-delete them, and,
- we don't presently record when instances were added to affinity/anti-affinity groups, so we can't present that in UIs in the future.
I'm not sure if either of these matter to us, but I figured I'd comment on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I wrote this I was kinda planning on just using hard deletion for membership. I could definitely add the "time_modified" and "time_deleted" columns, and use soft-deletion here too, but as usual, we'll need to be more cautious with our indexing.
("Why hard delete" -> this was kinda arbitrary, my decision here isn't super strong, but I'm more familiar with us using soft deletion for user-facing objects that have full CRUD APIs, and hard-deletion for more internal-facing stuff, to avoid the cost of garbage collecting later, which we haven't really done at all)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's totally fair!
The reason I brought up deletion was because if we start out with a schema that uses hard-deletion, and decide to switch to soft-deletion later in order to do something like display affinity group histories in the UI, we can't get back records from before that change, since...they've been deleted. On the other hand, if we started with soft-deletion, we could switch to hard-deletion and blow away any soft-deleted records if we decide to not use them in that way.
On the other hand, maybe the problem of displaying historical affinity group changes is better solved by other things, like audit logging! I dunno.
|
||
async fn sled_reservation_create_inner( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @hawkw , you asked about this in the Hypervisor Sync.
The original API of the function pub async fn sled_reservation_create
still exists above, but I made this "inner" function to get better testing of errors.
But we could also report these errors through e.g. an alerting system or something.
@@ -312,6 +312,19 @@ pub type PaginatedByNameOrId<Selector = ()> = PaginationParams< | |||
pub type PageSelectorByNameOrId<Selector = ()> = | |||
PageSelector<ScanByNameOrId<Selector>, NameOrId>; | |||
|
|||
pub fn id_pagination<'a, Selector>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using this in http_entrypoints.rs
to paginate over "group members", which only have UUIDs, not names.
instance_id, | ||
).get_results_async::<(AffinityPolicy, Uuid)>(&conn).await?; | ||
|
||
// We use the following logic to calculate a desirable sled, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sorta the "meat and potatoes" of this PR -- specifically, updating the logic for instance placement based on affinity groups.
This file also contains several tests for various combinations of affinity groups / instances, and I'm always happy to add more.
@@ -213,7 +213,7 @@ CREATE INDEX IF NOT EXISTS lookup_sled_by_policy_and_state ON omicron.public.sle | |||
); | |||
|
|||
CREATE TYPE IF NOT EXISTS omicron.public.sled_resource_kind AS ENUM ( | |||
-- omicron.public.instance | |||
-- omicron.public.vmm ; this is called "instance" for historical reasons. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tripped me up. This is not the UUID of an instance, it's the UUID of a propolis VMM that has been allocated. These are very subtly different objects.
|
||
-- The UUID of an instance, if this resource belongs to an instance. | ||
instance_id UUID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I do, however, want to know the instance UUID of this object, if it exists.
I don't think I can rely on the omicron.public.vmm
table existing by the time reservations are made - note that e.g. for instance start, "allocating a server" happens before "VMM record creation":
omicron/nexus/src/app/sagas/instance_start.rs
Lines 136 to 138 in 7f05fce
builder.append(alloc_server_action()); | |
builder.append(alloc_propolis_ip_action()); | |
builder.append(create_vmm_record_action()); |
So - when we create a sled_resource record on behalf of an instance, we need a way for other requests to identify "what instance does this VMM belong to?". Hence the addition of this optional field.
The contents of this PR should be unchanged, but as a heads-up, I'm going to be splitting it out into a few equivalent smaller PRs to make this easier to review:
|
Pulled out of #7076 Updates schema to include Affinity/Anti-Affinity groups, but does not use these schemas yet.
Pulled out of #7076 Updates schema to include Affinity/Anti-Affinity groups, but does not use these schemas yet.
Sean, is there a fixed limit to the number of members in either affinity or anti-affinity groups? Other than the natural limit of sufficient sleds for anti-affinity and sufficient resources for affinity? |
As currently implemented, no limit. We could definitely add one though. |
I ask because there are places where we soft-validate in the console, before a request is even sent to the control plane. But I guess this is dependent on how many sleds and capacity is currently available, and if we set the limit to the maximum a user could potentially add, then it's unlikely they'd ever see that validation. (Talked myself out of that 🙂) |
…#7447) Pulled out of #7076 This PR is a partial implementation of RFD 522 It adds: - Affinity and Anti-Affinity groups, contained within projects. These groups are configured with a **policy** and **failure domain** can currently contain zero or more **members**. Affinity groups attempt to co-locate members, anti-affinity groups attempt to avoid co-locating members. - **Policy** describes "what to do if we cannot fulfill the co-location request". Currently, these options are "fail" (reject the request) or "allow" (continue with provisioning of the group member regardless). - **Failure Domain** describes the scope of what is considered "co-located". In this PR, the only option is "sled", but in the future, this may be expanded to e.g. "rack". - **Members** describe what can be added to affinity/anti-affinity groups. In this PR, the only option is "instance". RFD 522 describes how "anti-affinity groups may also contain affinity groups" -- which is why this "member" terminology is introduced -- but it is not yet implemented. - (anti-)Affinity groups are exposed by the API, through a CRUD interface - (anti-)Affinity groups are considered during "sled reservation", where instances are placed on a sled. This is most significantly implemented (and tested) within `nexus/db-queries/src/db/datastore/sled.rs`, within #7446 Fixes #1705
Gonna close this PR - I'm trying to track things through smaller PRs and other issues. #7626 is my tracking issue for affinity follow-up work |
This PR is a partial implementation of RFD 522
It adds:
nexus/db-queries/src/db/datastore/sled.rs
.See: #1705