feat: Merge Decouple PodGroup KEP (#5832) into Gang Scheduling (#4671)#5980
feat: Merge Decouple PodGroup KEP (#5832) into Gang Scheduling (#4671)#5980helayoty wants to merge 1 commit intokubernetes:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: helayoty The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/assign @wojtek-t |
| owning-sig: sig-scheduling | ||
| participating-sigs: | ||
| - sig-apps | ||
| - sig-api-machinery |
There was a problem hiding this comment.
This was a feedback during PodGroup KEP review, since it's new API.
There was a problem hiding this comment.
No - api-machinery is not owning APIs.
api-machinery owns machinery for exposing APIs, which we don't influence/change with this KEP
|
|
||
| see-also: | ||
| - "/keps/sig-scheduling/583-coscheduling" | ||
| - "/keps/sig-scheduling/5832-decouple-podgroup-api" |
There was a problem hiding this comment.
Let's not replace, but add a new entry.
There was a problem hiding this comment.
Sorry, but aren't we going to remove the 5832-decouple-podgroup-api ?
There was a problem hiding this comment.
Good question - I though we should mark it as abandonned?
But maybe we should indeed remove it.
There was a problem hiding this comment.
I'm all for removing (or archiving) it. Moving it to replaces section similar to coscheduling.
| The longer version of this design describing the whole thought process of choosing the | ||
| above described approach can be found in the [extended proposal] document. | ||
|
|
||
| [extended proposal]: https://docs.google.com/document/d/1ulO5eUnAsBWzqJdk_o5L-qdq5DIVwGcE7gWzCQ80SCM/edit? |
There was a problem hiding this comment.
I just moved it down with other references.
|
|
||
| see-also: | ||
| - "/keps/sig-scheduling/583-coscheduling" | ||
| - "/keps/sig-scheduling/5832-decouple-podgroup-api" |
There was a problem hiding this comment.
Sorry, but aren't we going to remove the 5832-decouple-podgroup-api ?
|
|
||
| - `Workload` represents long-lived configuration-intent, whereas `PodGroups` represent transient units of scheduling. | ||
| Tying runtime execution units to the persistent definition object violates separation of concerns. | ||
| - Lifecycle coupling prevents standalone `PodGroup` objects from owning other resources (e.g., ResourceClaims) |
There was a problem hiding this comment.
This sentence is a bit confusing. Maybe:
Decoupling the lifecycles allows standalone PodGroup objects to own resources (e.g., ResourceClaims). This enables garbage collection to be scoped to specific scheduling units, rather than tying it to the entire Workload or individual Pods."
?
There was a problem hiding this comment.
The text was copied from the original (approved) KEP so I avoided commenting on things that were simply copied.
|
|
||
| - `Workload` becomes a scheduling policy object that defines scheduling constraints and requirements. | ||
| - `PodGroupTemplate` provides the blueprint for runtime `PodGroup` creation. | ||
| - `PodGroup` is a controller-owned runtime object with its own lifecycle that represents a single scheduling unit. |
There was a problem hiding this comment.
It doesn't have to be controller owned. So maybe: PodGroup is a standalone runtime object with its own lifecycle - typically managed by a controller - that represents a single scheduling unit. ?
| - Introduce a concept of a `PodGroup` positioned as runtime counterparts for the Workload | ||
| - Ensure that decoupled model of `Workload` and `PodGroup` provide clear responsibility split, improved scalability and simplified lifecycle management | ||
| - Enhance status ownership by making `PodGroup` status track podGroup-level runtime state | ||
| - Ensure proper ownership of `PodGroup` objects via controller `ownerReferences` |
There was a problem hiding this comment.
Is this really a goal? It looks like a mean to achieve something else. What about saying here:
Enable automatic lifecycle management and resource cleanup for PodGroup objects through integration with Kubernetes garbage collection.
?
| - Ensure proper ownership of `PodGroup` objects via controller `ownerReferences` | ||
| - Ensuring that we can extend `Workload` API in backward compatible way toward north-star API | ||
| - Ensuring that `Workload` API will be usable for both built-in and third-party workload controllers and APIs | ||
| - Simplify integration with `Workload` API and true workload[^6] controllers to make `Workload` API |
There was a problem hiding this comment.
"I'm not sure if 'Simplify integration' is the right goal here. Since this KEP is introducing the Workload API, there isn't an existing integration to simplify yet. This feels like a bit of an 'inception'.
The previous version ('Ensuring that Workload API will be usable...') seemed more accurate as it describes a key property of the new API (its universality). If we want to emphasize ease of use, maybe something like: 'Ensure the Workload & PodGroup API provides a consistent and accessible integration path for both built-in and third-party controllers.'?"
06d9e7a to
8e14592
Compare
Signed-off-by: helayoty <heelayot@microsoft.com>
8e14592 to
2ad2005
Compare
Consolidates KEP-5832 (Decouple PodGroup API) into KEP-4671 (Gang Scheduling) so that the decoupled PodGroup design lives in a single, self-contained KEP rather than being split across two documents.
KEP-5832 was created as a companion to KEP-4671 to detail the PodGroup decoupling design. Maintaining two separate KEPs for what is effectively one feature creates confusion for reviewers and implementers. Merging them produces a single authoritative document that is easier to review, approve, and track through the enhancement process.
/sig scheduling
/area workload-aware