Skip to content

Commit 891db0d

Browse files
committed
fixes backed on feedback
1 parent 2b3a71d commit 891db0d

File tree

1 file changed

+31
-21
lines changed
  • keps/sig-scheduling/5004-dra-extended-resource

1 file changed

+31
-21
lines changed

keps/sig-scheduling/5004-dra-extended-resource/README.md

Lines changed: 31 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
- [Proposal](#proposal)
1010
- [Design Details](#design-details)
1111
- [Device Class API](#device-class-api)
12+
- [Implicit Extended Resource Name](#implicit-extended-resource-name)
1213
- [Resource Claim API](#resource-claim-api)
1314
- [Pod API](#pod-api)
1415
- [Scheduling for Extended Resource backed by DRA](#scheduling-for-extended-resource-backed-by-dra)
@@ -214,7 +215,7 @@ non-goals of this KEP.
214215
extended resource requests.
215216
216217
* Enable application operators to use the existing extended resource request in
217-
pod spec to request for DRA resources.
218+
pod spec to request DRA resources.
218219
219220
* Extended resource support is not added just for easing the transition to DRA
220221
for the short term. Its ease of use is one big advantage to keep it remaining
@@ -265,7 +266,7 @@ device plugin, extended resource backed by DRA, and dynamic resource.
265266
* extended resource backed by device plugin uses pod's
266267
spec.containers[].resources.requests to request for resources, it consumes the capacity
267268
from node's status.capacity. It is of type (string, int64)
268-
* dynamic resource uses `ResourceClaim` to request for resources, and
269+
* dynamic resource uses `ResourceClaim` to request resources, and
269270
`ResourceSlice` to provide resource capacity. A pod asks for resources through
270271
resource claim requests in pod's spec.resources.claims. Dynamic resource type
271272
is described in resource slice, simply speaking, it is a list of devices, with
@@ -365,7 +366,7 @@ garbage collector.
365366
* It is *deleted*
366367
* either together with the owning pod's deletion.
367368
* or by the scheduler dynamic resource plugin during unReserve phase.
368-
* or by the scheduler dynamic resource plugin during PostFilter phase.
369+
* or by the scheduler dynamic resource plugin during postFilter phase.
369370
* It is *discovered* by the kubelet via `pod.Status.ExtendedResourceClaimStatus`
370371
* It is *read* by the kubelet DRA device driver to prepare the devices listed
371372
therein when preparing to run the pod.
@@ -391,13 +392,15 @@ resource requests. For example, if the first container in the pod has an
391392
extended resource backed by DRA which is the 3rd such request in the container,
392393
then the name of the `DeviceRequest` is "container-0-request-2".
393394

394-
Documenting this naming is merely informational, it is not part of the API. The kubelet must not rely on it. Instead, the `ContainerExtendedResourceRequest` field below specifies the mapping.
395+
Documenting this naming is merely informational, it is not part of the API.
396+
The kubelet must not rely on it. Instead, the
397+
`ContainerExtendedResourceRequest` field below specifies the mapping.
395398

396399
### Pod API
397400

398401
A new field `extendedResourceClaimStatus` is added to Pod's status to track
399-
the special `ResouceClaim` object created for the extended resource requests
400-
in the pod. This is needed for kublet to pass the devices allocated by driver
402+
the special `RresouceClaim` object created for the extended resource requests
403+
in the pod. This is needed for kubelet to pass the devices allocated by driver
401404
to the containers in the pod.
402405

403406
```go
@@ -454,12 +457,12 @@ then the pod's status is like below:
454457

455458
```yaml
456459
status:
457-
extendedResourceClaimStatus:
458-
- names:
459-
- container-name
460-
- foo.domain/bar
461-
- container-0-request-2
462-
resourceClaimName: ccc-gpu-57999b9c4c-vpq68-gpu-8s27z
460+
extendedResourceClaimStatus:
461+
names:
462+
- containerName: container-name
463+
extendedResourceName: foo.domain/bar
464+
requestName: container-0-request-2
465+
resourceClaimName: ccc-gpu-57999b9c4c-vpq68-gpu-8s27z
463466
```
464467
where `deviceRequest` name is "container-0-request-2", and container-name is the first container
465468
in the pod, foo.domain/bar is the 3rd extended resource in the container's requests.
@@ -494,10 +497,10 @@ type Resource struct {
494497
ScalarResources map[v1.ResourceName]int64
495498
496499
// NEW!
497-
// DynamicResources: keep track of extended resources backed by DRA to device classes
498-
// The map's key is the extended resource name that has at least one device
500+
// DynamicResources: keep track of extended resources backed by DRA to device class
501+
// The map's key is the extended resource name that has exactly one device
499502
// class advertises it.
500-
DynamicResources map[v1.ResourceName][]string
503+
DynamicResources map[v1.ResourceName]string
501504
}
502505
```
503506

@@ -509,7 +512,7 @@ a snapshot of all the nodes in the cluster, and updates their corresponding
509512

510513
For the scheduler with DRA enabled, right after taking the node snapshot, the
511514
scheduler also takes a snapshot of `DeviceClass`, and updates
512-
`NodeInfo.DynamicResources` if there is extended resource backed by DRA.
515+
`NodeInfo.DynamicResources` if there is an extended resource backed by DRA.
513516

514517
For a node with extended resources from device plugin, its NodeInfo's
515518
Allocatable.ScalarResources is updated with the k8s `Node`'s object.
@@ -692,14 +695,19 @@ ensure `ExtendedResourceName`s are handled by the scheduler as described in this
692695

693696
#### Beta
694697

695-
- Gather feedback
698+
- Gather feedback from developers and surveys
699+
- 3 examples of vendors making use of the extensions proposed in this KEP
700+
- Scalability tests that mirror real-world usage as determined by user feedback
696701
- Additional tests are in Testgrid and linked in KEP
702+
- All functionality completed
703+
- All security enforcement completed
704+
- All testing requirements completed
705+
- All known pre-release issues and gaps resolved
697706

698-
#### GA
699707

700-
- 3 examples of vendors making use of the extensions proposed in this KEP
701-
- Scalability tests that mirror real-world usage as determined by user feedback
708+
#### GA
702709
- Allowing time for feedback
710+
- All issues and gaps identified as feedback during beta are resolved
703711

704712
### Upgrade / Downgrade Strategy
705713

@@ -902,7 +910,9 @@ No.
902910

903911
### Scalability
904912

905-
No. The API extensions in this KEP are limited to at most one claim for extended resource backed by DRA per pod.
913+
###### Will enabling / using this feature result in any new API calls?
914+
915+
Yes. scheduler make new API calls to create, update, and delete the special resource claim for extended resource backed by DRA.
906916

907917
###### Will enabling / using this feature result in introducing new API types?
908918

0 commit comments

Comments
 (0)