Skip to content

Commit 225b33b

Browse files
committed
fixes based on feedback
1 parent 8d808bf commit 225b33b

File tree

2 files changed

+26
-21
lines changed

2 files changed

+26
-21
lines changed

keps/sig-scheduling/5004-dra-extended-resource/README.md

+19-15
Original file line numberDiff line numberDiff line change
@@ -310,7 +310,7 @@ The mapping of DRA devices and extended resources is stored in k8s data store
310310
application that uses the devices.
311311

312312
```go
313-
// DeviceClassSpec is used in a [DeviceClass] to define what can be allocated
313+
// DeviceClassSpec is used in a DeviceClass to define what can be allocated
314314
// and how to configure it.
315315
type DeviceClassSpec struct {
316316
// ExtendedResourceName defines a mapping to the extended resource API.
@@ -379,7 +379,7 @@ type DeviceRequest struct {
379379
// Must be a DNS label.
380380
//
381381
// +required
382-
Name string `json:"name" protobuf:"bytes,1,name=name"`
382+
Name string
383383
}
384384
```
385385

@@ -408,28 +408,28 @@ to the containers in the pod.
408408
// resource requests backed by DRA. It stores the generated name for
409409
// the corresponding special ResourceClaim created by scheduler.
410410
type PodExtendedResourceClaimStatus struct {
411-
// Names identifies the mapping of <container, extended resource backed by DRA> to device request.
411+
// ResourceClaimName is the name of the ResourceClaim that was
412+
// generated for the Pod in the namespace of the Pod.
413+
ResourceClaimName string
414+
415+
// RequestMapping identifies the mapping of <container, extended resource backed by DRA> to device request.
412416
// +patchMergeKey=requestName
413417
// +patchStrategy=merge,retainKeys
414418
// +listType=atomic
415419
// +listMapKey=requestName
416420
// +featureGate=DynamicResourceAllocation
417-
Names []ContainerExtendedResourceRequest `json:"names" patchStrategy:"merge,retainKeys" patchMergeKey:"requestName" protobuf:"bytes,1,rep,name=names"`
418-
419-
// ResourceClaimName is the name of the ResourceClaim that was
420-
// generated for the Pod in the namespace of the Pod.
421-
ResourceClaimName string `json:"resourceClaimName" protobuf:"bytes,2,name=resourceClaimName"`
421+
RequestMapping []ContainerExtendedResourceRequest
422422
}
423423
424424
type ContainerExtendedResourceRequest struct {
425425
// ContainerName is the unique container name within the pod.
426-
ContainerName string `json:"containerName" protobuf:"bytes,1,name=containerName"`
426+
ContainerName string
427427
// ExtendedResourceName is the extended resource name backed by DRA inside
428428
// the container's requests.
429-
ExtendedResourceName string `json:"extendedResourceName" protobuf:"bytes,2,name=extendedResourceName"`
429+
ExtendedResourceName string
430430
// RequestName is the device request name in the special resource claim
431431
// created for extended resource requests backed by DRA.
432-
RequestName string `json:"requestName" protobuf:"bytes,3,name=requestName"`
432+
RequestName string
433433
}
434434
435435
type PodStatus struct {
@@ -449,11 +449,11 @@ then the pod's status is like below:
449449
```yaml
450450
status:
451451
extendedResourceClaimStatus:
452-
names:
452+
resourceClaimName: ccc-gpu-57999b9c4c-vpq68-gpu-8s27z
453+
requestMapping:
453454
- containerName: container-name
454455
extendedResourceName: foo.domain/bar
455456
requestName: container-0-request-2
456-
resourceClaimName: ccc-gpu-57999b9c4c-vpq68-gpu-8s27z
457457
```
458458
where `deviceRequest` name is "container-0-request-2", and container-name is the first container
459459
in the pod, foo.domain/bar is the 3rd extended resource in the container's requests.
@@ -462,7 +462,7 @@ Note the validations for extendedResourceClaimStatus are different from the
462462
validations for resourceClaimStatuses.
463463

464464
1. resourceClaimStatuses requires `name` must be DNS label,
465-
extendedResourceClaimStatus's names' `containerName` and `RequestName` must
465+
extendedResourceClaimStatus's requestMapping's `containerName` and `RequestName` must
466466
be a DNS label, while the `extendedResourceName` is not a DNS label.
467467
1. resourceClaimStatuses requires `name` must be one of the claim's name in the
468468
pod spec. extendedResourceClaimStatus requires `containerName` must be one
@@ -963,7 +963,11 @@ For each of them, fill in the following information by copying the below templat
963963
Not required until feature graduated to beta.
964964
- Testing: Are there any tests for failure mode? If not, describe why.
965965
-->
966-
Will be considered for beta.
966+
- [Pod pending due to extended resource backed by DRA requests no less than 128 devices]
967+
- Detection: inspect pod status 'Pending'
968+
- Mitigations: reduce the number of devices requested in one extended resource backed by DRA requests
969+
- Diagnostics: scheduler logs at level 5 show the reason for the scheduling failure.
970+
- Testing: Will be considered for beta.
967971

968972
###### What steps should be taken if SLOs are not being met to determine the problem?
969973

keps/sig-scheduling/5004-dra-extended-resource/kep.yaml

+7-6
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,15 @@ authors:
55
owning-sig: sig-scheduling
66
participating-sigs:
77
- sig-node
8-
status: draft
8+
status: implementable
99
creation-date: 2025-02-03
1010
reviewers:
1111
- "@klueska"
1212
- "@johnbelamaric"
1313
- "@pohly"
1414
approvers:
1515
- "@mrunalp" # SIG-Node
16-
- "@alculquicondor" # SIG-Scheduling
16+
- "@dom4ha" # SIG-Scheduling
1717
- "@thockin" # API Review
1818

1919
see-also:
@@ -25,13 +25,13 @@ stage: alpha
2525
# The most recent milestone for which work toward delivery of this KEP has been
2626
# done. This can be the current (upcoming) milestone, if it is being actively
2727
# worked on.
28-
latest-milestone: "v1.33"
28+
latest-milestone: "v1.34"
2929

3030
# The milestone at which this feature was, or is targeted to be, at each stage.
3131
milestone:
32-
alpha: "v1.33"
33-
beta: "v1.34"
34-
stable: "v1.35"
32+
alpha: "v1.34"
33+
beta: "v1.35"
34+
stable: "v1.36"
3535

3636
# The following PRR answers are required at alpha release
3737
# List the feature gate name and the components for which it must be enabled
@@ -40,6 +40,7 @@ feature-gates:
4040
components:
4141
- kube-apiserver
4242
- kube-scheduler
43+
- kubelet
4344
disable-supported: true
4445

4546
# The following PRR answers are required at beta release

0 commit comments

Comments
 (0)