You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-node/1287-in-place-update-pod-resources/README.md
+74Lines changed: 74 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -881,6 +881,80 @@ Other components:
881
881
* check how the change of meaning of resource requests influence other
882
882
Kubernetes components.
883
883
884
+
### Instrumentation
885
+
886
+
The kubelet will record the following metrics:
887
+
888
+
#### `kubelet_pod_resize_requests_total`
889
+
890
+
This metric tracks the total number of resize requests observed by the Kubelet, counted at the pod level.
891
+
A single pod update changing multiple containers will be considered a single resize request.
892
+
893
+
Labels:
894
+
-`resource_type` - what type of resource is being resized. Possible values: `cpu_limits`, `cpu_requests``memory_limits`, or `memory_requests`. If more than one of these resource types is changing in the resize request,
895
+
we increment the counter multiple times, once for each. This means that a single pod update changing multiple
896
+
resource types will be considered multiple requests for this metric.
897
+
-`operation_type` - whether the resize is a net increase or a decrease (taken as an aggregate across
898
+
all containers in the pod). Possible values: `increase`, `decrease`, `add`, or `remove`.
899
+
900
+
This metric is recorded as a counter.
901
+
902
+
#### `kubelet_container_resize_requests_total`
903
+
904
+
This metric tracks the total number of resize requests observed by the Kubelet, counted at the container level.
905
+
A single pod update changing multiple containers will be considered separate resize requests.
906
+
907
+
Labels:
908
+
-`resource_type` - what type of resource is being resized. Possible values: `cpu_limits`, `cpu_requests``memory_limits`, or `memory_requests`. If more than one of these resource types is changing in the resize request,
909
+
we increment the counter multiple times, once for each. This means that a single pod update changing multiple
910
+
resource types will be considered multiple requests for this metric.
911
+
-`operation_type` - whether the resize is an increase or a decrease. Possible values: `increase`, `decrease`, `add`, or `remove`.
912
+
913
+
This metric is recorded as a counter.
914
+
915
+
#### `kubelet_pod_resize_sli_duration_seconds`
916
+
917
+
This metric tracks the latency between when the kubelet accepts a resize request and when it finshes actuating
918
+
the request. More precisely, this metric tracks the total amount of time that the `PodResizeInProgress` condition
919
+
is present on a pod.
920
+
921
+
Labels:
922
+
-`resource_type` - what type of resource is being resized. Possible values: `cpu_limits`, `cpu_requests``memory_limits`, or `memory_requests`. If more than one of these resource types is changing in the resize request,
923
+
we increment the counter multiple times, once for each.
924
+
-`operation_type` - whether the resize is an increase or a decrease. Possible values: `increase`, `decrease`, `add`, or `remove`.
925
+
926
+
This metric is recorded as a gauge.
927
+
928
+
#### `kubelet_pod_infeasible_resize_total`
929
+
930
+
This metric tracks the total count of resize requests that the kubelet marks as infeasible. This will make it
931
+
easier for us to see which of the current limitations users are running into the most.
932
+
933
+
Labels:
934
+
-`reason` - why the resize is infeasible. Although a more detailed "reason" will be provided in the `PodResizePending`
935
+
condition in the pod, we limit this label to only the following possible values to keep cardinality low:
936
+
-`guaranteed_pod_cpu_manager_static_policy` - In-place resize is not supported for Guaranteed Pods alongside CPU Manager static policy.
937
+
-`guaranteed_pod_memory_manager_static_policy` - In-place resize is not supported for Guaranteed Pods alongside Memory Manager static policy.
938
+
-`static_pod` - In-place resize is not supported for static pods.
939
+
-`swap_limitation` - In-place resize is not supported for containers with swap.
940
+
-`node_capacity` - The node doesn't have enough capacity for this resize request.
941
+
942
+
This list of possible reasons may shrink or grow depending on limitations that are added or removed in the future.
943
+
944
+
This metric is recorded as a counter.
945
+
946
+
#### `kubelet_pod_deferred_resize_accepted_total`
947
+
948
+
This metric tracks the total number of resize requests that the Kubelet originally marked as deferred but
949
+
later accepted. This metric primarily exists because if a deferred resize is accepted through the timed retry as
950
+
opposed to being explicitly signaled, it indicates an issue in the Kubelet's logic for handling deferred
951
+
resizes that we should fix.
952
+
953
+
Labels:
954
+
-`retry_reason` - whether the resize was accepted through the timed retry or explicitly signaled. Possible values: `timed`, `signaled`.
955
+
956
+
This metric is recorded as a counter.
957
+
884
958
### Static CPU & Memory Policy
885
959
886
960
Resizing pods with static CPU & memory policy configured is out-of-scope for the beta release of
0 commit comments