-
Notifications
You must be signed in to change notification settings - Fork 115
[WIP] CNTRLPLANE-371: Update to Kubernetes v1.33 #2261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…port direct spec.nodeName changes.
…ImgChangeE2E Add e2e test for Regular Container image change
Optimize DS Controller Performance: Reduce Work Duration Time & Minimize Cache Locking.
test: switch gotestsum quiet output format
DRA device taints: fix some race conditions
WebSocket HTTPS Proxy support
[PodLevelResources] Pod Level Hugepage Resources
…opagation APIServerTracing: Respect trace context only for privileged users
CI integration scripts: reduce log noise from installing etcd
KEP-4742: Copy topology labels from Node objects to Pods upon binding/scheduling
…d_of_caching_cluster_events_in_binding Call queue.Done() before PreBind phase, removing the pod in binding from inFlightPods to save memory
[KEP-2371] add test about container metrics from cadvisor
…size disable in-place pod vertical scaling for swap enabled pods
Signed-off-by: carlory <[email protected]>
Signed-off-by: carlory <[email protected]>
add device-plugin-test e2e log
Remove general available feature-gate CPUManager
…ularContainerImgChangeE2E Revert "Add e2e test for Regular Container image change"
The defaulting of TimeAdded randomly broke some of the tests: TestList: resttest.go:1393: expected: []runtime.Object{(*resource.DeviceTaintRule)(0xc000b83080), (*resource.DeviceTaintRule)(0xc000b831e0)}, got: []runtime.Object{(*resource.DeviceTaintRule)(0xc0003db608), (*resource.DeviceTaintRule)(0xc0003db750)} ... TestCreate: resttest.go:346: unexpected obj: &resource.DeviceTaintRule{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"foo2", GenerateName:"", Namespace:"", SelfLink:"", UID:"18d3084d-7d11-4575-8730-4650b81cf1a7", ResourceVersion:"8", Generation:1, CreationTimestamp:time.Date(2025, time.March, 21, 8, 27, 23, 0, time.Local), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:resource.DeviceTaintRuleSpec{DeviceSelector:(*resource.DeviceTaintSelector)(nil), Taint:resource.DeviceTaint{Key:"example.com/taint", Value:"", Effect:"NoExecute", TimeAdded:time.Date(2025, time.March, 21, 8, 27, 23, 0, time.Local)}}}, expected &resource.DeviceTaintRule{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"foo2", GenerateName:"", Namespace:"", SelfLink:"", UID:"18d3084d-7d11-4575-8730-4650b81cf1a7", ResourceVersion:"8", Generation:1, CreationTimestamp:time.Date(2025, time.March, 21, 8, 27, 23, 0, time.Local), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:resource.DeviceTaintRuleSpec{DeviceSelector:(*resource.DeviceTaintSelector)(nil), Taint:resource.DeviceTaint{Key:"example.com/taint", Value:"", Effect:"NoExecute", TimeAdded:time.Date(2025, time.March, 21, 8, 27, 24, 0, time.Local)}}} Failure rate before: 3m40s: 1332 runs so far, 7 failures (0.53%) It's not obvious from the test failure, but the difference is the TimeAdded. Setting it beforehand to a value that can be encoded (i.e. truncated to seconds) fixes the flake. Failure rate after: 5m0s: 1825 runs so far, 0 failures
…force 2nd labeling to make tests work
Adding a new mutation plugin that handles the following: 1. In case of `workload.openshift.io/enable-shared-cpus` request, it adds an annotation to hint runtime about the request. runtime is not aware of extended resources, hence we need the annotation. 2. It validates the pod's QoS class and return an error if it's not a guaranteed QoS class 3. It validates that no more than a single resource is being request. 4. It validates that the pod deployed in a namespace that has mixedcpus workloads allowed annotation. For more information see - openshift/enhancements#1396 Signed-off-by: Talor Itzhak <[email protected]> UPSTREAM: <carry>: Update management webhook pod admission logic Updating the logic for pod admission to allow a pod creation with workload partitioning annotations to be run in a namespace that has no workload allow annoations. The pod will be stripped of its workload annotations and treated as if it were normal, a warning annoation will be placed to note the behavior on the pod. Signed-off-by: ehila <[email protected]> UPSTREAM: <carry>: add support for cpu limits into management workloads Added support to allow workload partitioning to use the CPU limits for a container, to allow the runtime to make better decisions around workload cpu quotas we are passing down the cpu limit as part of the cpulimit value in the annotation. CRI-O will take that information and calculate the quota per node. This should support situations where workloads might have different cpu period overrides assigned. Updated kubelet for static pods and the admission webhook for regular to support cpu limits. Updated unit test to reflect changes. Signed-off-by: ehila <[email protected]>
…ject openshift feature gates into pkg/features Signed-off-by: Swarup Ghosh <[email protected]>
This is a short term fix, once we improve the cert rotation logic in library-go that does not depend on this hack, then we can remove this carry patch. squash with the previous PR during the rebase openshift#1924 squash with the previous PRs during the rebase openshift#1924 openshift#1929
…phase and graceful termination phase This reverts commit 85f0f2c. UPSTREAM: <carry>: fix request Host storing in openshift.io/during-graceful audit log annotation request URL doesn't contain the host used in the request, instead it should be fetched from request headers Note for rebase: squash it into the following commit vrutkovs@a83d289 UPSTREAM: <carry>: annotate audit events for requests during unready phase and graceful termination phase (openshift#2077) When audit message is being processed https://github.com/openshift/kubernetes/blob/309f240e18f1da87bbe86c18746774d6d302f8ef/staging/src/k8s.io/apimachinery/pkg/util/proxy/transport.go#L136-L174 may strip `Host` from `r.URL`, however `r.Host` is always filled in. This value may be different for proxy requests, but in most cases `r.Host` should be used instead of `r.URL.Host`
…navailable errors for the etcd health checker client UPSTREAM: <carry>: replace newETCD3ProberMonitor with etcd3RetryingProberMonitor
This commit fixes bug 1919737. https://bugzilla.redhat.com/show_bug.cgi?id=1919737 * pkg/proxy/iptables/proxier.go (syncProxyRules): Prefer a local endpoint for the cluster DNS service.
…admission Signed-off-by: chiragkyal <[email protected]>
similarly to what we do for the managed CPU (aka workload partitioning) feature, introduce a master configuration file `/etc/kubernetes/openshift-llc-alignment` which needs to be present for the LLC alignment feature to be activated, in addition to the policy option being required. Note this replace the standard upstream feature gate check. This can be dropped when the feature per KEP kubernetes/enhancements#4800 goes beta. Signed-off-by: Francesco Romani <[email protected]>
Signed-off-by: ehila <[email protected]>
Explicitly exclude etcd and etcd-readiness checks (OCPBUGS-48177) and have etcd operator take responsibility for properly reporting etcd readiness. Justification: kube-apiserver instances get removed from a load balancer when etcd starts to report not ready (as will KA's /readyz). Client connections can withstand etcd unreadiness longer than the readiness timeout is. Thus, it is not necessary to drop connections in case etcd resumes its readiness before a client connection times out naturally. This is a downstream patch only as OpenShift's way of using etcd is unique.
The existing patch retried any etcd error returned from storage with the code "Unavailable". Writes can only be safely retried if the client can be absolutely sure that the initial attempt ended before persisting any changes. The "Unavailable" code includes errors like "timed out" that can't be safely retried for writes.
Signed-off-by: Peter Hunt <[email protected]> UPSTREAM: <carry>: authorization: add minimumkubeletversion package MinimumKubeletVersion is a way for an admin to declare that nodes any older than the minimum version cannot authorize with the apiserver. This effectively prevents them from joining. Doing so means the apiservers can trust newer features are usable on clusters with version skews Signed-off-by: Peter Hunt <[email protected]> UPSTREAM: <carry>: authorizer: move mininum kubelet version authorizer to pkg/kubeapiserver and add authorization mode this does require a line of code be moved from the enablement package to stop a cyclical import Signed-off-by: Peter Hunt <[email protected]> UPSTREAM: <carry>: crdvalidation: move latency profile file to be agnostic of field Signed-off-by: Peter Hunt <[email protected]> UPSTREAM: <carry>: features: add MinimumKubeletVersion feature Signed-off-by: Peter Hunt <[email protected]>
Upstream enables volume group snapshots by editing yaml files in a shell script [1]. We can't use this script in openshift-tests. Create a brand new, OCP specific test driver based on csi-driver-hostpath, only with the --feature-gate=VolumeGroupSnapshot on external-snapshotter command line. We will need to carry this patch until the feature graduates to GA. I've chosen to create brand new files in this carry patch, so it can't conflict with the existing ones. 1: https://github.com/kubernetes/kubernetes/blob/91d6fd3455c4a071408df20c7f48df221f2b6d30/test/e2e/testing-manifests/storage-csi/external-snapshotter/volume-group-snapshots/run_group_snapshot_e2e.sh
…service account groups
5ff71f3
to
9cd691d
Compare
/test integration |
@bertinatto: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This is a temporary PR create to get the initial steps of the kube bump ready.
This will be closed before starting the payload-testing phase.