-
Notifications
You must be signed in to change notification settings - Fork 134
Description
What happened?
KAI-Scheduler v0.8.5 fails to run any scheduling cycles due to a mandatory dependency on Kueue Topology CRDs. The scheduler initializes successfully and all pods show Running status, but the scheduler never enters its main scheduling loop, making it completely non-functional.
Logs:
Go Version: go1.24.4
Go OS/Arch: linux/amd64
buildDate: 2025-09-04T08:40:47Z
gitCommit: b2c5b0c5f2a6ecb5ed2bf746a32bf23a049f9c21
gitTreeState: clean
gitVersion: v0.8.5
W1117 04:45:21.799217 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1alpha1.Topology: the server could not find the requested resource (get topologies.kueue.x-k8s.io)
E1117 04:45:21.799351 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1alpha1.Topology: failed to list *v1alpha1.Topology: the server could not find the requested resource (get topologies.kueue.x-k8s.io)" logger="UnhandledError"
W1117 04:45:23.332043 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1alpha1.Topology: the server could not find the requested resource (get topologies.kueue.x-k8s.io)
E1117 04:45:23.332082 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1alpha1.Topology: failed to list *v1alpha1.Topology: the server could not find the requested resource (get topologies.kueue.x-k8s.io)" logger="UnhandledError"
W1117 04:45:24.943298 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1alpha1.Topology: the server could not find the requested resource (get topologies.kueue.x-k8s.io)
E1117 04:45:24.943340 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1alpha1.Topology: failed to list *v1alpha1.Topology: the server could not find the requested resource (get topologies.kueue.x-k8s.io)" logger="UnhandledError"
W1117 04:45:31.168641 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1alpha1.Topology: the server could not find the requested resource (get topologies.kueue.x-k8s.io)
E1117 04:45:31.168682 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1alpha1.Topology: failed to list *v1alpha1.Topology: the server could not find the requested resource (get topologies.kueue.x-k8s.io)" logger="UnhandledError"
W1117 04:45:39.796797 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1alpha1.Topology: the server could not find the requested resource (get topologies.kueue.x-k8s.io)
E1117 04:45:39.796835 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1alpha1.Topology: failed to list *v1alpha1.Topology: the server could not find the requested resource (get topologies.kueue.x-k8s.io)" logger="UnhandledError"
W1117 04:45:53.218642 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1alpha1.Topology: the server could not find the requested resource (get topologies.kueue.x-k8s.io)
E1117 04:45:53.218707 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1alpha1.Topology: failed to list *v1alpha1.Topology: the server could not find the requested resource (get topologies.kueue.x-k8s.io)" logger="UnhandledError"
Code Location: pkg/scheduler/cache/cluster_info/cluster_info.go
In the Snapshot() function (called before every scheduling cycle):
snapshot.Topologies, err = c.snapshotTopologies()
if err != nil {
return nil, err // ← Blocks entire scheduler
}
Code location: pkg/scheduler/cache/cluster_info/data_lister/interface.go
ListTopologies() ([]*v1alpha1.Topology, error)
Kueue v0.14.0 had this in the changelog:
v1alpha1 changed to v1beta1
reference: https://github.com/kubernetes-sigs/kueue/releases/tag/v0.14.0
Also, Kueue has plans to move on to v1 in the future.
The code explicitly looks for v1alpha1 CRDs and fails.
What did you expect to happen?
Scheduler code should perhaps not fail on inexistent topology crds and disable topology features
Environment
- Kubernetes version: v1.30.14
- KAI Scheduler version: v0.8.5
- Cloud provider or hardware configuration: Enterprise onprem
- Tools that you are using KAI together with: Kueue v0.14.4