Skip to content

[FEATURE] Allow the Cluster Autoscaler startup-taint prefix in NodeReadinessRule taint keys #279

Description

@rajathagasthya

Is your feature request related to a problem or existing issue? Please describe.

Cluster Autoscaler treats a taint as a startup taint in two ways: the --startup-taint flag, or the reserved key prefix startup-taint.cluster-autoscaler.kubernetes.io/, which is auto-detected without any flag.

On managed platforms the flag is not user-editable, so the reserved prefix is the only way to get startup-taint semantics:

  • GKE documents the prefix as the only supported mechanism (docs).
  • AKS supports boot-time taints via --node-init-taints, but its autoscaler does not expose --startup-taints (Azure/AKS#3276)

A single taint key cannot carry both prefixes. So on GKE and AKS, NRC cannot be the component that removes the startup taint.

Concrete use case

Gating scheduling on autoscaled GPU nodes: the node pool template applies a startup taint, Node Problem Detector publishes a GPU readiness node condition, and NRC removes the taint when the condition is True. This works with self-managed Cluster Autoscaler (where --startup-taint can be set) but not on GKE or AKS.

Describe the solution you'd like

Extend the validation to also accept the Cluster Autoscaler startup-taint prefix:

// +kubebuilder:validation:XValidation:rule="self.key.startsWith('readiness.k8s.io/') || self.key.startsWith('startup-taint.cluster autoscaler.kubernetes.io/')"

Optionally also allow the legacy ignore-taint.cluster-autoscaler.kubernetes.io/ prefix, which older autoscaler versions auto-detect.

Describe alternatives you've considered

  • Admin-configured prefix allowlist (controller flag). More general, but CRD-level CEL cannot read controller flags, so apply-time validation would move to the reconciler (or webhook).
  • Cluster Autoscaler auto-detecting readiness.k8s.io/ as a startup-taint namespace. A good long-term change on the autoscaler side, but this requires coordination among multiple providers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions