diff --git a/keps/NNNN-kep-template/README.md b/keps/NNNN-kep-template/README.md index 7018679e98f..44f03cb0e87 100644 --- a/keps/NNNN-kep-template/README.md +++ b/keps/NNNN-kep-template/README.md @@ -373,13 +373,23 @@ Below are some examples to consider, in addition to the aforementioned [maturity - Gather feedback from developers and surveys - Complete features A, B, C - Additional tests are in Testgrid and linked in KEP +- More rigorous forms of testing—e.g., downgrade tests and scalability tests +- All functionality completed +- All security enforcement completed +- All monitoring requirements completed +- All testing requirements completed +- All known pre-release issues and gaps resolved + +**Note:** Beta criteria must include all functional, security, monitoring, and testing requirements along with resolving all issues and gaps identified #### GA - N examples of real-world usage - N installs -- More rigorous forms of testing—e.g., downgrade tests and scalability tests - Allowing time for feedback +- All issues and gaps identified as feedback during beta are resolved + +**Note:** GA criteria must not include any functional, security, monitoring, or testing requirements. Those must be beta requirements. **Note:** Generally we also wait at least two releases between beta and GA/stable, because there's no opportunity for user feedback, or even bug reports, diff --git a/keps/prod-readiness/sig-architecture/5241.yaml b/keps/prod-readiness/sig-architecture/5241.yaml new file mode 100644 index 00000000000..2bf3ef02ec2 --- /dev/null +++ b/keps/prod-readiness/sig-architecture/5241.yaml @@ -0,0 +1,3 @@ +kep-number: 5241 +stable: + approver: "@johnbelamaric" diff --git a/keps/sig-architecture/5241-beta-featuregate-promotion-requirements/README.md b/keps/sig-architecture/5241-beta-featuregate-promotion-requirements/README.md new file mode 100644 index 00000000000..ae9cfb35f96 --- /dev/null +++ b/keps/sig-architecture/5241-beta-featuregate-promotion-requirements/README.md @@ -0,0 +1,231 @@ + +# KEP-5241: Beta Feature Gate Promotion Requirements + + + + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [Risks and Mitigations](#risks-and-mitigations) + - [What if I need to add capability to my feature?](#what-if-i-need-to-add-capability-to-my-feature) + - [Who will make sure that new KEPs follow the promotion rules?](#who-will-make-sure-that-new-keps-follow-the-promotion-rules) + - [Graduation Criteria](#graduation-criteria) +- [Drawbacks](#drawbacks) + - [This may slow the rate that new features are promoted.](#this-may-slow-the-rate-that-new-features-are-promoted) +- [Alternatives](#alternatives) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +Features gates must include all functional, security, monitoring, and testing requirements along with +resolving all issues and gaps identified prior to being enabled by default. +The only valid GA criteria are “all issues and gaps identified as feedback during beta are resolved”. + +## Motivation + +Features gates that are enabled by default are enabled in every production Kubernetes cluster in the world. +We must avoid making every production cluster into unstable or incomplete feature testing clusters. +Even feature gates that make flags accessible, but require a secondary configuration to use must be +stable, because it is unrealistic to expect everyone to understand the graduation stages of various flags +for each release: the only stages that really matter are "takes enabling an explicit alpha feature gate" +and "my production cluster accepts this as valid by default". + +### Goals + +* Features gates must include all functional, security, monitoring, and testing requirements along with + resolving all issues and gaps identified prior to being enabled by default. +* The only valid GA criteria are “all issues and gaps identified as feedback during beta are resolved”. + +### Non-Goals + +* Changing beta APIs off by default rules. +* Change the imperfect mechanisms we have for API evolution. + +## Proposal + +Kubernetes feature gates have three levels: GA (locked on), GA (disable-able), Beta, and Alpha. +1. GA (locked-on) means that a feature gate is unconditionally enabled in all production kubernetes clusters and + that feature cannot be disabled. +2. GA (disable-able) is only for features gates that include a new API serialization that cannot be enabled by default + until the API reaches stable. This means that the first time the API is enabled in production, the feature will + be GA, but also can be disabled. This is a less common state and does not apply to most features. +3. Beta means that a feature gate is usually enabled in all production Kubernetes clusters by default + and that feature can be disabled. + Exceptions exist for entirely new APIs and some node features, but this broadly the case. +4. Alpha means that a feature gate is disabled in all production Kubernetes clusters by default and + can be optionally enabled by setting a `--feature-gate` command line argument. + +Making the jump to GA (cannot be disabled), without actual field experience is irresponsible. +The first time we take a feature gate enabled by default in production Kubernetes clusters, we must +have a way to disable the feature in case of unexpected stability, performance, or security issues. + +Enabling incomplete features in production Kubernetes clusters by default is irresponsible. +Features that are known to be incomplete naturally bring with them additional stability, performance, and security issues. +Once a feature has been enabled in a production Kubernetes cluster by default, adding to it carries +greater risk to upgrading clusters and the ecosystem. +The feature can easily have become relied upon by workloads and other platform extensions. +If an accident happens in adding those capabilities with stability, performance, and security the +cost to disable those features in a cluster becomes significantly greater and breaks existing +clusters, workloads and use-cases. +This posture makes upgrades higher risk than necessary. + +To balance these concerns, we are changing how we evaluate Beta and GA stability criteria. +The only valid GA criteria are “all issues and gaps identified as feedback during beta are resolved”. +Promotion from Beta to GA must have no significant change for the release. +This means that Beta criteria must include all functional, security, monitoring, and testing requirements along +with resolving all issues and gaps identified prior to beta. + +Phasing in larger features over time can be done by bringing separate feature gates through alpha, beta, and GA. +Each feature gate needs to meet the beta and GA criteria for completeness, functional, security, monitoring, and testing. +After meeting the criteria for enabled by default, and at the SIG's discretion, the new feature gate could be +set to enabled by default in the release it is introduced. +Importantly, the features need to behave in a way that allows old and new clients to interoperate and new additions +to larger features able to be independently disablable with their own path for GA. + +### Risks and Mitigations + +#### What if I need to add capability to my feature? +To handle this situation, we described above how to add second feature gate for the new behavior. +This provides a mechanism for adding needed capability, but ensures that +cluster-admins never end up stuck after upgrade because they rely on v1.Y-1 behavior that new capability +in v1.Y broke under the same feature gate. + +#### Who will make sure that new KEPs follow the promotion rules? +We'll adjust the KEP template to indicate the allowed criteria, so authors should notice. +SIG approvers should enforce those standards. +PRR approvers can be a final backstop. + +### Graduation Criteria + +This document is our new position once merged until it is superceded by another position statement. + +## Drawbacks + +### This may slow the rate that new features are promoted. +For this to be true, that would mean that we previously enabled feature gates in production that were knowingly +incomplete for functional, security, monitoring, testing, or known bugs. +We hope this was not the common case, but if it was the common enough to have an impact, we're pleased that +the result is preventing incomplete feature gates from being enabled in production clusters. + +## Alternatives + +None proposed so far. diff --git a/keps/sig-architecture/5241-beta-featuregate-promotion-requirements/kep.yaml b/keps/sig-architecture/5241-beta-featuregate-promotion-requirements/kep.yaml new file mode 100644 index 00000000000..603398c14c7 --- /dev/null +++ b/keps/sig-architecture/5241-beta-featuregate-promotion-requirements/kep.yaml @@ -0,0 +1,31 @@ +title: Beta Feature Gate Promotion Requirements +kep-number: 5241 +authors: + - "@deads2k" +owning-sig: sig-architecture +participating-sigs: +status: implemented +creation-date: 2025-04-02 +reviewers: + - "@liggitt" + - "@thockin" +approvers: + - "@johnbelamaric" + - "@dims" + - "@derekwaynecarr" + +see-also: + - "/keps/sig-architecture/3136-beta-apis-off-by-default" +replaces: + +stage: stable + +latest-milestone: "v1.34" + +milestone: + stable: "v1.34" + +feature-gates: +disable-supported: false + +metrics: