-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Improve compatibility with kstatus: avoid "resource is ready" race #9299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This issue is currently awaiting triage. If CAPI contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This was discussed on slack two days ago with @killianmuldoon , who suggested me to file an issue. |
@fabriziopandini WDYT? Seems sort of okay to me, although I'm not a huge fan of doing something like this just for Flux, is this some sort of standard? (I assume observedGeneration 0 would work, if we wouldn't use omitempty?) |
Quite intrigued about this statement, Flux is a graduated CNCF project, why wouldn't we want better synergy between CNCF projects? PS. As @tmmorin mentioned in this issue, this about Kubernetes kstatus compatibility. Kstatus is a library used in kubectl and also Flux, maybe others CNCF projects use it too. Doc here: https://github.com/kubernetes-sigs/cli-utils/tree/master/pkg/kstatus |
I didn't want to say that we wouldn't want to be compatible with Flux. I just wondered if there is some standard across the ecosystem (which not only includes Flux). For example at the moment Cluster API itself is using Condition types which diverge from the standard k/k conditions (afaik Cluster API and k/k conditions were introduced at ~ the same time). One interesting difference there is that observedGeneration is part of the conditions themselves in k/k (xref: https://github.com/kubernetes/apimachinery/blob/master/pkg/apis/meta/v1/types.go#L1497). We were thinking for a while if there is a way to make our status / status.conditions more standard so we can move utils like our patch helper to controller-runtime instead of requiring our own in Cluster API It would be just nice if we could find a way for our condition to align to Kubernetes-ecosystem-wide standards. I assume the Kubernetes condition type is compatible? (I mean it basically has to be, right? :)). So setting observedGeneration default to -1 would be a stopgap to make Cluster API conditions compatible, but it wouldn't be another problem if we want to move to k/k conditions? (because they don't have a default of -1 as far as I can tell) |
The issue here is not with conditions but with resources just created that don't have |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen Let's revisit this given the ongoing work on #11947. Especially the work on moving to metav1.Conditions. I'm not sure if we can entirely follow what is lined out here: https://github.com/kubernetes-sigs/cli-utils/tree/master/pkg/kstatus e.g.:
Cluster API follows https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties which states
Example: We have an "Available" condition that is "True" if the Cluster is available. What is not entirely clear to me is why kstatus considers a CR that has a status subresource with an empty status as fully reconciled. I guess they want to handle the case where a CR has a status but it's valid that even after reconcile the status is entirely empty. @stefanprodan If we would add the default value for status.observedGeneration, what would this improve on the Flux side nowadays? I think you'll still need configuration so Flux knows which condition to look for? (the full list of conditions of the Cluster v1beta2 type can be seen here: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20240916-improve-status-in-CAPI-resources.md#cluster-newconditions). Related: What is the information you are looking for? The v1beta1 Cluster Ready condition doesn't cover much (especially if a Cluster is e.g. upgraded or workers are scaled up) @JoelSpeed What is your take on setting a default of observedGeneration -1? I couldn't find anything about that in the API conventions. P.S. It was mention above that kstatus is used in kubectl. Is there a simple kubectl command I can use with our Cluster CR to play around with this? |
@sbueringer: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@sbueringer given that the After upgrading to v1beta2 the current health checks will fail, so Flux users will need to update their CEL expression and switch to |
I think something like this should work
(Note for other readers (discussed offline): this won't actually use v1beta1 it will use the preferred version, which is why we have to handle both conditions as soon as v1beta2 has been added to the CAPI CRD) |
Why does 0 not work, even with omitempty?
Why would a custom resource without status be considered ready? A lack of status, assuming the resource has a status to set, clearly means the controller hasn't got to it yet
Observed generation already has a sensible default when the object is created, omitted, just like the rest of the status. I'm not sure why we would want to set this to a non-obvious default. From an end user perspective, a lack of status is clearly "nothing has happened yet". Where a partial status with a default value looks odd, and now probably isn't necessarily obvious why there's a partial status. I'm currently -1 on setting any defaults on status objects right now |
Thx for the feedback everyone.
I think based on that there would be no benefit anymore in setting a default of |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What would you like to be added (User Story)?
There is a low-hanging fruit of something easy to do to have CAPI resources play much nicer with tools relying on kstatus to know if a CAPI resource is ready.
Detailed Description
CAPI custom resources can be considered "Ready" by kstatus library before they actually are ready.
In a context where a tool relying on kstatus is used (e.g FluxCD) this opens the door to inconsistencies: wrongly concluding that something is ready and triggering things that depend on that too early.
This typically happens very shortly after resource creation and for a very short period of time. And this resolves very quickly. But there is a race condition if the tool using kstatus is used during the problematic time window.
The problematic time is when, for instance, a Cluster CR has no
status
yet, or when it hasstatus
but nostatus.conditions
yet, with only thestatus.observedGeneration
being set to 1 (equal tometadata.generation
). As soon as the resource is processed by its controller, the controller will set itsstatus
to include instatus.conditions
a condition of type Ready and status False , and then kstatus will report a correct result (InProgress which means "ready").The typical solution to this issue is to ensure that the CRD defines a default of -1 for
status.observedGeneration
- this is sufficient to let kstatus library ignore the rest and conclude that the resource isn't ready yet.Quoting @stefanprodan (FluxCD dev):
Example here (FluxCD does this on their own CRDs):
https://github.com/fluxcd/source-controller/blob/a302c71c57e370403042a2e307e3f4446b539730/api/v1/gitrepository_types.go#L328
Anything else you would like to add?
#5472 has been opened a while ago, is strongly related, but has a much generic/wider scope than what I described her which is focusing only on having the InProgress/ready state free of this resource creation race condition.
Label(s) to be applied
/kind feature
/area api
The text was updated successfully, but these errors were encountered: