Skip to content

Cluster status.phase gets "Failed" forever once FailureMessage or FailureReason is set #10847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fariaass opened this issue Jul 8, 2024 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@fariaass
Copy link

fariaass commented Jul 8, 2024

What steps did you take and what happened?

There was a problem related to infrastructure, the provider reported the error to CAPI which wrote it to status.failureMessage and status.failureReason field in Cluster CR. The problem got resolved but the cluster phase was never updated and the error message is still there

What did you expect to happen?

I expected the status.phase to become "Provisioned" and the fields status.failureMessage and status.failureReason to be cleaned

Cluster API version

1.7.1

Kubernetes version

1.28.5

Anything else you would like to add?

I was reading CAPI code when I noticed that the fields status.failureMessage and status.failureReason are never updated if they aren't defined. So if they were set with an error, they would always have that error until they get a new error. The code where the fields are update (or not): https://github.com/kubernetes-sigs/cluster-api/blob/main/internal/controllers/cluster/cluster_controller_phases.go#L130-L144

Label(s) to be applied

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 8, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@adilGhaffarDev
Copy link
Contributor

status.failureMessage and status.failureReason should only be set when failure is terminal, in other cases where the failure is recoverable you should not set status.failureMessage and status.failureReason

https://cluster-api.sigs.k8s.io/developer/providers/machine-infrastructure#behavior

@fariaass
Copy link
Author

Got it! tks! Just a question, you sent the machine docs, is it the same for clusters?

@fabriziopandini
Copy link
Member

yes it is the same for cluster

/kind support
/close

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Jul 17, 2024
@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

yes it is the same for cluster

/kind support
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@fabriziopandini
Copy link
Member

fyi, in #10897 I'm proposing to get rid of the confusing concept of terminal failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

4 participants