-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include Condition Reason When Merging Conditions in CAPI v1beta2 #11899
Comments
This issue is currently awaiting triage. If CAPI contributors determine this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Some context: metav1.Condition type (and API conventions) define
The challenge is that while merging many conditions into one like we do in the cluster, we loose some details about the underlying conditions, and this is particularly relevant for the reason field, but also the message have the same issues about loosing information even if we have a little bit more room there and we came up with solutions like multiline messages. Going back to the request My first reaction is that, even if we find a way to add reasons to messages, I don't think that interacting with the Cluster solely allows to automatically root cause issues, because merged condition are only designed to try to point to where the problem might be. A few additional notes:
But of course I would like to hear opinion from other maintainers as well... |
I agree. We discussed this during the implementation of v1beta2 conditions. I don't see a good way to preserve reason when we merge multiple conditions into one. We only have one reason field in the resulting condition and message is not supposed to be machine readable |
Thanks @fabriziopandini and @sbueringer for your reply. In V1Beta2, the CAPI Cluster CR already reports the status of all underlying resources through conditions. This means we can potentially rely solely on CAPI cluster conditions to diagnose issues. For example, the ControlPlaneMachinesReady condition aggregates the Ready conditions of all control plane Machine CRs. If an infrastructure machine reports an error via its Ready condition, this condition is mirrored in the InfrastructureReady condition of the corresponding CAPI Machine. The CAPI Machine then merges all its conditions (including InfrastructureReady) into its own Ready condition. As a result, the CAPI Cluster’s ControlPlaneMachinesReady condition ultimately reflects the infrastructure error. With this aggregation, we only need to check the top-level CAPI cluster resource to capture errors. However, when merging conditions, we currently retain only the condition type and message in the following format: *ConditionType: ConditionMessage This approach makes it difficult to pinpoint the root cause based on the message alone. If we modify the format to include the condition reason, such as: *ConditionType-ConditionReason: ConditionMessage We can more easily identify the underlying issue by matching the simple reason string. Open Questions: |
What would you like to be added (User Story)?
As a developer, I would like to retain condition Reason when aggregating conditions because it can help us locate the issue by checking the Reason string.
Detailed Description
In CAPI v1beta2, the top-level Cluster resource aggregates unexpected condition statuses from underlying resources. For example, the ControlPlaneMachinesReady condition consolidates the Ready condition of all control plane machines, highlighting any unready machines.
However, when merging condition messages, the Reason field is not included by default. This behavior is defined in the merge strategy implementation.
This omission presents a challenge when diagnosing issues. If an infrastructure provider reports an error via a condition in infra machine resource, only the condition message propagates to the top-level Cluster resource. Consequently, when building an operator that interacts solely with the Cluster resource, we must rely on message parsing and pattern matching to infer the root cause, which is inefficient and error-prone.
By preserving both the Reason and Message fields during condition merging, we can retain structured and concise error information. The short Reason string provides a standardized way to identify failure categories, improving debugging and automation.
Anything else you would like to add?
No response
Label(s) to be applied
/kind feature
/area conditions
The text was updated successfully, but these errors were encountered: