-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CAPI logs filled with error messages if no machine deployments/pools exist and ControlPlane does not implement v1beta2 conditions #11820
CAPI logs filled with error messages if no machine deployments/pools exist and ControlPlane does not implement v1beta2 conditions #11820
Comments
Apologies for the rename, there was a little confusion on my end. We have a test case where the etcd plane is scaled to 0, a new etcd machine is created, and we perform an etcd restore on top of that. These logs were being printed endlessly, so I had erroneously assumed they were related. Although we skip draining with the That all being said, would it be possible to lower the log level for those messages until v1beta2 goes live? I'm content to just leave the conditions on there for now, otherwise it fills the logs and makes debugging quite difficult. |
/triage accepted |
/help |
@chrischdi: GuidelinesPlease ensure that the issue body includes answers to the following questions:
For more details on the requirements of such an issue, please see here and ensure that they are met. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Note, I will try to take a look at this in the context of the work I'm doing for #11474, but I'm not sure if and when I will get to this (if someone else wants to take care of this before me, feel free to do it!) |
id be interested in taking a look at this. not super familiar with the new conditions but seems like a small change to this logic should address @fabriziopandini? cluster-api/util/conditions/v1beta2/aggregate.go Lines 61 to 63 in 3cfb41d
|
@cahillsf I don't think it is correct to change the logic in the NewAggregateCondition func, it is the calling code that should not compute an aggregation if there are no conditions to aggregate The issue was initially reporting:
The write errors were about RollingOut, ScalingUp and ScalingDown conditions at cluster level. cluster-api/internal/controllers/cluster/cluster_controller_status.go Lines 942 to 949 in ad64bf6
However it looks like this logic doesn't take into account if the control plane object is reporting the optional ScalingUp condition or not (it only checks for This can be probably fixed by moving the existing check right before calling NewAggregateCondition, and replacing the if condition (currently Another option is to report "Conditions ScalingDown not yet reported from ..." from the CP if the condition is missing, but it seems wrong given that according to our contract conditions are optional @cahillsf @chrischdi @sbueringer opinions? |
ah, i see -- yes from what i understand this approach makes sense to me:
|
Sounds good |
/assign cahillsf |
What steps did you take and what happened?
Clusters that do not use machine deployments or machine pools (for example, configuring a cluster with machines manually) will cause the capi-controller-manager to endlessly write error logs whenever the cluster status is updated. The capi-controller-manager will show the following logs:
The requisite code can be found here:
We are able to workaround this by setting the conditions to false so that they are present.
What did you expect to happen?
My expectation is that during the v1beta1 -> v1beta2 migration, the status is set to
Unknown
if the aggregate conditions are not present.Cluster API version
v1.9.4
Kubernetes version
v1.30.2
Anything else you would like to add?
We implement a bring your own host style provisioning (not related to the byoh provider) in which users can register nodes freely, leaving lifecycle management to the user. Although a less common provisioning model, I imagine this could also affect clusters with control plane providers which have not been updated that also provision machines via manual definition.
Label(s) to be applied
/kind bug
/area conditions
The text was updated successfully, but these errors were encountered: