-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[target-allocator] TargetAllocator assigns Targets to an unhealthy collector? #3781
Comments
The problem here is that it's not clear what the behaviour should be. Ideally, we shouldn't assign targets to collectors which aren't healthy, but we also don't want to reallocate all targets just because a single collector was temporarily unhealthy. I had a basic idea on how this could be done in #2201, but I'm not sure what the possible consequences could be, or how to test it exhaustively. Maybe the way to go is to add a configurable grace period, and let users turn the behaviour off completely by setting that to 0. If anyone wants to submit a change like that, I'd be willing to review it. The other issue is assigned to me, but I've clearly not made much progress on it since then. |
Hey Mikolaj, I agree that always trigger a target reallocation for a bad collector is not the ideal solution, that could be too much. Just one more question, I am still confused about Appreciate! And please assign me this issue, I will file a PR and send you for a review! |
Basically, the way this works right now is that we take all the Collector Pods that fulfill our selectors, and which also have a Node assigned. I was thinking that instead we make this condition something like "the Pod is Ready, or it was Ready less than
Sure, if you have any questions, don't hestitate to ask them here! |
Oh, I see, totally agree with your idea. Having such a grace period of In addition, I will make the Now start working on a PR. |
@swiatekm I've submitted a Pull Request!! #3807 Basically I wrote an unit test and tested in local to validate my change is working. In terms of making the Actually NOT YET!!!! I believe I have a very basic misunderstanding about Following is an The OTEL Collector pod on that bad EC2 node was already unable to function properly since
I noticed that
Did a bit more investigations.
|
Pull Request [#3807] updated to consider both |
Component(s)
target allocator
What happened?
Description
I am wondering if it's expected that TargetAllocator still assign Targets to an unhealthy collector?
From the code [link], it seems there's no pod status check at all.
Can someone help double check if this is true?
Steps to Reproduce
http://localhost:8080/jobs/kube-state-metrics/targets
, and you can see the unhealthy OTEL Collector pod is still a candidate to assign targets to.Expected Result
We shouldn't assign targets to an unhealthy OTEL Collector? Or we should have a configuration that enables such a feature?
Actual Result
I was able to make a collector unhealthy but still seeing the unhealthy collector on
http://localhost:8080/jobs/kube-state-metrics/targets
.Kubernetes Version
1.31
Operator version
v0.120.0
Collector version
v0.120.0
The text was updated successfully, but these errors were encountered: