Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[target-allocator] TargetAllocator assigns Targets to an unhealthy collector? #3781

Open
rice-junhaoyu opened this issue Mar 7, 2025 · 6 comments · May be fixed by #3807
Open

[target-allocator] TargetAllocator assigns Targets to an unhealthy collector? #3781

rice-junhaoyu opened this issue Mar 7, 2025 · 6 comments · May be fixed by #3807
Assignees
Labels
area:target-allocator Issues for target-allocator bug Something isn't working

Comments

@rice-junhaoyu
Copy link

rice-junhaoyu commented Mar 7, 2025

Component(s)

target allocator

What happened?

Description

I am wondering if it's expected that TargetAllocator still assign Targets to an unhealthy collector?

From the code [link], it seems there's no pod status check at all.

Can someone help double check if this is true?

Steps to Reproduce

  1. Create regular TargetAllocator and OTEL Collectors.
  2. Specify the OTEL Collector StatefulSet to use a bad image or etc, this will cause a OTEL Collector pod to be in a non-Running state.
  3. Port forward to the TargetAllocator, curl http://localhost:8080/jobs/kube-state-metrics/targets, and you can see the unhealthy OTEL Collector pod is still a candidate to assign targets to.

Expected Result

We shouldn't assign targets to an unhealthy OTEL Collector? Or we should have a configuration that enables such a feature?

Actual Result

I was able to make a collector unhealthy but still seeing the unhealthy collector on http://localhost:8080/jobs/kube-state-metrics/targets.

Kubernetes Version

1.31

Operator version

v0.120.0

Collector version

v0.120.0

@rice-junhaoyu rice-junhaoyu added bug Something isn't working needs triage labels Mar 7, 2025
@atoulme atoulme added area:target-allocator Issues for target-allocator and removed needs triage labels Mar 12, 2025
@swiatekm
Copy link
Contributor

The problem here is that it's not clear what the behaviour should be. Ideally, we shouldn't assign targets to collectors which aren't healthy, but we also don't want to reallocate all targets just because a single collector was temporarily unhealthy. I had a basic idea on how this could be done in #2201, but I'm not sure what the possible consequences could be, or how to test it exhaustively.

Maybe the way to go is to add a configurable grace period, and let users turn the behaviour off completely by setting that to 0. If anyone wants to submit a change like that, I'd be willing to review it. The other issue is assigned to me, but I've clearly not made much progress on it since then.

@rice-junhaoyu
Copy link
Author

Hey Mikolaj,

I agree that always trigger a target reallocation for a bad collector is not the ideal solution, that could be too much.

Just one more question, I am still confused about Maybe the way to go is to add a configurable grace period, and let users turn the behaviour off completely by setting that to 0.. Can you elaborate a bit more and maybe point me to somewhere? I am kinda new to OTEL and not aware of such a grace period.

Appreciate! And please assign me this issue, I will file a PR and send you for a review!

@swiatekm
Copy link
Contributor

Just one more question, I am still confused about Maybe the way to go is to add a configurable grace period, and let users turn the behaviour off completely by setting that to 0.. Can you elaborate a bit more and maybe point me to somewhere? I am kinda new to OTEL and not aware of such a grace period.

Basically, the way this works right now is that we take all the Collector Pods that fulfill our selectors, and which also have a Node assigned.

I was thinking that instead we make this condition something like "the Pod is Ready, or it was Ready less than grace period seconds ago". Then if the grace period is, say, 30 seconds, then a Collector Pod could be unhealthy for 15 seconds without triggering a reallocation.

Appreciate! And please assign me this issue, I will file a PR and send you for a review!

Sure, if you have any questions, don't hestitate to ask them here!

@rice-junhaoyu
Copy link
Author

Oh, I see, totally agree with your idea. Having such a grace period of 30s or something prevents too frequent reallocation, epecially in a case of pod restart, which only takes few seconds if there're already available nodes.

In addition, I will make the grace period configurable.

Now start working on a PR.

@rice-junhaoyu
Copy link
Author

rice-junhaoyu commented Mar 15, 2025

@swiatekm I've submitted a Pull Request!! #3807

Basically I wrote an unit test and tested in local to validate my change is working.

In terms of making the grace period configurable, I will like to get your opinion first. I was a bit indecisive because I noticed currently minUpdateInterval is also un-configurable, so I am not pretty sure if I need to make grace period configurable.


Actually NOT YET!!!! I believe I have a very basic misunderstanding about Pod Phase = Running, IT DOES NOT NECESSARILY MEAN A OTEL COLLECTOR POD IS HEALTHY.

Following is an EC2 Machine Downtime on our service side, the EC2 Machine started to be unhealthy at 02:45.
Image

The OTEL Collector pod on that bad EC2 node was already unable to function properly since 02:45, but the collector was still maintaining Running state for quite a while until the EC2 Machine self-recovered.

| 2025-03-15 03:01:12.255 | Pending |
| 2025-03-15 03:01:12.252 | Failed |
| 2025-03-15 03:01:12.244 | Failed |
| 2025-03-15 03:01:05.224 | Running |
| 2025-03-15 03:01:01.233 | Running |
| 2025-03-15 03:01:01.228 | Running |
| 2025-03-15 02:50:08.528 | Running |
| 2025-03-15 02:50:08.527 | Running |
| 2025-03-15 02:45:03.419 | Running |

I noticed that PodStatusCondition: Ready was already False from the very beginning of the event, therefore, I believe we should also consider this Pod Condition.

[{"type":"PodReadyToStartContainers","status":"True","lastProbeTime":null,"lastTransitionTime":"2025-03-11T15:15:48Z"},
{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2025-03-11T15:16:05Z"},
{"type":"Ready","status":"False","lastProbeTime":null,"lastTransitionTime":"2025-03-15T02:45:03Z"},
{"type":"ContainersReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2025-03-11T15:16:07Z"},
{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2025-03-11T15:15:40Z"}]

Did a bit more investigations.

Running is a Pod Phase that is defined as: "The Pod has been bound to a node, and all of the containers have been created. At least one container is still running, or is in the process of starting or restarting."

=> It turns out that Pod Phase = Running does not necessarily mean a pod is healthy.

Ready is a Pod Condition that is defined as: "The Pod is able to serve requests and should be added to the load balancing pools of all matching Services."

=> This seems to be something we indeed want.

What do you think? I am fine to keep both of them in the checks although I think Ready pod condition check is covering the check for Pod Phase = Running.

@rice-junhaoyu
Copy link
Author

Pull Request [#3807] updated to consider both PodPhase and PodCondition!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:target-allocator Issues for target-allocator bug Something isn't working
Projects
None yet
3 participants