[target-allocator] TargetAllocator assigns Targets to an unhealthy collector? #3781

rice-junhaoyu · 2025-03-07T06:44:45Z

Component(s)

target allocator

What happened?

Description

I am wondering if it's expected that TargetAllocator still assign Targets to an unhealthy collector?

From the code [link], it seems there's no pod status check at all.

Can someone help double check if this is true?

Steps to Reproduce

Create regular TargetAllocator and OTEL Collectors.
Specify the OTEL Collector StatefulSet to use a bad image or etc, this will cause a OTEL Collector pod to be in a non-Running state.
Port forward to the TargetAllocator, curl http://localhost:8080/jobs/kube-state-metrics/targets, and you can see the unhealthy OTEL Collector pod is still a candidate to assign targets to.

Expected Result

We shouldn't assign targets to an unhealthy OTEL Collector? Or we should have a configuration that enables such a feature?

Actual Result

I was able to make a collector unhealthy but still seeing the unhealthy collector on http://localhost:8080/jobs/kube-state-metrics/targets.

Kubernetes Version

1.31

Operator version

v0.120.0

Collector version

v0.120.0

The text was updated successfully, but these errors were encountered:

swiatekm · 2025-03-14T13:51:44Z

The problem here is that it's not clear what the behaviour should be. Ideally, we shouldn't assign targets to collectors which aren't healthy, but we also don't want to reallocate all targets just because a single collector was temporarily unhealthy. I had a basic idea on how this could be done in #2201, but I'm not sure what the possible consequences could be, or how to test it exhaustively.

Maybe the way to go is to add a configurable grace period, and let users turn the behaviour off completely by setting that to 0. If anyone wants to submit a change like that, I'd be willing to review it. The other issue is assigned to me, but I've clearly not made much progress on it since then.

rice-junhaoyu · 2025-03-14T17:16:32Z

Hey Mikolaj,

I agree that always trigger a target reallocation for a bad collector is not the ideal solution, that could be too much.

Just one more question, I am still confused about Maybe the way to go is to add a configurable grace period, and let users turn the behaviour off completely by setting that to 0.. Can you elaborate a bit more and maybe point me to somewhere? I am kinda new to OTEL and not aware of such a grace period.

Appreciate! And please assign me this issue, I will file a PR and send you for a review!

swiatekm · 2025-03-14T17:23:44Z

Just one more question, I am still confused about Maybe the way to go is to add a configurable grace period, and let users turn the behaviour off completely by setting that to 0.. Can you elaborate a bit more and maybe point me to somewhere? I am kinda new to OTEL and not aware of such a grace period.

Basically, the way this works right now is that we take all the Collector Pods that fulfill our selectors, and which also have a Node assigned.

I was thinking that instead we make this condition something like "the Pod is Ready, or it was Ready less than grace period seconds ago". Then if the grace period is, say, 30 seconds, then a Collector Pod could be unhealthy for 15 seconds without triggering a reallocation.

Appreciate! And please assign me this issue, I will file a PR and send you for a review!

Sure, if you have any questions, don't hestitate to ask them here!

rice-junhaoyu · 2025-03-14T22:14:51Z

Oh, I see, totally agree with your idea. Having such a grace period of 30s or something prevents too frequent reallocation, epecially in a case of pod restart, which only takes few seconds if there're already available nodes.

In addition, I will make the grace period configurable.

Now start working on a PR.

rice-junhaoyu · 2025-03-15T02:21:05Z

@swiatekm I've submitted a Pull Request!! #3807

Basically I wrote an unit test and tested in local to validate my change is working.

In terms of making the grace period configurable, I will like to get your opinion first. I was a bit indecisive because I noticed currently minUpdateInterval is also un-configurable, so I am not pretty sure if I need to make grace period configurable.

Actually NOT YET!!!! I believe I have a very basic misunderstanding about Pod Phase = Running, IT DOES NOT NECESSARILY MEAN A OTEL COLLECTOR POD IS HEALTHY.

Following is an EC2 Machine Downtime on our service side, the EC2 Machine started to be unhealthy at 02:45.

The OTEL Collector pod on that bad EC2 node was already unable to function properly since 02:45, but the collector was still maintaining Running state for quite a while until the EC2 Machine self-recovered.

| 2025-03-15 03:01:12.255 | Pending |
| 2025-03-15 03:01:12.252 | Failed |
| 2025-03-15 03:01:12.244 | Failed |
| 2025-03-15 03:01:05.224 | Running |
| 2025-03-15 03:01:01.233 | Running |
| 2025-03-15 03:01:01.228 | Running |
| 2025-03-15 02:50:08.528 | Running |
| 2025-03-15 02:50:08.527 | Running |
| 2025-03-15 02:45:03.419 | Running |

I noticed that PodStatusCondition: Ready was already False from the very beginning of the event, therefore, I believe we should also consider this Pod Condition.

[{"type":"PodReadyToStartContainers","status":"True","lastProbeTime":null,"lastTransitionTime":"2025-03-11T15:15:48Z"},
{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2025-03-11T15:16:05Z"},
{"type":"Ready","status":"False","lastProbeTime":null,"lastTransitionTime":"2025-03-15T02:45:03Z"},
{"type":"ContainersReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2025-03-11T15:16:07Z"},
{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2025-03-11T15:15:40Z"}]

Did a bit more investigations.

`Running` is a Pod Phase that is defined as: "The Pod has been bound to a node, and all of the containers have been created. At least one container is still running, or is in the process of starting or restarting."

=> It turns out that Pod Phase = Running does not necessarily mean a pod is healthy.

`Ready` is a Pod Condition that is defined as: "The Pod is able to serve requests and should be added to the load balancing pools of all matching Services."

=> This seems to be something we indeed want.

What do you think? I am fine to keep both of them in the checks although I think `Ready` pod condition check is covering the check for `Pod Phase = Running`.

rice-junhaoyu · 2025-03-15T05:26:33Z

Pull Request [#3807] updated to consider both PodPhase and PodCondition!

rice-junhaoyu added bug Something isn't working needs triage labels Mar 7, 2025

atoulme added area:target-allocator Issues for target-allocator and removed needs triage labels Mar 12, 2025

swiatekm assigned rice-junhaoyu Mar 14, 2025

rice-junhaoyu linked a pull request Mar 15, 2025 that will close this issue

Do not assign targets to a collector pod that are not Ready for accepting traffic more than a while #3807

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[target-allocator] TargetAllocator assigns Targets to an unhealthy collector? #3781

[target-allocator] TargetAllocator assigns Targets to an unhealthy collector? #3781

rice-junhaoyu commented Mar 7, 2025 •

edited

Loading

swiatekm commented Mar 14, 2025

rice-junhaoyu commented Mar 14, 2025

swiatekm commented Mar 14, 2025

rice-junhaoyu commented Mar 14, 2025

rice-junhaoyu commented Mar 15, 2025 •

edited

Loading

rice-junhaoyu commented Mar 15, 2025

[target-allocator] TargetAllocator assigns Targets to an unhealthy collector? #3781

[target-allocator] TargetAllocator assigns Targets to an unhealthy collector? #3781

Comments

rice-junhaoyu commented Mar 7, 2025 • edited Loading

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Kubernetes Version

Operator version

Collector version

swiatekm commented Mar 14, 2025

rice-junhaoyu commented Mar 14, 2025

swiatekm commented Mar 14, 2025

rice-junhaoyu commented Mar 14, 2025

rice-junhaoyu commented Mar 15, 2025 • edited Loading

Running is a Pod Phase that is defined as: "The Pod has been bound to a node, and all of the containers have been created. At least one container is still running, or is in the process of starting or restarting."

Ready is a Pod Condition that is defined as: "The Pod is able to serve requests and should be added to the load balancing pools of all matching Services."

What do you think? I am fine to keep both of them in the checks although I think Ready pod condition check is covering the check for Pod Phase = Running.

rice-junhaoyu commented Mar 15, 2025

rice-junhaoyu commented Mar 7, 2025 •

edited

Loading

rice-junhaoyu commented Mar 15, 2025 •

edited

Loading

`Running` is a Pod Phase that is defined as: "The Pod has been bound to a node, and all of the containers have been created. At least one container is still running, or is in the process of starting or restarting."

`Ready` is a Pod Condition that is defined as: "The Pod is able to serve requests and should be added to the load balancing pools of all matching Services."

What do you think? I am fine to keep both of them in the checks although I think `Ready` pod condition check is covering the check for `Pod Phase = Running`.