Skip to content

Conversation

@manuelbuil
Copy link
Contributor

Proposed Changes

Types of Changes

Verification

Testing

Linked Issues

User-Facing Change


Further Comments

@l-mb
Copy link

l-mb commented Oct 16, 2025

This looks good to me 👍

@manuelbuil manuelbuil force-pushed the adr-metrics-responder branch from 3ec4545 to e0892fd Compare October 27, 2025 14:28
@manuelbuil manuelbuil changed the title ADR about metrics-responder ADR about security-responder Oct 27, 2025
@pgonin
Copy link
Contributor

pgonin commented Oct 27, 2025

Ok for me
Which team will run the 'server' part?
Have you had some estimates on bandwidth and load numbers? (Using ballpark numbers of number of RKE2 clusters currently running)

I would add that there should be a possibility in Rancher / Rancher Prime deployment context to setup for the data to be collected by Rancher (with Rancher acting as "proxy" to needed data)

@l-mb
Copy link

l-mb commented Oct 27, 2025

@pgonin We'll have EIO stand up another upgrade-responder deployment, they've got that fairly automated, not an issue.

RMS/Prime gathering similar data about systems under management is a very different discussion, let's have that elsewhere please.

@pgonin
Copy link
Contributor

pgonin commented Oct 27, 2025

Agreed, it should be a second step. To be discussed together with Rancher actually

Comment on lines 50 to 52
- nodeCount
- serverNodeCount
- agentNodeCount
Copy link
Member

@brandond brandond Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: nodeCount = serverNodeCount + agentNodeCount. A separate field seems unnecessary?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

I think the intent is to differentiate between pure control plane nodes and those that run workloads. For example, SUSE's pricing model excludes pure control plane nodes.

Copy link
Member

@brandond brandond Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is that determined? There is no such thing as a "pure" control-plane node as far as Kubernetes or RKE2 are concerned. Are you saying that customers don't have to pay for server nodes (cp/etcd/cp+etcd) if they put some taints on them to prevent workload scheduling? What taints, exactly?

Copy link
Member

@brandond brandond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few nits, and a comment on deployment opt-out and configuration

@manuelbuil manuelbuil force-pushed the adr-metrics-responder branch from e0892fd to df0de20 Compare October 30, 2025 11:39
@manuelbuil manuelbuil marked this pull request as ready for review October 30, 2025 11:40
@manuelbuil manuelbuil requested a review from a team as a code owner October 30, 2025 11:40
}
```

The `clusteruuid` is needed to differentiate between different deployments (the UUID of `kube-system`). It is completely random and does not expose privacy considerations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth addressing here, but I have seen some users that build images with RKE2 already installed and started. This is a bad idea for several reasons, but it happens. When this is done, the clusters will have the same root CAs - and the same kube-system namespace UID. This will cause them to look like the same cluster to anything using that UID as a cluster UID. I don't know that there's a good way to work around this, but it is something to be aware of.

@manuelbuil manuelbuil force-pushed the adr-metrics-responder branch from df0de20 to 0c35abc Compare October 31, 2025 11:24

```yaml
# /etc/rancher/rke2/config.yaml
security-responder-enabled: true # default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
security-responder-enabled: true # default
security-responder: true # default

Having "enabled" in the name is redundant for a bool flag. We don't follow that style or any of our other bool flags like embedded-registry or supervisor-metrics.

Copy link
Member

@brandond brandond Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I already requested replacement of this with a normal --disable flag, since it'll be packaged as a helm chart. no special flag necessary. That bit of review seems to have been dropped though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that! 🤦

}
```

The `clusteruuid` is needed to differentiate between different deployments (the UUID of `kube-system`). It is completely random and does not expose privacy considerations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might consider hashing the clusteruuid to ensure another level of obfuscation for collected metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add it as an option. I think we can discuss about this in the real PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm kinda meh on that. I don't see what hashing an already random number would add, other than the appearance of obfuscation.


```yaml
# /etc/rancher/rke2/config.yaml
security-responder-enabled: true # default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
security-responder-enabled: true # default
disable:
- rke2-security-responder

Copy link
Member

@brandond brandond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit on how to disable

@manuelbuil
Copy link
Contributor Author

nit on how to disable

Could you check again Brad? Thanks

Copy link
Member

@brandond brandond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants