[release-1.4] COO-1687: feat: migrate to EndpointSlice service discovery#1035
Conversation
|
@PeterYurkovich: This pull request references COO-1687 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retitle [release-1.4] COO-1687: feat: migrate to EndpointSlice service discovery |
* feat: migrate to EndpointSlice service discovery Prometheus Operator defaults to watching the deprecated Endpoints API for service discovery. Switch the operator's own ServiceMonitors to use EndpointSlice explicitly, which eliminates the deprecation log noise from the operator's internal components. Changes: - Set serviceDiscoveryRole: EndpointSlice on the ServiceMonitors we own (observability-operator, health-analyzer, thanos-querier) so that prometheus-operator uses the EndpointSlice role for these jobs. - Add discovery.k8s.io/endpointslices to all Prometheus RBAC roles and ClusterRoles (alongside the existing endpoints permission) so that Prometheus can serve both kinds of ServiceMonitors simultaneously. - Add discovery.k8s.io/endpointslices to the korrel8r ClusterRole so the correlation tool can read both endpoint representations. - Add the corresponding kubebuilder markers and update the generated cluster role YAML and CSV. The Prometheus CR's global serviceDiscoveryRole is intentionally left unset (defaulting to Endpoints) so that user-created ServiceMonitors continue to work without modification. Users can opt individual ServiceMonitors into EndpointSlice by setting serviceDiscoveryRole: EndpointSlice on them. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Jan Fajerski <jan@fajerski.name> * fix: revert serviceDiscoveryRole from monitoring.coreos.com ServiceMonitors The operator's self-monitoring ServiceMonitor and the health-analyzer ServiceMonitor are monitoring.coreos.com objects processed by the platform prometheus-operator on OpenShift, which we don't control. Setting serviceDiscoveryRole: EndpointSlice on them requires the platform Prometheus to have endpointslices access and the platform prometheus-operator to correctly generate TLS-aware scrape configs for the endpointslice role — neither of which is guaranteed across OCP versions. The thanos-querier ServiceMonitor (monitoring.rhobs) is handled by the obo-prometheus-operator we manage, so it retains the EndpointSlice setting safely. Fixes TestOperatorMetrics/metrics_ingested_in_Prometheus on OCP clusters. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Signed-off-by: Jan Fajerski <jan@fajerski.name> Co-authored-by: Jan Fajerski <jan@fajerski.name> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
01c833d to
f67eff3
Compare
|
/lgtm |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jgbernalp, PeterYurkovich The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Manual cherry pick of #1028