Skip to content

Commit ecd4b2b

Browse files
committed
Allow for scoping TA to watch namespace
It's currently not possible to deploy the TA without cluster-wide permisisons. This change introduces a new env variable to the TA, WATCH_NAMESPACE, which allows for specifying which namespaces to watch. This approach is similar to how the opentelemetry-operator can be scoped to watch a single namespace. This does mean that cluster-wide resource like node metrics (cAdvisor) are no longer accessible, but this is acceptable since we only want the TA to know about targets that exist a specific namespaces. Fixes: #3086 Signed-off-by: Charlie Le <[email protected]>
1 parent 9f152fb commit ecd4b2b

File tree

9 files changed

+399
-10
lines changed

9 files changed

+399
-10
lines changed

.chloggen/namespace-ta.yaml

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
2+
change_type: enhancement
3+
4+
# The name of the component, or a single word describing the area of concern, (e.g. collector, target allocator, auto-instrumentation, opamp, github action)
5+
component: target allocator
6+
7+
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
8+
note: |
9+
Add support for `WATCH_NAMESPACE` environment variable in the target allocator.
10+
11+
# One or more tracking issues related to the change
12+
issues: [3086]
13+
14+
# (Optional) One or more lines of additional information to render under the primary note.
15+
# These lines will be padded with 2 spaces and then inserted directly into the document.
16+
# Use pipe (|) for multiline entries.
17+
subtext: |
18+
This variable can be set to an empty string to watch all namespaces, or to a comma-separated list of namespaces to watch.

cmd/otel-allocator/README.md

+84-8
Original file line numberDiff line numberDiff line change
@@ -180,9 +180,11 @@ Upstream documentation here: [PrometheusReceiver](https://github.com/open-teleme
180180

181181
### RBAC
182182

183-
Before the TargetAllocator can start scraping, you need to set up Kubernetes RBAC (role-based access controls) resources. This means that you need to have a `ServiceAccount` and corresponding cluster roles so that the TargetAllocator has access to all of the necessary resources to pull metrics from.
183+
Before the TargetAllocator can start scraping, you need to set up Kubernetes RBAC (role-based access controls) resources. This means that you need to have a `ServiceAccount` and corresponding ClusterRoles/Roles so that the TargetAllocator has access to all the necessary resources to pull metrics from.
184184

185-
You can create your own `ServiceAccount`, and reference it in `spec.targetAllocator.serviceAccount` in your `OpenTelemetryCollector` CR. You’ll then need to configure the `ClusterRole` and `ClusterRoleBinding` for this `ServiceAccount`, as per below.
185+
You can create your own `ServiceAccount`, and reference it in `spec.targetAllocator.serviceAccount` in your `OpenTelemetryCollector` CR. You’ll then need to configure the `ClusterRole` and `ClusterRoleBinding` or `Role` and `RoleBinding` for this `ServiceAccount`, as per below.
186+
187+
#### Cluster-scoped RBAC
186188

187189
```yaml
188190
targetAllocator:
@@ -193,11 +195,11 @@ You can create your own `ServiceAccount`, and reference it in `spec.targetAlloca
193195
```
194196

195197
> 🚨 **Note**: The Collector part of this same CR *also* has a serviceAccount key which only affects the collector and *not*
196-
the TargetAllocator.
198+
> the TargetAllocator.
197199

198-
If you omit the `ServiceAccount` name, the TargetAllocator creates a `ServiceAccount` for you. The `ServiceAccount`’s default name is a concatenation of the Collector name and the `-targetallocator` suffix. By default, this `ServiceAccount` has no defined policy, so you’ll need to create your own `ClusterRole` and `ClusterRoleBinding` for it, as per below.
200+
If you omit the `ServiceAccount` name, the TargetAllocator creates a `ServiceAccount` for you. The `ServiceAccount`’s default name is a concatenation of the Collector name and the `-targetallocator` suffix. By default, this `ServiceAccount` has no defined policy, so you’ll need to create your own `ClusterRole` and `ClusterRoleBinding` or `Role` and `RoleBinding` for it, as per below.
199201

200-
The role below will provide the minimum access required for the Target Allocator to query all the targets it needs based on any Prometheus configurations:
202+
The ClusterRole below will provide the minimum access required for the Target Allocator to query all the targets it needs based on any Prometheus configurations:
201203

202204
```yaml
203205
apiVersion: rbac.authorization.k8s.io/v1
@@ -231,7 +233,7 @@ rules:
231233
verbs: ["get"]
232234
```
233235

234-
If you enable the the `prometheusCR` (set `spec.targetAllocator.prometheusCR.enabled` to `true`) in the `OpenTelemetryCollector` CR, you will also need to define the following roles. These give the TargetAllocator access to the `PodMonitor` and `ServiceMonitor` CRs. It also gives namespace access to the `PodMonitor` and `ServiceMonitor`.
236+
If you enable the `prometheusCR` (set `spec.targetAllocator.prometheusCR.enabled` to `true`) in the `OpenTelemetryCollector` CR, you will also need to define the following ClusterRoles. These give the TargetAllocator access to the `PodMonitor` and `ServiceMonitor` CRs. It also gives namespace access to the `PodMonitor` and `ServiceMonitor`.
235237

236238
```yaml
237239
apiVersion: rbac.authorization.k8s.io/v1
@@ -252,8 +254,83 @@ rules:
252254
verbs: ["get", "list", "watch"]
253255
```
254256

255-
> ✨ The above roles can be combined into a single role.
257+
> ✨ The above ClusterRoles can be combined into a single ClusterRole.
258+
259+
#### Namespace-scoped RBAC
260+
261+
If you want to have the TargetAllocator watch a specific namespace, you can set the WATCH_NAMESPACE environment variable
262+
in the TargetAllocator's deployment. This is useful if you want to restrict the TargetAllocator to only watch Prometheus
263+
CRs in a specific namespace, and not have cluster-wide access.
264+
265+
```yaml
266+
targetAllocator:
267+
enabled: true
268+
serviceAccount: opentelemetry-targetallocator-sa
269+
prometheusCR:
270+
enabled: true
271+
env:
272+
- name: WATCH_NAMESPACE
273+
value: "foo"
274+
```
275+
276+
In this case, you will need to create a Role and RoleBinding instead of a ClusterRole and ClusterRoleBinding. The Role
277+
and RoleBinding should be created in the namespace specified in the WATCH_NAMESPACE environment variable.
256278

279+
```yaml
280+
apiVersion: rbac.authorization.k8s.io/v1
281+
kind: Role
282+
metadata:
283+
name: opentelemetry-targetallocator-role
284+
rules:
285+
- apiGroups:
286+
- ""
287+
resources:
288+
- pods
289+
- services
290+
- endpoints
291+
- configmaps
292+
- secrets
293+
- namespaces
294+
verbs:
295+
- get
296+
- watch
297+
- list
298+
- apiGroups:
299+
- apps
300+
resources:
301+
- statefulsets
302+
verbs:
303+
- get
304+
- watch
305+
- list
306+
- apiGroups:
307+
- discovery.k8s.io
308+
resources:
309+
- endpointslices
310+
verbs:
311+
- get
312+
- watch
313+
- list
314+
- apiGroups:
315+
- networking.k8s.io
316+
resources:
317+
- ingresses
318+
verbs:
319+
- get
320+
- watch
321+
- list
322+
- apiGroups:
323+
- monitoring.coreos.com
324+
resources:
325+
- servicemonitors
326+
- podmonitors
327+
- scrapeconfigs
328+
- probes
329+
verbs:
330+
- get
331+
- watch
332+
- list
333+
```
257334

258335
### Service / Pod monitor endpoint credentials
259336

@@ -409,4 +486,3 @@ Shards the received targets based on the discovered Collector instances
409486

410487
### Collector
411488
Client to watch for deployed Collector instances which will then provided to the Allocator.
412-

cmd/otel-allocator/internal/watcher/promOperator.go

+17-2
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import (
88
"fmt"
99
"log/slog"
1010
"os"
11+
"strings"
1112
"time"
1213

1314
"github.com/blang/semver/v4"
@@ -53,7 +54,21 @@ func NewPrometheusCRWatcher(ctx context.Context, logger logr.Logger, cfg allocat
5354
return nil, err
5455
}
5556

56-
factory := informers.NewMonitoringInformerFactories(map[string]struct{}{v1.NamespaceAll: {}}, map[string]struct{}{}, mClient, allocatorconfig.DefaultResyncTime, nil) //TODO decide what strategy to use regarding namespaces
57+
// Check env var for WATCH_NAMESPACE and use it if its set, else use v1.NamespaceAll
58+
// This is to allow the operator to watch only a specific namespace
59+
watchNamespace, found := os.LookupEnv("WATCH_NAMESPACE")
60+
allowList := map[string]struct{}{}
61+
if found {
62+
logger.Info("watching namespace(s)", "namespaces", watchNamespace)
63+
for _, ns := range strings.Split(watchNamespace, ",") {
64+
allowList[ns] = struct{}{}
65+
}
66+
} else {
67+
allowList = map[string]struct{}{v1.NamespaceAll: {}}
68+
logger.Info("the env var WATCH_NAMESPACE isn't set, watching all namespaces")
69+
}
70+
71+
factory := informers.NewMonitoringInformerFactories(allowList, map[string]struct{}{}, mClient, allocatorconfig.DefaultResyncTime, nil) //TODO decide what strategy to use regarding namespaces
5772

5873
monitoringInformers, err := getInformers(factory)
5974
if err != nil {
@@ -99,7 +114,7 @@ func NewPrometheusCRWatcher(ctx context.Context, logger logr.Logger, cfg allocat
99114
logger.Error(err, "Retrying namespace informer creation in promOperator CRD watcher")
100115
return true
101116
}, func() error {
102-
nsMonInf, err = getNamespaceInformer(ctx, map[string]struct{}{v1.NamespaceAll: {}}, promLogger, clientset, operatorMetrics)
117+
nsMonInf, err = getNamespaceInformer(ctx, allowList, promLogger, clientset, operatorMetrics)
103118
return err
104119
})
105120
if getNamespaceInformerErr != nil {
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
apiVersion: batch/v1
2+
kind: Job
3+
metadata:
4+
name: check-metrics
5+
status:
6+
succeeded: 1
7+
---
8+
apiVersion: batch/v1
9+
kind: Job
10+
metadata:
11+
name: check-ta-jobs
12+
status:
13+
succeeded: 1
14+
---
15+
apiVersion: batch/v1
16+
kind: Job
17+
metadata:
18+
name: check-ta-scrape-configs
19+
status:
20+
succeeded: 1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
apiVersion: apps/v1
2+
kind: StatefulSet
3+
metadata:
4+
name: prometheus-cr-collector
5+
status:
6+
readyReplicas: 1
7+
replicas: 1
8+
---
9+
apiVersion: apps/v1
10+
kind: Deployment
11+
metadata:
12+
name: cr-targetallocator
13+
status:
14+
readyReplicas: 1
15+
replicas: 1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# yaml-language-server: $schema=https://raw.githubusercontent.com/kyverno/chainsaw/main/.schemas/json/test-chainsaw-v1alpha1.json
2+
apiVersion: chainsaw.kyverno.io/v1alpha1
3+
kind: Test
4+
metadata:
5+
name: targetallocator-namespace
6+
spec:
7+
steps:
8+
- try:
9+
- apply:
10+
file: resources/rbac.yaml
11+
- apply:
12+
file: resources/otelcol.yaml
13+
- assert:
14+
file: assert-workloads-ready.yaml
15+
- apply:
16+
file: resources/jobs.yaml
17+
- assert:
18+
file: assert-jobs-succeeded.yaml
19+
catch:
20+
- podLogs:
21+
selector: app.kubernetes.io/managed-by=opentelemetry-operator
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
apiVersion: batch/v1
2+
kind: Job
3+
metadata:
4+
name: check-metrics
5+
spec:
6+
template:
7+
spec:
8+
restartPolicy: OnFailure
9+
containers:
10+
- name: check-metrics
11+
image: curlimages/curl
12+
args:
13+
- /bin/sh
14+
- -c
15+
- |
16+
for i in $(seq 30); do
17+
if curl -m 1 -s http://prometheus-cr-collector:9090/metrics | grep "otelcol"; then exit 0; fi
18+
sleep 5
19+
done
20+
exit 1
21+
---
22+
apiVersion: batch/v1
23+
kind: Job
24+
metadata:
25+
name: check-ta-jobs
26+
spec:
27+
template:
28+
spec:
29+
restartPolicy: OnFailure
30+
containers:
31+
- name: check-metrics
32+
image: curlimages/curl
33+
args:
34+
- /bin/sh
35+
- -c
36+
- curl -s http://cr-targetallocator/scrape_configs | grep "prometheus-cr"
37+
---
38+
apiVersion: batch/v1
39+
kind: Job
40+
metadata:
41+
name: check-ta-scrape-configs
42+
spec:
43+
template:
44+
spec:
45+
restartPolicy: OnFailure
46+
containers:
47+
- name: check-metrics
48+
image: curlimages/curl
49+
args:
50+
- /bin/sh
51+
- -c
52+
- curl -s http://cr-targetallocator/jobs | grep "prometheus-cr"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
apiVersion: opentelemetry.io/v1alpha1
2+
kind: TargetAllocator
3+
metadata:
4+
name: cr
5+
spec:
6+
args:
7+
"zap-log-level": "debug"
8+
prometheusCR:
9+
enabled: true
10+
scrapeInterval: 1s
11+
scrapeConfigSelector: {}
12+
probeSelector: {}
13+
serviceMonitorSelector: {}
14+
podMonitorSelector: {}
15+
observability:
16+
metrics:
17+
disablePrometheusAnnotations: true
18+
enableMetrics: true
19+
env:
20+
- name: WATCH_NAMESPACE
21+
value: "($namespace)"
22+
serviceAccount: ta
23+
---
24+
apiVersion: opentelemetry.io/v1beta1
25+
kind: OpenTelemetryCollector
26+
metadata:
27+
name: prometheus-cr
28+
labels:
29+
opentelemetry.io/target-allocator: cr
30+
spec:
31+
observability:
32+
metrics:
33+
disablePrometheusAnnotations: true
34+
enableMetrics: true
35+
config:
36+
receivers:
37+
prometheus:
38+
config:
39+
scrape_configs: []
40+
41+
processors:
42+
43+
exporters:
44+
prometheus:
45+
endpoint: 0.0.0.0:9090
46+
service:
47+
pipelines:
48+
metrics:
49+
receivers: [prometheus]
50+
exporters: [prometheus]
51+
telemetry:
52+
logs:
53+
level: "DEBUG"
54+
development: true
55+
encoding: "json"
56+
mode: statefulset
57+
serviceAccount: collector

0 commit comments

Comments
 (0)