Skip to content

Commit b0713c8

Browse files
authored
Unify internal observability documentation - 1 of 3 (#4246)
1 parent ae403b8 commit b0713c8

File tree

3 files changed

+129
-0
lines changed

3 files changed

+129
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
---
2+
title: Internal telemetry
3+
weight: 25
4+
cSpell:ignore: journalctl kube otecol pprof tracez zpages
5+
---
6+
7+
You can monitor the health of any OpenTelemetry Collector instance by checking
8+
its own internal telemetry. Read on to learn how to configure this telemetry to
9+
help you [troubleshoot](/docs/collector/troubleshooting/) Collector issues.
10+
11+
## Activate internal telemetry in the Collector
12+
13+
By default, the Collector exposes its own telemetry in two ways:
14+
15+
- Internal [metrics](#configure-internal-metrics) are exposed using a Prometheus
16+
interface which defaults to port `8888`.
17+
- [Logs](#configure-internal-logs) are emitted to `stderr` by default.
18+
19+
### Configure internal metrics
20+
21+
You can configure how internal metrics are generated and exposed by the
22+
Collector. By default, the Collector generates basic metrics about itself and
23+
exposes them for scraping at `http://127.0.0.1:8888/metrics`. You can expose the
24+
endpoint to one specific or all network interfaces when needed. For
25+
containerized environments, you might want to expose this port on a public
26+
interface.
27+
28+
Set the address in the config `service::telemetry::metrics`:
29+
30+
```yaml
31+
service:
32+
telemetry:
33+
metrics:
34+
address: '0.0.0.0:8888'
35+
```
36+
37+
You can enhance the metrics telemetry level using the `level` field. The
38+
following is a list of all possible values and their explanations.
39+
40+
- `none` indicates that no telemetry data should be collected.
41+
- `basic` is the recommended value and covers the basics of the service
42+
telemetry.
43+
- `normal` adds other indicators on top of basic.
44+
- `detailed` adds dimensions and views to the previous levels.
45+
46+
For example:
47+
48+
```yaml
49+
service:
50+
telemetry:
51+
metrics:
52+
level: detailed
53+
address: ':8888'
54+
```
55+
56+
The Collector can also be configured to scrape its own metrics and send them
57+
through configured pipelines. For example:
58+
59+
```yaml
60+
receivers:
61+
prometheus:
62+
config:
63+
scrape_configs:
64+
- job_name: 'otelcol'
65+
scrape_interval: 10s
66+
static_configs:
67+
- targets: ['0.0.0.0:8888']
68+
metric_relabel_configs:
69+
- source_labels: [__name__]
70+
regex: '.*grpc_io.*'
71+
action: drop
72+
exporters:
73+
debug:
74+
service:
75+
pipelines:
76+
metrics:
77+
receivers: [prometheus]
78+
exporters: [debug]
79+
```
80+
81+
{{% alert title="Caution" color="warning" %}}
82+
83+
Self-monitoring is a risky practice. If an issue arises, the source of the
84+
problem is unclear and the telemetry is unreliable.
85+
86+
{{% /alert %}}
87+
88+
### Configure internal logs
89+
90+
You can find log output in `stderr`. The verbosity level for logs defaults to
91+
`INFO`, but you can adjust it in the config `service::telemetry::logs`:
92+
93+
```yaml
94+
service:
95+
telemetry:
96+
logs:
97+
level: 'debug'
98+
```
99+
100+
You can also see logs for the Collector on a Linux systemd system using
101+
`journalctl`:
102+
103+
{{< tabpane text=true >}} {{% tab "All logs" %}}
104+
105+
```sh
106+
journalctl | grep otelcol
107+
```
108+
109+
{{% /tab %}} {{% tab "Errors only" %}}
110+
111+
```sh
112+
journalctl | grep otelcol | grep Error
113+
```
114+
115+
{{% /tab %}} {{< /tabpane >}}

content/en/docs/collector/troubleshooting.md

+6
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,12 @@ This page describes some options when troubleshooting the health or performance
88
of the OpenTelemetry Collector. The Collector provides a variety of metrics,
99
logs, and extensions for debugging issues.
1010

11+
## Internal telemetry
12+
13+
You can configure and use the Collector's own
14+
[internal telemetry](/docs/collector/internal-telemetry/) to monitor its
15+
performance.
16+
1117
## Sending test data
1218

1319
For certain types of issues, particularly verifying configuration and debugging

static/refcache.json

+8
Original file line numberDiff line numberDiff line change
@@ -3079,6 +3079,10 @@
30793079
"StatusCode": 200,
30803080
"LastSeen": "2024-01-18T19:36:56.082576-05:00"
30813081
},
3082+
"https://github.com/open-telemetry/opentelemetry-collector/issues/7532": {
3083+
"StatusCode": 200,
3084+
"LastSeen": "2024-04-04T11:07:15.276911438-07:00"
3085+
},
30823086
"https://github.com/open-telemetry/opentelemetry-collector/pull/6140": {
30833087
"StatusCode": 200,
30843088
"LastSeen": "2024-01-30T05:18:24.402543-05:00"
@@ -4523,6 +4527,10 @@
45234527
"StatusCode": 200,
45244528
"LastSeen": "2024-04-12T20:40:33.435682362Z"
45254529
},
4530+
"https://grafana.com/grafana/dashboards/15983-opentelemetry-collector/": {
4531+
"StatusCode": 200,
4532+
"LastSeen": "2024-04-10T15:11:30.311778613-07:00"
4533+
},
45264534
"https://grafana.com/oss/opentelemetry/": {
45274535
"StatusCode": 200,
45284536
"LastSeen": "2024-01-18T08:52:48.999991-05:00"

0 commit comments

Comments
 (0)