Skip to content

Commit 355a88b

Browse files
authored
OpenTelemetry and auto-instrumentation (#612)
1 parent 0a86493 commit 355a88b

27 files changed

+509
-101
lines changed

docs/assets/envoy-tracing.png

-12.4 KB
Binary file not shown.

docs/assets/example-trace.png

-223 KB
Binary file not shown.

docs/assets/grafana-tempo-logs.png

682 KB
Loading
390 KB
Loading
488 KB
Loading

docs/assets/grafana-tempo.png

776 KB
Loading

docs/assets/kiali-400-sample.gif

-787 KB
Binary file not shown.

docs/assets/kiali-sample.gif

-595 KB
Binary file not shown.

docs/assets/logging_overview.png

-697 KB
Binary file not shown.
-15.3 KB
Binary file not shown.

docs/assets/trace-span-ids.png

-13.3 KB
Binary file not shown.

docs/assets/tracing.png

71.3 KB
Loading

docs/explanation/observability/README.md

+28-16
Original file line numberDiff line numberDiff line change
@@ -22,17 +22,27 @@ The tree pillars of observability are:
2222
2. **Metrics** - Metrics are a numerical measurement of something in your application. They are useful for understanding the performance of your application and is generally more scalable than logs both in terms of storage and querying since they are structured data.
2323
3. **Traces** - Traces are a record of the path a request takes through your application. They are useful for understanding how a request is processed in your application.
2424

25+
<center>
26+
2527
```mermaid
2628
graph
27-
A[Application] --> B((Logs))
28-
A --> C((Metrics))
29-
A --> D((Traces))
29+
A[Application] --> B(Logs)
30+
A --> C(Metrics)
31+
A --> D(Traces)
3032
3133
click B "#logs"
3234
click C "#metrics"
3335
click D "#traces"
3436
```
3537

38+
</center>
39+
40+
## Automatic observability
41+
42+
NAIS provides a new way to get started with observability. By enabling auto-instrumentation, you can get started with observability without having to write any code. This is the easiest way to get started with observability, as it requires little to no effort on the part of the team developing the application.
43+
44+
[:bulb: Get started with auto-instrumentation](../../how-to-guides/observability/auto-instrumentation.md)
45+
3646
## Metrics
3747

3848
Metrics are a way to measure the state of your application. Metrics are usually numerical values that can be aggregated and visualized. Metrics are often used to create alerts and dashboards.
@@ -41,7 +51,7 @@ We use the [OpenMetrics][openmetrics] format for metrics. This is a text-based f
4151

4252
[openmetrics]: https://openmetrics.io/
4353

44-
[:octicons-arrow-right-24: Get started with metrics](./metrics.md)
54+
[:bulb: Get started with metrics](./metrics.md)
4555

4656
### Prometheus
4757

@@ -57,13 +67,13 @@ graph LR
5767
Prometheus --GET /metrics--> Application
5868
```
5969

60-
[:octicons-arrow-right-24: Access Prometheus here](./metrics.md#prometheus-environments)
70+
[:simple-prometheus: Access Prometheus here](./metrics.md#prometheus-environments)
6171

6272
### Grafana
6373

6474
[Grafana][grafana] is a tool for visualizing metrics. It is used to create dashboards that can be used to monitor your application. Grafana is used by many open source projects and is the de facto standard for metrics in the cloud native world.
6575

66-
[:octicons-arrow-right-24: Access Grafana here][nais-grafana]
76+
[:simple-grafana: Access Grafana here][nais-grafana]
6777

6878
[grafana]: https://grafana.com/
6979
[nais-grafana]: <<tenant_url("grafana")>>
@@ -82,25 +92,24 @@ graph LR
8292
Router --> C[Elastic / Kibana]
8393
```
8494

85-
[:octicons-arrow-right-24: Configure your logs](./logging.md)
95+
[:bulb: Configure your logs](./logging.md)
8696

8797
## Traces
8898

8999
With tracing, we can get application performance monitoring (APM). Tracing gives deep insight into the execution of your application. For instance, you can use tracing to see if parallel function are actually run in parallel,
90100
or what amount of time your application spends in a given function.
91101

92-
Traces from NAIS applications are collected using the [OpenTelemetry](https://opentelemetry.io/) standard. Performance metrics are stored and queried from the [Tempo](https://grafana.com/oss/tempo/) component.
102+
Traces from NAIS applications can be collected using the [OpenTelemetry](https://opentelemetry.io/) standard. Performance metrics are stored and queried from the [Tempo](https://grafana.com/oss/tempo/) component.
93103

94-
Visualization of traces can be done in [Grafana](https://grafana.<<tenant()>>.cloud.nais.io),
95-
using the `*-tempo` data sources (one for each environment).
104+
Visualization of traces can be done in [Grafana](https://grafana.<<tenant()>>.cloud.nais.io), using the `*-tempo` data sources (one for each environment).
96105

97106
```mermaid
98107
graph LR
99108
Application --gRPC--> Tempo
100109
Tempo --> Grafana
101110
```
102111

103-
[:octicons-arrow-right-24: Read more about tracing](./tracing.md)
112+
[:bulb: Read more about tracing](./tracing.md)
104113

105114
## Alerts
106115

@@ -117,16 +126,19 @@ graph LR
117126
Alertmanager --> Slack
118127
```
119128

120-
[:octicons-arrow-right-24: Read more about alerts](./alerting.md)
129+
[:bulb: Read more about alerts](./alerting.md)
121130

122131
## Learning more
123132

124133
Observability is a very broad topic and there is a lot more to learn. Here are some resources that you can use to learn more about observability:
125134

126-
- [:octicons-video-24: Monitoring, the Prometheus Way][youtube-prometheus]
127-
- [:octicons-book-24: SRE Book - Monitoring distributed systems][sre-book-monitoring]
128-
- [:octicons-book-24: SRE Workbook - Monitoring][sre-workbook-monitoring]
129-
- [:octicons-book-24: SRE Workbook - Alerting][sre-workbook-alerting]
135+
[:octicons-video-24: Monitoring, the Prometheus Way][youtube-prometheus]
136+
137+
[:octicons-book-24: SRE Book - Monitoring distributed systems][sre-book-monitoring]
138+
139+
[:octicons-book-24: SRE Workbook - Monitoring][sre-workbook-monitoring]
140+
141+
[:octicons-book-24: SRE Workbook - Alerting][sre-workbook-alerting]
130142

131143
[sre-book-monitoring]: https://sre.google/sre-book/monitoring-distributed-systems/
132144
[sre-workbook-monitoring]: https://sre.google/workbook/monitoring/

docs/explanation/observability/frontend.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -211,8 +211,8 @@ Instrumenting mounts and unmounts can be quite data intensive, take due care.
211211

212212
Navigate your web browser to the new Grafana at <https://grafana.<<tenant()>>.cloud.nais.io>.
213213

214-
Traces are available from the `dev-gcp-tempo` and `prod-gcp-tempo` data sources, whereas
215-
logs and metrics are available from the `dev-gcp-loki` and `prod-gcp-loki` data sources.
214+
Traces are available from the data sources ending with `-tempo`, whereas
215+
logs and metrics are available from data sources sources ending with `-loki`.
216216

217217
Use the "Explore" tab under either the Loki or Tempo tab and run queries.
218218

+76-13
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,88 @@
11
---
22
description: >-
33
Application Performance Monitoring or tracing using Grafana Tempo on NAIS.
4-
tags: [explanation]
4+
tags: [explanation, tracing]
55
---
66

7-
# Tracing
7+
# Distributed Tracing
88

9-
[Traces](https://en.wikipedia.org/wiki/Observability_(software)#Distributed_traces) are a record of the path a request takes through your application. They
10-
are useful for understanding how a request is processed in your application.
9+
Tracing is a way to track a request as it passes through the various services needed to handle it. This is especially useful in a microservices architecture, where a single user action often results in a series of calls to different services.
1110

12-
NAIS does not collect trace data automatically. If you want tracing integration,
13-
you must first instrument your application to collect traces, and then configure
14-
the tracing library to send it to the correct place.
11+
Tracing allows developers to understand the entire journey of a request, making it easier to identify bottlenecks, latency issues, or failures that can impact user experience.
1512

16-
Traces from NAIS applications are collected using the [OpenTelemetry](https://opentelemetry.io/) standard.
17-
Performance metrics are stored and queried from the [Tempo](https://grafana.com/oss/tempo/) component.
13+
## How tracing works
1814

19-
## Visualizing application performance
15+
When a request is made to your application, a trace is started. This creates a Trace which serves as a container for all the work done for that request.
2016

21-
Visualization of traces can be done in [the new Grafana installation](https://grafana.<<tenant()>>.cloud.nais.io).
17+
![Tracing](../../assets/tracing.png)
2218

23-
You can use the **Explore** feature of Grafana with the _prod-gcp-tempo_ and _dev-gcp-tempo_ data sources.
19+
<small>Trace visualization by Logshero licensed under Apache License 2.0</small>
2420

25-
There are no ready-made dashboards at this point, but feel free to make one yourself and contribute to this page.
21+
The work done by individual services (or components of a single service) is captured in Spans. A span represents a single unit of work in a trace, like a SQL query or a call to an external service.
22+
23+
Spans can be nested and form a trace tree. The Trace is the root of the tree, and each Span is a node that represents a specific operation in your application. The tree of spans captures the causal relationships between the operations in your application (i.e., which operations caused others to occur).
24+
25+
Each Span carries a Context that includes metadata about the trace (like a unique trace identifier and span identifier) and any other data you choose to include. This context is propagated across process boundaries, allowing all the work that's part of a single trace to be linked together, even if it spans multiple services.
26+
27+
By analyzing the data captured in traces and spans, you can gain a deep understanding of how requests flow through your system, where time is being spent, and where problems might be occurring. This can be invaluable for debugging, performance optimization, and understanding the overall health of your system.
28+
29+
## OpenTelemetry
30+
31+
OpenTelemetry, a project under the Cloud Native Computing Foundation (CNCF), has become the standard for tracing and application telemetry due to its unified APIs for tracing and metrics, which simplify instrumentation and data collection from applications.
32+
33+
It supports a wide range of programming languages, including Java, JavaScript, Python, Go, and more, allowing for consistent tooling across different parts of a tech stack.
34+
35+
OpenTelemetry also provides automatic instrumentation for popular frameworks and libraries, enabling the collection of traces and metrics without the need for modifying application code.
36+
37+
It's vendor-neutral, allowing telemetry data export to any backend, providing the flexibility to switch between different analysis tools as needs change. Backed by leading companies in the cloud and software industry, and with a vibrant community, OpenTelemetry ensures project longevity and continuous improvement.
38+
39+
[:octicons-link-external-24: Learn more about OpenTelemetry][open-telemetry]
40+
41+
## Tracing in NAIS
42+
43+
NAIS does not collect application trace data automatically, but it provides the infrastructure to do so using OpenTelemetry, Grafana Tempo for storage and querying, and easy-to-use configuration options.
44+
45+
### The easy way: Auto-instrumentation
46+
47+
The preferred way to get started with tracing is to enable auto-instrumentation for your application. This will automatically collect traces and send them to the correct place using the OpenTelemetry Agent.
48+
49+
This is the easiest way to get started with tracing, as it requires little to no effort on the part of the team developing the application and provides instrumentation for popular libraries, frameworks and external services such as PostgreSQL, Redis, Kafka and HTTP clients.
50+
51+
[:bulb: Get started with auto-instrumentation](../../how-to-guides/observability/auto-instrumentation.md)
52+
53+
### The hard way: Manual instrumentation
54+
55+
If you want more control over how your application is instrumented, you can manually instrument your application using the OpenTelemetry SDK for your programming language.
56+
57+
To get the correct configuration for you can still use the auto-instrumentation configuration, but set the `runtime` to `sdk` as this will only set up the OpenTelemetry configuration, without injecting the OpenTelemetry Agent.
58+
59+
[:bulb: Get started with manual-instrumentation](../../how-to-guides/observability/auto-instrumentation.md#enable-auto-instrumentation-for-other-applications)
60+
61+
### OpenTelemetry SDKs
62+
63+
OpenTelemetry provides SDKs for a wide range of programming languages:
64+
65+
* [:fontawesome-brands-java: OpenTelemetry Java][otel-java]
66+
* [:fontawesome-brands-js: OpenTelemetry JavaScript][otel-node]
67+
* [:fontawesome-brands-python: OpenTelemetry Python][otel-python]
68+
* [:fontawesome-brands-golang: OpenTelemetry Go][otel-go]
69+
70+
## Visualizing traces in Grafana Tempo
71+
72+
Visualizing and querying traces is done in Grafana using the Grafana Tempo. Tempo is an open-source, easy-to-use, high-scale, and cost-effective distributed tracing backend that stores and queries traces.
73+
74+
The easiest way to get started with Tempo is to use the [Explore view in Grafana][grafana-explore], which provides a user-friendly interface for querying and visualizing traces.
75+
76+
[:octicons-link-external-24: Open Grafana Explore][grafana-explore]
77+
78+
[:bulb: Get started with Grafana Tempo](../../how-to-guides/observability/tracing/tempo.md)
79+
80+
![Grafana Tempo](../../assets/grafana-tempo.png)
81+
82+
[open-telemetry]: https://opentelemetry.io/
83+
[otel-java]: https://opentelemetry.io/docs/languages/java/
84+
[otel-node]: https://opentelemetry.io/docs/languages/js/
85+
[otel-python]: https://opentelemetry.io/docs/languages/python/
86+
[otel-go]: https://opentelemetry.io/docs/languages/go/
87+
[grafana]: <<tenant_url("grafana")>>
88+
[grafana-explore]: <<tenant_url("grafana", "explore")>>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
description: Get started with auto-instrumentation for your applications with OpenTelemetry data for Tracing, Metrics and Logs using the OpenTelemetry Agent.
3+
tags: [guide, tracing]
4+
---
5+
# Get started with auto-instrumentation
6+
7+
This guide will explain how to get started with auto-instrumentation your applications with OpenTelemetry data for [Tracing](../../explanation/observability/tracing.md), [Metrics](../../explanation/observability/metrics.md) and [Logs](../../explanation/observability/logging.md) using the OpenTelemetry Agent.
8+
9+
The main benefit of auto-instrumentation is that is requires little to no effort on the part of the team developing the application while providing insight into popular libraries, frameworks and external services such as PostgreSQL, Redis, Kafka and HTTP clients.
10+
11+
Auto-instrumentation is a preferred way to get started with tracing in NAIS, and can also be used for metrics and logs collection.This type of instrumentation is available for Java, Node.js and Python applications, but can also be used for other in `sdk` mode where it will only set up the OpenTelemetry configuration.
12+
13+
!!! info
14+
15+
:new: Auto-instrumentation is a new feature and is only available for nais applications running in GCP.
16+
17+
## Enable auto-instrumentation for Java/Kotlin applications
18+
19+
```yaml
20+
...
21+
spec:
22+
observability:
23+
autoInstrumentation:
24+
enabled: true
25+
runtime: java
26+
```
27+
28+
## Enable auto-instrumentation for Node.js applications
29+
30+
```yaml
31+
...
32+
spec:
33+
observability:
34+
autoInstrumentation:
35+
enabled: true
36+
runtime: node
37+
```
38+
39+
## Enable auto-instrumentation for Python applications
40+
41+
```yaml
42+
...
43+
spec:
44+
observability:
45+
autoInstrumentation:
46+
enabled: true
47+
runtime: python
48+
```
49+
50+
## Enable auto-instrumentation for other applications
51+
52+
If your application runtime is not one of the supported runtimes or you want to instrument your application yourself you can stil get benefit from the auto instrumentation configuration.
53+
54+
This will only set up the OpenTelemetry configuration for the application, but it will not inject the OpenTelemetry Agent into the application.
55+
56+
```yaml
57+
...
58+
spec:
59+
observability:
60+
autoInstrumentation:
61+
enabled: true
62+
runtime: sdk
63+
```
64+
65+
## Resources
66+
67+
[:bulb: OpenTelemetry Auto-Instrumentation Configuration Reference](../../reference/observability/auto-config.md)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
description: Learn how to propagate trace context across process boundaries in a few common scenarios.
3+
tags: [guide, tracing]
4+
---
5+
# Trace context propagation
6+
7+
Each Span carries a Context that includes metadata about the trace (like a unique trace identifier and span identifier) and any other data you choose to include. This context is propagated across process boundaries, allowing all the work that's part of a single trace to be linked together, even if it spans multiple services.
8+
9+
This guide explains how to propagate trace context across process boundaries in a few common scenarios. If you are using [auto-instrumentation](../auto-instrumentation.md), trace context propagation is already handled for you.
10+
11+
[:octicons-link-external-24: OpenTelemetry Context Propagation](https://opentelemetry.io/docs/concepts/context-propagation/)
12+
13+
## Propagate trace context in HTTP requests
14+
15+
When a service makes an HTTP request to another service, it should include the trace context in the request headers. The receiving service can then use this context to create a new Span that's part of the same trace. OpenTelemetry provides a standard for how trace context should be propagated in HTTP requests, called the [W3C Trace Context](https://www.w3.org/TR/trace-context/) standard.
16+
17+
* [OpenTelemetry Setup in Spring Boot Application](https://opentelemetry.io/docs/languages/java/automatic/spring-boot)
18+
* [OpenTelemetry Setup in Ktor Application](https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/ktor/ktor-2.0/library)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
---
2+
description: Learn how to correlate traces with logs in Grafana Tempo.
3+
tags: [guide, tracing]
4+
---
5+
# Correlate traces and logs
6+
7+
This guide will explain how to correlate traces with logs in Grafana Tempo.
8+
9+
## Step 1: Configure Tracing
10+
11+
First you need to configure OpenTelemetry tracing in your application. The easiest way to get started with tracing is to enable auto-instrumentation for your application. This will automatically collect traces and send them to the correct place using the OpenTelemetry Agent.
12+
13+
[:bulb: Get started with auto-instrumentation](../auto-instrumentation.md)
14+
15+
## Step 2: Configure Logging
16+
17+
If you are using auto-instrumentation for logs they are automatically correlated with traces. If you are not using auto-instrumentation for logs, you need to configure your log output to include trace information.
18+
19+
20+
=== "log4j"
21+
22+
Add the [opentelemetry-javaagent-log4j-context-data-2.17](https://mvnrepository.com/artifact/io.opentelemetry.javaagent.instrumentation/opentelemetry-javaagent-log4j-context-data-2.17) package to your `pom.xml` or `build.gradle` to include trace information in your logs:
23+
24+
```
25+
io.opentelemetry.instrumentation:opentelemetry-log4j-context-data-2.17-autoconfigure:2.1.0-alpha
26+
```
27+
28+
Add the following pattern to your log4j configuration to include trace information in your logs:
29+
30+
```xml
31+
<?xml version="1.0" encoding="UTF-8"?>
32+
<Configuration status="WARN">
33+
<Appenders>
34+
<Console name="Console" target="SYSTEM_OUT">
35+
<PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} traceId: %X{trace_id} spanId: %X{span_id} - %msg%n" />
36+
</Console>
37+
</Appenders>
38+
<Loggers>
39+
<Root level="All" >
40+
<AppenderRef ref="Console"/>
41+
</Root>
42+
</Loggers>
43+
</Configuration>
44+
```
45+
46+
=== "logback"
47+
48+
Add the [opentelemetry-logback-mdc-1.0](https://mvnrepository.com/artifact/io.opentelemetry.instrumentation/opentelemetry-logback-mdc-1.0) package to your `pom.xml` or `build.gradle` to include trace information in your logs:
49+
50+
```
51+
io.opentelemetry.instrumentation:opentelemetry-logback-mdc-1.0:2.1.0-alpha
52+
```
53+
54+
Add the following pattern to your logback configuration to include trace information in your logs:
55+
56+
```xml
57+
<?xml version="1.0" encoding="UTF-8" ?>
58+
<configuration>
59+
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
60+
<encoder>
61+
<pattern><![CDATA[%date{HH:mm:ss.SSS} [%thread] %-5level %logger{15}#%line %X{req.requestURI} traceId: %X{trace_id} spanId: %X{span_id} %msg\n]]></pattern>
62+
</encoder>
63+
</appender>
64+
65+
<appender name="OTEL" class="io.opentelemetry.instrumentation.logback.v1_0.OpenTelemetryAppender">
66+
<appender-ref ref="STDOUT" />
67+
</appender>
68+
69+
<root>
70+
<level value="DEBUG" />
71+
<appender-ref ref="STDOUT" />
72+
</root>
73+
74+
</configuration>
75+
```
76+
77+
## 3. Profit
78+
79+
Now that you have tracing and logging set up, you can use Grafana Tempo to correlate traces and logs. When you view a trace in Grafana Tempo, you can see the logs that are associated with that trace. This makes it easy to understand what happened in your application and troubleshoot issues.
80+
81+
![Correlate traces and logs](../../../assets/grafana-tempo-logs.png)
82+
83+
[:arrow_backward: Back to the list of guides](../index.md)

0 commit comments

Comments
 (0)