|
1 | 1 | ---
|
2 | 2 | description: >-
|
3 | 3 | Application Performance Monitoring or tracing using Grafana Tempo on NAIS.
|
4 |
| -tags: [explanation] |
| 4 | +tags: [explanation, tracing] |
5 | 5 | ---
|
6 | 6 |
|
7 |
| -# Tracing |
| 7 | +# Distributed Tracing |
8 | 8 |
|
9 |
| -[Traces](https://en.wikipedia.org/wiki/Observability_(software)#Distributed_traces) are a record of the path a request takes through your application. They |
10 |
| -are useful for understanding how a request is processed in your application. |
| 9 | +Tracing is a way to track a request as it passes through the various services needed to handle it. This is especially useful in a microservices architecture, where a single user action often results in a series of calls to different services. |
11 | 10 |
|
12 |
| -NAIS does not collect trace data automatically. If you want tracing integration, |
13 |
| -you must first instrument your application to collect traces, and then configure |
14 |
| -the tracing library to send it to the correct place. |
| 11 | +Tracing allows developers to understand the entire journey of a request, making it easier to identify bottlenecks, latency issues, or failures that can impact user experience. |
15 | 12 |
|
16 |
| -Traces from NAIS applications are collected using the [OpenTelemetry](https://opentelemetry.io/) standard. |
17 |
| -Performance metrics are stored and queried from the [Tempo](https://grafana.com/oss/tempo/) component. |
| 13 | +## How tracing works |
18 | 14 |
|
19 |
| -## Visualizing application performance |
| 15 | +When a request is made to your application, a trace is started. This creates a Trace which serves as a container for all the work done for that request. |
20 | 16 |
|
21 |
| -Visualization of traces can be done in [the new Grafana installation](https://grafana.<<tenant()>>.cloud.nais.io). |
| 17 | + |
22 | 18 |
|
23 |
| -You can use the **Explore** feature of Grafana with the _prod-gcp-tempo_ and _dev-gcp-tempo_ data sources. |
| 19 | +<small>Trace visualization by Logshero licensed under Apache License 2.0</small> |
24 | 20 |
|
25 |
| -There are no ready-made dashboards at this point, but feel free to make one yourself and contribute to this page. |
| 21 | +The work done by individual services (or components of a single service) is captured in Spans. A span represents a single unit of work in a trace, like a SQL query or a call to an external service. |
| 22 | + |
| 23 | +Spans can be nested and form a trace tree. The Trace is the root of the tree, and each Span is a node that represents a specific operation in your application. The tree of spans captures the causal relationships between the operations in your application (i.e., which operations caused others to occur). |
| 24 | + |
| 25 | +Each Span carries a Context that includes metadata about the trace (like a unique trace identifier and span identifier) and any other data you choose to include. This context is propagated across process boundaries, allowing all the work that's part of a single trace to be linked together, even if it spans multiple services. |
| 26 | + |
| 27 | +By analyzing the data captured in traces and spans, you can gain a deep understanding of how requests flow through your system, where time is being spent, and where problems might be occurring. This can be invaluable for debugging, performance optimization, and understanding the overall health of your system. |
| 28 | + |
| 29 | +## OpenTelemetry |
| 30 | + |
| 31 | +OpenTelemetry, a project under the Cloud Native Computing Foundation (CNCF), has become the standard for tracing and application telemetry due to its unified APIs for tracing and metrics, which simplify instrumentation and data collection from applications. |
| 32 | + |
| 33 | +It supports a wide range of programming languages, including Java, JavaScript, Python, Go, and more, allowing for consistent tooling across different parts of a tech stack. |
| 34 | + |
| 35 | +OpenTelemetry also provides automatic instrumentation for popular frameworks and libraries, enabling the collection of traces and metrics without the need for modifying application code. |
| 36 | + |
| 37 | +It's vendor-neutral, allowing telemetry data export to any backend, providing the flexibility to switch between different analysis tools as needs change. Backed by leading companies in the cloud and software industry, and with a vibrant community, OpenTelemetry ensures project longevity and continuous improvement. |
| 38 | + |
| 39 | +[:octicons-link-external-24: Learn more about OpenTelemetry][open-telemetry] |
| 40 | + |
| 41 | +## Tracing in NAIS |
| 42 | + |
| 43 | +NAIS does not collect application trace data automatically, but it provides the infrastructure to do so using OpenTelemetry, Grafana Tempo for storage and querying, and easy-to-use configuration options. |
| 44 | + |
| 45 | +### The easy way: Auto-instrumentation |
| 46 | + |
| 47 | +The preferred way to get started with tracing is to enable auto-instrumentation for your application. This will automatically collect traces and send them to the correct place using the OpenTelemetry Agent. |
| 48 | + |
| 49 | +This is the easiest way to get started with tracing, as it requires little to no effort on the part of the team developing the application and provides instrumentation for popular libraries, frameworks and external services such as PostgreSQL, Redis, Kafka and HTTP clients. |
| 50 | + |
| 51 | +[:bulb: Get started with auto-instrumentation](../../how-to-guides/observability/auto-instrumentation.md) |
| 52 | + |
| 53 | +### The hard way: Manual instrumentation |
| 54 | + |
| 55 | +If you want more control over how your application is instrumented, you can manually instrument your application using the OpenTelemetry SDK for your programming language. |
| 56 | + |
| 57 | +To get the correct configuration for you can still use the auto-instrumentation configuration, but set the `runtime` to `sdk` as this will only set up the OpenTelemetry configuration, without injecting the OpenTelemetry Agent. |
| 58 | + |
| 59 | +[:bulb: Get started with manual-instrumentation](../../how-to-guides/observability/auto-instrumentation.md#enable-auto-instrumentation-for-other-applications) |
| 60 | + |
| 61 | +### OpenTelemetry SDKs |
| 62 | + |
| 63 | +OpenTelemetry provides SDKs for a wide range of programming languages: |
| 64 | + |
| 65 | +* [:fontawesome-brands-java: OpenTelemetry Java][otel-java] |
| 66 | +* [:fontawesome-brands-js: OpenTelemetry JavaScript][otel-node] |
| 67 | +* [:fontawesome-brands-python: OpenTelemetry Python][otel-python] |
| 68 | +* [:fontawesome-brands-golang: OpenTelemetry Go][otel-go] |
| 69 | + |
| 70 | +## Visualizing traces in Grafana Tempo |
| 71 | + |
| 72 | +Visualizing and querying traces is done in Grafana using the Grafana Tempo. Tempo is an open-source, easy-to-use, high-scale, and cost-effective distributed tracing backend that stores and queries traces. |
| 73 | + |
| 74 | +The easiest way to get started with Tempo is to use the [Explore view in Grafana][grafana-explore], which provides a user-friendly interface for querying and visualizing traces. |
| 75 | + |
| 76 | +[:octicons-link-external-24: Open Grafana Explore][grafana-explore] |
| 77 | + |
| 78 | +[:bulb: Get started with Grafana Tempo](../../how-to-guides/observability/tracing/tempo.md) |
| 79 | + |
| 80 | + |
| 81 | + |
| 82 | +[open-telemetry]: https://opentelemetry.io/ |
| 83 | +[otel-java]: https://opentelemetry.io/docs/languages/java/ |
| 84 | +[otel-node]: https://opentelemetry.io/docs/languages/js/ |
| 85 | +[otel-python]: https://opentelemetry.io/docs/languages/python/ |
| 86 | +[otel-go]: https://opentelemetry.io/docs/languages/go/ |
| 87 | +[grafana]: <<tenant_url("grafana")>> |
| 88 | +[grafana-explore]: <<tenant_url("grafana", "explore")>> |
0 commit comments