diff --git a/docs/assets/envoy-tracing.png b/docs/assets/envoy-tracing.png
deleted file mode 100644
index 2055e42b4..000000000
Binary files a/docs/assets/envoy-tracing.png and /dev/null differ
diff --git a/docs/assets/example-trace.png b/docs/assets/example-trace.png
deleted file mode 100644
index 5cebd8a6a..000000000
Binary files a/docs/assets/example-trace.png and /dev/null differ
diff --git a/docs/assets/grafana-tempo-logs.png b/docs/assets/grafana-tempo-logs.png
new file mode 100644
index 000000000..b9f15a9e6
Binary files /dev/null and b/docs/assets/grafana-tempo-logs.png differ
diff --git a/docs/assets/grafana-tempo-query-builder.png b/docs/assets/grafana-tempo-query-builder.png
new file mode 100644
index 000000000..5345c7d98
Binary files /dev/null and b/docs/assets/grafana-tempo-query-builder.png differ
diff --git a/docs/assets/grafana-tempo-trace-view.png b/docs/assets/grafana-tempo-trace-view.png
new file mode 100644
index 000000000..d36982cb1
Binary files /dev/null and b/docs/assets/grafana-tempo-trace-view.png differ
diff --git a/docs/assets/grafana-tempo.png b/docs/assets/grafana-tempo.png
new file mode 100644
index 000000000..a240e417c
Binary files /dev/null and b/docs/assets/grafana-tempo.png differ
diff --git a/docs/assets/kiali-400-sample.gif b/docs/assets/kiali-400-sample.gif
deleted file mode 100644
index 7cb706043..000000000
Binary files a/docs/assets/kiali-400-sample.gif and /dev/null differ
diff --git a/docs/assets/kiali-sample.gif b/docs/assets/kiali-sample.gif
deleted file mode 100644
index 8a640c1eb..000000000
Binary files a/docs/assets/kiali-sample.gif and /dev/null differ
diff --git a/docs/assets/logging_overview.png b/docs/assets/logging_overview.png
deleted file mode 100644
index f05f64185..000000000
Binary files a/docs/assets/logging_overview.png and /dev/null differ
diff --git a/docs/assets/prometheus_alertmanager_overview.png b/docs/assets/prometheus_alertmanager_overview.png
deleted file mode 100644
index 85bbbdb5a..000000000
Binary files a/docs/assets/prometheus_alertmanager_overview.png and /dev/null differ
diff --git a/docs/assets/trace-span-ids.png b/docs/assets/trace-span-ids.png
deleted file mode 100644
index 927d84b43..000000000
Binary files a/docs/assets/trace-span-ids.png and /dev/null differ
diff --git a/docs/assets/tracing.png b/docs/assets/tracing.png
new file mode 100644
index 000000000..a0e2427ad
Binary files /dev/null and b/docs/assets/tracing.png differ
diff --git a/docs/explanation/observability/README.md b/docs/explanation/observability/README.md
index 8aa46f375..ee1c5f5b6 100644
--- a/docs/explanation/observability/README.md
+++ b/docs/explanation/observability/README.md
@@ -22,17 +22,27 @@ The tree pillars of observability are:
2. **Metrics** - Metrics are a numerical measurement of something in your application. They are useful for understanding the performance of your application and is generally more scalable than logs both in terms of storage and querying since they are structured data.
3. **Traces** - Traces are a record of the path a request takes through your application. They are useful for understanding how a request is processed in your application.
+
+
```mermaid
graph
- A[Application] --> B((Logs))
- A --> C((Metrics))
- A --> D((Traces))
+ A[Application] --> B(Logs)
+ A --> C(Metrics)
+ A --> D(Traces)
click B "#logs"
click C "#metrics"
click D "#traces"
```
+
+
+## Automatic observability
+
+NAIS provides a new way to get started with observability. By enabling auto-instrumentation, you can get started with observability without having to write any code. This is the easiest way to get started with observability, as it requires little to no effort on the part of the team developing the application.
+
+[:bulb: Get started with auto-instrumentation](../../how-to-guides/observability/auto-instrumentation.md)
+
## Metrics
Metrics are a way to measure the state of your application. Metrics are usually numerical values that can be aggregated and visualized. Metrics are often used to create alerts and dashboards.
@@ -41,7 +51,7 @@ We use the [OpenMetrics][openmetrics] format for metrics. This is a text-based f
[openmetrics]: https://openmetrics.io/
-[:octicons-arrow-right-24: Get started with metrics](./metrics.md)
+[:bulb: Get started with metrics](./metrics.md)
### Prometheus
@@ -57,13 +67,13 @@ graph LR
Prometheus --GET /metrics--> Application
```
-[:octicons-arrow-right-24: Access Prometheus here](./metrics.md#prometheus-environments)
+[:simple-prometheus: Access Prometheus here](./metrics.md#prometheus-environments)
### Grafana
[Grafana][grafana] is a tool for visualizing metrics. It is used to create dashboards that can be used to monitor your application. Grafana is used by many open source projects and is the de facto standard for metrics in the cloud native world.
-[:octicons-arrow-right-24: Access Grafana here][nais-grafana]
+[:simple-grafana: Access Grafana here][nais-grafana]
[grafana]: https://grafana.com/
[nais-grafana]: <>
@@ -82,17 +92,16 @@ graph LR
Router --> C[Elastic / Kibana]
```
-[:octicons-arrow-right-24: Configure your logs](./logging.md)
+[:bulb: Configure your logs](./logging.md)
## Traces
With tracing, we can get application performance monitoring (APM). Tracing gives deep insight into the execution of your application. For instance, you can use tracing to see if parallel function are actually run in parallel,
or what amount of time your application spends in a given function.
-Traces from NAIS applications are collected using the [OpenTelemetry](https://opentelemetry.io/) standard. Performance metrics are stored and queried from the [Tempo](https://grafana.com/oss/tempo/) component.
+Traces from NAIS applications can be collected using the [OpenTelemetry](https://opentelemetry.io/) standard. Performance metrics are stored and queried from the [Tempo](https://grafana.com/oss/tempo/) component.
-Visualization of traces can be done in [Grafana](https://grafana.<>.cloud.nais.io),
-using the `*-tempo` data sources (one for each environment).
+Visualization of traces can be done in [Grafana](https://grafana.<>.cloud.nais.io), using the `*-tempo` data sources (one for each environment).
```mermaid
graph LR
@@ -100,7 +109,7 @@ graph LR
Tempo --> Grafana
```
-[:octicons-arrow-right-24: Read more about tracing](./tracing.md)
+[:bulb: Read more about tracing](./tracing.md)
## Alerts
@@ -117,16 +126,19 @@ graph LR
Alertmanager --> Slack
```
-[:octicons-arrow-right-24: Read more about alerts](./alerting.md)
+[:bulb: Read more about alerts](./alerting.md)
## Learning more
Observability is a very broad topic and there is a lot more to learn. Here are some resources that you can use to learn more about observability:
-- [:octicons-video-24: Monitoring, the Prometheus Way][youtube-prometheus]
-- [:octicons-book-24: SRE Book - Monitoring distributed systems][sre-book-monitoring]
-- [:octicons-book-24: SRE Workbook - Monitoring][sre-workbook-monitoring]
-- [:octicons-book-24: SRE Workbook - Alerting][sre-workbook-alerting]
+[:octicons-video-24: Monitoring, the Prometheus Way][youtube-prometheus]
+
+[:octicons-book-24: SRE Book - Monitoring distributed systems][sre-book-monitoring]
+
+[:octicons-book-24: SRE Workbook - Monitoring][sre-workbook-monitoring]
+
+[:octicons-book-24: SRE Workbook - Alerting][sre-workbook-alerting]
[sre-book-monitoring]: https://sre.google/sre-book/monitoring-distributed-systems/
[sre-workbook-monitoring]: https://sre.google/workbook/monitoring/
diff --git a/docs/explanation/observability/frontend.md b/docs/explanation/observability/frontend.md
index 2385a50b7..60964cb32 100644
--- a/docs/explanation/observability/frontend.md
+++ b/docs/explanation/observability/frontend.md
@@ -211,8 +211,8 @@ Instrumenting mounts and unmounts can be quite data intensive, take due care.
Navigate your web browser to the new Grafana at >.cloud.nais.io>.
-Traces are available from the `dev-gcp-tempo` and `prod-gcp-tempo` data sources, whereas
-logs and metrics are available from the `dev-gcp-loki` and `prod-gcp-loki` data sources.
+Traces are available from the data sources ending with `-tempo`, whereas
+logs and metrics are available from data sources sources ending with `-loki`.
Use the "Explore" tab under either the Loki or Tempo tab and run queries.
diff --git a/docs/explanation/observability/tracing.md b/docs/explanation/observability/tracing.md
index 61ab67dff..8ab914884 100644
--- a/docs/explanation/observability/tracing.md
+++ b/docs/explanation/observability/tracing.md
@@ -1,25 +1,88 @@
---
description: >-
Application Performance Monitoring or tracing using Grafana Tempo on NAIS.
-tags: [explanation]
+tags: [explanation, tracing]
---
-# Tracing
+# Distributed Tracing
-[Traces](https://en.wikipedia.org/wiki/Observability_(software)#Distributed_traces) are a record of the path a request takes through your application. They
-are useful for understanding how a request is processed in your application.
+Tracing is a way to track a request as it passes through the various services needed to handle it. This is especially useful in a microservices architecture, where a single user action often results in a series of calls to different services.
-NAIS does not collect trace data automatically. If you want tracing integration,
-you must first instrument your application to collect traces, and then configure
-the tracing library to send it to the correct place.
+Tracing allows developers to understand the entire journey of a request, making it easier to identify bottlenecks, latency issues, or failures that can impact user experience.
-Traces from NAIS applications are collected using the [OpenTelemetry](https://opentelemetry.io/) standard.
-Performance metrics are stored and queried from the [Tempo](https://grafana.com/oss/tempo/) component.
+## How tracing works
-## Visualizing application performance
+When a request is made to your application, a trace is started. This creates a Trace which serves as a container for all the work done for that request.
-Visualization of traces can be done in [the new Grafana installation](https://grafana.<>.cloud.nais.io).
+
-You can use the **Explore** feature of Grafana with the _prod-gcp-tempo_ and _dev-gcp-tempo_ data sources.
+Trace visualization by Logshero licensed under Apache License 2.0
-There are no ready-made dashboards at this point, but feel free to make one yourself and contribute to this page.
+The work done by individual services (or components of a single service) is captured in Spans. A span represents a single unit of work in a trace, like a SQL query or a call to an external service.
+
+Spans can be nested and form a trace tree. The Trace is the root of the tree, and each Span is a node that represents a specific operation in your application. The tree of spans captures the causal relationships between the operations in your application (i.e., which operations caused others to occur).
+
+Each Span carries a Context that includes metadata about the trace (like a unique trace identifier and span identifier) and any other data you choose to include. This context is propagated across process boundaries, allowing all the work that's part of a single trace to be linked together, even if it spans multiple services.
+
+By analyzing the data captured in traces and spans, you can gain a deep understanding of how requests flow through your system, where time is being spent, and where problems might be occurring. This can be invaluable for debugging, performance optimization, and understanding the overall health of your system.
+
+## OpenTelemetry
+
+OpenTelemetry, a project under the Cloud Native Computing Foundation (CNCF), has become the standard for tracing and application telemetry due to its unified APIs for tracing and metrics, which simplify instrumentation and data collection from applications.
+
+It supports a wide range of programming languages, including Java, JavaScript, Python, Go, and more, allowing for consistent tooling across different parts of a tech stack.
+
+OpenTelemetry also provides automatic instrumentation for popular frameworks and libraries, enabling the collection of traces and metrics without the need for modifying application code.
+
+It's vendor-neutral, allowing telemetry data export to any backend, providing the flexibility to switch between different analysis tools as needs change. Backed by leading companies in the cloud and software industry, and with a vibrant community, OpenTelemetry ensures project longevity and continuous improvement.
+
+[:octicons-link-external-24: Learn more about OpenTelemetry][open-telemetry]
+
+## Tracing in NAIS
+
+NAIS does not collect application trace data automatically, but it provides the infrastructure to do so using OpenTelemetry, Grafana Tempo for storage and querying, and easy-to-use configuration options.
+
+### The easy way: Auto-instrumentation
+
+The preferred way to get started with tracing is to enable auto-instrumentation for your application. This will automatically collect traces and send them to the correct place using the OpenTelemetry Agent.
+
+This is the easiest way to get started with tracing, as it requires little to no effort on the part of the team developing the application and provides instrumentation for popular libraries, frameworks and external services such as PostgreSQL, Redis, Kafka and HTTP clients.
+
+[:bulb: Get started with auto-instrumentation](../../how-to-guides/observability/auto-instrumentation.md)
+
+### The hard way: Manual instrumentation
+
+If you want more control over how your application is instrumented, you can manually instrument your application using the OpenTelemetry SDK for your programming language.
+
+To get the correct configuration for you can still use the auto-instrumentation configuration, but set the `runtime` to `sdk` as this will only set up the OpenTelemetry configuration, without injecting the OpenTelemetry Agent.
+
+[:bulb: Get started with manual-instrumentation](../../how-to-guides/observability/auto-instrumentation.md#enable-auto-instrumentation-for-other-applications)
+
+### OpenTelemetry SDKs
+
+OpenTelemetry provides SDKs for a wide range of programming languages:
+
+* [:fontawesome-brands-java: OpenTelemetry Java][otel-java]
+* [:fontawesome-brands-js: OpenTelemetry JavaScript][otel-node]
+* [:fontawesome-brands-python: OpenTelemetry Python][otel-python]
+* [:fontawesome-brands-golang: OpenTelemetry Go][otel-go]
+
+## Visualizing traces in Grafana Tempo
+
+Visualizing and querying traces is done in Grafana using the Grafana Tempo. Tempo is an open-source, easy-to-use, high-scale, and cost-effective distributed tracing backend that stores and queries traces.
+
+The easiest way to get started with Tempo is to use the [Explore view in Grafana][grafana-explore], which provides a user-friendly interface for querying and visualizing traces.
+
+[:octicons-link-external-24: Open Grafana Explore][grafana-explore]
+
+[:bulb: Get started with Grafana Tempo](../../how-to-guides/observability/tracing/tempo.md)
+
+
+
+[open-telemetry]: https://opentelemetry.io/
+[otel-java]: https://opentelemetry.io/docs/languages/java/
+[otel-node]: https://opentelemetry.io/docs/languages/js/
+[otel-python]: https://opentelemetry.io/docs/languages/python/
+[otel-go]: https://opentelemetry.io/docs/languages/go/
+[grafana]: <>
+[grafana-explore]: <>
diff --git a/docs/how-to-guides/observability/auto-instrumentation.md b/docs/how-to-guides/observability/auto-instrumentation.md
new file mode 100644
index 000000000..a61027e68
--- /dev/null
+++ b/docs/how-to-guides/observability/auto-instrumentation.md
@@ -0,0 +1,67 @@
+---
+description: Get started with auto-instrumentation for your applications with OpenTelemetry data for Tracing, Metrics and Logs using the OpenTelemetry Agent.
+tags: [guide, tracing]
+---
+# Get started with auto-instrumentation
+
+This guide will explain how to get started with auto-instrumentation your applications with OpenTelemetry data for [Tracing](../../explanation/observability/tracing.md), [Metrics](../../explanation/observability/metrics.md) and [Logs](../../explanation/observability/logging.md) using the OpenTelemetry Agent.
+
+The main benefit of auto-instrumentation is that is requires little to no effort on the part of the team developing the application while providing insight into popular libraries, frameworks and external services such as PostgreSQL, Redis, Kafka and HTTP clients.
+
+Auto-instrumentation is a preferred way to get started with tracing in NAIS, and can also be used for metrics and logs collection.This type of instrumentation is available for Java, Node.js and Python applications, but can also be used for other in `sdk` mode where it will only set up the OpenTelemetry configuration.
+
+!!! info
+
+ :new: Auto-instrumentation is a new feature and is only available for nais applications running in GCP.
+
+## Enable auto-instrumentation for Java/Kotlin applications
+
+```yaml
+...
+spec:
+ observability:
+ autoInstrumentation:
+ enabled: true
+ runtime: java
+```
+
+## Enable auto-instrumentation for Node.js applications
+
+```yaml
+...
+spec:
+ observability:
+ autoInstrumentation:
+ enabled: true
+ runtime: node
+```
+
+## Enable auto-instrumentation for Python applications
+
+```yaml
+...
+spec:
+ observability:
+ autoInstrumentation:
+ enabled: true
+ runtime: python
+```
+
+## Enable auto-instrumentation for other applications
+
+If your application runtime is not one of the supported runtimes or you want to instrument your application yourself you can stil get benefit from the auto instrumentation configuration.
+
+This will only set up the OpenTelemetry configuration for the application, but it will not inject the OpenTelemetry Agent into the application.
+
+```yaml
+...
+spec:
+ observability:
+ autoInstrumentation:
+ enabled: true
+ runtime: sdk
+```
+
+## Resources
+
+[:bulb: OpenTelemetry Auto-Instrumentation Configuration Reference](../../reference/observability/auto-config.md)
diff --git a/docs/how-to-guides/observability/tracing/context-propagation.md b/docs/how-to-guides/observability/tracing/context-propagation.md
new file mode 100644
index 000000000..29d845635
--- /dev/null
+++ b/docs/how-to-guides/observability/tracing/context-propagation.md
@@ -0,0 +1,18 @@
+---
+description: Learn how to propagate trace context across process boundaries in a few common scenarios.
+tags: [guide, tracing]
+---
+# Trace context propagation
+
+Each Span carries a Context that includes metadata about the trace (like a unique trace identifier and span identifier) and any other data you choose to include. This context is propagated across process boundaries, allowing all the work that's part of a single trace to be linked together, even if it spans multiple services.
+
+This guide explains how to propagate trace context across process boundaries in a few common scenarios. If you are using [auto-instrumentation](../auto-instrumentation.md), trace context propagation is already handled for you.
+
+[:octicons-link-external-24: OpenTelemetry Context Propagation](https://opentelemetry.io/docs/concepts/context-propagation/)
+
+## Propagate trace context in HTTP requests
+
+When a service makes an HTTP request to another service, it should include the trace context in the request headers. The receiving service can then use this context to create a new Span that's part of the same trace. OpenTelemetry provides a standard for how trace context should be propagated in HTTP requests, called the [W3C Trace Context](https://www.w3.org/TR/trace-context/) standard.
+
+* [OpenTelemetry Setup in Spring Boot Application](https://opentelemetry.io/docs/languages/java/automatic/spring-boot)
+* [OpenTelemetry Setup in Ktor Application](https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/ktor/ktor-2.0/library)
\ No newline at end of file
diff --git a/docs/how-to-guides/observability/tracing/correlate-traces-logs.md b/docs/how-to-guides/observability/tracing/correlate-traces-logs.md
new file mode 100644
index 000000000..ae3ad24a9
--- /dev/null
+++ b/docs/how-to-guides/observability/tracing/correlate-traces-logs.md
@@ -0,0 +1,83 @@
+---
+description: Learn how to correlate traces with logs in Grafana Tempo.
+tags: [guide, tracing]
+---
+# Correlate traces and logs
+
+This guide will explain how to correlate traces with logs in Grafana Tempo.
+
+## Step 1: Configure Tracing
+
+First you need to configure OpenTelemetry tracing in your application. The easiest way to get started with tracing is to enable auto-instrumentation for your application. This will automatically collect traces and send them to the correct place using the OpenTelemetry Agent.
+
+[:bulb: Get started with auto-instrumentation](../auto-instrumentation.md)
+
+## Step 2: Configure Logging
+
+If you are using auto-instrumentation for logs they are automatically correlated with traces. If you are not using auto-instrumentation for logs, you need to configure your log output to include trace information.
+
+
+=== "log4j"
+
+ Add the [opentelemetry-javaagent-log4j-context-data-2.17](https://mvnrepository.com/artifact/io.opentelemetry.javaagent.instrumentation/opentelemetry-javaagent-log4j-context-data-2.17) package to your `pom.xml` or `build.gradle` to include trace information in your logs:
+
+ ```
+ io.opentelemetry.instrumentation:opentelemetry-log4j-context-data-2.17-autoconfigure:2.1.0-alpha
+ ```
+
+ Add the following pattern to your log4j configuration to include trace information in your logs:
+
+ ```xml
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ```
+
+=== "logback"
+
+ Add the [opentelemetry-logback-mdc-1.0](https://mvnrepository.com/artifact/io.opentelemetry.instrumentation/opentelemetry-logback-mdc-1.0) package to your `pom.xml` or `build.gradle` to include trace information in your logs:
+
+ ```
+ io.opentelemetry.instrumentation:opentelemetry-logback-mdc-1.0:2.1.0-alpha
+ ```
+
+ Add the following pattern to your logback configuration to include trace information in your logs:
+
+ ```xml
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ```
+
+## 3. Profit
+
+Now that you have tracing and logging set up, you can use Grafana Tempo to correlate traces and logs. When you view a trace in Grafana Tempo, you can see the logs that are associated with that trace. This makes it easy to understand what happened in your application and troubleshoot issues.
+
+
+
+[:arrow_backward: Back to the list of guides](../index.md)
\ No newline at end of file
diff --git a/docs/how-to-guides/observability/tracing/otel-tracing.md b/docs/how-to-guides/observability/tracing/otel-tracing.md
deleted file mode 100644
index 945855b1a..000000000
--- a/docs/how-to-guides/observability/tracing/otel-tracing.md
+++ /dev/null
@@ -1,37 +0,0 @@
----
-description: How to enable tracing in your application
-tags: [guide, otel]
----
-# Enable trace collection
-
-This guide will show you how to get started collecting distributed trace data from your application.
-
-## 1. Enable tracing in application manifest
-
-The first step in collecting trace information is to enable it in your application spec:
-
-???+ note ".nais/app.yaml"
-
- ```yaml
- ...
- spec:
- observability:
- tracing:
- enabled: true
- ...
- ```
-
-## 2. Select an OTLP exporter
-
-Select the appropriate [OTLP exporter](https://opentelemetry.io/ecosystem/registry/?s=otlp+exporter) for your specific application.
-Ready-made libraries can be found for Java, Rust, Python, Go, and most other popular languages.
-
-## 3. Configure the OTLP exporter
-
-Finally, the OTLP exporter must be configured to send data to the NAIS collector.
-
-That configuration is provided by NAIS through the `$OTEL_EXPORTER_OTLP_ENDPOINT` environment variable,
-which in turn is supposed to be automatically detected and used by your OpenTelemetry library.
-All data must be sent using the gRPC protocol.
-
-You can find all the environment variables provided by NAIS in the [OpenTelemetry Tracing Reference](../../../reference/observability/otel/tracing.md).
diff --git a/docs/how-to-guides/observability/tracing/tempo.md b/docs/how-to-guides/observability/tracing/tempo.md
new file mode 100644
index 000000000..b673d9be0
--- /dev/null
+++ b/docs/how-to-guides/observability/tracing/tempo.md
@@ -0,0 +1,52 @@
+# Get started with Grafana Tempo
+
+Grafana Tempo is an open-source, easy-to-use, high-scale, and cost-effective distributed tracing backend that stores and queries traces in a way that is easy to understand and use. It is fully integrated with Grafana, allowing you to visualize and query traces in the same interface as your metrics, and logs.
+
+Since NAIS does not collect application trace data automatically, you need to enable tracing in your application. The preferred way to get started with tracing is to enable auto-instrumentation for your application. This will automatically collect traces and send them to the correct place using the OpenTelemetry Agent.
+
+[:bulb: Get started with auto-instrumentation](../auto-instrumentation.md)
+
+Once you have traces being collected, you can visualize and query them in Grafana using the Grafana Tempo data source. To get started with Tempo, you can use the Explore view in Grafana, which provides a user-friendly interface for querying and visualizing traces.
+
+[:simple-grafana: Open Grafana Explore](<>)
+
+## Querying traces in Grafana Tempo
+
+The easiest way to get started with querying traces in Grafana Tempo is to use the query builder mode. The query builder mode is a graphical interface that helps you build LogQL queries by selecting labels and fields from your logs.
+
+Start by selecting the tempo data source for the environment you want to query traces for (one ending with `-tempo`). Then select the Search query type to open the query builder mode.
+
+
+
+Here you can select the service you want to query traces for, and then select the operation you want to query traces for. You can also add filters to your query to narrow down the results.
+
+Bellow the query builder you will see the TraceQL query that is being built as you select tags and fields. You can also edit the TraceQL query directly if you want to write your own queries.
+
+Click the `Run query` button to run the query and see the results. You can also add the query to a dashboard by clicking the `Add to dashboard` button.
+
+[:octicons-link-external-24: Learn more about Grafana Tempo query editor on grafana.com](https://grafana.com/docs/grafana/latest/datasources/tempo/query-editor/)
+
+### TraceQL query language
+
+Grafana Tempo uses the TraceQL query language to query traces. TraceQL is a query language for querying trace data, and it is based on the LogQL query language used by Grafana Loki for querying logs and the PromQL query language used by Prometheus for querying metrics.
+
+TraceQL provides a powerful and flexible way to query trace data, and it is designed to be easy to use and understand. You can use TraceQL to filter and aggregate trace data, and to create visualizations and alerts based on trace data.
+
+
+
+[:bulb: Learn more about TraceQL query language](../../../reference/observability/tracing/traceql.md)
+
+## Understanding trace data
+
+Clicking a trace in the query results will open the trace view, which provides a detailed view of the trace data. Here you can see the trace ID, the duration of the trace, and the services and operations involved in the trace.
+
+Trace data is visualized as a tree, where each node represents a span in the trace. You can expand and collapse nodes to see more or less detail, and you can click a node to see more information about the span.
+
+
+
+A red circle next to a span indicates that the span has an error. You can click the span to see more information about the error.
+
+Traces in nais follows the OpenTelemetry Semantic Conventions, which provides a standard for naming and structuring trace data. This makes it easier to understand and use trace data, as you can rely on a consistent structure across all traces.
+
+[:bulb: Learn more about OpenTelemetry Trace Semantic Conventions](../../../reference/observability/tracing/trace-semconv.md)
+
diff --git a/docs/reference/glossary.md b/docs/reference/glossary.md
new file mode 100644
index 000000000..07cffff10
--- /dev/null
+++ b/docs/reference/glossary.md
@@ -0,0 +1,57 @@
+# A nais glossary
+
+## Observability
+
+Observability is the art of understanding how a system behaves by adding instrumentation such as logs, metrics, and traces.
+
+### Metrics
+
+Metrics are a numerical measurement of something in your application such as the number of requests or the response time. Metrics are much better suited for for dashboards and alerts compared to logs.
+
+### Prometheus
+
+[Prometheus](https://prometheus.io/) is a time-series database that is used to store and query metrics from Grafana.
+
+### Alertmanager
+
+[Alertmanager](https://prometheus.io/docs/alerting/alertmanager/) is a component of Prometheus that is used to create and manage Slack alerts based on the metrics collected by Prometheus.
+
+### Grafana
+
+[Grafana](https://grafana.com/) is a tool for creating application dashboards and visualizing data such as metrics, traces, and logs in a user-friendly way.
+
+### Traces
+
+Traces are a record of the path a request takes through your application. They are useful for understanding how a request is processed across multiple internal and external services.
+
+### Span
+
+A span represents a single unit of work in a Trace, like a SQL query or an HTTP call to an API.
+
+### Context
+
+Each Span carries a Context that includes metadata about the trace (like a unique trace identifier and span identifier) and any other data you choose to include. This context is propagated across process boundaries, allowing all the work that's part of a single trace to be linked together, even if it spans multiple services.
+
+### Grafana Tempo
+
+[Grafana Tempo](https://grafana.com/oss/tempo/) is a storage and query system for traces in Grafana.
+
+### OpenTelemetry
+
+[OpenTelemetry](https://opentelemetry.io) is the standard for instrumenting your application for observability. It provides APIs, libraries, agents, and instrumentation to capture distributed traces, metrics and logs from your application.
+
+### Logs
+
+Logs are a record of what has happened in your application. They are useful for debugging, but due to their unstructured format they generally do not scale very well. Use metrics for dashboards and alerts and traces for understanding how a request is processed across multiple services.
+
+### Grafana Loki
+
+[Grafana Loki](https://grafana.com/oss/loki/) is a storage and query system for logs in Grafana.
+
+### Kibana
+
+[Kibana](https://www.elastic.co/kibana) is a tool for visualizing logs. It is often used in combination with Elasticsearch to create dashboards and alerts.
+
+### Elasticsearch
+
+[Elasticsearch](https://www.elastic.co/elasticsearch/) is a search engine that is used to store logs.
diff --git a/docs/reference/observability/auto-config.md b/docs/reference/observability/auto-config.md
new file mode 100644
index 000000000..803fd7ff9
--- /dev/null
+++ b/docs/reference/observability/auto-config.md
@@ -0,0 +1,27 @@
+# OpenTelemetry Auto-Instrumentation Configuration
+
+When you enable [auto-instrumentation](../../how-to-guides/observability/auto-instrumentation.md) in your application the following OpenTelemetry configuration will become available to your application as environment variables:
+
+| Variable | Example Value |
+| ------------------------------------ | --------------------------------------------------------------------------------------------- |
+| `OTEL_SERVICE_NAME` | `my-application` |
+| `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://opentelemetry-collector.nais-system:4317` |
+| `OTEL_EXPORTER_OTLP_PROTOCOL` | `grpc` |
+| `OTEL_EXPORTER_OTLP_INSECURE` | `true` |
+| `OTEL_PROPAGATORS` | `tracecontext,baggage` |
+| `OTEL_TRACES_SAMPLER` | `parentbased_always_on` |
+| `OTEL_RESOURCE_ATTRIBUTES_POD_NAME` | `my-application-777787df6d-pw9mq` |
+| `OTEL_RESOURCE_ATTRIBUTES_NODE_NAME` | `gke-node-abc123` |
+| `OTEL_RESOURCE_ATTRIBUTES` | `service.name=my-application,service.namespace=my-team,k8s.container.name=my-application,...` |
+
+!!! tip
+ Do not hardcode these values in your application. OpenTelemetry SDKs and auto-instrumentation libraries will automatically pick up these environment variables and use them to configure the SDK.
+
+## More OpenTelemetry Configuration
+
+A full list of environment variables that can be used to configure the OpenTelemetry SDK can be found here:
+
+* [:simple-opentelemetry: General SDK Configuration](https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#general-sdk-configuration)
+* [:simple-opentelemetry: OTLP Exporter Configuration](https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/)
+
+[OTLP is the OpenTelemetry Protocol](https://opentelemetry.io/docs/specs/otel/protocol/exporter/), and is the protocol used to send telemetry data to Prometheus, Grafana Tempo, and Grafana Loki.
diff --git a/docs/reference/observability/logs/logql.md b/docs/reference/observability/logs/logql.md
index f8bdf1343..918977e11 100644
--- a/docs/reference/observability/logs/logql.md
+++ b/docs/reference/observability/logs/logql.md
@@ -4,12 +4,10 @@ tags: [reference, loki]
---
# LogQL Reference
-[LogQL][logql] is the query language used in Grafana Loki to query logs. It is a powerful query language that allows you to filter, aggregate, and search for logs and should be familiar to anyone who has used SQL or [PromQL](../metrics/promql.md).
+LogQL is the query language used in Grafana Loki to query logs. It is a powerful query language that allows you to filter, aggregate, and search for logs and should be familiar to anyone who has used SQL or [PromQL](../metrics/promql.md).
Where LogQL differs from PromQL is it's trailing pipeline syntax, or log pipeline. A log pipeline is a set of stage expressions that are chained together and applied to the selected log streams. Each expression can filter out, parse, or mutate log lines and their respective labels.
-[logql]: https://grafana.com/docs/loki/latest/query/
-
## Syntax
A LogQL query is composed by two main parts: the **stream selector** (the query) and the **log pipeline** (the transformation).
diff --git a/docs/reference/observability/otel/tracing.md b/docs/reference/observability/otel/tracing.md
deleted file mode 100644
index 7a9a0d840..000000000
--- a/docs/reference/observability/otel/tracing.md
+++ /dev/null
@@ -1,29 +0,0 @@
----
-description: OpenTelemetry Tracing reference documentation
-tags: [reference, otel]
----
-# OpenTelemetry Tracing Reference
-
-## Environment variables
-
-The following environment variables are available for configuring the tracing library:
-
-| Name | Description | Default value |
-| ----------------------------- | --------------------------------------------------------------------------------- | --------------------------------------------------------- |
-| `OTEL_SERVICE_NAME` | The name of the service. | `$APP_NAME` |
-| `OTEL_RESOURCE_ATTRIBUTES` | A comma-separated list of key-value pairs to be added to the resource attributes. | `service.name=$APP_NAME,service.namespace=$APP_NAMESPACE` |
-| `OTEL_EXPORTER_OTLP_ENDPOINT` | The endpoint to send trace data to. | `http://tempo-distributor.nais-system:4317` |
-| `OTEL_EXPORTER_PROTOCOL` | The protocol to use when sending trace data. Must be `grpc`. | `grpc` |
-
-## Instrumentation libraries
-
-The following libraries are available for instrumenting your application:
-
-| Language | Library |
-| -------- | ------------------------------------------------------------------------------------------------- |
-| Java | [OpenTelemetry Java](https://opentelemetry.io/docs/instrumentation/java/getting-started/) |
-| Node.js | [OpenTelemetry Node.js](https://opentelemetry.io/docs/instrumentation/js/getting-started/nodejs/) |
-| Python | [OpenTelemetry Python](https://opentelemetry.io/docs/python/getting-started/) |
-| Go | [OpenTelemetry Go](https://opentelemetry.io/docs/go/getting-started/) |
-
-You can also see the [nais/examples](https://github.com/nais/examples) repository for examples of how to instrument your application.
\ No newline at end of file
diff --git a/docs/reference/observability/tracing/trace-semconv.md b/docs/reference/observability/tracing/trace-semconv.md
new file mode 100644
index 000000000..f3648708f
--- /dev/null
+++ b/docs/reference/observability/tracing/trace-semconv.md
@@ -0,0 +1,3 @@
+# OpenTelemetry Trace Semantic Conventions
+
+OpenTelemetry Trace Semantic Conventions can be found at [opentelemetry.io](https://opentelemetry.io/docs/specs/semconv/general/trace/).
diff --git a/docs/reference/observability/tracing/traceql.md b/docs/reference/observability/tracing/traceql.md
new file mode 100644
index 000000000..6e9de4adb
--- /dev/null
+++ b/docs/reference/observability/tracing/traceql.md
@@ -0,0 +1,93 @@
+---
+description: TraceQL reference documentation for querying traces in Grafana Tempo.
+tags: [reference, tempo]
+---
+# TraceQL Reference
+
+TraceQL is the query language used in Grafana Tempo to query traces. It is a powerful query language that allows you to filter, aggregate, and search for traces and should be familiar to anyone who has used SQL or [PromQL](../metrics/promql.md).
+
+Where TraceQL differs from PromQL is it's trailing pipeline syntax, or trace pipeline. A trace pipeline is a set of stage expressions that are chained together and applied to the selected trace data. Each expression can filter out, parse, or mutate trace spans and their respective labels.
+
+## Syntax
+
+A TraceQL query is composed by two main parts: the **trace span selector(s)** (the query) and the **trace pipeline** (aggregations).
+
+```traceql
+{label="value"} | stage1 | stage2 | stage3
+```
+
+### Trace Span Selector
+
+The trace span selector is used to select spans based on their attributes. It is a set of key-value pairs that are used to filter spans.
+
+Some span metadata are intrinsic to the span, such as `name`, `status`, `duration`, and `kind`, while others (attributes and resources) are user-defined, such as `service.name`, `db.operation`, and `http.status_code`.
+
+```traceql
+{ span.http.status_code >= 200 && span.http.status_code < 300 }
+```
+
+#### Comparison Operators
+
+Similar to PromQL, TraceQL supports a set of operators for comparing span attributes and values. One notable difference is type coercion, where the type of the attribute is inferred from the value being compared.
+
+- `=` - Equality
+- `!=` - Inequality
+- `>` - Greater than
+- `>=` - Greater than or equal to
+- `<` - Less than
+- `<=` - Less than or equal to
+- `=~` - Regular expression
+- `!~` - Negated regular expression
+
+#### Combining spansets
+
+Since a trace can be composed of multiple spans, multiple selectors can be used together to filter spans based on different attributes.
+
+TraceQL supports two types of combining spansets: logical (`&&` and `||`) and structural relations (`>`, `>\>`, `<\<` `<`, and `~`).
+
+##### Logical
+
+The logical operators `&&` and `||` are used to combine spansets based on their attributes.
+
+```traceql
+{ resource.service.name="server" } && { resource.service.name="client" }
+```
+
+The above query will return traces where a span with the service name `server` and a differ3ent span with the service name `client` are present.
+
+##### Structural
+
+Structural relations are used to filter spans based on their structural relationships. The structural relations are:
+
+- `>` - Direct parent of
+- `>\>` - Ancestor of
+- `<\<` - Descendant of
+- `<` - Direct child of
+- `~` - Sibling of
+
+For example, to find a trace where a specific HTTP request interacted with a particular kafka topic, you could use the following query:
+
+```traceql
+{ span.http.url = "/path/of/api" } >> { span.messaging.destination.name = "my-team.some-topic" }
+```
+
+### Trace Pipeline Aggregations
+
+To further refine the selected spans, a trace pipeline can be used to apply a set of aggregation functions on the selected spans.
+
+- `count` - The count of spans in the spanset.
+- `avg` - The average of a given numeric attribute or intrinsic for a spanset.
+- `max` - The max value of a given numeric attribute or intrinsic for a spanset.
+- `min` - The min value of a given numeric attribute or intrinsic for a spanset.
+- `sum` - The sum value of a given numeric attribute or intrinsic for a spanset.
+
+We can use the `count()` function to count the number of spans in the selected traces. In the following example, we select traces that contains more then 3 database SELECT operations:
+
+```traceql
+{span.db.operation="SELECT"} | count() > 3
+```
+
+## Reference
+
+- [TraceQL query documentation](https://grafana.com/docs/tempo/latest/traceql/)
+- [Get to know TraceQL](https://grafana.com/blog/2023/02/07/get-to-know-traceql-a-powerful-new-query-language-for-distributed-tracing/)
diff --git a/mkdocs.yml b/mkdocs.yml
index afaf22c14..2ad9f8198 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -63,7 +63,6 @@ extra:
using our feedback form or contact us on Slack.
extra_javascript:
- javascript/amplitude_events.js
- - javascript/tenant_redirect.js
extra_css:
- material_theme_stylesheet_overrides/uu.css
- material_theme_stylesheet_overrides/grid.css
@@ -93,6 +92,8 @@ plugins:
'security/salsa/salsa.md': 'security/salsa/README.md'
'addons/wonderwall.md': 'security/auth/wonderwall.md'
'device/update.md': 'how-to-guides/naisdevice/update.md'
+ 'how-to-guides/observability/tracing/otel-tracing.md': 'how-to-guides/observability/auto-instrumentation.md'
+ 'reference/observability/otel/tracing.md': 'reference/observability/auto-config.md'
markdown_extensions:
- toc:
permalink: True