Skip to content

Commit 9e1cc03

Browse files
committed
feat(satp): added predefined grafana dashboards and alerts
Signed-off-by: Jorge Santos <[email protected]>
1 parent aa0e6a2 commit 9e1cc03

File tree

11 files changed

+1822
-43
lines changed

11 files changed

+1822
-43
lines changed

packages/cactus-plugin-satp-hermes/docker-compose-satp.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,13 @@ services:
2424
OTEL_METRIC_EXPORT_INTERVAL: "1000"
2525
OTEL_EXPORTER_OTLP_METRICS_DEFAULT_HISTOGRAM_AGGREGATION: explicit_bucket_histogram
2626
ports:
27-
- "4000:3000" # Grafana
27+
- "3000:3000" # Grafana
2828
- "4317:4317" # OpenTelemetry gRPC endpoint
2929
- "4318:4318" # OpenTelemetry HTTP endpoint
3030
- "9090:9090" # Prometheus
3131
- "3100:3100" # Loki
3232
- "3200:3200" # Tempo
3333
restart: unless-stopped
34+
volumes:
35+
- ./grafana/provisioning/dashboards:/otel-lgtm/grafana/conf/provisioning/dashboards
36+
- ./grafana/provisioning/alerting:/otel-lgtm/grafana/conf/provisioning/alerting
Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
# Grafana / OTEL-LGTM Provisioning Guide
2+
3+
This repository defines the provisioning setup for **Grafana** as part of the **[OpenTelemetry LGTM](https://hub.docker.com/r/grafana/otel-lgtm) (Loki, Grafana, Tempo, Prometheus)** stack.
4+
Provisioning allows you to **automatically configure dashboards, alerting rules, and contact points** through YAML files — ensuring consistent, version-controlled observability configurations.
5+
6+
---
7+
8+
## Overview
9+
10+
Provisioning in Grafana means preloading configuration files that are applied automatically when Grafana starts.
11+
This setup covers three main provisioning categories:
12+
13+
1. **Dashboards** – Predefined visualizations for metrics, logs, and traces.
14+
2. **Alerting Rules** – Conditions that trigger alerts based on metric or log thresholds.
15+
3. **Contact Points** – Notification destinations for alert delivery (e.g., email, Discord, Slack).
16+
17+
All provisioning files are placed inside the docker image under the directory `/otel-lgtm/grafana/conf/provisioning/` (or equivalent custom mount path defined in the [`docker-compose-satp.yml`](../docker-compose-satp.yml)).
18+
19+
---
20+
21+
## Directory Structure
22+
23+
To provision the monitor system with dashboards, alerts or contact points use the following repository layout:
24+
25+
provisioning/<br>
26+
├── dashboards/<br>
27+
│ ├── [grafana-dashboards.yaml](#grafana-dashboardsyaml) &emsp;&emsp;&emsp;&nbsp;*# Dashboard provisioning configuration*<br>
28+
│ ├── dashboard-#1.json &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;*# Dashboard #1 definition*<br>
29+
│ ├── ...<br>
30+
│ └── [dashboard-#n.json](#dashboard-njson) &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;*# Dashboard #n definition*<br>
31+
├── alerting/<br>
32+
│ ├── alert-#1.yaml &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&nbsp;*# Alert group and rule definitions #1*<br>
33+
│ ├── ...<br>
34+
│ ├── [alert-#m.yaml](#alert-myaml) &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;*# Alert group and rule definitions #m*<br>
35+
│ ├── contact-point-#1.yaml &emsp;&emsp;&emsp;&emsp;&nbsp;*# Contact point definitions #1*<br>
36+
│ ├── ...<br>
37+
│ └── [contact-point-#p.yaml](#contact-point-pyaml) &emsp;&emsp;&emsp;&emsp;&nbsp;*# Contact point definitions #p*<br>
38+
39+
### grafana-dashboards.yaml
40+
41+
This file contains the list of dashboards to be made available on start of the docker image. Each entry in the providers list defines a new dashboard.
42+
43+
### dashboard-#n.json
44+
45+
An example of a dashboard, that must be included in the [grafana-dashboards.yaml](#grafana-dashboardsyaml). Each dashboard is defined in a separate JSON file
46+
47+
Official documentation for dashboard provisioning available [here](https://grafana.com/docs/grafana/latest/administration/provisioning/#dashboards).
48+
49+
### alert-#m.yaml
50+
51+
An example of an alert. Alerts are defined in yaml files.
52+
53+
Official documentation for alert provisioning available [here](https://grafana.com/docs/grafana/latest/alerting/set-up/provision-alerting-resources/file-provisioning/#import-alert-rules).
54+
55+
### contact-point-#p.yaml
56+
57+
An example of a contact point. Contact points are defined in yaml files.
58+
59+
Official documentation for contact point provisioning available [here](https://grafana.com/docs/grafana/latest/alerting/set-up/provision-alerting-resources/file-provisioning/#import-contact-points).
60+
61+
## Usage Explanation
62+
63+
To customize the available dashboards, alerts and contact points, there are some files that require creation or modification. To create these elements there are 2 possible options:
64+
65+
- Read the [official documentation](https://grafana.com/docs/grafana/latest/) (less intuitive).
66+
- Initiate the docker image, create the element and export it (more interactive).
67+
68+
In this explanation, we will provide a step-by-step tutorial on how to create a [dashboard](#dashboard-creation), an [alert rule](#alert-rule-creation) or a [contact point](#contact-point-creation) with the help of Grafana's built-in tools (the second option).
69+
70+
### Dashboard Creation
71+
72+
1. Go to the [`test file`](../src/test/typescript/integration/monitoring/functionality.test.ts) for monitoring and comment the lines that call the function `stopDockerComposeService`.
73+
74+
2. Run the test using `npx jest ./packages/cactus-plugin-satp-hermes/src/test/typescript/integration/monitoring/functionality.test.ts` from the project root.
75+
76+
*Note: This is done to make the metrics appear in Grafana, facilitating the process of creation.*
77+
78+
3. Login into Grafana using the following [link](http://localhost:3000/login) using `admin` as both *username* and *password*.
79+
80+
*Note: You might require to change the address in case it is not running on localhost.*
81+
82+
4. Access the Grafana dashboard endpoint using the following [link](http://localhost:3000/dashboards).
83+
84+
*Note: You might require to change the address in case it is not running on localhost.*
85+
86+
5. Click the `New` button and then the `New dashboard` button.
87+
88+
6. Click the `Add visualization` and select a data source for the data visualization.
89+
90+
7. Configure the panel with the desired data.
91+
92+
8. Click `Save dashboard`, name the dashboard and click `Save`.
93+
94+
9. If more panels are desired, click `Add` and then `Visualization`. Redo step 7 and 8.
95+
96+
10. After all panels are added, click `Exit edit` and then, `Export` followed by `Export as code`.
97+
98+
11. Click `Download file`, then move the file to the folder `grafana/provisioning/dashboards/` inside the package for the SATP-Hermes project.
99+
100+
12. On the file [grafana-dashboards.yaml](./provisioning/dashboards/grafana-dashboards.yaml), create a new entry by copying an existing example one and change the path to have the name of the new dashboard and change the property name of the dashboard itself.
101+
102+
13. Kill the running docker image for the container regarding otel-lgtm, rerun the test from step 2 and check if the dashboard is automatically provisioned.
103+
104+
14. Go to the [`test file`](../src/test/typescript/integration/monitoring/functionality.test.ts) for monitoring and uncomment the commented lines that call the function `stopDockerComposeService`.
105+
106+
### Contact Point Creation
107+
108+
*Note: This section appears before alert creation since alerts rely on the existence of established contact points.*
109+
110+
1. Access the Grafana dashboard endpoint using the following [link](http://localhost:3000/alerting/notifications).
111+
112+
*Note: You might require to change the address in case it is not running on localhost.*
113+
114+
2. Click the `Create contact point` button.
115+
116+
3. Browse the integration options, selecting the one that better suits the use case (eg. discord, email, etc.).
117+
118+
4. Fill the name of the contact point and the specific details of the integration option.
119+
120+
5. Click `Test` to assess the correct functioning of the contact point.
121+
122+
6. Click `Save contact point`.
123+
124+
7. Click `More` on the newly created contact point, followed by `Export`.
125+
126+
8. Click `Download`, then move the file to the folder `grafana/provisioning/alerting/` inside the package for the SATP-Hermes project.
127+
128+
### Alert Rule Creation
129+
130+
1. Go to the [`test file`](../src/test/typescript/integration/monitoring/functionality.test.ts) for monitoring and comment the lines that call the function `stopDockerComposeService`.
131+
132+
2. Run the test using `npx jest ./packages/cactus-plugin-satp-hermes/src/test/typescript/integration/monitoring/functionality.test.ts` from the project root.
133+
134+
*Note: This is done to make the metrics appear in Grafana, facilitating the process of creation.*
135+
136+
3. Access the Grafana dashboard endpoint using the following [link](http://localhost:3000/alerting/list).
137+
138+
*Note: You might require to change the address in case it is not running on localhost.*
139+
140+
4. Click the `New alert rule` button.
141+
142+
5. Fill in the name and select the metric to track.
143+
144+
6. Configure the threshold that should trigger the alarm.
145+
146+
7. Select the folder for your rule or click `New folder` to create a new one, giving it a name and clicking `Create`.
147+
148+
8. Select the evaluation group for your rule (periodicity of evaluation) or click `New evaluation group` to create a new one, giving it a name and an evaluation interval, and clicking `Create`.
149+
150+
9. Select the contact point (if none is defined, check its corresponding [section](#contact-point-creation)).
151+
152+
10. (Optional) Configure the notification message.
153+
154+
11. Click `Save`.
155+
156+
12. Click `More` on the newly created alert rule, followed by `Export` and `With modifications`.
157+
158+
13. Scroll down and click `Export`.
159+
160+
14. Click `Download`, then move the file to the folder `grafana/provisioning/alerting/` inside the package for the SATP-Hermes project.
161+
162+
15. Kill the running docker image for the container regarding otel-lgtm, rerun the test from step 2 and check if the alert is automatically provisioned.
163+
164+
16. Go to the [`test file`](../src/test/typescript/integration/monitoring/functionality.test.ts) for monitoring and uncomment the commented lines that call the function `stopDockerComposeService`.
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
apiVersion: 1
2+
3+
# Definition of alert rule for failed transactions
4+
# THIS IS AN EXAMPLE FILE. REPLACE THE VALUES WITH YOUR OWN ALERT CONDITIONS TO ENABLE ALERT NOTIFICATIONS.
5+
groups:
6+
- orgId: 1
7+
name: Alerts
8+
folder: Alerts
9+
interval: 1m
10+
rules:
11+
- uid: bf0keqynb8ni8d
12+
title: Failed Transactions
13+
condition: Failed Transactions over 0
14+
data:
15+
- refId: Failed Transactions
16+
relativeTimeRange:
17+
from: 600
18+
to: 0
19+
datasourceUid: prometheus
20+
model:
21+
disableTextWrap: false
22+
editorMode: builder
23+
expr: failed_transactions_total
24+
fullMetaSearch: false
25+
includeNullMetadata: true
26+
instant: true
27+
intervalMs: 1000
28+
legendFormat: __auto
29+
maxDataPoints: 43200
30+
range: false
31+
refId: Failed Transactions
32+
useBackend: false
33+
- refId: Failed Transactions over 0
34+
datasourceUid: __expr__
35+
model:
36+
conditions:
37+
- evaluator:
38+
params:
39+
- 0
40+
type: gt
41+
operator:
42+
type: and
43+
query:
44+
params:
45+
- Failed Transactions over 0
46+
reducer:
47+
params: []
48+
type: last
49+
type: query
50+
datasource:
51+
type: __expr__
52+
uid: __expr__
53+
expression: Failed Transactions
54+
intervalMs: 1000
55+
maxDataPoints: 43200
56+
refId: Failed Transactions over 0
57+
type: threshold
58+
noDataState: NoData
59+
execErrState: Error
60+
for: 1m
61+
annotations: {}
62+
labels: {}
63+
isPaused: false
64+
notification_settings:
65+
receiver: test-contact-points
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
apiVersion: 1
2+
3+
# Definition of contact points for alerting
4+
# THIS IS AN EXAMPLE FILE. REPLACE THE VALUES WITH YOUR OWN CONTACT INFORMATION TO ENABLE ALERT NOTIFICATIONS.
5+
contactPoints:
6+
- orgId: 1
7+
name: test-contact-points
8+
receivers:
9+
- uid: af1q2g33sy3uod
10+
type: discord
11+
settings:
12+
# Replace with your Discord webhook URL (to test existing one, join the server using https://discord.gg/VnvMpcs2kT)
13+
url: https://discord.com/api/webhooks/1430194325597196369/7Giic1fhAVEyhSO3Uy00yzgsyWMREEILBPMQACKXUghoB1KNdoOhoqWHntdE0CEanEJb
14+
use_discord_username: false
15+
disableResolveMessage: false
16+
- uid: ff1q2g33sy3upc
17+
type: email
18+
settings:
19+
# Replace with your email address
20+
addresses: [email protected]
21+
singleEmail: false
22+
disableResolveMessage: false

0 commit comments

Comments
 (0)