-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.md.gotmpl
155 lines (113 loc) · 6.13 KB
/
README.md.gotmpl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
# circleci-server-monitoring-reference
A reference for tools, configurations, and documentation used to monitor CircleCI server.
🚧 **Under Development**
This repository is currently under active development and is not yet a supported resource. Please refer to it at your own discretion until further notice.
## Table of Contents
- [Installing the Monitoring Stack](#installing-the-monitoring-stack)
- [Modifying or Adding Grafana Dashboards](#modifying-or-adding-grafana-dashboards)
- [Helm Values](#values)
- [Helm Releases](#releases)
{{ template "chart.header" . }}
{{ template "chart.description" . }}
{{ template "chart.versionBadge" . }}{{ template "chart.typeBadge" . }}{{ template "chart.appVersionBadge" . }}
## Installing the Monitoring Stack
{{ template "chart.requirementsSection" . }}
### 1. Configure Server for the Monitoring Stack
To set up monitoring for a CircleCI server instance, you need to configure Telegraf to set up a Prometheus client and expose a metrics endpoint. Add the following configuration to the CircleCI **server** Helm chart values:
```yaml
telegraf:
config:
outputs:
- file:
files: ["stdout"]
- prometheus_client:
listen: ":9273"
```
### 3. Add Helm Repository
First, add the CircleCI Server Monitoring Stack Helm repository:
```bash
$ helm repo add {{ template "chart.name" . }} https://packagecloud.io/circleci/server-monitoring-stack/helm
$ helm repo update
```
### 3. Install Dependencies
Before installing the full chart, you must first install the dependency subcharts, including the Prometheus Custom Resource Definitions (CRDs) and the Grafana operator chart. This assumes you are installing it in the same namespace as your CircleCI server installation:
```bash
$ helm install {{ template "chart.name" . }} {{ template "chart.name" . }}/{{ template "chart.name" . }} --set global.enabled=false --set prometheusOperator.installCRDs=true --version {{ template "chart.version" . }} -n <your-server-namespace>
```
> **_NOTE:_** It's possible to install the monitoring stack in a different namespace than the CircleCI server installation. If you do so, set the `prometheus.serviceMonitor.selectorNamespaces` value with the target namespace.
### 4. Install the Helm Chart
Next, install the Helm chart using the following command:
```bash
$ helm upgrade --install {{ template "chart.name" . }} {{ template "chart.name" . }}/{{ template "chart.name" . }} --reset-values --version {{ template "chart.version" . }} -n <your-server-namespace>
```
### 5. Verify Prometheus Is Up and Targeting Telegraf
To verify that Prometheus is working correctly and targeting Telegraf, use the following command to port-forward Prometheus:
```bash
$ kubectl port-forward svc/server-monitoring-prometheus 9090:9090 -n <your-namespace-here>
```
Then visit http://localhost:9090/targets in your browser. Verify that Telegraf appears as a target and that its state is "up".

### 6. Verify Grafana Is Up and Connected to Prometheus
To verify that Grafana is working correctly and connected to Prometheus, use the following command to port-forward Grafana:
```bash
$ kubectl port-forward svc/server-monitoring-grafana-service 3000:3000 <your-namespace-here>
```
Then visit http://localhost:3000 in your browser. Once logged in with the default credentials, navigate to http://localhost:3000/dashboards and verify that the default dashboards are present and populating with data.

### 7. Next Steps
After ensuring both Prometheus and Grafana are operational, consider these enhancements:
#### Security
Secure Grafana by configuring credentials:
```yaml
grafana:
credentials:
# Directly set these for quick setups
adminUser: "admin"
adminPassword: "<your-secure-password-here>"
# For production, use a Kubernetes secret to manage credentials securely
existingSecretName: "<your-secret-here>"
```
#### Expose Grafana Externally
For external access, modify the service or ingress values. For example:
```yaml
grafana:
service:
type: LoadBalancer
```
#### Enabling Persistent Storage
Persist data by enabling storage for Prometheus and Grafana:
```yaml
prometheus:
persistence:
enabled: true
storageClass: <your-custom-storage-class>
grafana:
persistence:
enabled: true
storageClass: <your-custom-storage-class>
```
> **_NOTE:_** Use a custom storage class with a 'Retain' policy to allow for data retention even after uninstalling the chart.
## Modifying or Adding Grafana Dashboards
The default dashboards are located in the `dashboards` directory of the reference chart. To add new dashboards or modify existing ones, follow these steps.
Dashboards are provisioned directly from CRDs, which means any manual edits will be lost upon a refresh. As such, the workflow outlined below is recommended for making changes:
1. **Create a Copy**:
- Select **Edit** in the upper right corner.
- Choose **Save dashboard** -> **Save as copy**.
- After saving, navigate to the copy.
2. **Make Edits**:
- Modify the copy as needed and exit edit mode.
3. **Export as JSON**:
- Select **Export** in the upper right corner and then **Export as JSON**.
- **Ensure that `Export the dashboard to use in another instance` is toggled on.**
4. **Update the JSON File**:
- Download the file and replace the `./dashboards/server-slis.json` file with the updated copy.
- Run the following command to automatically validate the JSON and apply necessary updates:
```bash
./do validate-dashboards
```
5. **Commit and Open a PR**:
- Review and commit the changes.
- Open a pull request for the On-Prem team to review.
{{ template "chart.valuesSection" . }}
## Releases
Releases are managed by the CI/CD pipeline on the main branch, with an approval job gate called `approve-deploy-chart`. Before releasing, increment the Helm chart version in `Chart.yaml` and regenerate the documentation using `./do helm-docs`. Once approved, the release will be available in the [package repository](https://packagecloud.io/circleci/server-monitoring-stack).