Skip to content

Commit f6ca84c

Browse files
authored
Merge pull request #35 from coder/dk/prebuilds-alerts
Add `CoderdUnprovisionedPrebuiltWorkspaces` alert
2 parents b6af003 + 391c7f6 commit f6ca84c

File tree

6 files changed

+138
-13
lines changed

6 files changed

+138
-13
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -244,7 +244,7 @@ values which are defined [here](https://github.com/grafana/helm-charts/tree/main
244244

245245
| Key | Type | Default | Description |
246246
|-----|------|---------|-------------|
247-
| global.coder.alerts | object | `{"coderd":{"groups":{"CPU":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":0.9,"warning":0.8}},"IneligiblePrebuilds":{"delay":"10m","enabled":true,"thresholds":{"notify":1}},"Memory":{"delay":"10m","enabled":true,"thresholds":{"critical":0.9,"warning":0.8}},"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}},"Restarts":{"delay":"1m","enabled":true,"period":"10m","thresholds":{"critical":3,"notify":1,"warning":2}},"WorkspaceBuildFailures":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":10,"notify":2,"warning":5}}}},"enterprise":{"groups":{"Licences":{"delay":"1m","enabled":true,"thresholds":{"critical":1,"warning":0.9}}}},"provisionerd":{"groups":{"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}}}}}` | alerts for the various aspects of Coder |
247+
| global.coder.alerts | object | `{"coderd":{"groups":{"CPU":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":0.9,"warning":0.8}},"IneligiblePrebuilds":{"delay":"10m","enabled":true,"thresholds":{"notify":1}},"Memory":{"delay":"10m","enabled":true,"thresholds":{"critical":0.9,"warning":0.8}},"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}},"Restarts":{"delay":"1m","enabled":true,"period":"10m","thresholds":{"critical":3,"notify":1,"warning":2}},"UnprovisionedPrebuiltWorkspaces":{"delay":"10m","enabled":true,"thresholds":{"warn":1}},"WorkspaceBuildFailures":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":10,"notify":2,"warning":5}}}},"enterprise":{"groups":{"Licences":{"delay":"1m","enabled":true,"thresholds":{"critical":1,"warning":0.9}}}},"provisionerd":{"groups":{"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}}}}}` | alerts for the various aspects of Coder |
248248
| global.coder.coderdSelector | string | `"pod=~`coder.*`, pod!~`.*provisioner.*`"` | series selector for Prometheus/Loki to locate provisioner pods. ensure this uses backticks for quotes! |
249249
| global.coder.controlPlaneNamespace | string | `"coder"` | the namespace into which the control plane has been deployed. |
250250
| global.coder.externalProvisionersNamespace | string | `"coder"` | the namespace into which any external provisioners have been deployed. |

coder-observability/runbooks/coderd.md

+51-1
Original file line numberDiff line numberDiff line change
@@ -82,4 +82,54 @@ Please contact your Coder sales contact, or visit https://coder.com/contact/sale
8282
Prebuilds only become eligible to be claimed by users once the workspace's agent is a) running and b) all of its startup
8383
scripts have completed.
8484

85-
If a prebuilt workspace is not eligible, view its agent logs to diagnose the problem.
85+
If a prebuilt workspace is not eligible, view its agent logs to diagnose the problem.
86+
87+
## CoderdUnprovisionedPrebuiltWorkspaces
88+
89+
The number of running prebuilt workspaces is lower than the desired instances. This could be for several reasons,
90+
ordered by likehood:
91+
92+
### Experiment/License
93+
94+
The prebuilds feature is currently gated behind an experiment *and* a premium license.
95+
96+
Ensure that the prebuilds experiment is enabled with `CODER_EXPERIMENTS=workspace-prebuilds`, and that you have a premium
97+
license added.
98+
99+
### Preset Validation Issue
100+
101+
Templates which have prebuilds configured will require a configured preset defined, with ALL of the required parameters
102+
set in the preset. If any of these are missing, or any of the parameters - as defined - fail validation, then the prebuilds
103+
subsystem will refuse to attempt a workspace build.
104+
105+
Consult the coderd logs for more information; look out for errors or warnings from the prebuilds subsystem.
106+
107+
### Template Misconfiguration or Error
108+
109+
Prebuilt workspaces cannot be provisioned due to some issue at `terraform apply`-time. This could be due to misconfigured
110+
cloud resources, improper authorization, or any number of other issues.
111+
112+
Visit the Workspaces page, change the search term to `owner:prebuilds`, and view on the previously failed builds. The
113+
error will likely be quite obvious.
114+
115+
### Provisioner Latency
116+
117+
If your provisioners are overloaded and cannot process provisioner jobs quickly enough, prebuilt workspaces may be affected.
118+
There is no prioritization at present for prebuilt workspace jobs.
119+
120+
Ensure your provisioners are appropriately resources (i.e. you have enough instances) to handle the concurrent build demand.
121+
122+
### Use of Workspace Tags
123+
124+
If you are using `coder_workspace_tags` ([docs](https://coder.com/docs/admin/templates/extending-templates/workspace-tags))
125+
in your template, chances are you do not have any provisioners running or they are under-resourced (see **Provisioner Latency**).
126+
127+
Ensure your running provisioners are configured with your desired tags.
128+
129+
### Reconciliation Loop Issue
130+
131+
The prebuilds subsystem runs a _reconciliation loop_ which monitors the state of prebuilt workspaces to ensure the desired
132+
number of instances are present at all times. Workspace Prebuilds is currently a BETA feature and so there could be a bug
133+
in this _reconciliation loop_, which should be reported to Coder.
134+
135+
Examine your coderd logs for any errors or warnings relating to prebuilds.

coder-observability/templates/configmap-prometheus-alerts.yaml

+20
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,26 @@ data:
125125
{{- end }}
126126
{{- end }}
127127

128+
{{- with .groups.UnprovisionedPrebuiltWorkspaces }}
129+
{{- $group := . }}
130+
{{- if .enabled }}
131+
- name: Coderd Unprovisioned Prebuilt Workspaces
132+
rules:
133+
{{ $alert := "CoderdUnprovisionedPrebuiltWorkspaces" }}
134+
{{- range $severity, $threshold := .thresholds }}
135+
- alert: {{ $alert }}
136+
expr: max by (template_name, preset_name) (coderd_prebuilds_desired - coderd_prebuilds_running) > 0
137+
for: {{ $group.delay }}
138+
annotations:
139+
summary: >
140+
{{ `{{ $value }}` }} prebuilt workspace(s) not yet been provisioned for the "{{ `{{ $labels.template_name }}` }}" template and "{{ `{{ $labels.preset_name }}` }}" preset.
141+
labels:
142+
severity: {{ $severity }}
143+
runbook_url: {{ template "runbook-url" (deepCopy $ | merge (dict "alert" $alert) $service) }}
144+
{{- end }}
145+
{{- end }}
146+
{{- end }}
147+
128148
{{- end }} {{/* end-section */}}
129149

130150

coder-observability/templates/dashboards/_dashboards_prebuilds.json.tpl

+4-4
Original file line numberDiff line numberDiff line change
@@ -111,9 +111,9 @@
111111
},
112112
"editorMode": "code",
113113
"expr": "min(coderd_experiments{experiment=\"workspace-prebuilds\"})",
114-
"instant": false,
114+
"instant": true,
115115
"legendFormat": "__auto",
116-
"range": true,
116+
"range": false,
117117
"refId": "A"
118118
}
119119
],
@@ -645,7 +645,7 @@
645645
"refId": "E"
646646
}
647647
],
648-
"title": "Change over range: $preset",
648+
"title": "Pool Capacity: $preset",
649649
"type": "timeseries"
650650
},
651651
{
@@ -871,7 +871,7 @@
871871
"refId": "F"
872872
}
873873
],
874-
"title": "Change over range: $preset",
874+
"title": "Pool Operations: $preset",
875875
"type": "timeseries"
876876
},
877877
{

coder-observability/values.yaml

+5
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,11 @@ global:
8181
delay: 10m
8282
thresholds:
8383
notify: 1
84+
UnprovisionedPrebuiltWorkspaces:
85+
enabled: true
86+
delay: 10m
87+
thresholds:
88+
warn: 1
8489
provisionerd:
8590
groups:
8691
Replicas:

0 commit comments

Comments
 (0)