Skip to content

Commit d8f9e8f

Browse files
authored
feat(application-template): PrometheusRule CR 기능 추가하기 (#17)
* fix: render service monitor CR only the CRD installed * feat: add prometheus rule CR * feat: change rules to alerting_roles * feat: worker, scheduler에 prometheus rule 추가 * feat: trigger actions * feat: update chart version * feat: update readme
1 parent 8a42bea commit d8f9e8f

File tree

7 files changed

+295
-5
lines changed

7 files changed

+295
-5
lines changed

charts/application-template/Chart.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,4 @@ maintainers:
99
- name: modusign
1010
url: https://github.com/modusign
1111
name: application-template
12-
version: 1.4.2
12+
version: 1.5.0

charts/application-template/README.md

+9-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# application-template
22

3-
![Version: 1.3.2](https://img.shields.io/badge/Version-1.3.2-informational?style=flat-square) ![AppVersion: v1.0.0](https://img.shields.io/badge/AppVersion-v1.0.0-informational?style=flat-square)
3+
![Version: 1.5.0](https://img.shields.io/badge/Version-1.5.0-informational?style=flat-square) ![AppVersion: v1.0.0](https://img.shields.io/badge/AppVersion-v1.0.0-informational?style=flat-square)
44

55
A Helm chart for Modusign Applications
66

@@ -33,6 +33,8 @@ Kubernetes: `>=1.23`
3333
| global.minReadySeconds | int | `60` | optional field that specifies the minimum number of seconds for which a newly created Pod should be ready without any of its containers crashing |
3434
| global.nodeSelector | object | `{}` | Default node selector for all components |
3535
| global.observability.datadog | object | `{"admissionController":{"enabled":false}}` | inject datadog admission controller env label |
36+
| global.observability.prometheus | object | `{"serviceMonitor":{"enabled":false,"path":"/metrics","portName":"metrics"}}` | set up additional service port and setup |
37+
| global.observability.prometheus.serviceMonitor | object | `{"enabled":false,"path":"/metrics","portName":"metrics"}` | create Prometheus Operator ServiceMonitor CR |
3638
| global.podAnnotations | object | `{}` | Annotations for the all deployed pods |
3739
| global.podLabels | object | `{}` | Labels for the all deployed pods |
3840
| global.revisionHistoryLimit | int | `3` | Number of old deployment ReplicaSets to retain. The rest will be garbage collected. |
@@ -73,6 +75,8 @@ Kubernetes: `>=1.23`
7375
| scheduler.istio.virtualServices | list | `[]` | virtualService configuration |
7476
| scheduler.lifecycle | object | `{}` | Specify postStart and preStop lifecycle hooks for your container |
7577
| scheduler.nodeSelector | object | `{}` (defaults to global.nodeSelector) | [Node selector] |
78+
| scheduler.observability.prometheus.alerting_rules | object | `{"enabled":false,"highCpuUsageThreshold":70,"highMemoryUsageThreshold":70}` | create Prometheus Operator PrometheusRule CR for service container |
79+
| scheduler.observability.prometheus.istio_alerting_rules | object | `{"enabled":false,"highCpuUsageThreshold":70,"highMemoryUsageThreshold":70}` | create Prometheus Operator PrometheusRule CR for istio proxy container |
7680
| scheduler.pdb.annotations | object | `{}` | Annotations to be added to scheduler pdb |
7781
| scheduler.pdb.enabled | bool | `false` | Deploy a [PodDisruptionBudget] for the scheduler |
7882
| scheduler.pdb.labels | object | `{}` | Labels to be added to scheduler pdb |
@@ -128,6 +132,8 @@ Kubernetes: `>=1.23`
128132
| server.istio.virtualServices | list | `[]` | virtualService configuration |
129133
| server.lifecycle | object | `{}` | Specify postStart and preStop lifecycle hooks for your container |
130134
| server.nodeSelector | object | `{}` (defaults to global.nodeSelector) | [Node selector] |
135+
| server.observability.prometheus.alerting_rules | object | `{"enabled":false,"highCpuUsageThreshold":70,"highMemoryUsageThreshold":70}` | create Prometheus Operator PrometheusRule CR for service container |
136+
| server.observability.prometheus.istio_alerting_rules | object | `{"enabled":false,"highCpuUsageThreshold":70,"highMemoryUsageThreshold":70}` | create Prometheus Operator PrometheusRule CR for istio proxy container |
131137
| server.pdb.annotations | object | `{}` | Annotations to be added to server pdb |
132138
| server.pdb.enabled | bool | `true` | Deploy a [PodDisruptionBudget] for the server |
133139
| server.pdb.labels | object | `{}` | Labels to be added to server pdb |
@@ -181,6 +187,8 @@ Kubernetes: `>=1.23`
181187
| worker.istio.virtualServices | list | `[]` | virtualService configuration |
182188
| worker.lifecycle | object | `{}` | Specify postStart and preStop lifecycle hooks for your container |
183189
| worker.nodeSelector | object | `{}` (defaults to global.nodeSelector) | [Node selector] |
190+
| worker.observability.prometheus.alerting_rules | object | `{"enabled":false,"highCpuUsageThreshold":70,"highMemoryUsageThreshold":70}` | create Prometheus Operator PrometheusRule CR for service container |
191+
| worker.observability.prometheus.istio_alerting_rules | object | `{"enabled":false,"highCpuUsageThreshold":70,"highMemoryUsageThreshold":70}` | create Prometheus Operator PrometheusRule CR for istio proxy container |
184192
| worker.pdb.annotations | object | `{}` | Annotations to be added to worker pdb |
185193
| worker.pdb.enabled | bool | `false` | Deploy a [PodDisruptionBudget] for the worker |
186194
| worker.pdb.labels | object | `{}` | Labels to be added to worker pdb |
@@ -207,5 +215,3 @@ Kubernetes: `>=1.23`
207215
| worker.volumes | list | `[]` | Additional volumes to the application worker pod |
208216
| worker.workload | string | `"deployment"` | set deployment kind to Rollouts rollout: enabled : false |
209217

210-
----------------------------------------------
211-
Autogenerated from chart metadata using [helm-docs v1.11.3](https://github.com/norwoodj/helm-docs/releases/v1.11.3)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
{{- if .Capabilities.APIVersions.Has "monitoring.coreos.com/v1/PrometheusRule" }}
2+
{{- if and .Values.scheduler.enabled (or .Values.scheduler.observability.prometheus.alerting_rules.enabled .Values.scheduler.observability.prometheus.istio_alerting_rules.enabled) }}
3+
apiVersion: monitoring.coreos.com/v1
4+
kind: PrometheusRule
5+
metadata:
6+
name: {{ template "application.scheduler.name" . }}
7+
namespace: {{ .Release.Namespace }}
8+
spec:
9+
groups:
10+
{{- if .Values.scheduler.observability.prometheus.alerting_rules.enabled }}
11+
- name: ServiceContainerResourceUsage
12+
alerting_rules:
13+
- alert: "HighServiceContainerCPUUsage"
14+
expr: |
15+
avg(
16+
rate(container_cpu_usage_seconds_total{ container={{ .Values.scheduler.name | quote }} }[2m]) * on(pod) group_left kube_pod_labels{ label_app_kubernetes_io_name={{ include "application.scheduler.name" . | quote }} }
17+
/ on(pod)
18+
(kube_pod_container_resource_limits{ resource="cpu", container={{ .Values.scheduler.name | quote }} })
19+
)
20+
* 100
21+
> {{ .Values.scheduler.observability.prometheus.alerting_rules.highCpuUsageThreshold }}
22+
for: 5m
23+
labels:
24+
severity: critical
25+
annotations:
26+
summary: "[{{ include "application.scheduler.name" . | title }}] High CPU usage"
27+
description: "[{{ include "application.scheduler.name" . | title }}] 서비스의 최근 CPU 사용량이 {{ .Values.scheduler.observability.prometheus.alerting_rules.highCpuUsageThreshold }}% 이상이 되었습니다. 현재값: {{`{{ .Value | humanize }}`}}%"
28+
29+
- alert: HighServiceContainerMemoryUsage
30+
expr: |
31+
avg(
32+
(container_memory_rss{ container={{ .Values.scheduler.name | quote }} } * on(pod) group_left kube_pod_labels{ label_app_kubernetes_io_name={{ include "application.scheduler.name" . | quote }} })
33+
/ on(pod)
34+
(kube_pod_container_resource_limits{ resource="memory", container={{ .Values.scheduler.name | quote }} })
35+
) * 100
36+
> {{ .Values.scheduler.observability.prometheus.alerting_rules.highMemoryUsageThreshold }}
37+
for: 5m
38+
labels:
39+
service: {{ include "application.scheduler.name" . | quote }}
40+
severity: critical
41+
annotations:
42+
summary: "[{{ include "application.scheduler.name" . | title }}] High memory usage"
43+
description: "[{{ include "application.scheduler.name" . | title }}] 서비스의 최근 메모리 사용량이 {{ .Values.scheduler.observability.prometheus.alerting_rules.highMemoryUsageThreshold }}% 이상이 되었습니다. 현재값: {{`{{ .Value | humanize }}`}}%"
44+
{{- end }}
45+
{{- if .Values.scheduler.observability.prometheus.istio_alerting_rules.enabled }}
46+
- name: IstioContainerResourceUsage
47+
alerting_rules:
48+
- alert: "HighIstioContainerCPUUsage"
49+
expr: |
50+
avg(
51+
rate(container_cpu_usage_seconds_total{ container="istio-proxy" }[2m]) * on(pod) group_left kube_pod_labels{ label_app_kubernetes_io_name={{ include "application.scheduler.name" . | quote }} }
52+
/ on(pod)
53+
(kube_pod_container_resource_limits{ resource="cpu", container="istio-proxy" })
54+
)
55+
* 100
56+
> {{ .Values.scheduler.observability.prometheus.istio_alerting_rules.highCpuUsageThreshold }}
57+
for: 5m
58+
labels:
59+
severity: critical
60+
annotations:
61+
summary: "[{{ include "application.scheduler.name" . | title }}][istio-proxy] High CPU usage"
62+
description: "[{{ include "application.scheduler.name" . | title }}][istio-proxy] 서비스의 최근 CPU 사용량이 {{ .Values.scheduler.observability.prometheus.istio_alerting_rules.highCpuUsageThreshold }}% 이상이 되었습니다. 현재값: {{`{{ .Value | humanize }}`}}%"
63+
64+
- alert: HighIstioContainerMemoryUsage
65+
expr: |
66+
avg(
67+
(container_memory_rss{ container="istio-proxy" } * on(pod) group_left kube_pod_labels{ label_app_kubernetes_io_name={{ include "application.scheduler.name" . | quote }} })
68+
/ on(pod)
69+
(kube_pod_container_resource_limits{ resource="memory", container="istio-proxy" })
70+
) * 100
71+
> {{ .Values.scheduler.observability.prometheus.istio_alerting_rules.highMemoryUsageThreshold }}
72+
for: 5m
73+
labels:
74+
service: {{ include "application.scheduler.name" . | quote }}
75+
severity: critical
76+
annotations:
77+
summary: "[{{ include "application.scheduler.name" . | title }}][istio-proxy] High memory usage"
78+
description: "[{{ include "application.scheduler.name" . | title }}][istio-proxy] 서비스의 최근 메모리 사용량이 {{ .Values.scheduler.observability.prometheus.istio_alerting_rules.highMemoryUsageThreshold }}% 이상이 되었습니다. 현재값: {{`{{ .Value | humanize }}`}}%"
79+
{{- end }}
80+
{{- end }}
81+
{{- end }}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
{{- if .Capabilities.APIVersions.Has "monitoring.coreos.com/v1/PrometheusRule" }}
2+
{{- if and .Values.server.enabled (or .Values.server.observability.prometheus.rules.enabled .Values.server.observability.prometheus.istio_rules.enabled) }}
3+
apiVersion: monitoring.coreos.com/v1
4+
kind: PrometheusRule
5+
metadata:
6+
name: {{ template "application.server.name" . }}
7+
namespace: {{ .Release.Namespace }}
8+
spec:
9+
groups:
10+
{{- if .Values.server.observability.prometheus.rules.enabled }}
11+
- name: ServiceContainerResourceUsage
12+
rules:
13+
- alert: "HighServiceContainerCPUUsage"
14+
expr: |
15+
avg(
16+
rate(container_cpu_usage_seconds_total{ container={{ .Values.server.name | quote }} }[2m]) * on(pod) group_left kube_pod_labels{ label_app_kubernetes_io_name={{ include "application.server.name" . | quote }} }
17+
/ on(pod)
18+
(kube_pod_container_resource_limits{ resource="cpu", container={{ .Values.server.name | quote }} })
19+
)
20+
* 100
21+
> {{ .Values.server.observability.prometheus.rules.highCpuUsageThreshold }}
22+
for: 5m
23+
labels:
24+
severity: critical
25+
annotations:
26+
summary: "[{{ include "application.server.name" . | title }}] High CPU usage"
27+
description: "[{{ include "application.server.name" . | title }}] 서비스의 최근 CPU 사용량이 {{ .Values.server.observability.prometheus.rules.highCpuUsageThreshold }}% 이상이 되었습니다. 현재값: {{`{{ .Value | humanize }}`}}%"
28+
29+
- alert: HighServiceContainerMemoryUsage
30+
expr: |
31+
avg(
32+
(container_memory_rss{ container={{ .Values.server.name | quote }} } * on(pod) group_left kube_pod_labels{ label_app_kubernetes_io_name={{ include "application.server.name" . | quote }} })
33+
/ on(pod)
34+
(kube_pod_container_resource_limits{ resource="memory", container={{ .Values.server.name | quote }} })
35+
) * 100
36+
> {{ .Values.server.observability.prometheus.rules.highMemoryUsageThreshold }}
37+
for: 5m
38+
labels:
39+
service: {{ include "application.server.name" . | quote }}
40+
severity: critical
41+
annotations:
42+
summary: "[{{ include "application.server.name" . | title }}] High memory usage"
43+
description: "[{{ include "application.server.name" . | title }}] 서비스의 최근 메모리 사용량이 {{ .Values.server.observability.prometheus.rules.highMemoryUsageThreshold }}% 이상이 되었습니다. 현재값: {{`{{ .Value | humanize }}`}}%"
44+
{{- end }}
45+
{{- if .Values.server.observability.prometheus.istio_rules.enabled }}
46+
- name: IstioContainerResourceUsage
47+
rules:
48+
- alert: "HighIstioContainerCPUUsage"
49+
expr: |
50+
avg(
51+
rate(container_cpu_usage_seconds_total{ container="istio-proxy" }[2m]) * on(pod) group_left kube_pod_labels{ label_app_kubernetes_io_name={{ include "application.server.name" . | quote }} }
52+
/ on(pod)
53+
(kube_pod_container_resource_limits{ resource="cpu", container="istio-proxy" })
54+
)
55+
* 100
56+
> {{ .Values.server.observability.prometheus.istio_rules.highCpuUsageThreshold }}
57+
for: 5m
58+
labels:
59+
severity: critical
60+
annotations:
61+
summary: "[{{ include "application.server.name" . | title }}][istio-proxy] High CPU usage"
62+
description: "[{{ include "application.server.name" . | title }}][istio-proxy] 서비스의 최근 CPU 사용량이 {{ .Values.server.observability.prometheus.istio_rules.highCpuUsageThreshold }}% 이상이 되었습니다. 현재값: {{`{{ .Value | humanize }}`}}%"
63+
64+
- alert: HighIstioContainerMemoryUsage
65+
expr: |
66+
avg(
67+
(container_memory_rss{ container="istio-proxy" } * on(pod) group_left kube_pod_labels{ label_app_kubernetes_io_name={{ include "application.server.name" . | quote }} })
68+
/ on(pod)
69+
(kube_pod_container_resource_limits{ resource="memory", container="istio-proxy" })
70+
) * 100
71+
> {{ .Values.server.observability.prometheus.istio_rules.highMemoryUsageThreshold }}
72+
for: 5m
73+
labels:
74+
service: {{ include "application.server.name" . | quote }}
75+
severity: critical
76+
annotations:
77+
summary: "[{{ include "application.server.name" . | title }}][istio-proxy] High memory usage"
78+
description: "[{{ include "application.server.name" . | title }}][istio-proxy] 서비스의 최근 메모리 사용량이 {{ .Values.server.observability.prometheus.istio_rules.highMemoryUsageThreshold }}% 이상이 되었습니다. 현재값: {{`{{ .Value | humanize }}`}}%"
79+
{{- end }}
80+
{{- end }}
81+
{{- end }}

charts/application-template/templates/service_monitor.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
{{- if .Values.global.observability.prometheus.serviceMonitor.enabled }}
1+
{{- if and .Values.global.observability.prometheus.serviceMonitor.enabled (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1/ServiceMonitor") }}
22
apiVersion: monitoring.coreos.com/v1
33
kind: ServiceMonitor
44
metadata:

0 commit comments

Comments
 (0)