Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Mimir / Loki Rules Sync Support #568

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .github/configs/lokiRule.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: monitoring.coreos.com/v1

Check failure on line 1 in .github/configs/lokiRule.yaml

View workflow job for this annotation

GitHub Actions / runner / yamllint

[yamllint] reported by reviewdog 🐶 [warning] missing document start "---" (document-start) Raw Output: ./.github/configs/lokiRule.yaml:1:1: [warning] missing document start "---" (document-start)
kind: PrometheusRule
metadata:
name: example-log-rule
namespace: loki
labels:
rule_type: loki
spec:
groups:
- name: logs
rules:
- record: log_errors:count1m
expr: count_over_time({level="error"}[1m])
9 changes: 9 additions & 0 deletions .github/configs/updatecli.d/alloy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,12 @@ targets:
name: charts/k8s-monitoring
versionincrement: none
sourceid: alloy
alloy-rules:
name: Bump Helm chart dependency "alloy-rules" for Helm chart "k8s-monitoring"
kind: helmchart
spec:
file: Chart.yaml
key: $.dependencies[4].version
name: charts/k8s-monitoring
versionincrement: none
sourceid: alloy
4 changes: 2 additions & 2 deletions .github/configs/updatecli.d/kepler.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ conditions:
kind: yaml
spec:
file: charts/k8s-monitoring/Chart.yaml
key: $.dependencies[9].name
key: $.dependencies[10].name
value: kepler
disablesourceinput: true

Expand All @@ -27,7 +27,7 @@ targets:
kind: helmchart
spec:
file: Chart.yaml
key: $.dependencies[9].version
key: $.dependencies[10].version
name: charts/k8s-monitoring
versionincrement: none
sourceid: kepler
4 changes: 2 additions & 2 deletions .github/configs/updatecli.d/kube-state-metrics.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ conditions:
kind: yaml
spec:
file: charts/k8s-monitoring/Chart.yaml
key: $.dependencies[4].name
key: $.dependencies[5].name
value: kube-state-metrics
disablesourceinput: true
targets:
Expand All @@ -25,7 +25,7 @@ targets:
kind: helmchart
spec:
file: Chart.yaml
key: $.dependencies[4].version
key: $.dependencies[5].version
name: charts/k8s-monitoring
versionincrement: none
sourceid: kube-state-metrics
4 changes: 2 additions & 2 deletions .github/configs/updatecli.d/node-exporter.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ conditions:
kind: yaml
spec:
file: charts/k8s-monitoring/Chart.yaml
key: $.dependencies[5].name
key: $.dependencies[6].name
value: prometheus-node-exporter
disablesourceinput: true

Expand All @@ -27,7 +27,7 @@ targets:
kind: helmchart
spec:
file: Chart.yaml
key: $.dependencies[5].version
key: $.dependencies[6].version
name: charts/k8s-monitoring
versionincrement: none
sourceid: prometheus-node-exporter
4 changes: 2 additions & 2 deletions .github/configs/updatecli.d/opencost.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ conditions:
kind: yaml
spec:
file: charts/k8s-monitoring/Chart.yaml
key: $.dependencies[8].name
key: $.dependencies[9].name
value: opencost
disablesourceinput: true
targets:
Expand All @@ -25,7 +25,7 @@ targets:
kind: helmchart
spec:
file: Chart.yaml
key: $.dependencies[8].version
key: $.dependencies[9].version
name: charts/k8s-monitoring
versionincrement: none
sourceid: opencost
4 changes: 2 additions & 2 deletions .github/configs/updatecli.d/prometheus-operator-crds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ conditions:
kind: yaml
spec:
file: charts/k8s-monitoring/Chart.yaml
key: $.dependencies[6].name
key: $.dependencies[7].name
value: prometheus-operator-crds
disablesourceinput: true
targets:
Expand All @@ -25,7 +25,7 @@ targets:
kind: helmchart
spec:
file: Chart.yaml
key: $.dependencies[6].version
key: $.dependencies[7].version
name: charts/k8s-monitoring
versionincrement: none
sourceid: prometheus-operator-crds
4 changes: 2 additions & 2 deletions .github/configs/updatecli.d/windows-exporter.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ conditions:
kind: yaml
spec:
file: charts/k8s-monitoring/Chart.yaml
key: $.dependencies[7].name
key: $.dependencies[8].name
value: prometheus-windows-exporter
disablesourceinput: true
targets:
Expand All @@ -25,7 +25,7 @@ targets:
kind: helmchart
spec:
file: Chart.yaml
key: $.dependencies[7].version
key: $.dependencies[8].version
name: charts/k8s-monitoring
versionincrement: none
sourceid: prometheus-windows-exporter
2 changes: 2 additions & 0 deletions .github/workflows/helm-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ env:
PROMETHEUS_WORKLOAD_VALUES: "${{ github.workspace }}/.github/configs/prometheus-workload.yaml"
CREDENTIALS: "${{ github.workspace }}/.github/configs/credentials.yaml"
LOKI_VALUES: "${{ github.workspace }}/.github/configs/loki.yaml"
LOKI_RULE_OBJECT: "${{ github.workspace }}/.github/configs/lokiRule.yaml"
TEMPO_VALUES: "" # No values for now
PYROSCOPE_VALUES: "${{ github.workspace }}/.github/configs/pyroscope.yaml"
GRAFANA_VALUES: "${{ github.workspace }}/.github/configs/grafana.yaml"
Expand Down Expand Up @@ -185,6 +186,7 @@ jobs:
run: |
helm install loki grafana/loki -f "${LOKI_VALUES}" -n loki --create-namespace --wait
helm install loki-otlp grafana/alloy -f "${GRAFANA_ALLOY_LOKI_OTLP_VALUES}" -n loki --wait
kubectl apply -f "${LOKI_RULE_OBJECT}"

- name: Deploy Tempo
if: (steps.list-changed.outputs.changed == 'true') || (contains(github.event.pull_request.labels.*.name, 'full_test_required'))
Expand Down
7 changes: 5 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ METRICS_CONFIG_FILES = $(subst values.yaml,metrics.alloy,$(INPUT_FILES))
EVENTS_CONFIG_FILES = $(subst values.yaml,events.alloy,$(INPUT_FILES))
LOGS_CONFIG_FILES = $(subst values.yaml,logs.alloy,$(INPUT_FILES))
PROFILES_CONFIG_FILES = $(subst values.yaml,profiles.alloy,$(INPUT_FILES))
RULES_CONFIG_FILES = $(subst values.yaml,rules.alloy,$(INPUT_FILES))

CT_CONFIGFILE ?= .github/configs/ct.yaml
LINT_CONFIGFILE ?= .github/configs/lintconf.yaml
Expand Down Expand Up @@ -39,7 +40,7 @@ lint-chart:
ct lint --debug --config "$(CT_CONFIGFILE)" --lint-conf "$(LINT_CONFIGFILE)" --check-version-increment=false

lint-config lint-configs lint-alloy:
@./scripts/lint-alloy.sh $(METRICS_CONFIG_FILES) $(EVENTS_CONFIG_FILES) $(LOGS_CONFIG_FILES) --public-preview $(PROFILES_CONFIG_FILES)
@./scripts/lint-alloy.sh $(METRICS_CONFIG_FILES) $(EVENTS_CONFIG_FILES) $(LOGS_CONFIG_FILES) $(RULES_CONFIG_FILES) --public-preview $(PROFILES_CONFIG_FILES)

# Shell Linting
lint-sh lint-shell:
Expand Down Expand Up @@ -98,7 +99,9 @@ test: scripts/test-runner.sh lint-chart lint-config
%/profiles.alloy: %/output.yaml
yq -r "select(.metadata.name==\"k8smon-alloy-profiles\") | .data[\"config.alloy\"] | select( . != null )" $< > $@

%/rules.alloy: %/output.yaml
yq -r "select(.metadata.name==\"k8smon-alloy-rules\") | .data[\"config.alloy\"] | select( . != null )" $< > $@

generate-example-outputs: $(OUTPUT_FILES) $(METRICS_CONFIG_FILES) $(EVENTS_CONFIG_FILES) $(LOGS_CONFIG_FILES) $(PROFILES_CONFIG_FILES)
generate-example-outputs: $(OUTPUT_FILES) $(METRICS_CONFIG_FILES) $(EVENTS_CONFIG_FILES) $(LOGS_CONFIG_FILES) $(PROFILES_CONFIG_FILES) $(RULES_CONFIG_FILES)

regenerate-example-outputs: clean generate-example-outputs
7 changes: 5 additions & 2 deletions charts/k8s-monitoring/Chart.lock
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ dependencies:
- name: alloy
repository: https://grafana.github.io/helm-charts
version: 0.6.0
- name: alloy
repository: https://grafana.github.io/helm-charts
version: 0.6.0
- name: kube-state-metrics
repository: https://prometheus-community.github.io/helm-charts
version: 5.25.1
Expand All @@ -29,5 +32,5 @@ dependencies:
- name: kepler
repository: https://sustainable-computing-io.github.io/kepler-helm-chart
version: 0.5.9
digest: sha256:78cc014e2a726be60e168fa7d09facff16ff7ed399948403ff2e692ae8d24d91
generated: "2024-08-14T17:25:14.684591-05:00"
digest: sha256:454ef3f30f999539d32fbe7aa1237aea0ed572bb9982a9a01c2c63317c58b2d3
generated: "2024-08-16T12:35:18.184172-05:00"
5 changes: 5 additions & 0 deletions charts/k8s-monitoring/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,11 @@ dependencies:
version: 0.6.0
repository: https://grafana.github.io/helm-charts
condition: profiles.enabled
- alias: alloy-rules
name: alloy
version: 0.6.0
repository: https://grafana.github.io/helm-charts
condition: rules.enabled
- name: kube-state-metrics
version: 5.25.1
repository: https://prometheus-community.github.io/helm-charts
Expand Down
32 changes: 32 additions & 0 deletions charts/k8s-monitoring/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ The Prometheus and Loki services may be hosted on the same cluster, or remotely
| https://grafana.github.io/helm-charts | alloy-events(alloy) | 0.6.0 |
| https://grafana.github.io/helm-charts | alloy-logs(alloy) | 0.6.0 |
| https://grafana.github.io/helm-charts | alloy-profiles(alloy) | 0.6.0 |
| https://grafana.github.io/helm-charts | alloy-rules(alloy) | 0.6.0 |
| https://opencost.github.io/opencost-helm-chart | opencost | 1.41.0 |
| https://prometheus-community.github.io/helm-charts | kube-state-metrics | 5.25.1 |
| https://prometheus-community.github.io/helm-charts | prometheus-node-exporter | 4.38.0 |
Expand Down Expand Up @@ -841,6 +842,37 @@ The Prometheus and Loki services may be hosted on the same cluster, or remotely
| receivers.zipkin.port | int | `9411` | Which port to use for the Zipkin receiver. This port needs to be opened in the alloy section below. |
| receivers.zipkin.tls | object | `{}` | [TLS settings](https://grafana.com/docs/alloy/latest/reference/components/otelcol.receiver.zipkin/#tls-block) to configure for the Zipkin receiver. |

### Rules

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| rules.enabled | bool | `false` | Enable rules synchronization. |

### Rules (Loki)

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| rules.loki.enabled | bool | `true` | Enable Loki rules synchronization |
| rules.loki.namespace.label_expressions | list | `[]` | Label expressions for Namespace resources. |
| rules.loki.namespace.label_selectors | object | `{}` | Label selector for Namespace resources. |
| rules.loki.prefix | string | alloy | Prefix to be added to the rule namespace, used to differentiate multiple Alloy deployments added. |
| rules.loki.rule.label_expressions | list | `[]` | Label expressions for PrometheusRule resources. |
| rules.loki.rule.label_selectors | object | `{"rule_type":"loki"}` | Label selectors for PrometheusRule resources as key/pair values. |
| rules.loki.sync_interval | string | 5m | Amount of time between reconciliations with Mimir. |

### Rules (Mimir)

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| rules.mimir.enabled | bool | `true` | Enable Mimir rules synchronization |
| rules.mimir.namespace.label_expressions | list | `[]` | Label expressions for Namespace resources. |
| rules.mimir.namespace.label_selectors | object | `{}` | Label selector for Namespace resources. |
| rules.mimir.prefix | string | alloy | Prefix to be added to the rule namespace, used to differentiate multiple Alloy deployments added. |
| rules.mimir.prometheus_http_prefix | string | /api/prom | Path prefix for Mimir’s Prometheus endpoint (gem-path-prefix). |
| rules.mimir.rule.label_expressions | list | `[]` | Label expressions for PrometheusRule resources. |
| rules.mimir.rule.label_selectors | object | `{"rule_type":"mimir"}` | Label selectors for PrometheusRule resources as key/pair values. |
| rules.mimir.sync_interval | string | 5m | Amount of time between reconciliations with Mimir. |

### Test Job

| Key | Type | Default | Description |
Expand Down
12 changes: 12 additions & 0 deletions charts/k8s-monitoring/ci/ci-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,14 @@ traces:
profiles:
enabled: true

rules:
enabled: true
mimir:
enabled: false
loki:
rule_selectors:
rule_type: loki

test:
attempts: 20
extraQueries:
Expand All @@ -85,6 +93,10 @@ test:
- query: "count_over_time({cluster=\"ci-test-cluster\", job!=\"integrations/kubernetes/eventhandler\"}[1h])"
type: logql

# Check for rule applied to Loki
- query: "log_errors:count1m"
type: logql

# Check for profiles
- query: '{cluster="ci-test-cluster"}'
type: pyroql
Expand Down
15 changes: 11 additions & 4 deletions charts/k8s-monitoring/docs/Structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ section inside the Helm chart's values.yaml file that controls how it is configu
| Grafana Alloy for Logs | DaemonSet | `alloy-logs` | The Grafana Alloy instance that gathers Pod logs. By default, it uses HostPath volume mounts to read Pod log files directly from the nodes. It can alternatively get logs via the API server and be deployed as a Deployment. |
| Grafana Alloy for Events | Deployment | `alloy-events` | The Grafana Alloy instance that is responsible for gathering Cluster events from the API server. This does not support clustering, so only one instance should be used. |
| Grafana Alloy for Profiles | Deployment | `alloy-events` | The Grafana Alloy instance that is responsible for gathering profiles. |
| Grafana Alloy for Rules | Deployment | `alloy-rules` | The Grafana Alloy instance that is responsible for synchronizing PrometheusRule objects to either Mimir or Loki. |
| [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) | Deployment | `kube-state-metrics` | A service for generating metrics about the state of the objects inside the Cluster. |
| [Node Exporter](https://github.com/prometheus/node_exporter) | DaemonSet | `prometheus-node-exporter` | An exporter used for gathering hardware and OS metrics for *NIX nodes of the Cluster. |
| [Windows Exporter](https://github.com/prometheus-community/windows_exporter) | DaemonSet | `prometheus-windows-exporter` | An exporter used for gathering hardware and OS metrics for Windows nodes of the Cluster. Not deployed by default. |
Expand All @@ -28,7 +29,7 @@ section inside the Helm chart's values.yaml file that controls how it is configu

### Grafana Alloy instances

You may wonder why there are four instances of Grafana Alloy, rather than combining them. The reason is a balance
You may wonder why there are five instances of Grafana Alloy, rather than combining them. The reason is a balance
between functionality and scalability. The default functionality of the Grafana Alloy for Logs is to gather logs via
HostPath volume mounts. This requires it to be deployed as a DaemonSet. The Grafana Alloy for metrics and receivers is
deployed as a StatefulSet, which allows it to be scaled (optionally with a HorizontalPodAutoscaler) based on load. If it
Expand Down Expand Up @@ -85,6 +86,12 @@ controls it.

### Grafana Alloy for Profiles Configuration

| Name | Associated values | Description |
|----------------|-------------------|-------------------------------------|
| Cluster Events | `.profiles` | Controls how profiles are gathered. |
| Name | Associated values | Description |
|----------|-------------------|-------------------------------------|
| Profiles | `.profiles` | Controls how profiles are gathered. |

### Grafana Alloy for Rules

| Name | Associated values | Description |
|-------|-------------------|----------------------------------------------------------------------|
| Rules | `.rules` | Controls how PrometheusRule objects are discovered and synchronized. |
19 changes: 19 additions & 0 deletions charts/k8s-monitoring/templates/_configs.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -153,3 +153,22 @@
{{- include "alloy.config.logging" (index .Values "alloy-profiles").logging }}
{{- include "alloy.config.liveDebugging" (index .Values "alloy-profiles").liveDebugging}}
{{- end -}}

{{/* Grafana Alloy for Rules config */}}
{{- define "alloyRulesConfig" -}}
{{- if .Values.rules.mimir.enabled }}
{{- include "alloy.config.metricsServiceSecret" . }}
{{ include "alloy.config.rulesMimir" . }}
{{ end }}

{{- if .Values.rules.loki.enabled }}
{{- include "alloy.config.logsServiceSecret" . }}
{{ include "alloy.config.rulesLoki" . }}
{{ end }}

{{- include "alloy.config.logging" (index .Values "alloy-rules").logging }}
{{- include "alloy.config.liveDebugging" (index .Values "alloy-rules").liveDebugging}}
{{- if .Values.logs.extraConfig }}
{{- tpl .Values.logs.extraConfig $ | indent 0 }}
{{- end }}
{{- end -}}
11 changes: 11 additions & 0 deletions charts/k8s-monitoring/templates/alloy-rules-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{{- if .Values.rules.enabled }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "alloy.fullname" (index .Subcharts "alloy-rules") }}
namespace: {{ .Release.Namespace }}
data:
config.alloy: |-
{{- include "alloyRulesConfig" . | trim | nindent 4 }}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
{{ define "alloy.config.logsService" }}
// Logs Service
remote.kubernetes.secret "logs_service" {
name = {{ include "kubernetes_monitoring.logs_service.secret.name" . | quote}}
namespace = {{ .Values.externalServices.loki.secret.namespace | default .Release.Namespace | quote }}
}
{{- include "alloy.config.logsServiceSecret" . }}

loki.process "logs_service" {
stage.static_labels {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{{ define "alloy.config.logsServiceSecret" }}
remote.kubernetes.secret "logs_service" {
name = {{ include "kubernetes_monitoring.logs_service.secret.name" . | quote}}
namespace = {{ .Values.externalServices.loki.secret.namespace | default .Release.Namespace | quote }}
}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
{{ define "alloy.config.metricsService" }}
// Metrics Service
remote.kubernetes.secret "metrics_service" {
name = {{ include "kubernetes_monitoring.metrics_service.secret.name" . | quote }}
namespace = {{ .Values.externalServices.prometheus.secret.namespace | default .Release.Namespace | quote }}
}
{{- include "alloy.config.metricsServiceSecret" . }}

prometheus.relabel "metrics_service" {
max_cache_size = {{ .Values.metrics.maxCacheSize | int }}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{{ define "alloy.config.metricsServiceSecret" }}
remote.kubernetes.secret "metrics_service" {
name = {{ include "kubernetes_monitoring.metrics_service.secret.name" . | quote }}
namespace = {{ .Values.externalServices.prometheus.secret.namespace | default .Release.Namespace | quote }}
}
{{- end }}
Loading
Loading