-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Managed alertmanager no longer sending slack alerts #1550
Comments
We're also experiencing something similar whilst using GKE. Below are some details from a ticket we raised with google support: SetupGenerate a "valid" alertmanager.yaml: cat <<EOF > alertmanager.yaml
route:
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- channel: '#some_channel'
api_url: https://slack.com/api/chat.postMessage
http_config:
authorization:
type: 'Bearer'
credentials: 'redacted'
EOF (Note: 'redacted' was the true slack token if) Apply to the cluster (per the docs):
Get the resultant secret from the gmp-system namespace to show configuration error Check: This shell command will get the resultant alertmanager config: Result on v1.30.x GKE cluster which runs v0.13route:
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- channel: '#some_channel'
api_url: https://slack.com/api/chat.postMessage
http_config:
authorization:
type: 'Bearer'
credentials: 'redacted' This worked for us previously Result on v1.31.x GKE cluster which runs v0.14global:
resolve_timeout: 5m
http_config:
follow_redirects: true
enable_http2: true
smtp_hello: localhost
smtp_require_tls: true
pagerduty_url: https://events.pagerduty.com/v2/enqueue
opsgenie_api_url: https://api.opsgenie.com/
wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
telegram_api_url: https://api.telegram.org
webex_api_url: https://webexapis.com/v1/messages
route:
receiver: slack
continue: false
receivers:
- name: slack
slack_configs:
- send_resolved: false
http_config:
authorization:
type: Bearer
credentials: <secret>
follow_redirects: true
enable_http2: true
api_url: <secret>
channel: '#ops-alerts-dev'
username: '{{ template "slack.default.username" . }}'
color: '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}'
title: '{{ template "slack.default.title" . }}'
title_link: '{{ template "slack.default.titlelink" . }}'
pretext: '{{ template "slack.default.pretext" . }}'
text: '{{ template "slack.default.text" . }}'
short_fields: false
footer: '{{ template "slack.default.footer" . }}'
fallback: '{{ template "slack.default.fallback" . }}'
callback_id: '{{ template "slack.default.callbackid" . }}'
icon_emoji: '{{ template "slack.default.iconemoji" . }}'
icon_url: '{{ template "slack.default.iconurl" . }}'
link_names: false
templates: [] Notes:
More note: If the |
And to be thorough, I mounted a debug container on the running And another find, if I add a value into |
And for more investigation, I went on a git diff hunt I suspect #1074 to be the problem. I'm not that familar with the code base, but my guess:
TL;DR: Previously alertmanager config was taken from the secret in gmp-public and used directly. #1074 uses the prometheus engine config unmarshaller to parse the bytes from the provided config, and that has security measures to ensure that secrets aren't leaked. Unfortunately that means any "secret" field that was in the secret is being obsfucated. Note: I'm not a go expert, but just stepping through the logic to see what changed and what might be the cause, so take the above with a large grain of salt. Either way, I don't know if there are any workarounds. I suspect we can't use the |
Thanks for helping! It got to our attention just now, it feels it's the obfuscation code that might be problematic. We changed the configuration propagation flow for security reasons (so GMP operator have less permissions), but something got wrong. We are on it, will give an update today! |
Signed-off-by: bwplotka <[email protected]>
We are double checking the details and will release a bugfix but there is a quick mitigation anyone can do:
For example:
Let us know if this mitigates this issue! |
Signed-off-by: bwplotka <[email protected]>
#1550 Signed-off-by: bwplotka <[email protected]>
#1550 Signed-off-by: bwplotka <[email protected]>
I can confirm that adding the |
Repro: #1558 |
does not work for me, still see |
Are you sure? Can you share your config and OperatorConfig in the cluster? |
secret gmp-public/alertmanager
OperatorConfig:
|
Thanks! You have a typo:
should be:
Sorry for the mess, we are releasing bug fix today. |
#1550 Signed-off-by: bwplotka <[email protected]>
#1550 Signed-off-by: bwplotka <[email protected]>
#1550 Signed-off-by: bwplotka <[email protected]>
#1550 Signed-off-by: bwplotka <[email protected]>
#1550 Signed-off-by: bwplotka <[email protected]>
#1550 Signed-off-by: bwplotka <[email protected]>
updated to the
nothing changed |
I also just noticed in your alertmanager config you pasted (#1550 (comment)) there are two |
yes, only one left |
here is what I currently have
|
Hello,
We have noticed that since March 18th, our managed alertmanager is no longer sending alerts. We hve not made any configuration changes to the alertmanager config contained in the
alertmanager
secret, and we can see that the configuration is loaded correctly.Alertmanager logs are full of errors now however:
I imagine this is somehow related to the 0.27.0 update but I am unable to tell where the problem is in our config since it gets loaded correctly. I also cannot find any breaking change announcements in the docs about this changing in some way. Can you please provide more guidance?
Thanks!
The text was updated successfully, but these errors were encountered: