DogPush enables to manage your DataDog monitors in YAML files.
Excellent question. So many reasons:
- You can store your monitors in source control so you can track changes (who changed this monitor?), and perform code reviews.
- Automatically includes the right @-mentions in the message based on team configurations.
- It is very easy to accidentally make changes to monitors on DataDog's UI. By having your monitors documented in your own source code repository, you can set up a cron job that pushes the monitors to DataDog. This allows you to ensure that accidental changes get automatically corrected.
- DataDog has a missing feature: it is impossible to set up a monitor that
is only active during business hours. With DogPush it is really easy using
mute_tags.
Run pip install dogpush
Run docker pull trueaccord/dogpush
To run, you need ot make your configuration files accessible to the container. Use a command like this:
docker run --rm -v /path/to/config:/config trueaccord/dogpush -c /config/config.yaml diff
Go to DataDog API settings and generate an API key and application key. Create a minimal config.yaml that looks like this:
---
datadog:
api_key: YOUR_API_KEY
app_key: YOUR_APP_KEYThen run dogpush -c ./config.yaml diff and all the alerts you currently
have on datadog will appear as untracked.
The next step would be to create your initial alerts file:
dogpush -c ./config.yaml init > ./my_monitors.yaml
Now, add my_monitors.yaml as a rules_file to config.yaml. Edit
config.yaml again:
---
datadog:
api_key: YOUR_API_KEY
app_key: YOUR_APP_KEY
rule_files:
- my_monitors.yamlNow, run dogpush diff again, and see that the difference will be empty. Your
local rules are in sync with your DataDog rules. If you have a lot of rules,
You may split your initial rules file to multiple files (by category, or
team), and include all of them in the rule_files section. Paths can be
either relative to the config file or absolute. Paths may contain wildcards.
rule_files:
- rds.yaml
- ec2.yaml
- dir1/rules.yaml
- /absolute/path/to/rules.yaml
- path/with/wildcard/*.yamlNow you can make changes to your rule files. See the changes by running
dogpush diff, and push them using dogpush push.
DogPush config file defines up the DataDog api and app keys. If they are not
specified in the file DogPush looks for these in environment variables named
DATADOG_API_KEY and DATADOG_APP_KEY.
The file defines the teams and how to alert your team at different severity levels.
Your config can optionally define two values to specify defaults.
default_rules can be used to specify defaults for the top level values of
rules, such as multi and message. default_rule_options can be used to
specify default values for the options section of rules, such as locked or
notify_audit. All of these values are automatically filled in for every rule
you define, but can always be overridden per-rule. For example,
default_rules:
multi: False
default_rule_options:
notify_audit: True
locked: TrueTeams in DogPush are just a way to append some text to the message body of a monitor so it will grab the attention of the right people. By defining your teams in the global config, it is super easy to add these @-mentions to all your monitors. For example,
teams:
eng:
notifications:
CRITICAL: '@hipchat-Engineering @victorops-eng'
WARNING: '@[email protected]'
ops:
notifications:
CRITICAL: '@hipchat-Ops'means that any monitor of severity CRITICAL to the eng team will have the
'@hipchat-Engineering @victorops-eng' appending to its message body.
Then, in a rules file, you can have a top level team setting (making all
the alerts automatically go to that team), or specify 'team' at the alert
level.
The default severity is CRITICAL, and you can override the severity in each
monitor by using the severity key.
DogPush supports automatically muting alerts. A common use case is to mute non critical alerts outside business hours. First, define the time window like this:
mute_tags:
not_business_hours:
timezone: US/Pacific
expr: now.hour < 9 or now.hour >= 17 or (now.weekday() in (5, 6))This defines not_business_hours period. The period is defined using a
timezone and a Python expression. The expression provides a now variable
of type datetime.datetime
and should return True when now falls in a time where the monitor should be
muted. In this example, not_business_hours is defined as before 9am or after
5pm or anytime during the weekend. The timezone key specifies in which
timezone should the now be localized to. (all the internal time
calculations in DogPush are done in UTC regardless of the system's default
timezone).
After defining mute_tags, you can apply mute_when: not_business_hours to
rules in your rules file. Also read about dogpush mute to learn how to
automatically mute these alerts.
See config-sample.yaml for a full example.
There are two optional settings that can be configured in the config file:
dogpush:
yaml_width: 80
ignore_prefix: 'string'The yaml_width option sets the line width of the generated yaml output.
Using ignore_prefix one can define a set of monitor names that are
simply ignored by DogPush when fetching the remote monitors.
On the top of a rules file you can define team: xyz to define the default
team for all the alerts in the file. You can override the team by specifying a
different team in an alert.
Tip 1: it is possible to use a list of teams instead of a single team.
Tip 2: if you would like to have a monitor without any team associated
with it, you can use team: [] in that monitor. This will override the file's
default team settings with an empty list.
The config file references your rule files. See rds.yaml for an example
rule file.
Prints an initial rules file so you can get started quickly with your existing rules.
Prints the diff between the monitors in the files and the monitors in datadog.
This is useful when just starting out so you can build your initial rule
files. Also, use dogpush diff before dogpush push as a dry-run to see
what will change.
dogpush diff --no_exitstatus will cause dogpush diff to return exit code 0
regardless of if there is a difference detected. If this flag is not present
then dogpush will return 1 if there is a difference between local and remote.
Pushes the local monitors to datadog:
- will create new monitors
- will update existing monitors (so it could override what you were doing if you edit an existing monitor in datadog)
- will not remove or touch untracked monitors (that is, datadog monitors
that are not in any of the yaml files) unless the
--delete_untrackedflag is passed in.
This command can run from a cronjob to ensure the monitors on DataDog are synchronized with the local monitors.
Use dogpush mute to mute monitors that have a mute_when tag. This command
will mute these monitors, and the mute will automatically expires when the
period described by mute_when is over. This command can be run
from a cron job to ensure monitors are silenced at the right times. As the
mute automatically expires, there is no need to run anything to unmute the
alerts.