-
Notifications
You must be signed in to change notification settings - Fork 34
Ported monitoring stack to k3s #449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
wtripp180901
wants to merge
117
commits into
main
Choose a base branch
from
feature/k3s-monitoring
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
117 commits
Select commit
Hold shift + click to select a range
108fa7c
Added prometheus operator role compatible with state_dir (still needs…
wtripp180901 8f2977c
Merge branch 'feature/k3s-ansible-init' into feature/k3s-monitoring
wtripp180901 9836ef8
Added node selectors for non-exporter pods
wtripp180901 d790b2b
Added services for monitoring
wtripp180901 10af75d
WIP porting prometheus rolevars
wtripp180901 7b29a3b
Added ingress for monitoring services
wtripp180901 b959e92
Refactored + re-enabled external labels (not sure if working)
wtripp180901 560eb96
replaced monitoring in site.yml and fixed sslip IPs
wtripp180901 0106a95
Added slurm exporter service to k3s
wtripp180901 a4dca77
Added ood exporter to k3s
wtripp180901 6081d77
added grafana metrics
wtripp180901 84fd355
fixed alertmanager status
wtripp180901 e2d1c62
Dashboards now installed into k3s (dataources not configured yet)
wtripp180901 cce35a9
Merge branch 'feature/k3s-ansible-init' into feature/k3s-monitoring
wtripp180901 7afdc1d
Added slurmstats datasource
wtripp180901 b3020ca
enabled ips for monitoring services (except prometheus)
wtripp180901 0dff07f
Added grafana to state directory and made port configurable
wtripp180901 f7e555b
grafana can now be reverse proxied by ood
wtripp180901 d142a9f
Ported grafana rolevars
wtripp180901 7fa3609
Added slack integration default
wtripp180901 96edb79
Ported alertmanager rolevars
wtripp180901 123c573
Merge branch 'feature/k3s-ansible-init' into feature/k3s-monitoring
wtripp180901 b13311a
removed k3s ingress
wtripp180901 01718ee
Services now exposed/proxied via nodeports
wtripp180901 74bd3ba
Removed grafana servicemonitor and moved nodeports to helm config
wtripp180901 e724b5d
grafana admin now definable
wtripp180901 9c359d9
Now adds additional rules correctly
wtripp180901 04a5bf3
Merge branch 'feature/k3s-ansible-init' into feature/k3s-monitoring
wtripp180901 7f4862c
Merge branch 'feature/k3s-ansible-init' into feature/k3s-monitoring
wtripp180901 2b97d32
Merge branch 'feature/k3s-ansible-init' into feature/k3s-monitoring
wtripp180901 337c101
Removed monitoring binaries from build
wtripp180901 dd1e464
bump for CI test
wtripp180901 cc6bef1
ported node-exporter vars
wtripp180901 21a8d25
non-atomic helm install for ci test
wtripp180901 e1355de
Merge branch 'feature/k3s-monitoring' of github.com:stackhpc/ansible-…
wtripp180901 76842f1
fixed hostnames not recognised by selector and defaulted slack integr…
wtripp180901 185eafb
fixed k3s hostnames properly
wtripp180901 10d4e93
increased control node CI memory
wtripp180901 d1e8c0a
Refactored monitoring config and removed redundant groups
wtripp180901 96723f8
updated dashboard defaults
wtripp180901 bf9a473
fixed caas cluster name
wtripp180901 fdb5c23
nodeselectors now use custom labels
wtripp180901 4bffe4b
fixed (?) grafana zenith proxy
wtripp180901 b0f856e
bump images
wtripp180901 12e1166
added old recording rules to defaults
wtripp180901 9ab06a6
Merge branch 'feature/k3s-monitoring' of github.com:stackhpc/ansible-…
wtripp180901 5f89be8
fixed openhpc dashboard
wtripp180901 2d1dab5
Refactored and fixed slack integration
wtripp180901 43f27a5
removed unused config options
wtripp180901 bb928ad
review suggestions
wtripp180901 db91120
updated defaults
wtripp180901 34a779b
Merge branch 'feature/k3s-monitoring' of github.com:stackhpc/ansible-…
wtripp180901 7e1370d
removed grafana data volume
wtripp180901 934ec7a
set default dashboard to slurm exporter
wtripp180901 a03f7f9
added play to remove unwanted default dashboards
wtripp180901 886c22d
updated grafana groupvars
wtripp180901 c4fa2a6
added node exporter collection config
wtripp180901 3e1f019
removed unenforced volume size config option
wtripp180901 8d242f7
ondemand grafana proxying now conditional on ondemand having groups d…
wtripp180901 e6fbda8
standardised control ip resolution
wtripp180901 8b93aa2
Merge branch 'feature/k3s-ansible-init' into feature/k3s-monitoring
wtripp180901 e6a4e4b
reduced collectors to minimal set
wtripp180901 e5dff96
updated docs
wtripp180901 ce90ab0
Merge branch 'feature/k3s-ansible-init' into feature/k3s-monitoring
wtripp180901 4c44261
bump images
wtripp180901 29614fe
Merge branch 'feature/k3s-ansible-init' into feature/k3s-monitoring
wtripp180901 f93348e
monitoring stack images now pre-pulled
wtripp180901 e8d2e81
moved monitoring pre-pulls to role
wtripp180901 311bbbc
Merge branch 'feature/k3s-monitoring' of github.com:stackhpc/ansible-…
wtripp180901 d002454
fixed build typo
wtripp180901 991613e
removed unused groupvars
wtripp180901 b4b69b5
removed cloudalchemy roles from install
wtripp180901 b0f48fd
bump images
wtripp180901 ec57a21
fixed some incompatibilities with old metrics
wtripp180901 df946dd
Merge branch 'feature/k3s-monitoring' of github.com:stackhpc/ansible-…
wtripp180901 a0edab7
removed container internal networking devices from grafana
wtripp180901 8acc2b5
openhpc dashboard now job agnostic
wtripp180901 ee945b5
added local copy of slurm exporter dashboard without container networ…
wtripp180901 46d95ef
set default dashboard to slurm jobs
wtripp180901 b2b673f
added ansible to migrate cloudalchemy data to KPS
wtripp180901 d712f69
updated docs
wtripp180901 d0c8781
merge conflicts
wtripp180901 f23c2fc
merge
wtripp180901 2d16356
cleaned up dashboard role
wtripp180901 c6b221e
moved image pre-pull list to rolevar
wtripp180901 d1c915e
doc changes + opensearch datasource now based on opensearch group
wtripp180901 b6be009
made kps default dashboards more configurable
wtripp180901 206134d
bump image up to date with main
wtripp180901 8364eb8
newline
wtripp180901 603e818
bumped caas minimum control node ram
wtripp180901 c4a4847
reduced disk footprint of container pe-pulls
wtripp180901 040e569
merge
wtripp180901 a2540f2
moved image pulls to tasks
wtripp180901 cd281f3
moved prometheus install to host group
wtripp180901 774f608
Review docs suggestions
wtripp180901 80a0e21
Merge branch 'feature/k3s-monitoring' of github.com:stackhpc/ansible-…
wtripp180901 6506e7e
added readme link
wtripp180901 a6d8edc
file name and defaults changes
wtripp180901 5864b56
disambiguated default addresses
wtripp180901 15b77db
separated prometheus recording and alerting rules
wtripp180901 acf0c0d
adding alertmanager docs
wtripp180901 b7d9c48
merge
wtripp180901 6f16492
merge
wtripp180901 8ca0407
bump
wtripp180901 7f0af9e
Merge branch 'main' into feature/k3s-monitoring
wtripp180901 f864023
bump
wtripp180901 e1e7b34
merged with release train changes
wtripp180901 aae2aa4
pinned python kube version
wtripp180901 f97c6a6
Merge branch 'main' into feature/k3s-monitoring
wtripp180901 fe79a33
removed old monitoring services from systemd dropins
wtripp180901 32735c0
bump images
wtripp180901 63c3094
fixed KPS not having access to legacy data
wtripp180901 bdd265a
fixed prometheus not resolving OOD
wtripp180901 4d6dee0
merge conflicts
wtripp180901 1f25927
image bump
wtripp180901 31c2b5b
merge mistake
wtripp180901 f9fc905
Merge branch 'main' into feature/k3s-monitoring
sjpb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.