Skip to content

Commit d33fd99

Browse files
Max EshlemanTravis Patterson
Max Eshleman
authored and
Travis Patterson
committed
Move system-metrics jobs to their own release
[#168409208] Signed-off-by: Travis Patterson <[email protected]>
0 parents  commit d33fd99

File tree

1,292 files changed

+616777
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,292 files changed

+616777
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
builds:
2+
8ee749668a0b9a22f5219c469da869ba1cb335b21f6c78e4f5700e5f0d200620:
3+
version: 8ee749668a0b9a22f5219c469da869ba1cb335b21f6c78e4f5700e5f0d200620
4+
blobstore_id: 87370c57-06b5-40a0-7bed-8531e824e70c
5+
sha1: sha256:6e96744cb26218b9c99c35b7dca8d3460d34bd6b742807f87edc64aaee0f30ee
6+
format-version: "2"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
builds:
2+
527bd9abc9fce0f3542d58216e7d22cd0468061637c89ba272f2f915e403de6f:
3+
version: 527bd9abc9fce0f3542d58216e7d22cd0468061637c89ba272f2f915e403de6f
4+
blobstore_id: 92cd417d-b148-40c1-4ff1-888da40c94eb
5+
sha1: sha256:d96d3dadb5725f1c0bb8e31516527b2abdb009603956229eba58d23a6e06b290
6+
format-version: "2"

.gitignore

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
2+
# golang
3+
/bin/
4+
*.prof
5+
*.mprof
6+
*.coverprofile
7+
*.test
8+
9+
# macOS
10+
.DS_Store
11+
12+
# jetbrains
13+
.idea
14+
15+
# vscode
16+
.vscode
17+
18+
# bosh
19+
.blobs
20+
blobs
21+
.dev_builds
22+
dev_releases
23+
config/dev.yml
24+
config/private.yml
25+
releases/*.tgz
26+
releases/**/*.tgz
27+
28+
# misc
29+
out/
30+
*/**/test_assets/stdout.log
31+
test-certs/
32+
release/
33+
*/bin
34+
*.iml
35+
*.pem

README.md

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
#System Metrics
2+
[![slack.cloudfoundry.org][slack-badge]][loggregator-slack]
3+
[![CI Badge][ci-badge]][ci-pipeline]
4+
===================================================
5+
6+
Components required to collect system metrics from BOSH-deployed vms.
7+
8+
### System Metrics Agent
9+
A standalone agent to provide VM system metrics via a prometheus-scrapable endpoint. A list of metrics
10+
is available in the [docs][system-metrics-agent]
11+
12+
#### Metric Scraper
13+
A central component for scraping `system-metrics-agents` and forwarding the metrics to the firehose. Metric Scraper
14+
attempts to scrape the configured port across all vms deployed to the director. If present, this job can be configured to
15+
communicate with the Leadership Election Job so duplicate scrapes are avoided in an HA environment.
16+
17+
###Leadership Election
18+
A job intended to be run alongside the System Metric Scraper to allow for multiple scrapers to exist while only one is
19+
scraping.
20+
21+
[system-metrics-agent]: docs/system-metrics-agent.md
22+
[slack-badge]: https://slack.cloudfoundry.org/badge.svg
23+
[loggregator-slack]: https://cloudfoundry.slack.com/archives/loggregator
24+
[ci-badge]: https://loggregator.ci.cf-app.com/api/v1/pipelines/loggregator/jobs/loggregator-agent-tests/badge
25+
[ci-pipeline]: https://loggregator.ci.cf-app.com/teams/main/pipelines/loggregator
26+
[loggregator-tracker]: https://www.pivotaltracker.com/n/projects/993188

config/blobs.yml

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{}

config/final.yml

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
name: system-metrics
2+
blobstore:
3+
provider: s3
4+
options:
5+
bucket_name: systme-metrics-release-blobs
6+
region: us-west-2
7+

docs/system-metrics-agent.md

+93
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
### System Metrics Agent
2+
A standalone agent to provide VM system metrics via a prometheus-scrapable endpoint.
3+
4+
#### VM Metrics
5+
6+
When the Loggregator System Metrics Agent is deployed along with the Loggregator Agent,
7+
it will emit the following metrics about the VM where it is deployed:
8+
9+
| Metric | Linux | Windows |
10+
|--------------------------------------|-------|---------|
11+
| system_mem_kb | ✔️ | ✔️ |
12+
| system_mem_percent | ✔️ | ✔️ |
13+
| system_swap_kb | ✔️ | ✔️ |
14+
| system_swap_percent | ✔️ | ✔️ |
15+
| system_load_1m | ✔️ | |
16+
| system_load_5m | ✔️ | |
17+
| system_load_15m | ✔️ | |
18+
| system_cpu_user | ✔️ | ✔️ |
19+
| system_cpu_sys | ✔️ | ✔️ |
20+
| system_cpu_idle | ✔️ | ✔️ |
21+
| system_cpu_wait | ✔️ | ✔️ |
22+
| system_cpu_core_user | ✔️ | ✔️ |
23+
| system_cpu_core_sys | ✔️ | ✔️ |
24+
| system_cpu_core_idle | ✔️ | ✔️ |
25+
| system_cpu_core_wait | ✔️ | ✔️ |
26+
| system_disk_system_percent | ✔️ | ✔️ |
27+
| system_disk_system_inode_percent | ✔️ | ✔️ |
28+
| system_disk_system_read_bytes | ✔️ | ✔️ |
29+
| system_disk_system_write_bytes | ✔️ | ✔️ |
30+
| system_disk_system_read_time | ✔️ | ✔️ |
31+
| system_disk_system_write_time | ✔️ | ✔️ |
32+
| system_disk_system_io_time | ✔️ | ✔️ |
33+
| system_disk_ephemeral_percent | ✔️ | ✔️ |
34+
| system_disk_ephemeral_inode_percent | ✔️ | ✔️ |
35+
| system_disk_ephemeral_read_bytes | ✔️ | ✔️ |
36+
| system_disk_ephemeral_write_bytes | ✔️ | ✔️ |
37+
| system_disk_ephemeral_read_time | ✔️ | ✔️ |
38+
| system_disk_ephemeral_write_time | ✔️ | ✔️ |
39+
| system_disk_ephemeral_io_time | ✔️ | ✔️ |
40+
| system_disk_persistent_percent | ✔️ | ✔️ |
41+
| system_disk_persistent_inode_percent | ✔️ | ✔️ |
42+
| system_disk_persistent_read_bytes | ✔️ | ✔️ |
43+
| system_disk_persistent_write_bytes | ✔️ | ✔️ |
44+
| system_disk_persistent_read_time | ✔️ | ✔️ |
45+
| system_disk_persistent_write_time | ✔️ | ✔️ |
46+
| system_disk_persistent_io_time | ✔️ | ✔️ |
47+
| system_healthy | ✔️ | ✔️ |
48+
| system_network_ip_forwarding | ✔️ | |
49+
| system_network_udp_no_ports | ✔️ | |
50+
| system_network_udp_in_errors | ✔️ | |
51+
| system_network_udp_lite_in_errors | ✔️ | |
52+
| system_network_tcp_active_opens | ✔️ | |
53+
| system_network_tcp_curr_estab | ✔️ | |
54+
| system_network_tcp_retrans_segs | ✔️ | |
55+
| system_network_bytes_sent | ✔️ | ✔️ |
56+
| system_network_bytes_received | ✔️ | ✔️ |
57+
| system_network_packets_sent | ✔️ | ✔️ |
58+
| system_network_packets_received | ✔️ | ✔️ |
59+
| system_network_error_in | ✔️ | ✔️ |
60+
| system_network_error_out | ✔️ | ✔️ |
61+
| system_network_drop_in | ✔️ | ✔️ |
62+
| system_network_drop_out | ✔️ | ✔️ |
63+
64+
Note: these metrics are also available via HTTP in Prometheus format.
65+
66+
#### Deploying System Metrics Agent
67+
68+
To deploy system metrics agent, add the following jobs to all instance groups and the variables to the variables section.
69+
70+
**Notes**
71+
- The system metrics agent is scraped by the metric scraper deployed inside of CF.
72+
73+
```yaml
74+
jobs:
75+
- name: loggr-system-metrics-agent
76+
properties:
77+
system_metrics:
78+
tls:
79+
ca_cert: ((system_metrics.ca))
80+
cert: ((system_metrics.certificate))
81+
key: ((system_metrics.private_key))
82+
release: loggregator-agent
83+
84+
variables:
85+
- name: system_metrics
86+
options:
87+
ca: /bosh-<ENV_NAME>/cf/loggregator_ca
88+
common_name: system-metrics
89+
extended_key_usage:
90+
- client_auth
91+
- server_auth
92+
type: certificate
93+
```

jobs/leadership-election/monit

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
check process leadership-election
2+
with pidfile /var/vcap/sys/run/bpm/leadership-election/leadership-election.pid
3+
start program "/var/vcap/jobs/bpm/bin/bpm start leadership-election"
4+
stop program "/var/vcap/jobs/bpm/bin/bpm stop leadership-election"
5+
group vcap
6+

jobs/leadership-election/spec

+51
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
name: leadership-election
3+
4+
templates:
5+
bpm.yml.erb: config/bpm.yml
6+
leadership_election_ca.crt.erb: config/certs/leadership_election_ca.crt
7+
leadership_election.crt.erb: config/certs/leadership_election.crt
8+
leadership_election.key.erb: config/certs/leadership_election.key
9+
metrics_ca.crt.erb: config/certs/metrics_ca.crt
10+
metrics.crt.erb: config/certs/metrics.crt
11+
metrics.key.erb: config/certs/metrics.key
12+
prom_scraper_config.yml.erb: config/prom_scraper_config.yml
13+
14+
provides:
15+
- name: leader-election-agent
16+
type: leader-election-agent
17+
properties:
18+
- port
19+
20+
consumes:
21+
- name: leader-election-agent
22+
type: leader-election-agent
23+
24+
packages:
25+
- leadership-election
26+
27+
properties:
28+
port:
29+
description: "The port the agent will listen on for HTTP requests"
30+
default: 8080
31+
health_port:
32+
description: "The port where debug information will be served"
33+
default: 6060
34+
tls.ca_cert:
35+
description: "TLS CA cert to verify requests to leadership election"
36+
tls.cert:
37+
description: "TLS certificate for leadership election"
38+
tls.key:
39+
description: "TLS private key for leadership election"
40+
41+
metrics.port:
42+
description: "Port leadership election uses to serve metrics and debug information"
43+
default: 14921
44+
metrics.ca_cert:
45+
description: "TLS CA cert to verify requests to metrics endpoint."
46+
metrics.cert:
47+
description: "TLS certificate for metrics server signed by the metrics CA"
48+
metrics.key:
49+
description: "TLS private key for metrics server signed by the metrics CA"
50+
metrics.server_name:
51+
description: "The server name used in the scrape configuration for the metrics endpoint"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
<%
2+
sorted_instances=link("leader-election-agent").instances.sort_by {|i|i.address}
3+
index=sorted_instances.index(sorted_instances.find{|i|i.id == spec.id})
4+
addrs=sorted_instances.map{|i| "#{i.address}:#{p('port')}"}.join(',')
5+
6+
certs_dir="/var/vcap/jobs/leadership-election/config/certs"
7+
%>
8+
processes:
9+
- name: leadership-election
10+
executable: /var/vcap/packages/leadership-election/leadership-election
11+
env:
12+
PORT: <%= p("port") %>
13+
HEALTH_PORT: <%= p("health_port") %>
14+
NODE_INDEX: "<%= index %>"
15+
NODE_ADDRS: "<%= addrs %>"
16+
CA_FILE: "<%= certs_dir %>/leadership_election_ca.crt"
17+
CERT_FILE: "<%= certs_dir %>/leadership_election.crt"
18+
KEY_FILE: "<%= certs_dir %>/leadership_election.key"
19+
20+
METRICS_PORT: <%= p('metrics.port') %>
21+
METRICS_CA_FILE_PATH: "<%= certs_dir %>/metrics_ca.crt"
22+
METRICS_CERT_FILE_PATH: "<%= certs_dir %>/metrics.crt"
23+
METRICS_KEY_FILE_PATH: "<%= certs_dir %>/metrics.key"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
<%= p("tls.cert") %>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
<%= p("tls.key") %>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
<%= p("tls.ca_cert") %>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
<%= p("metrics.cert") %>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
<%= p("metrics.key") %>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
<%= p("metrics.ca_cert") %>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
port: <%= p("metrics.port") %>
3+
source_id: "leadership-election"
4+
instance_id: <%= spec.id || spec.index.to_s %>
5+
scheme: https
6+
server_name: <%= p('metrics.server_name') %>
+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
check process loggr-system-metric-scraper
2+
with pidfile /var/vcap/sys/run/bpm/loggr-system-metric-scraper/loggr-system-metric-scraper.pid
3+
start program "/var/vcap/jobs/bpm/bin/bpm start loggr-system-metric-scraper"
4+
stop program "/var/vcap/jobs/bpm/bin/bpm stop loggr-system-metric-scraper"
5+
group vcap

jobs/loggr-system-metric-scraper/spec

+67
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
name: loggr-system-metric-scraper
3+
4+
consumes:
5+
- name: loggregator
6+
type: loggregator
7+
optional: false
8+
- name: leader-election-agent
9+
type: leader-election-agent
10+
11+
templates:
12+
bpm.yml.erb: config/bpm.yml
13+
loggregator_agent.crt.erb: config/certs/loggregator_agent.crt
14+
loggregator_agent.key.erb: config/certs/loggregator_agent.key
15+
loggregator_ca.crt.erb: config/certs/loggregator_ca.crt
16+
system_metrics.crt.erb: config/certs/system_metrics.crt
17+
system_metrics.key.erb: config/certs/system_metrics.key
18+
system_metrics_ca.crt.erb: config/certs/system_metrics_ca.crt
19+
prom_scraper_config.yml.erb: config/prom_scraper_config.yml
20+
metrics_ca.crt.erb: config/certs/metrics_ca.crt
21+
metrics.crt.erb: config/certs/metrics.crt
22+
metrics.key.erb: config/certs/metrics.key
23+
leadership_election_ca.crt.erb: config/certs/leadership_election_ca.crt
24+
leadership_election.crt.erb: config/certs/leadership_election.crt
25+
leadership_election.key.erb: config/certs/leadership_election.key
26+
27+
packages:
28+
- metric-scraper
29+
30+
properties:
31+
scrape_interval:
32+
description: "The interval to scrape the metrics URL (golang duration)"
33+
default: 1m
34+
scrape_port:
35+
description: "The port where the scraping endpoints are hosted"
36+
default: 9100
37+
38+
system_metrics.tls.common_name:
39+
description: "Common name for system metrics agent CA"
40+
default: "system-metrics"
41+
system_metrics.tls.ca_cert:
42+
description: |
43+
TLS loggregator root CA certificate. It is required for key/cert
44+
verification.
45+
system_metrics.tls.cert:
46+
description: "TLS certificate for system metrics agent signed by the loggregator CA"
47+
system_metrics.tls.key:
48+
description: "TLS private key for system metrics agent signed by the loggregator CA"
49+
50+
metrics.port:
51+
description: "Port the agent uses to serve metrics and debug information"
52+
default: 14825
53+
metrics.ca_cert:
54+
description: "TLS CA cert to verify requests to metrics endpoint."
55+
metrics.cert:
56+
description: "TLS certificate for metrics server signed by the metrics CA"
57+
metrics.key:
58+
description: "TLS private key for metrics server signed by the metrics CA"
59+
metrics.server_name:
60+
description: "The server name used in the scrape configuration for the metrics endpoint"
61+
62+
leadership_election.ca_cert:
63+
description: "TLS CA cert to verify requests to leadership election endpoint."
64+
leadership_election.cert:
65+
description: "TLS certificate for leadership election client signed by the leadership election CA"
66+
leadership_election.key:
67+
description: "TLS private key for leadership election client signed by the leadership election CA"

0 commit comments

Comments
 (0)