scertecd: restructure metrics to match Prometheus guidelines #11
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I left this as a comment in an already-merged PR #8 which I suspect was not the best way to raise awareness. Opening a PR just to start a discussion, but I won't be too sad if you decide to address this in some other way.
I don't particularly care what Prometheus metric library we use, but if we are open-sourcing this tool I think we should structure metrics in a more idiomatic way, aligning with Prometheus instrumentation and metric naming guidelines. Specifically:
_total
for counters.This PR is an example of what it could look like using a standard Prometheus library. It allows users to monitor things like:
Error ratio of renewals:
Error ratio of setec operations:
Domains that should have been renewed more than 15 minutes ago:
Instances with broken AWS credentials:
I dropped the various OCSP error codes since it seems like many of them are very unlikely to happen, and it's now possible to detect any of them via a metric label (
scertecd_ocsp_checks_total{result="error"}
) and then look at the logs to learn more.Again, I don't feel too strongly about the library used, just that we make these metrics easy to understand and use.