Add more alerting stuff

Starefossen · Starefossen · commit 598b4d8be453 · 2024-02-06T20:43:53.000+01:00
diff --git a/docs/explanation/observability/alerting.md b/docs/explanation/observability/alerting.md
@@ -1,12 +1,20 @@
+---
+description: >-
+  Alerting is a crucial part of observability, and it's the first step in knowing when something is wrong with your application.
+---
 # Alerting
 
-<iframe width="560" height="315" src="https://www.youtube.com/embed/CGldVD5wR-g?si=luayvJTiZBsWK24u" title="YouTube video player" frameborder="0" allowfullscreen></iframe> -->
+<iframe width="560" height="315" src="https://www.youtube.com/embed/CGldVD5wR-g?si=luayvJTiZBsWK24u" title="Video about Actionable Alerting" frameborder="0" allowfullscreen></iframe>
 
 You can't fix what you can't see. Alerting is a crucial part of observability, and it's the first step in knowing when something is wrong with your application.
 
 However, alerting is only as good as the data you have available and the conditions you set. It's important to have a good understanding of what you want to monitor and how you want to be notified. We call this the _alerting strategy_.
 
-While many metrics can be useful for monitoring, not all of them are useful for alerting. When setting up alerts, it's important to choose metrics that are relevant to the user experience and that can be used to detect problems early.
+While many metrics can be useful to gain insights into different aspects of a system, not all of them are useful for alerting. When setting up alerts, it's important to choose metrics that are relevant to the user experience and that can be used to detect problems early.
+
+## Reliability
+
+TBD
 
 ## Critical user journeys
 
@@ -43,7 +51,7 @@ Continuing with the case management system example, let's say you want to monito
 
 ## Alerting objectives
 
-
+Service Level Objectives (SLOs) are the target values or ranges of values for a service level the team aims for. They are defined based on indicators, which are the quantitative measures of some aspect of the level of service, a target value or range of values for the indicator, and a time period over which the indicator is measured.
 
 ## Alerting conditions
 
@@ -59,7 +67,7 @@ Consider the following attributes when setting up alerts:
 
 * _Reset Time_. The amount of time it takes for the alerting system to resolve an alert after the problem has been fixed. Short reset times are desirable as they reduce the amount of time spent dealing with alerts for problems that have already been resolved.
 
+## Reference
 
-https://cloud.google.com/blog/products/management-tools/practical-guide-to-setting-slos
-
-https://cloud.google.com/blog/products/management-tools/good-relevance-and-outcomes-for-alerting-and-monitoring
+* https://cloud.google.com/blog/products/management-tools/practical-guide-to-setting-slos
+* https://cloud.google.com/blog/products/management-tools/good-relevance-and-outcomes-for-alerting-and-monitoring
diff --git a/docs/explanation/observability/logging.md b/docs/explanation/observability/logging.md
@@ -1,6 +1,28 @@
-# Logs (TODO: nav only)
+---
+description: |>
+
+---
+# Logging
+
+## Purpose of logs
+
+Logs are a way to understand what is happening in your application. They are usually text-based and are often used for debugging. Since the format of logs is usually not standardized, it can be difficult to query and aggregate logs and thus we recommend using metrics for dashboards and alerting.
+
+There are many types of logs, and they can be used for different purposes. Some logs are used for debugging, some are used for auditing, and some are used for security. Our primary use case for logs is to understand the flow of a request through a system.
+
+Application logs in nais is first and foremost a tool for developers to debug their applications. It is not intended to be used for auditing or security purposes. We do not condone writing sensitive information to application logs.
+
+## Good practice
+
+- [x] **Establish a clear logging strategy** for your application. What do you want to log? What do you not want to log? What is the purpose of your logs?
+- [x] **Use log levels** to different- [x] **Use log levels** to differentiate between different types of logs. We recommend using the following log levels: `INFO`, `WARN`, `ERROR`, and `FATAL`.
+- [x] **Use structured logging**. This means that your logs must be written in a JSON format. This makes it easier to query and aggregate logs.
+- [x] **Write meaningful log messages** and attach relevant metadata to your logs. This makes it easier to understand what is happening in your application.
+- [ ] **Do not log sensitive information**. This includes personal information, passwords, and secrets. If you need to log sensitive information, use [secure logs](#secure-logs) or [audit logs](#audit-logs).
+- [ ] **Do not underestimate the cost and performance** of logging. Logging is a trade-off between observability, performance, and cost. Logging can be computational and financial expensive, so make sure you log only what you actually need.
+- [ ] **Do not use rely on logs for monitoring**. Use metrics for monitoring, visualization, and alerting as your first line of defense and use logs for debugging when something goes wrong.
+
 
-## Logging
 
 Configure your application to log to console \(stdout/stderr\), it will be scraped by [FluentD](https://www.fluentd.org/) running inside the cluster and sent to [Elasticsearch](https://www.elastic.co/products/elasticsearch) and made available via [Kibana](https://www.elastic.co/products/kibana). Visit our Kibana at [logs.adeo.no](https://logs.adeo.no/).