From 612facb41f273fc7d5f4afe859bb1ab300b862e3 Mon Sep 17 00:00:00 2001 From: Antonios Kouzoupis Date: Fri, 18 Nov 2022 16:59:40 +0100 Subject: [PATCH] [HWORKS-284] Documentation for exporting logs --- docs/admin/monitoring/export-logs.md | 139 +++++++++++++++++++++++++++ mkdocs.yml | 1 + 2 files changed, 140 insertions(+) create mode 100644 docs/admin/monitoring/export-logs.md diff --git a/docs/admin/monitoring/export-logs.md b/docs/admin/monitoring/export-logs.md new file mode 100644 index 000000000..099691475 --- /dev/null +++ b/docs/admin/monitoring/export-logs.md @@ -0,0 +1,139 @@ +# Exporting Hopsworks logs + +## Introduction +Hopsworks collects services and applications logs to [Logstash](https://www.elastic.co/logstash/) which then forwards them to OpenSearch for indexing. +Often organizations already have logging systems in place so streaming Hopsworks logs is necessary. + +## Prerequisites +To configure Logstash streaming logs outside of Hopsworks you will need SSH access to the cluster (Logstash node). Also, depending on the target system you might +need authentication tokens or opening firewall rules. + +## Export logs +Logstash is a well established log collection service with many output [plugins](https://www.elastic.co/guide/en/logstash/7.17/output-plugins.html) available. + +Documentation of individual plugins is beyond the scope of this tutorial. In this guide we will give general instructions and also cover the basic but powerful `http` plugin. + +Logstash process logs in *pipelines* where each pipeline is responsible for a logical group of logs. In Hopsworks we have multiple pipelines and their configuration files are under `/srv/hops/logstash/config` + +### Export services logs +To stream various services' logs outside of Hopsworks you will need to **create another pipeline** similar to `services`. + +#### Step 1 +Copy `/srv/hops/logstash/config/services.conf` to `/srv/hops/logstash/config/services_http.conf` + +Change the pipeline *input address* to: +```treetop +input { + pipeline { + address => services_http + } +} +``` + +!!! note + Take a note of the pipeline address as we will use it in Step 2 + +At the end of the file is the `output` section which currently forwards them to OpenSearch. Replace the output section with a sample block such as + +```treetop +output { + http { + format => "json_batch" + headers => ["x-api-key", "API_KEY"] + http_compression => false + http_method => "post" + url => "https://localhost/logs" + follow_redirects => false + } +} +``` + +#### Step 2 +The next step is to configure Logstash to use the new pipeline. + +Open `/srv/hops/logstash/config/pipelines.yml` + +Add the new pipeline in the pipeline definitions +```yaml +- pipeline.id: services_http + path.config: "/srv/hops/logstash/config/services_http.conf" + pipeline.batch.delay: 2000 + pipeline.batch.size: 50 +``` + +**Instruct** the services pipeline to push logs also in the newly created pipeline by appending to `services-intake` for example: + +```yaml +- pipeline.id: services-intake + config.string: | + input { beats { port => 5053 } } + output { pipeline { send_to => ["services","services_http"] } } +``` + +#### Step 3 +Final step is to restart Logstash with `sudo systemctl restart logstash` + +Logstash logs can be found in `/srv/hops/logstash/log/logstash-plain.log` + + +### Export Spark logs +To stream applications' logs to another system the Steps are fairly similar to exporting services logs but need some additional configuration. + +#### Step 1 +Copy `/srv/hops/logstash/config/spark-streaming.conf` to `/srv/hops/logstash/config/spark-streaming_http.conf` + +Change the **input** section to: + +```treetop +input { + pipeline { + address => spark_http + } +} +``` + +Also, add the **output** block such as: + +```treetop +output { + http { + format => "json_batch" + headers => ["x-api-key", "API_KEY"] + http_compression => false + http_method => "post" + url => "https://localhost/logs" + follow_redirects => false + } +} +``` + +#### Step 2 +Edit `/srv/hops/logstash/config/spark-streaming.conf` and **change** the input to: + +```treetop +input { + pipeline { + address => spark + } +} +``` + +#### Step 3 +Now you need to change `/srv/hops/logstash/config/pipelines.yml` and **add** the following pipeline definitions: + +```yaml +- pipeline.id: spark-intake + config.string: | + input { beats { port => 5044 } } + output { pipeline { send_to => ["spark", "spark_http"] } } +- pipeline.id: spark_http + path.config: "/srv/hops/logstash/config/spark-streaming_http.conf" +``` + +#### Step 4 +Finally you should restart Logstash `sudo systemctl restart logstash` + +## Conclusion +It is not easy to write a guide for a task that can be achieved in many different ways but in this guide we gave solid +instructions for streaming Logstash pipelines to external resources. We did not dive into much detail for a specific output +method as there are many plugins and official documentation is complete, but we mainly focused on Hopsworks related configuration. \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 7e93fe2d3..1432d88a3 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -216,6 +216,7 @@ nav: - Monitoring: - Services Dashboards: admin/monitoring/grafana.md - Services Logs: admin/monitoring/services-logs.md + - Export Logs: admin/monitoring/export-logs.md - Authentication: - Configure Authentication: admin/auth.md - Configure OAuth2: