From 3d38f63d689d1910837c2c4e591289f7784c6f65 Mon Sep 17 00:00:00 2001 From: Anton Rubin Date: Mon, 13 Oct 2025 16:28:43 +0100 Subject: [PATCH 1/4] adding examples to drop_event processor in data prepper Signed-off-by: Anton Rubin --- .../configuration/processors/drop-events.md | 287 +++++++++++++++++- 1 file changed, 283 insertions(+), 4 deletions(-) diff --git a/_data-prepper/pipelines/configuration/processors/drop-events.md b/_data-prepper/pipelines/configuration/processors/drop-events.md index 374ff573a56..c49a9e8c115 100644 --- a/_data-prepper/pipelines/configuration/processors/drop-events.md +++ b/_data-prepper/pipelines/configuration/processors/drop-events.md @@ -8,14 +8,293 @@ nav_order: 130 # Drop events processor - The `drop_events` processor drops all the events that are passed into it. The following table describes when events are dropped and how exceptions for dropping events are handled. Option | Required | Type | Description :--- | :--- | :--- | :--- drop_when | Yes | String | Accepts an OpenSearch Data Prepper expression string following the [expression syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). Configuring `drop_events` with `drop_when: true` drops all the events received. -handle_failed_events | No | Enum | Specifies how exceptions are handled when an exception occurs while evaluating an event. Default value is `drop`, which drops the event so that it is not sent to OpenSearch. Available options are `drop`, `drop_silently`, `skip`, and `skip_silently`. For more information, see [handle_failed_events](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/drop-events-processor#handle_failed_events). +handle_failed_events | No | Enum | Specifies how exceptions are handled when an exception occurs while evaluating an event. Default value is `drop`, which drops the event so that it is not sent to OpenSearch. Available options are:
- `drop`: The event will be dropped and a warning will be logged.
- `drop_silently`: The event will be dropped without warning.
- `skip`: The event will not be dropped and a warning will be logged.
- `skip_silently`: The event will not be dropped and no warning will be logged.
For more information, see [handle_failed_events](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/drop-events-processor#handle_failed_events). + +## Examples + +The following are examples of possible pipeline configurations using `drop_events` processors. + +### Filter out debug logs + +This example configuration demonstrates filtering out `DEBUG` level logs to reduce noise and storage costs while allowing `INFO`, `WARN`, and `ERROR` events through. + +```yaml +filter-debug-logs-pipeline: + source: + http: + port: 2021 + path: /events + ssl: false + + processor: + - drop_events: + drop_when: '/level == "DEBUG"' + handle_failed_events: drop + + sink: + - opensearch: + hosts: ["https://opensearch:9200"] + insecure: true + username: admin + password: "admin_pass" + index_type: custom + index: "filtered-logs-%{yyyy.MM.dd}" +``` +{% include copy.html %} + +You can test this pipeline using the following command: + +```bash +curl -sS -X POST "http://localhost:2021/events" \ + -H "Content-Type: application/json" \ + -d '[ + {"level": "DEBUG", "message": "Database connection established", "service": "user-service", "timestamp": "2023-10-13T14:30:45Z"}, + {"level": "INFO", "message": "User login successful", "service": "user-service", "timestamp": "2023-10-13T14:31:00Z"}, + {"level": "ERROR", "message": "Database connection failed", "service": "user-service", "timestamp": "2023-10-13T14:32:00Z"}, + {"level": "WARN", "message": "Cache miss detected", "service": "user-service", "timestamp": "2023-10-13T14:33:00Z"} + ]' +``` + +Only three documents get indexed in OpenSearch: + +```json +{ + ... + "hits": { + "total": { + "value": 3, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "filtered-logs-2025.10.13", + "_id": "1zsA3pkBWoWOQ0uhY3EJ", + "_score": 1, + "_source": { + "level": "INFO", + "message": "User login successful", + "service": "user-service", + "timestamp": "2023-10-13T14:31:00Z" + } + }, + { + "_index": "filtered-logs-2025.10.13", + "_id": "2DsA3pkBWoWOQ0uhY3EJ", + "_score": 1, + "_source": { + "level": "ERROR", + "message": "Database connection failed", + "service": "user-service", + "timestamp": "2023-10-13T14:32:00Z" + } + }, + { + "_index": "filtered-logs-2025.10.13", + "_id": "2TsA3pkBWoWOQ0uhY3EJ", + "_score": 1, + "_source": { + "level": "WARN", + "message": "Cache miss detected", + "service": "user-service", + "timestamp": "2023-10-13T14:33:00Z" + } + } + ] + } +} +``` + +### Multi condition event filtering + +The following example shows how to drop events based on multiple criteria, such as debug logs, error status codes, and missing user IDs, ensuring only valid and important events reach OpenSearch. + +```yaml +multi-condition-filter-pipeline: + source: + http: + port: 2022 + path: /events + ssl: false + + processor: + - drop_events: + drop_when: '/level == "DEBUG" or /status_code >= 400 or /user_id == null' + handle_failed_events: drop_silently + + - date: + from_time_received: true + destination: "@timestamp" + + sink: + - opensearch: + hosts: ["https://opensearch:9200"] + insecure: true + username: admin + password: "admin_pass" + index_type: custom + index: "filtered-events-%{yyyy.MM.dd}" +``` +{% include copy.html %} + +You can test this pipeline using the following command: + +```bash +curl -sS -X POST "http://localhost:2022/events" \ + -H "Content-Type: application/json" \ + -d '[ + {"level": "DEBUG", "message": "Cache hit", "status_code": 200, "user_id": "user123", "timestamp": "2023-10-13T14:33:00Z"}, + {"level": "INFO", "message": "Request failed", "status_code": 500, "user_id": "user123", "timestamp": "2023-10-13T14:34:00Z"}, + {"level": "INFO", "message": "Anonymous request", "status_code": 200, "timestamp": "2023-10-13T14:35:00Z"}, + {"level": "INFO", "message": "User request processed", "status_code": 200, "user_id": "user123", "timestamp": "2023-10-13T14:36:00Z"}, + {"level": "INFO", "message": "Another valid request", "status_code": 201, "user_id": "user456", "timestamp": "2023-10-13T14:37:00Z"} + ]' +``` + +Only two documents reach OpenSearch: + +```json +{ + ... + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "filtered-events-2025.10.13", + "_id": "7DsB3pkBWoWOQ0uhmHH4", + "_score": 1, + "_source": { + "level": "INFO", + "message": "User request processed", + "status_code": 200, + "user_id": "user123", + "timestamp": "2023-10-13T14:36:00Z", + "@timestamp": "2025-10-13T14:37:47.636Z" + } + }, + { + "_index": "filtered-events-2025.10.13", + "_id": "7TsB3pkBWoWOQ0uhmHH4", + "_score": 1, + "_source": { + "level": "INFO", + "message": "Another valid request", + "status_code": 201, + "user_id": "user456", + "timestamp": "2023-10-13T14:37:00Z", + "@timestamp": "2025-10-13T14:37:47.636Z" + } + } + ] + } +} +``` + +### Intelligent data sampling + +The following example demonstrates how to implement sampling strategies that drop high volume traffic based on request ID patterns and internal IP addresses, to manage data volume while preserving representative samples: + +```yaml +sampling-pipeline: + source: + http: + port: 2023 + path: /events + ssl: false + + processor: + # Sample based on request_id being in specific sets + - drop_events: + drop_when: '/sampling_rate > 0.4 and /request_id not in {1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000}' + handle_failed_events: skip + + - drop_events: + drop_when: '/source_ip =~ "^192.168.*"' + handle_failed_events: skip + + sink: + - opensearch: + hosts: ["https://opensearch:9200"] + insecure: true + username: admin + password: "admin_pass" + index_type: custom + index: "sampled-events-%{yyyy.MM.dd}" +``` +{% include copy.html %} + +You can test this pipeline using the following command: + +```bash +curl -sS -X POST "http://localhost:2023/events" \ + -H "Content-Type: application/json" \ + -d '[ + {"request_id": 12345, "sampling_rate": 0.9, "source_ip": "10.0.0.1", "message": "High volume request - dropped", "timestamp": "2023-10-13T14:37:00Z"}, + {"request_id": 5000, "sampling_rate": 0.9, "source_ip": "10.0.0.1", "message": "High volume request - sampled", "timestamp": "2023-10-13T14:37:30Z"}, + {"request_id": 12346, "sampling_rate": 0.6, "source_ip": "192.168.1.100", "message": "Internal request - dropped", "timestamp": "2023-10-13T14:38:00Z"}, + {"request_id": 12347, "sampling_rate": 0.3, "source_ip": "203.0.113.45", "message": "External request - passed", "timestamp": "2023-10-13T14:39:00Z"}, + {"request_id": 1000, "sampling_rate": 0.9, "source_ip": "10.0.0.2", "message": "Another sampled request", "timestamp": "2023-10-13T14:40:00Z"} + ]' +``` - +```json +{ + ... + "hits": { + "total": { + "value": 3, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "sampled-events-2025.10.13", + "_id": "i2Ej3pkBh3nNS_N9KazV", + "_score": 1, + "_source": { + "request_id": 5000, + "sampling_rate": 0.9, + "source_ip": "10.0.0.1", + "message": "High volume request - sampled", + "timestamp": "2023-10-13T14:37:30Z" + } + }, + { + "_index": "sampled-events-2025.10.13", + "_id": "jGEj3pkBh3nNS_N9KazV", + "_score": 1, + "_source": { + "request_id": 12347, + "sampling_rate": 0.3, + "source_ip": "203.0.113.45", + "message": "External request - passed", + "timestamp": "2023-10-13T14:39:00Z" + } + }, + { + "_index": "sampled-events-2025.10.13", + "_id": "jWEj3pkBh3nNS_N9KazV", + "_score": 1, + "_source": { + "request_id": 1000, + "sampling_rate": 0.9, + "source_ip": "10.0.0.2", + "message": "Another sampled request", + "timestamp": "2023-10-13T14:40:00Z" + } + } + ] + } +} +``` \ No newline at end of file From 5210c7f1c3c7a7120262d2cef7d8176d27488753 Mon Sep 17 00:00:00 2001 From: Anton Rubin Date: Mon, 13 Oct 2025 16:37:06 +0100 Subject: [PATCH 2/4] adding examples to drop_event processor in data prepper Signed-off-by: Anton Rubin --- .../pipelines/configuration/processors/drop-events.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/_data-prepper/pipelines/configuration/processors/drop-events.md b/_data-prepper/pipelines/configuration/processors/drop-events.md index c49a9e8c115..c0367e88363 100644 --- a/_data-prepper/pipelines/configuration/processors/drop-events.md +++ b/_data-prepper/pipelines/configuration/processors/drop-events.md @@ -59,6 +59,7 @@ curl -sS -X POST "http://localhost:2021/events" \ {"level": "WARN", "message": "Cache miss detected", "service": "user-service", "timestamp": "2023-10-13T14:33:00Z"} ]' ``` +{% include copy.html %} Only three documents get indexed in OpenSearch: @@ -155,6 +156,7 @@ curl -sS -X POST "http://localhost:2022/events" \ {"level": "INFO", "message": "Another valid request", "status_code": 201, "user_id": "user456", "timestamp": "2023-10-13T14:37:00Z"} ]' ``` +{% include copy.html %} Only two documents reach OpenSearch: @@ -245,6 +247,7 @@ curl -sS -X POST "http://localhost:2023/events" \ {"request_id": 1000, "sampling_rate": 0.9, "source_ip": "10.0.0.2", "message": "Another sampled request", "timestamp": "2023-10-13T14:40:00Z"} ]' ``` +{% include copy.html %} Three documents are indexed in OpenSearch: From bf6c86c8707193e4723f19b5984d0c10728b34cd Mon Sep 17 00:00:00 2001 From: Anton Rubin Date: Tue, 14 Oct 2025 11:41:28 +0100 Subject: [PATCH 3/4] addressing PR comments Signed-off-by: Anton Rubin --- .../pipelines/configuration/processors/drop-events.md | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/_data-prepper/pipelines/configuration/processors/drop-events.md b/_data-prepper/pipelines/configuration/processors/drop-events.md index c0367e88363..d5df46510cf 100644 --- a/_data-prepper/pipelines/configuration/processors/drop-events.md +++ b/_data-prepper/pipelines/configuration/processors/drop-events.md @@ -13,7 +13,7 @@ The `drop_events` processor drops all the events that are passed into it. The fo Option | Required | Type | Description :--- | :--- | :--- | :--- drop_when | Yes | String | Accepts an OpenSearch Data Prepper expression string following the [expression syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). Configuring `drop_events` with `drop_when: true` drops all the events received. -handle_failed_events | No | Enum | Specifies how exceptions are handled when an exception occurs while evaluating an event. Default value is `drop`, which drops the event so that it is not sent to OpenSearch. Available options are:
- `drop`: The event will be dropped and a warning will be logged.
- `drop_silently`: The event will be dropped without warning.
- `skip`: The event will not be dropped and a warning will be logged.
- `skip_silently`: The event will not be dropped and no warning will be logged.
For more information, see [handle_failed_events](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/drop-events-processor#handle_failed_events). +handle_failed_events | No | Enum | Specifies how exceptions are handled when an exception occurs while evaluating an event. Default value is `drop`, which drops the event so that it is not sent to any sinks or further processors. Available options are:
- `drop`: The event will be dropped and a warning will be logged.
- `drop_silently`: The event will be dropped without warning.
- `skip`: The event will not be dropped and a warning will be logged.
- `skip_silently`: The event will not be dropped and no warning will be logged.
For more information, see [handle_failed_events](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/drop-events-processor#handle_failed_events). ## Examples @@ -27,7 +27,6 @@ This example configuration demonstrates filtering out `DEBUG` level logs to redu filter-debug-logs-pipeline: source: http: - port: 2021 path: /events ssl: false @@ -119,7 +118,6 @@ The following example shows how to drop events based on multiple criteria, such multi-condition-filter-pipeline: source: http: - port: 2022 path: /events ssl: false @@ -146,7 +144,7 @@ multi-condition-filter-pipeline: You can test this pipeline using the following command: ```bash -curl -sS -X POST "http://localhost:2022/events" \ +curl -sS -X POST "http://localhost:2021/events" \ -H "Content-Type: application/json" \ -d '[ {"level": "DEBUG", "message": "Cache hit", "status_code": 200, "user_id": "user123", "timestamp": "2023-10-13T14:33:00Z"}, @@ -209,7 +207,6 @@ The following example demonstrates how to implement sampling strategies that dro sampling-pipeline: source: http: - port: 2023 path: /events ssl: false @@ -237,7 +234,7 @@ sampling-pipeline: You can test this pipeline using the following command: ```bash -curl -sS -X POST "http://localhost:2023/events" \ +curl -sS -X POST "http://localhost:2021/events" \ -H "Content-Type: application/json" \ -d '[ {"request_id": 12345, "sampling_rate": 0.9, "source_ip": "10.0.0.1", "message": "High volume request - dropped", "timestamp": "2023-10-13T14:37:00Z"}, From a785161b9591cb741e7134e2da8d4e83a99d637f Mon Sep 17 00:00:00 2001 From: AntonEliatra Date: Wed, 15 Oct 2025 10:35:00 +0100 Subject: [PATCH 4/4] Update drop-events.md Signed-off-by: AntonEliatra --- .../configuration/processors/drop-events.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/_data-prepper/pipelines/configuration/processors/drop-events.md b/_data-prepper/pipelines/configuration/processors/drop-events.md index d5df46510cf..c1f104457f3 100644 --- a/_data-prepper/pipelines/configuration/processors/drop-events.md +++ b/_data-prepper/pipelines/configuration/processors/drop-events.md @@ -19,9 +19,12 @@ handle_failed_events | No | Enum | Specifies how exceptions are handled when an The following are examples of possible pipeline configurations using `drop_events` processors. +The examples don't use security and are for demonstration purposes only. We strongly recommend configuring SSL before using these examples in production. +{: .warning} + ### Filter out debug logs -This example configuration demonstrates filtering out `DEBUG` level logs to reduce noise and storage costs while allowing `INFO`, `WARN`, and `ERROR` events through. +The following example configuration demonstrates filtering out `DEBUG` level logs to reduce noise and storage costs while allowing `INFO`, `WARN`, and `ERROR` events through: ```yaml filter-debug-logs-pipeline: @@ -60,7 +63,7 @@ curl -sS -X POST "http://localhost:2021/events" \ ``` {% include copy.html %} -Only three documents get indexed in OpenSearch: +The documents stored in OpenSearch contain the following information: ```json { @@ -112,7 +115,7 @@ Only three documents get indexed in OpenSearch: ### Multi condition event filtering -The following example shows how to drop events based on multiple criteria, such as debug logs, error status codes, and missing user IDs, ensuring only valid and important events reach OpenSearch. +The following example shows how to drop events based on multiple criteria, such as debug logs, error status codes, and missing user IDs, ensuring only valid and important events reach OpenSearch: ```yaml multi-condition-filter-pipeline: @@ -156,7 +159,7 @@ curl -sS -X POST "http://localhost:2021/events" \ ``` {% include copy.html %} -Only two documents reach OpenSearch: +The documents stored in OpenSearch contain the following information: ```json { @@ -246,7 +249,7 @@ curl -sS -X POST "http://localhost:2021/events" \ ``` {% include copy.html %} -Three documents are indexed in OpenSearch: +The documents stored in OpenSearch contain the following information: ```json { @@ -297,4 +300,4 @@ Three documents are indexed in OpenSearch: ] } } -``` \ No newline at end of file +```