Skip to content

Commit bc28bf9

Browse files
vagimelihdhalternatebowerdlvenable
authored
Edit for redundant information and sections across Data Prepper (opensearch-project#7127)
* Edit for redundant information and sections across Data Prepper Signed-off-by: Melissa Vagi <[email protected]> * Edit for redundant information and sections across Data Prepper Signed-off-by: Melissa Vagi <[email protected]> * Rewrite expression syntax and reorganize doc structure for readability Signed-off-by: Melissa Vagi <[email protected]> * Rewrite expression syntax and reorganize doc structure for readability Signed-off-by: Melissa Vagi <[email protected]> * Rewrite expression syntax and reorganize doc structure for readability Signed-off-by: Melissa Vagi <[email protected]> * Rewrite expression syntax and reorganize doc structure for readability Signed-off-by: Melissa Vagi <[email protected]> * Rewrite expression syntax and reorganize doc structure for readability Signed-off-by: Melissa Vagi <[email protected]> * Update _data-prepper/index.md Signed-off-by: Melissa Vagi <[email protected]> * Update configuring-data-prepper.md Signed-off-by: Melissa Vagi <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _data-prepper/pipelines/expression-syntax.md Signed-off-by: Melissa Vagi <[email protected]> * Update _data-prepper/pipelines/expression-syntax.md Signed-off-by: Melissa Vagi <[email protected]> * Update _data-prepper/pipelines/pipelines.md Signed-off-by: Melissa Vagi <[email protected]> * Update expression-syntax.md Signed-off-by: Melissa Vagi <[email protected]> * Create Functions subpages Signed-off-by: Melissa Vagi <[email protected]> * Create functions subpages Signed-off-by: Melissa Vagi <[email protected]> * Copy edit Signed-off-by: Melissa Vagi <[email protected]> * add remaining subpages Signed-off-by: Melissa Vagi <[email protected]> * Update _data-prepper/index.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Heather Halter <[email protected]> * Apply suggestions from code review Accepted editorial suggestions. Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Heather Halter <[email protected]> * Apply suggestions from code review Accepted more editorial suggestions that were hidden. Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Heather Halter <[email protected]> * Apply suggestions from code review Co-authored-by: Heather Halter <[email protected]> Signed-off-by: David Venable <[email protected]> * removed-line Signed-off-by: Heather Halter <[email protected]> * Fixed broken link to pipelines Signed-off-by: Heather Halter <[email protected]> * Fixed broken links on Update add-entries.md Signed-off-by: Heather Halter <[email protected]> * Fixed broken link in Update dynamo-db.md Signed-off-by: Heather Halter <[email protected]> * Fixed link syntax in Update index.md Signed-off-by: Heather Halter <[email protected]> --------- Signed-off-by: Melissa Vagi <[email protected]> Signed-off-by: Heather Halter <[email protected]> Signed-off-by: David Venable <[email protected]> Signed-off-by: Heather Halter <[email protected]> Co-authored-by: Heather Halter <[email protected]> Co-authored-by: Nathan Bower <[email protected]> Co-authored-by: David Venable <[email protected]>
1 parent 73ab08c commit bc28bf9

19 files changed

+364
-543
lines changed

_data-prepper/index.md

+19-36
Original file line numberDiff line numberDiff line change
@@ -18,42 +18,24 @@ Data Prepper is a server-side data collector capable of filtering, enriching, tr
1818

1919
With Data Prepper you can build custom pipelines to improve the operational view of applications. Two common use cases for Data Prepper are trace analytics and log analytics. [Trace analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/) can help you visualize event flows and identify performance problems. [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/) equips you with tools to enhance your search capabilities, conduct comprehensive analysis, and gain insights into your applications' performance and behavior.
2020

21-
## Concepts
21+
## Key concepts and fundamentals
2222

23-
Data Prepper includes one or more **pipelines** that collect and filter data based on the components set within the pipeline. Each component is pluggable, enabling you to use your own custom implementation of each component. These components include the following:
23+
Data Prepper ingests data through customizable [pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/). These pipelines consist of pluggable components that you can customize to fit your needs, even allowing you to plug in your own implementations. A Data Prepper pipeline consists of the following components:
2424

25-
- One [source](#source)
26-
- One or more [sinks](#sink)
27-
- (Optional) One [buffer](#buffer)
28-
- (Optional) One or more [processors](#processor)
25+
- One [source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/sources/)
26+
- One or more [sinks]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sinks/sinks/)
27+
- (Optional) One [buffer]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/buffers/buffers/)
28+
- (Optional) One or more [processors]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/)
2929

30-
A single instance of Data Prepper can have one or more pipelines.
30+
Each pipeline contains two required components: `source` and `sink`. If a `buffer`, a `processor`, or both are missing from the pipeline, then Data Prepper uses the default `bounded_blocking` buffer and a no-op processor. Note that a single instance of Data Prepper can have one or more pipelines.
3131

32-
Each pipeline definition contains two required components: **source** and **sink**. If buffers and processors are missing from the Data Prepper pipeline, Data Prepper uses the default buffer and a no-op processor.
32+
## Basic pipeline configurations
3333

34-
### Source
34+
To understand how the pipeline components function within a Data Prepper configuration, see the following examples. Each pipeline configuration uses a `yaml` file format. For more information, see [Pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/) for more information and examples.
3535

36-
Source is the input component that defines the mechanism through which a Data Prepper pipeline will consume events. A pipeline can have only one source. The source can consume events either by receiving the events over HTTP or HTTPS or by reading from external endpoints like OTeL Collector for traces and metrics and Amazon Simple Storage Service (Amazon S3). Sources have their own configuration options based on the format of the events (such as string, JSON, Amazon CloudWatch logs, or open telemetry trace). The source component consumes events and writes them to the buffer component.
36+
### Minimal configuration
3737

38-
### Buffer
39-
40-
The buffer component acts as the layer between the source and the sink. Buffer can be either in-memory or disk based. The default buffer uses an in-memory queue called `bounded_blocking` that is bounded by the number of events. If the buffer component is not explicitly mentioned in the pipeline configuration, Data Prepper uses the default `bounded_blocking`.
41-
42-
### Sink
43-
44-
Sink is the output component that defines the destination(s) to which a Data Prepper pipeline publishes events. A sink destination could be a service, such as OpenSearch or Amazon S3, or another Data Prepper pipeline. When using another Data Prepper pipeline as the sink, you can chain multiple pipelines together based on the needs of the data. Sink contains its own configuration options based on the destination type.
45-
46-
### Processor
47-
48-
Processors are units within the Data Prepper pipeline that can filter, transform, and enrich events using your desired format before publishing the record to the sink component. The processor is not defined in the pipeline configuration; the events publish in the format defined in the source component. You can have more than one processor within a pipeline. When using multiple processors, the processors are run in the order they are defined inside the pipeline specification.
49-
50-
## Sample pipeline configurations
51-
52-
To understand how all pipeline components function within a Data Prepper configuration, see the following examples. Each pipeline configuration uses a `yaml` file format.
53-
54-
### Minimal component
55-
56-
This pipeline configuration reads from the file source and writes to another file in the same path. It uses the default options for the buffer and processor.
38+
The following minimal pipeline configuration reads from the file source and writes the data to another file on the same path. It uses the default options for the `buffer` and `processor` components.
5739

5840
```yml
5941
sample-pipeline:
@@ -65,13 +47,13 @@ sample-pipeline:
6547
path: <path/to/output-file>
6648
```
6749
68-
### All components
50+
### Comprehensive configuration
6951
70-
The following pipeline uses a source that reads string events from the `input-file`. The source then pushes the data to the buffer, bounded by a max size of `1024`. The pipeline is configured to have `4` workers, each of them reading a maximum of `256` events from the buffer for every `100 milliseconds`. Each worker runs the `string_converter` processor and writes the output of the processor to the `output-file`.
52+
The following comprehensive pipeline configuration uses both required and optional components:
7153
7254
```yml
7355
sample-pipeline:
74-
workers: 4 #Number of workers
56+
workers: 4 # Number of workers
7557
delay: 100 # in milliseconds, how often the workers should run
7658
source:
7759
file:
@@ -88,9 +70,10 @@ sample-pipeline:
8870
path: <path/to/output-file>
8971
```
9072
91-
## Next steps
92-
93-
To get started building your own custom pipelines with Data Prepper, see [Getting started]({{site.url}}{{site.baseurl}}/clients/data-prepper/get-started/).
73+
In the given pipeline configuration, the `source` component reads string events from the `input-file` and pushes the data to a bounded buffer with a maximum size of `1024`. The `workers` component specifies `4` concurrent threads that will process events from the buffer, each reading a maximum of `256` events from the buffer every `100` milliseconds. Each `workers` component runs the `string_converter` processor, which converts the strings to uppercase and writes the processed output to the `output-file`.
9474

95-
<!---Delete this comment.--->
75+
## Next steps
9676

77+
- [Get started with Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/).
78+
- [Get familiar with Data Prepper pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/).
79+
- [Explore common use cases]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/common-use-cases/).

_data-prepper/managing-data-prepper/configuring-data-prepper.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -103,8 +103,7 @@ check_interval | No | Duration | Specifies the time between checks of the heap s
103103

104104
### Extension plugins
105105

106-
Since Data Prepper 2.5, Data Prepper provides support for user configurable extension plugins. Extension plugins are shared common
107-
configurations shared across pipeline plugins, such as [sources, buffers, processors, and sinks]({{site.url}}{{site.baseurl}}/data-prepper/index/#concepts).
106+
Data Prepper provides support for user-configurable extension plugins. Extension plugins are common configurations shared across pipeline plugins, such as [sources, buffers, processors, and sinks]({{site.url}}{{site.baseurl}}/data-prepper/index/#key-concepts-and-fundamentals).
108107

109108
### AWS extension plugins
110109

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
layout: default
3+
title: cidrContains()
4+
parent: Functions
5+
grand_parent: Pipelines
6+
nav_order: 5
7+
---
8+
9+
# cidrContains()
10+
11+
The `cidrContains()` function is used to check if an IP address is contained within a specified Classless Inter-Domain Routing (CIDR) block or range of CIDR blocks. It accepts two or more arguments:
12+
13+
- The first argument is a JSON pointer, which represents the key or path to the field containing the IP address to be checked. It supports both IPv4 and IPv6 address formats.
14+
15+
- The subsequent arguments are strings representing one or more CIDR blocks or IP address ranges. The function checks if the IP address specified in the first argument matches or is contained within any of these CIDR blocks.
16+
17+
For example, if your data contains an IP address field named `client.ip` and you want to check if it belongs to the CIDR blocks `192.168.0.0/16` or `10.0.0.0/8`, you can use the `cidrContains()` function as follows:
18+
19+
```
20+
cidrContains('/client.ip', '192.168.0.0/16', '10.0.0.0/8')
21+
```
22+
{% include copy-curl.html %}
23+
24+
This function returns `true` if the IP address matches any of the specified CIDR blocks or `false` if it does not.

_data-prepper/pipelines/configuration/buffers/buffers.md

+6-2
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,13 @@ layout: default
33
title: Buffers
44
parent: Pipelines
55
has_children: true
6-
nav_order: 20
6+
nav_order: 30
77
---
88

99
# Buffers
1010

11-
Buffers store data as it passes through the pipeline. If you implement a custom buffer, it can be memory based, which provides better performance, or disk based, which is larger in size.
11+
The `buffer` component acts as an intermediary layer between the `source` and `sink` components in a Data Prepper pipeline. It serves as temporary storage for events, decoupling the `source` from the downstream processors and sinks. Buffers can be either in-memory or disk based.
12+
13+
If not explicitly specified in the pipeline configuration, Data Prepper uses the default `bounded_blocking` buffer, which is an in-memory queue bounded by the number of events it can store. The `bounded_blocking` buffer is a convenient option when the event volume and processing rates are manageable within the available memory constraints.
14+
15+

_data-prepper/pipelines/configuration/processors/add-entries.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ You can configure the `add_entries` processor with the following options.
2121
| `metadata_key` | No | The key for the new metadata attribute. The argument must be a literal string key and not a JSON Pointer. Either one string key or `metadata_key` is required. |
2222
| `value` | No | The value of the new entry to be added, which can be used with any of the following data types: strings, Booleans, numbers, null, nested objects, and arrays. |
2323
| `format` | No | A format string to use as the value of the new entry, for example, `${key1}-${key2}`, where `key1` and `key2` are existing keys in the event. Required if neither `value` nor `value_expression` is specified. |
24-
| `value_expression` | No | An expression string to use as the value of the new entry. For example, `/key` is an existing key in the event with a type of either a number, a string, or a Boolean. Expressions can also contain functions returning number/string/integer. For example, `length(/key)` will return the length of the key in the event when the key is a string. For more information about keys, see [Expression syntax](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/). |
25-
| `add_when` | No | A [conditional expression](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/), such as `/some-key == "test"'`, that will be evaluated to determine whether the processor will be run on the event. |
24+
| `value_expression` | No | An expression string to use as the value of the new entry. For example, `/key` is an existing key in the event with a type of either a number, a string, or a Boolean. Expressions can also contain functions returning number/string/integer. For example, `length(/key)` will return the length of the key in the event when the key is a string. For more information about keys, see [Expression syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). |
25+
| `add_when` | No | A [conditional expression]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/), such as `/some-key == "test"'`, that will be evaluated to determine whether the processor will be run on the event. |
2626
| `overwrite_if_key_exists` | No | When set to `true`, the existing value is overwritten if `key` already exists in the event. The default value is `false`. |
2727
| `append_if_key_exists` | No | When set to `true`, the existing value will be appended if a `key` already exists in the event. An array will be created if the existing value is not an array. Default is `false`. |
2828

@@ -135,7 +135,7 @@ When the input event contains the following data:
135135
{"message": "hello"}
136136
```
137137

138-
The processed event will have the same data, with the metadata, `{"length": 5}`, attached. You can subsequently use expressions like `getMetadata("length")` in the pipeline. For more information, see the [`getMetadata` function](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/#getmetadata) documentation.
138+
The processed event will have the same data, with the metadata, `{"length": 5}`, attached. You can subsequently use expressions like `getMetadata("length")` in the pipeline. For more information, see [`getMetadata` function]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/get-metadata/).
139139

140140

141141
### Example: Add a dynamic key

_data-prepper/pipelines/configuration/processors/processors.md

+6-4
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,14 @@ layout: default
33
title: Processors
44
has_children: true
55
parent: Pipelines
6-
nav_order: 25
6+
nav_order: 35
77
---
88

99
# Processors
1010

11-
Processors perform an action on your data, such as filtering, transforming, or enriching.
11+
Processors are components within a Data Prepper pipeline that enable you to filter, transform, and enrich events using your desired format before publishing records to the `sink` component. If no `processor` is defined in the pipeline configuration, then the events are published in the format specified by the `source` component. You can incorporate multiple processors within a single pipeline, and they are executed sequentially as defined in the pipeline.
12+
13+
Prior to Data Prepper 1.3, these components were named *preppers*. In Data Prepper 1.3, the term *prepper* was deprecated in favor of *processor*. In Data Prepper 2.0, the term *prepper* was removed.
14+
{: .note }
15+
1216

13-
Prior to Data Prepper 1.3, processors were named preppers. Starting in Data Prepper 1.3, the term *prepper* is deprecated in favor of the term *processor*. Data Prepper will continue to support the term *prepper* until 2.0, where it will be removed.
14-
{: .note }

_data-prepper/pipelines/configuration/sinks/sinks.md

+9-7
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,22 @@ layout: default
33
title: Sinks
44
parent: Pipelines
55
has_children: true
6-
nav_order: 30
6+
nav_order: 25
77
---
88

99
# Sinks
1010

11-
Sinks define where Data Prepper writes your data to.
11+
A `sink` is an output component that specifies the destination(s) to which a Data Prepper pipeline publishes events. Sink destinations can be services like OpenSearch, Amazon Simple Storage Service (Amazon S3), or even another Data Prepper pipeline, enabling chaining of multiple pipelines. The sink component has the following configurable options that you can use to customize the destination type.
1212

13-
## General options for all sink types
13+
## Configuration options
1414

1515
The following table describes options you can use to configure the `sinks` sink.
1616

1717
Option | Required | Type | Description
1818
:--- | :--- |:------------| :---
19-
routes | No | String list | A list of routes for which this sink applies. If not provided, this sink receives all events. See [conditional routing]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines#conditional-routing) for more information.
20-
tags_target_key | No | String | When specified, includes event tags in the output of the provided key.
21-
include_keys | No | String list | When specified, provides the keys in this list in the data sent to the sink. Some codecs and sinks do not allow use of this field.
22-
exclude_keys | No | String list | When specified, excludes the keys given from the data sent to the sink. Some codecs and sinks do not allow use of this field.
19+
`routes` | No | String list | A list of routes to which the sink applies. If not provided, then the sink receives all events. See [conditional routing]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines#conditional-routing) for more information.
20+
`tags_target_key` | No | String | When specified, includes event tags in the output under the provided key.
21+
`include_keys` | No | String list | When specified, provides only the listed keys in the data sent to the sink. Some codecs and sinks may not support this field.
22+
`exclude_keys` | No | String list | When specified, excludes the listed keys from the data sent to the sink. Some codecs and sinks may not support this field.
23+
24+

_data-prepper/pipelines/configuration/sources/dynamo-db.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ Option | Required | Type | Description
9292

9393
## Exposed metadata attributes
9494

95-
The following metadata will be added to each event that is processed by the `dynamodb` source. These metadata attributes can be accessed using the [expression syntax `getMetadata` function](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/#getmetadata).
95+
The following metadata will be added to each event that is processed by the `dynamodb` source. These metadata attributes can be accessed using the [expression syntax `getMetadata` function]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/get-metadata/).
9696

9797
* `primary_key`: The primary key of the DynamoDB item. For tables that only contain a partition key, this value provides the partition key. For tables that contain both a partition and sort key, the `primary_key` attribute will be equal to the partition and sort key, separated by a `|`, for example, `partition_key|sort_key`.
9898
* `partition_key`: The partition key of the DynamoDB item.

_data-prepper/pipelines/configuration/sources/sources.md

+4-2
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,11 @@ layout: default
33
title: Sources
44
parent: Pipelines
55
has_children: true
6-
nav_order: 15
6+
nav_order: 20
77
---
88

99
# Sources
1010

11-
Sources define where your data comes from within a Data Prepper pipeline.
11+
A `source` is an input component that specifies how a Data Prepper pipeline ingests events. Each pipeline has a single source that either receives events over HTTP(S) or reads from external endpoints, such as OpenTelemetry Collector or Amazon Simple Storage Service (Amazon S3). Sources have configurable options based on the event format (string, JSON, Amazon CloudWatch logs, OpenTelemtry traces). The source consumes events and passes them to the [`buffer`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/buffers/buffers/) component.
12+
13+

_data-prepper/pipelines/contains.md

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
layout: default
3+
title: contains()
4+
parent: Functions
5+
grand_parent: Pipelines
6+
nav_order: 10
7+
---
8+
9+
# contains()
10+
11+
The `contains()` function is used to check if a substring exists within a given string or the value of a field in an event. It takes two arguments:
12+
13+
- The first argument is either a literal string or a JSON pointer that represents the field or value to be searched.
14+
15+
- The second argument is the substring to be searched for within the first argument.
16+
The function returns `true` if the substring specified in the second argument is found within the string or field value represented by the first argument. It returns `false` if it is not.
17+
18+
For example, if you want to check if the string `"abcd"` is contained within the value of a field named `message`, you can use the `contains()` function as follows:
19+
20+
```
21+
contains('/message', 'abcd')
22+
```
23+
{% include copy-curl.html %}
24+
25+
This will return `true` if the field `message` contains the substring `abcd` or `false` if it does not.
26+
27+
Alternatively, you can also use a literal string as the first argument:
28+
29+
```
30+
contains('This is a test message', 'test')
31+
```
32+
{% include copy-curl.html %}
33+
34+
In this case, the function will return `true` because the substring `test` is present within the string `This is a test message`.
35+
36+
Note that the `contains()` function performs a case-sensitive search by default. If you need to perform a case-insensitive search, you can use the `containsIgnoreCase()` function instead.

0 commit comments

Comments
 (0)