You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _data-prepper/common-use-cases/log-analytics.md
+6-3Lines changed: 6 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -67,7 +67,7 @@ log-pipeline:
67
67
# Change to your credentials
68
68
username: "admin"
69
69
password: "admin"
70
-
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
70
+
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
71
71
#cert: /path/to/cert
72
72
# If you are connecting to an Amazon OpenSearch Service domain without
73
73
# Fine-Grained Access Control, enable these settings. Comment out the
@@ -78,6 +78,7 @@ log-pipeline:
78
78
# You should change this to correspond with how your OpenSearch indexes are set up.
79
79
index: apache_logs
80
80
```
81
+
{% include copy.html %}
81
82
82
83
This pipeline configuration is an example of Apache log ingestion. Don't forget that you can easily configure the Grok Processor for your own custom logs. You will need to modify the configuration for your OpenSearch cluster.
83
84
@@ -100,7 +101,7 @@ Note that you should adjust the file `path`, output `Host`, and `Port` according
100
101
101
102
The following is an example `fluent-bit.conf` file without SSL and basic authentication enabled on the HTTP source:
102
103
103
-
```
104
+
```text
104
105
[INPUT]
105
106
name tail
106
107
refresh_interval 5
@@ -115,14 +116,15 @@ The following is an example `fluent-bit.conf` file without SSL and basic authent
115
116
URI /log/ingest
116
117
Format json
117
118
```
119
+
{% include copy.html %}
118
120
119
121
If your HTTP source has SSL and basic authentication enabled, you will need to add the details of `http_User`, `http_Passwd`, `tls.crt_file`, and `tls.key_file` to the `fluent-bit.conf` file, as shown in the following example.
120
122
121
123
### Example: Fluent Bit file with SSL and basic authentication enabled
122
124
123
125
The following is an example `fluent-bit.conf` file with SSL and basic authentication enabled on the HTTP source:
124
126
125
-
```
127
+
```text
126
128
[INPUT]
127
129
name tail
128
130
refresh_interval 5
@@ -142,6 +144,7 @@ The following is an example `fluent-bit.conf` file with SSL and basic authentica
Copy file name to clipboardExpand all lines: _data-prepper/common-use-cases/trace-analytics.md
+21-17Lines changed: 21 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -116,7 +116,7 @@ The following example demonstrates how to build a pipeline that supports the [Op
116
116
117
117
Starting with Data Prepper version 2.0, Data Prepper no longer supports the `otel_traces_prepper` processor. The `otel_traces` processor replaces the `otel_traces_prepper` processor and supports some of Data Prepper's recent data model changes. Instead, you should use the `otel_traces` processor. See the following YAML file example:
118
118
119
-
```yml
119
+
```yaml
120
120
entry-pipeline:
121
121
delay: "100"
122
122
source:
@@ -167,6 +167,7 @@ service-map-pipeline:
167
167
password: admin
168
168
index_type: trace-analytics-service-map
169
169
```
170
+
{% include copy.html %}
170
171
171
172
To maintain similar ingestion throughput and latency, scale the `buffer_size` and `batch_size` by the estimated maximum batch size in the client request payload. {: .tip}
172
173
@@ -186,21 +187,22 @@ source:
186
187
username: "my-user"
187
188
password: "my_s3cr3t"
188
189
```
190
+
{% include copy.html %}
189
191
190
192
#### Example: pipeline.yaml
191
193
192
194
The following is an example `pipeline.yaml` file without SSL and basic authentication enabled for the `otel-trace-pipeline` pipeline:
193
195
194
196
```yaml
195
197
otel-trace-pipeline:
196
-
# workers is the number of threads processing data in each pipeline.
198
+
# workers is the number of threads processing data in each pipeline.
197
199
# We recommend same value for all pipelines.
198
200
# default value is 1, set a value based on the machine you are running Data Prepper
199
-
workers: 8
201
+
workers: 8
200
202
# delay in milliseconds is how often the worker threads should process data.
201
203
# Recommend not to change this config as we want the entry-pipeline to process as quick as possible
202
204
# default value is 3_000 ms
203
-
delay: "100"
205
+
delay: "100"
204
206
source:
205
207
otel_trace_source:
206
208
#record_type: event # Add this when using Data Prepper 1.x. This option is removed in 2.0
@@ -209,8 +211,8 @@ otel-trace-pipeline:
209
211
unauthenticated:
210
212
buffer:
211
213
bounded_blocking:
212
-
# buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory.
213
-
# We recommend to keep the same buffer_size for all pipelines.
214
+
# buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory.
215
+
# We recommend to keep the same buffer_size for all pipelines.
214
216
# Make sure you configure sufficient heap
215
217
# default value is 512
216
218
buffer_size: 512
@@ -225,9 +227,9 @@ otel-trace-pipeline:
225
227
name: "entry-pipeline"
226
228
raw-trace-pipeline:
227
229
# Configure same as the otel-trace-pipeline
228
-
workers: 8
230
+
workers: 8
229
231
# We recommend using the default value for the raw-trace-pipeline.
230
-
delay: "3000"
232
+
delay: "3000"
231
233
source:
232
234
pipeline:
233
235
name: "entry-pipeline"
@@ -248,7 +250,7 @@ raw-trace-pipeline:
248
250
# Change to your credentials
249
251
username: "admin"
250
252
password: "admin"
251
-
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
253
+
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
252
254
#cert: /path/to/cert
253
255
# If you are connecting to an Amazon OpenSearch Service domain without
254
256
# Fine-Grained Access Control, enable these settings. Comment out the
@@ -262,7 +264,7 @@ raw-trace-pipeline:
262
264
# Change to your credentials
263
265
username: "admin"
264
266
password: "admin"
265
-
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
267
+
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
266
268
#cert: /path/to/cert
267
269
# If you are connecting to an Amazon OpenSearch Service domain without
268
270
# Fine-Grained Access Control, enable these settings. Comment out the
@@ -277,14 +279,14 @@ service-map-pipeline:
277
279
name: "entry-pipeline"
278
280
processor:
279
281
- service_map:
280
-
# The window duration is the maximum length of time the data prepper stores the most recent trace data to evaluvate service-map relationships.
282
+
# The window duration is the maximum length of time the data prepper stores the most recent trace data to evaluvate service-map relationships.
281
283
# The default is 3 minutes, this means we can detect relationships between services from spans reported in last 3 minutes.
282
-
# Set higher value if your applications have higher latency.
283
-
window_duration: 180
284
+
# Set higher value if your applications have higher latency.
285
+
window_duration: 180
284
286
buffer:
285
287
bounded_blocking:
286
-
# buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory.
287
-
# We recommend to keep the same buffer_size for all pipelines.
288
+
# buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory.
289
+
# We recommend to keep the same buffer_size for all pipelines.
288
290
# Make sure you configure sufficient heap
289
291
# default value is 512
290
292
buffer_size: 512
@@ -299,14 +301,15 @@ service-map-pipeline:
299
301
# Change to your credentials
300
302
username: "admin"
301
303
password: "admin"
302
-
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
304
+
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
303
305
#cert: /path/to/cert
304
306
# If you are connecting to an Amazon OpenSearch Service domain without
305
307
# Fine-Grained Access Control, enable these settings. Comment out the
306
308
# username and password above.
307
309
#aws_sigv4: true
308
310
#aws_region: us-east-1
309
311
```
312
+
{% include copy.html %}
310
313
311
314
You need to modify the preceding configuration for your OpenSearch cluster so that the configuration matches your environment. Note that it has two `opensearch` sinks that need to be modified.
312
315
{: .note}
@@ -328,7 +331,7 @@ You need to run OpenTelemetry Collector in your service environment. Follow [Get
328
331
329
332
The following is an example `otel-collector-config.yaml` file:
330
333
331
-
```
334
+
```yaml
332
335
receivers:
333
336
jaeger:
334
337
protocols:
@@ -356,6 +359,7 @@ service:
356
359
processors: [batch/traces]
357
360
exporters: [otlp/data-prepper]
358
361
```
362
+
{% include copy.html %}
359
363
360
364
After you run OpenTelemetry in your service environment, you must configure your application to use the OpenTelemetry Collector. The OpenTelemetry Collector typically runs alongside your application.
Copy file name to clipboardExpand all lines: _data-prepper/getting-started.md
+12-9Lines changed: 12 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ There are two ways to install Data Prepper: you can run the Docker image or buil
19
19
20
20
The easiest way to use Data Prepper is by running the Docker image. We suggest that you use this approach if you have [Docker](https://www.docker.com) available. Run the following command:
21
21
22
-
```
22
+
```bash
23
23
docker pull opensearchproject/data-prepper:latest
24
24
```
25
25
{% include copy.html %}
@@ -36,27 +36,30 @@ Two configuration files are required to run a Data Prepper instance. Optionally,
36
36
37
37
For Data Prepper versions earlier than 2.0, the `.jar` file expects the pipeline configuration file path to be followed by the server configuration file path. See the following configuration path example:
Optionally, you can add `"-Dlog4j.configurationFile=config/log4j2.properties"` to the command to pass a custom Log4j 2 configuration file. If you don't provide a properties file, Data Prepper defaults to the `log4j2.properties` file in the `shared-config` directory.
44
45
45
46
46
47
Starting with Data Prepper 2.0, you can launch Data Prepper by using the following `data-prepper` script that does not require any additional command line arguments:
47
48
48
-
```
49
+
```bash
49
50
bin/data-prepper
50
51
```
52
+
{% include copy.html %}
51
53
52
54
Configuration files are read from specific subdirectories in the application's home directory:
53
55
1.`pipelines/`: Used for pipeline configurations. Pipeline configurations can be written in one or more YAML files.
54
56
2.`config/data-prepper-config.yaml`: Used for the Data Prepper server configuration.
55
57
56
58
You can supply your own pipeline configuration file path followed by the server configuration file path. However, this method will not be supported in a future release. See the following example:
Copy file name to clipboardExpand all lines: _data-prepper/pipelines/configuration/processors/aggregate.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,15 +38,15 @@ The `remove_duplicates` action processes the first event for a group immediately
38
38
39
39
The `put_all` action combines events belonging to the same group by overwriting existing keys and adding new keys, similarly to the Java `Map.putAll`. The action drops all events that make up the combined event. For example, when using `identification_keys: ["sourceIp", "destination_ip"]`, the `put_all` action processes the following three events:
@@ -93,7 +93,7 @@ You can customize the processor with the following configuration options:
93
93
94
94
For example, when using `identification_keys: ["sourceIp", "destination_ip", "request"]`, `key: latency`, and `buckets: [0.0, 0.25, 0.5]`, the `histogram` action processes the following events:
Copy file name to clipboardExpand all lines: _data-prepper/pipelines/configuration/processors/anomaly-detector.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -64,13 +64,14 @@ To get started, create the following `pipeline.yaml` file. You can use the follo
64
64
ad-pipeline:
65
65
source:
66
66
...
67
-
....
67
+
....
68
68
processor:
69
69
- anomaly_detector:
70
70
keys: ["latency"]
71
-
mode:
71
+
mode:
72
72
random_cut_forest:
73
73
```
74
+
{% include copy.html %}
74
75
75
76
When you run the `anomaly_detector` processor, the processor extracts the value for the `latency` key and then passes the value through the RCF ML algorithm. You can configure any key that comprises integers or real numbers as values. In the following example, you can configure `bytes` or `latency` as the key for an anomaly detector.
Copy file name to clipboardExpand all lines: _data-prepper/pipelines/configuration/processors/aws-lambda.md
+2-3Lines changed: 2 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -42,7 +42,7 @@ Field | Type | Required | Description
42
42
43
43
#### Example configuration
44
44
45
-
```
45
+
```yaml
46
46
processors:
47
47
- aws_lambda:
48
48
function_name: "my-lambda-function"
@@ -62,7 +62,6 @@ processors:
62
62
maximum_size: "5mb"
63
63
event_collect_timeout: PT10S
64
64
lambda_when: "event['status'] == 'process'"
65
-
66
65
```
67
66
{% include copy.html %}
68
67
@@ -98,7 +97,7 @@ Note the following limitations:
98
97
99
98
Integration tests for this plugin are executed separately from the main Data Prepper build process. Use the following Gradle command to run these tests:
Copy file name to clipboardExpand all lines: _data-prepper/pipelines/configuration/processors/csv.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,9 +48,10 @@ csv-pipeline:
48
48
49
49
When run, the processor will parse the message. Although only two column names are specified in processor settings, a third column name is automatically generated because the data contained in `ingest.csv` includes three columns, `1,2,3`:
The following configuration automatically detects the header of a CSV file ingested through an [`s3 source`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/):
@@ -80,7 +81,7 @@ csv-s3-pipeline:
80
81
81
82
For example, if the `ingest.csv` file in the Amazon Simple Storage Service (Amazon S3) bucket that the Amazon Simple Queue Service (SQS) queue is attached to contains the following data:
0 commit comments