Skip to content

Commit 37270c1

Browse files
Add copy buttons and highlighting to data prepper code samples (#11465) (#11466)
1 parent b049c71 commit 37270c1

File tree

13 files changed

+67
-46
lines changed

13 files changed

+67
-46
lines changed

_data-prepper/common-use-cases/log-analytics.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ log-pipeline:
6767
# Change to your credentials
6868
username: "admin"
6969
password: "admin"
70-
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
70+
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
7171
#cert: /path/to/cert
7272
# If you are connecting to an Amazon OpenSearch Service domain without
7373
# Fine-Grained Access Control, enable these settings. Comment out the
@@ -78,6 +78,7 @@ log-pipeline:
7878
# You should change this to correspond with how your OpenSearch indexes are set up.
7979
index: apache_logs
8080
```
81+
{% include copy.html %}
8182
8283
This pipeline configuration is an example of Apache log ingestion. Don't forget that you can easily configure the Grok Processor for your own custom logs. You will need to modify the configuration for your OpenSearch cluster.
8384
@@ -100,7 +101,7 @@ Note that you should adjust the file `path`, output `Host`, and `Port` according
100101

101102
The following is an example `fluent-bit.conf` file without SSL and basic authentication enabled on the HTTP source:
102103

103-
```
104+
```text
104105
[INPUT]
105106
name tail
106107
refresh_interval 5
@@ -115,14 +116,15 @@ The following is an example `fluent-bit.conf` file without SSL and basic authent
115116
URI /log/ingest
116117
Format json
117118
```
119+
{% include copy.html %}
118120

119121
If your HTTP source has SSL and basic authentication enabled, you will need to add the details of `http_User`, `http_Passwd`, `tls.crt_file`, and `tls.key_file` to the `fluent-bit.conf` file, as shown in the following example.
120122

121123
### Example: Fluent Bit file with SSL and basic authentication enabled
122124

123125
The following is an example `fluent-bit.conf` file with SSL and basic authentication enabled on the HTTP source:
124126

125-
```
127+
```text
126128
[INPUT]
127129
name tail
128130
refresh_interval 5
@@ -142,6 +144,7 @@ The following is an example `fluent-bit.conf` file with SSL and basic authentica
142144
URI /log/ingest
143145
Format json
144146
```
147+
{% include copy.html %}
145148

146149
# Next steps
147150

_data-prepper/common-use-cases/trace-analytics.md

Lines changed: 21 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ The following example demonstrates how to build a pipeline that supports the [Op
116116

117117
Starting with Data Prepper version 2.0, Data Prepper no longer supports the `otel_traces_prepper` processor. The `otel_traces` processor replaces the `otel_traces_prepper` processor and supports some of Data Prepper's recent data model changes. Instead, you should use the `otel_traces` processor. See the following YAML file example:
118118

119-
```yml
119+
```yaml
120120
entry-pipeline:
121121
delay: "100"
122122
source:
@@ -167,6 +167,7 @@ service-map-pipeline:
167167
password: admin
168168
index_type: trace-analytics-service-map
169169
```
170+
{% include copy.html %}
170171
171172
To maintain similar ingestion throughput and latency, scale the `buffer_size` and `batch_size` by the estimated maximum batch size in the client request payload. {: .tip}
172173

@@ -186,21 +187,22 @@ source:
186187
username: "my-user"
187188
password: "my_s3cr3t"
188189
```
190+
{% include copy.html %}
189191

190192
#### Example: pipeline.yaml
191193

192194
The following is an example `pipeline.yaml` file without SSL and basic authentication enabled for the `otel-trace-pipeline` pipeline:
193195

194196
```yaml
195197
otel-trace-pipeline:
196-
# workers is the number of threads processing data in each pipeline.
198+
# workers is the number of threads processing data in each pipeline.
197199
# We recommend same value for all pipelines.
198200
# default value is 1, set a value based on the machine you are running Data Prepper
199-
workers: 8
201+
workers: 8
200202
# delay in milliseconds is how often the worker threads should process data.
201203
# Recommend not to change this config as we want the entry-pipeline to process as quick as possible
202204
# default value is 3_000 ms
203-
delay: "100"
205+
delay: "100"
204206
source:
205207
otel_trace_source:
206208
#record_type: event # Add this when using Data Prepper 1.x. This option is removed in 2.0
@@ -209,8 +211,8 @@ otel-trace-pipeline:
209211
unauthenticated:
210212
buffer:
211213
bounded_blocking:
212-
# buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory.
213-
# We recommend to keep the same buffer_size for all pipelines.
214+
# buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory.
215+
# We recommend to keep the same buffer_size for all pipelines.
214216
# Make sure you configure sufficient heap
215217
# default value is 512
216218
buffer_size: 512
@@ -225,9 +227,9 @@ otel-trace-pipeline:
225227
name: "entry-pipeline"
226228
raw-trace-pipeline:
227229
# Configure same as the otel-trace-pipeline
228-
workers: 8
230+
workers: 8
229231
# We recommend using the default value for the raw-trace-pipeline.
230-
delay: "3000"
232+
delay: "3000"
231233
source:
232234
pipeline:
233235
name: "entry-pipeline"
@@ -248,7 +250,7 @@ raw-trace-pipeline:
248250
# Change to your credentials
249251
username: "admin"
250252
password: "admin"
251-
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
253+
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
252254
#cert: /path/to/cert
253255
# If you are connecting to an Amazon OpenSearch Service domain without
254256
# Fine-Grained Access Control, enable these settings. Comment out the
@@ -262,7 +264,7 @@ raw-trace-pipeline:
262264
# Change to your credentials
263265
username: "admin"
264266
password: "admin"
265-
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
267+
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
266268
#cert: /path/to/cert
267269
# If you are connecting to an Amazon OpenSearch Service domain without
268270
# Fine-Grained Access Control, enable these settings. Comment out the
@@ -277,14 +279,14 @@ service-map-pipeline:
277279
name: "entry-pipeline"
278280
processor:
279281
- service_map:
280-
# The window duration is the maximum length of time the data prepper stores the most recent trace data to evaluvate service-map relationships.
282+
# The window duration is the maximum length of time the data prepper stores the most recent trace data to evaluvate service-map relationships.
281283
# The default is 3 minutes, this means we can detect relationships between services from spans reported in last 3 minutes.
282-
# Set higher value if your applications have higher latency.
283-
window_duration: 180
284+
# Set higher value if your applications have higher latency.
285+
window_duration: 180
284286
buffer:
285287
bounded_blocking:
286-
# buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory.
287-
# We recommend to keep the same buffer_size for all pipelines.
288+
# buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory.
289+
# We recommend to keep the same buffer_size for all pipelines.
288290
# Make sure you configure sufficient heap
289291
# default value is 512
290292
buffer_size: 512
@@ -299,14 +301,15 @@ service-map-pipeline:
299301
# Change to your credentials
300302
username: "admin"
301303
password: "admin"
302-
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
304+
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
303305
#cert: /path/to/cert
304306
# If you are connecting to an Amazon OpenSearch Service domain without
305307
# Fine-Grained Access Control, enable these settings. Comment out the
306308
# username and password above.
307309
#aws_sigv4: true
308310
#aws_region: us-east-1
309311
```
312+
{% include copy.html %}
310313

311314
You need to modify the preceding configuration for your OpenSearch cluster so that the configuration matches your environment. Note that it has two `opensearch` sinks that need to be modified.
312315
{: .note}
@@ -328,7 +331,7 @@ You need to run OpenTelemetry Collector in your service environment. Follow [Get
328331

329332
The following is an example `otel-collector-config.yaml` file:
330333

331-
```
334+
```yaml
332335
receivers:
333336
jaeger:
334337
protocols:
@@ -356,6 +359,7 @@ service:
356359
processors: [batch/traces]
357360
exporters: [otlp/data-prepper]
358361
```
362+
{% include copy.html %}
359363

360364
After you run OpenTelemetry in your service environment, you must configure your application to use the OpenTelemetry Collector. The OpenTelemetry Collector typically runs alongside your application.
361365

_data-prepper/getting-started.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ There are two ways to install Data Prepper: you can run the Docker image or buil
1919

2020
The easiest way to use Data Prepper is by running the Docker image. We suggest that you use this approach if you have [Docker](https://www.docker.com) available. Run the following command:
2121

22-
```
22+
```bash
2323
docker pull opensearchproject/data-prepper:latest
2424
```
2525
{% include copy.html %}
@@ -36,27 +36,30 @@ Two configuration files are required to run a Data Prepper instance. Optionally,
3636

3737
For Data Prepper versions earlier than 2.0, the `.jar` file expects the pipeline configuration file path to be followed by the server configuration file path. See the following configuration path example:
3838

39-
```
39+
```bash
4040
java -jar data-prepper-core-$VERSION.jar pipelines.yaml data-prepper-config.yaml
4141
```
42+
{% include copy.html %}
4243

4344
Optionally, you can add `"-Dlog4j.configurationFile=config/log4j2.properties"` to the command to pass a custom Log4j 2 configuration file. If you don't provide a properties file, Data Prepper defaults to the `log4j2.properties` file in the `shared-config` directory.
4445

4546

4647
Starting with Data Prepper 2.0, you can launch Data Prepper by using the following `data-prepper` script that does not require any additional command line arguments:
4748

48-
```
49+
```bash
4950
bin/data-prepper
5051
```
52+
{% include copy.html %}
5153

5254
Configuration files are read from specific subdirectories in the application's home directory:
5355
1. `pipelines/`: Used for pipeline configurations. Pipeline configurations can be written in one or more YAML files.
5456
2. `config/data-prepper-config.yaml`: Used for the Data Prepper server configuration.
5557

5658
You can supply your own pipeline configuration file path followed by the server configuration file path. However, this method will not be supported in a future release. See the following example:
57-
```
59+
```bash
5860
bin/data-prepper pipelines.yaml data-prepper-config.yaml
5961
```
62+
{% include copy.html %}
6063

6164
The Log4j 2 configuration file is read from the `config/log4j2.properties` file located in the application's home directory.
6265

@@ -69,7 +72,7 @@ To configure Data Prepper, see the following information for each use case:
6972

7073
Create a Data Prepper pipeline file named `pipelines.yaml` using the following configuration:
7174

72-
```yml
75+
```yaml
7376
simple-sample-pipeline:
7477
workers: 2
7578
delay: "5000"
@@ -96,7 +99,7 @@ The example pipeline configuration above demonstrates a simple pipeline with a s
9699

97100
After starting Data Prepper, you should see log output and some UUIDs after a few seconds:
98101

99-
```yml
102+
```text
100103
2021-09-30T20:19:44,147 [main] INFO com.amazon.dataprepper.pipeline.server.DataPrepperServer - Data Prepper server running at :4900
101104
2021-09-30T20:19:44,681 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
102105
2021-09-30T20:19:45,183 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
@@ -120,21 +123,21 @@ image and modify both the `pipelines.yaml` and `data-prepper-config.yaml` files.
120123

121124
For Data Prepper 2.0 or later, use this command:
122125

123-
```
126+
```bash
124127
docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/config/data-prepper-config.yaml opensearchproject/data-prepper:latest
125128
```
126129
{% include copy.html %}
127130

128131
For Data Prepper versions earlier than 2.0, use this command:
129132

130-
```
133+
```bash
131134
docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml opensearchproject/data-prepper:1.x
132135
```
133136
{% include copy.html %}
134137

135138
Once Data Prepper is running, it processes data until it is shut down. Once you are done, shut it down with the following command:
136139

137-
```
140+
```bash
138141
POST /shutdown
139142
```
140143
{% include copy-curl.html %}

_data-prepper/pipelines/configuration/processors/aggregate.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,15 +38,15 @@ The `remove_duplicates` action processes the first event for a group immediately
3838

3939
The `put_all` action combines events belonging to the same group by overwriting existing keys and adding new keys, similarly to the Java `Map.putAll`. The action drops all events that make up the combined event. For example, when using `identification_keys: ["sourceIp", "destination_ip"]`, the `put_all` action processes the following three events:
4040

41-
```
41+
```json
4242
{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "status": 200 }
4343
{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "bytes": 1000 }
4444
{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "http_verb": "GET" }
4545
```
4646

4747
Then the action combines the events into one. The pipeline then uses the following combined event:
4848

49-
```
49+
```json
5050
{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "status": 200, "bytes": 1000, "http_verb": "GET" }
5151
```
5252

@@ -93,7 +93,7 @@ You can customize the processor with the following configuration options:
9393

9494
For example, when using `identification_keys: ["sourceIp", "destination_ip", "request"]`, `key: latency`, and `buckets: [0.0, 0.25, 0.5]`, the `histogram` action processes the following events:
9595

96-
```
96+
```json
9797
{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "request" : "/index.html", "latency": 0.2 }
9898
{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "request" : "/index.html", "latency": 0.55}
9999
{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "request" : "/index.html", "latency": 0.25 }
@@ -139,7 +139,7 @@ You can set the percentage of events using the `percent` configuration, which in
139139

140140
For example, if percent is set to `50`, the action tries to process the following events in the one-second interval:
141141

142-
```
142+
```json
143143
{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "bytes": 2500 }
144144
{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "bytes": 500 }
145145
{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "bytes": 1000 }
@@ -148,7 +148,7 @@ For example, if percent is set to `50`, the action tries to process the followin
148148

149149
The pipeline processes 50% of the events, drops the other events, and does not generate a new event:
150150

151-
```
151+
```json
152152
{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "bytes": 500 }
153153
{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "bytes": 3100 }
154154
```

_data-prepper/pipelines/configuration/processors/anomaly-detector.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,13 +64,14 @@ To get started, create the following `pipeline.yaml` file. You can use the follo
6464
ad-pipeline:
6565
source:
6666
...
67-
....
67+
....
6868
processor:
6969
- anomaly_detector:
7070
keys: ["latency"]
71-
mode:
71+
mode:
7272
random_cut_forest:
7373
```
74+
{% include copy.html %}
7475
7576
When you run the `anomaly_detector` processor, the processor extracts the value for the `latency` key and then passes the value through the RCF ML algorithm. You can configure any key that comprises integers or real numbers as values. In the following example, you can configure `bytes` or `latency` as the key for an anomaly detector.
7677

_data-prepper/pipelines/configuration/processors/aws-lambda.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Field | Type | Required | Description
4242

4343
#### Example configuration
4444

45-
```
45+
```yaml
4646
processors:
4747
- aws_lambda:
4848
function_name: "my-lambda-function"
@@ -62,7 +62,6 @@ processors:
6262
maximum_size: "5mb"
6363
event_collect_timeout: PT10S
6464
lambda_when: "event['status'] == 'process'"
65-
6665
```
6766
{% include copy.html %}
6867
@@ -98,7 +97,7 @@ Note the following limitations:
9897

9998
Integration tests for this plugin are executed separately from the main Data Prepper build process. Use the following Gradle command to run these tests:
10099

101-
```
100+
```bash
102101
./gradlew :data-prepper-plugins:aws-lambda:integrationTest -Dtests.processor.lambda.region="us-east-1" -Dtests.processor.lambda.functionName="lambda_test_function" -Dtests.processor.lambda.sts_role_arn="arn:aws:iam::123456789012:role/dataprepper-role
103102
```
104103
{% include copy.html %}

_data-prepper/pipelines/configuration/processors/csv.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,10 @@ csv-pipeline:
4848
4949
When run, the processor will parse the message. Although only two column names are specified in processor settings, a third column name is automatically generated because the data contained in `ingest.csv` includes three columns, `1,2,3`:
5050

51-
```
51+
```json
5252
{"message": "1,2,3", "col1": "1", "col2": "2", "column3": "3"}
5353
```
54+
5455
### Automatically detect column names
5556

5657
The following configuration automatically detects the header of a CSV file ingested through an [`s3 source`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/):
@@ -80,7 +81,7 @@ csv-s3-pipeline:
8081

8182
For example, if the `ingest.csv` file in the Amazon Simple Storage Service (Amazon S3) bucket that the Amazon Simple Queue Service (SQS) queue is attached to contains the following data:
8283

83-
```
84+
```text
8485
Should,skip,this,line
8586
a,b,c
8687
1,2,3

0 commit comments

Comments
 (0)