Skip to content

Commit d812346

Browse files
authored
Merge pull request #1526 from fluent/lynettemiles/sc-113397/add-detail-about-how-to-send-files-to-gcs
2 parents efba05e + 6a202d1 commit d812346

File tree

1 file changed

+42
-27
lines changed

1 file changed

+42
-27
lines changed

pipeline/outputs/s3.md

Lines changed: 42 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,9 @@ for details about fetching AWS credentials.
3131

3232
{% hint style="info" %}
3333
The [Prometheus success/retry/error metrics values](administration/monitoring.md)
34-
output by the built-in http server in Fluent Bit are meaningless for S3 output. S3 has
34+
output by the built-in HTTP server in Fluent Bit are meaningless for S3 output. S3 has
3535
its own buffering and retry mechanisms. The Fluent Bit AWS S3 maintainers apologize
36-
for this feature gap; you can [track our progress fixing it on GitHub](https://github.com/fluent/fluent-bit/issues/6141).
36+
for this feature gap; you can [track issue progress on GitHub](https://github.com/fluent/fluent-bit/issues/6141).
3737
{% endhint %}
3838

3939
## Configuration Parameters
@@ -48,10 +48,10 @@ for this feature gap; you can [track our progress fixing it on GitHub](https://g
4848
| `upload_chunk_size` | The size of each part for multipart uploads. Max: 50M | 5,242,880 bytes |
4949
| `upload_timeout` | When this amount of time elapses, Fluent Bit uploads and creates a new file in S3. Set to `60m` to upload a new file every hour. | `10m`|
5050
| `store_dir` | Directory to locally buffer data before sending. When using multipart uploads, data buffers until reaching the `upload_chunk_size`. S3 stores metadata about in progress multipart uploads in this directory, allowing pending uploads to be completed if Fluent Bit stops and restarts. It stores the current `$INDEX` value if enabled in the S3 key format so the `$INDEX` keeps incrementing from its previous value after Fluent Bit restarts. | `/tmp/fluent-bit/s3` |
51-
| `store_dir_limit_size` | Size limit for disk usage in S3. Limit theS3 buffers in the `store_dir` to limit disk usage. Use `store_dir_limit_size` instead of `storage.total_limit_size` which can be used for other plugins | `0` (unlimited) |
51+
| `store_dir_limit_size` | Size limit for disk usage in S3. Limit theS3 buffers in the `store_dir` to limit disk usage. Use `store_dir_limit_size` instead of `storage.total_limit_size` which can be used for other plugins | `0` (unlimited) |
5252
| `s3_key_format` | Format string for keys in S3. This option supports a UUID, strftime time formatters, a syntax for selecting parts of the Fluent log tag using a syntax inspired by the `rewrite_tag` filter. Add `$UUID` in the format string to insert a random string. Add `$INDEX` in the format string to insert an integer that increments each upload. The `$INDEX` value saves in the `store_dir`. Add `$TAG` in the format string to insert the full log tag. Add `$TAG[0]` to insert the first part of the tag in theS3 key. The tag is split into parts using the characters specified with the `s3_key_format_tag_delimiters` option. Add the extension directly after the last piece of the format string to insert a key suffix. To specify a key suffix in `use_put_object` mode, you must specify `$UUID`. See [S3 Key Format](#allowing-a-file-extension-in-the-s3-key-format-with-usduuid). Time in `s3_key` is the timestamp of the first record in the S3 file. | `/fluent-bit-logs/$TAG/%Y/%m/%d/%H/%M/%S` |
5353
| `s3_key_format_tag_delimiters` | A series of characters used to split the tag into parts for use with `s3_key_format`. option. | `.` |
54-
| `static_file_path` | Disables behavior where UUID string appendeds to the end of the S3 key name when `$UUID` is not provided in `s3_key_format`. `$UUID`, time formatters, `$TAG`, and other dynamic key formatters all work as expected while this feature is set to true. | `false` |
54+
| `static_file_path` | Disables behavior where UUID string appends to the end of the S3 key name when `$UUID` isn't provided in `s3_key_format`. `$UUID`, time formatters, `$TAG`, and other dynamic key formatters all work as expected while this feature is set to true. | `false` |
5555
| `use_put_object` | Use the S3 `PutObject` API instead of the multipart upload API. When enabled, the key extension is only available when `$UUID` is specified in `s3_key_format`. If `$UUID` isn't included, a random string appends format string and the key extension can't be customized. | `false` |
5656
| `role_arn` | ARN of an IAM role to assume (for example, for cross account access.) | _none_ |
5757
| `endpoint` | Custom endpoint for the S3 API. Endpoints can contain scheme and port. | _none_ |
@@ -61,8 +61,8 @@ for this feature gap; you can [track our progress fixing it on GitHub](https://g
6161
| `compression` | Compression type for S3 objects. `gzip` is currently the only supported value by default. If Apache Arrow support was enabled at compile time, you can use `arrow`. For gzip compression, the Content-Encoding HTTP Header will be set to `gzip`. Gzip compression can be enabled when `use_put_object` is `on` or `off` (`PutObject` and Multipart). Arrow compression can only be enabled with `use_put_object On`. | _none_ |
6262
| `content_type` | A standard MIME type for the S3 object, set as the Content-Type HTTP header. | _none_ |
6363
| `send_content_md5` | Send the Content-MD5 header with `PutObject` and UploadPart requests, as is required when Object Lock is enabled. | `false` |
64-
| `auto_retry_requests` | Immediately retry failed requests to AWS services once. This option doesn't affect the normal Fluent Bit retry mechanism with backoff. Instead, it enables an immediate retry with no delay for networking errors, which may help improve throughput during transient network issues. | `true` |
65-
| `log_key` | By default, the whole log record will be sent to S3. When specifing a key name with this option, only the value of that key sends to S3. For example, when using Docker you can specify `log_key log` and only the log message sends to S3. | _none_ |
64+
| `auto_retry_requests` | Immediately retry failed requests to AWS services once. This option doesn't affect the normal Fluent Bit retry mechanism with backoff. Instead, it enables an immediate retry with no delay for networking errors, which can help improve throughput during transient network issues. | `true` |
65+
| `log_key` | By default, the whole log record will be sent to S3. When specifying a key name with this option, only the value of that key sends to S3. For example, when using Docker you can specify `log_key log` and only the log message sends to S3. | _none_ |
6666
| `preserve_data_ordering` | When an upload request fails, the last received chunk might swap with a later chunk, resulting in data shuffling. This feature prevents shuffling by using a queue logic for uploads. | `true` |
6767
| `storage_class` | Specify the [storage class](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html#AmazonS3-PutObject-request-header-StorageClass) for S3 objects. If this option isn't specified, objects store with the default `STANDARD` storage class. | _none_ |
6868
| `retry_limit` | Integer value to set the maximum number of retries allowed. Requires versions 1.9.10 and 2.0.1 or later. For previous version, the number of retries is `5` and isn't configurable. | `1` |
@@ -104,15 +104,15 @@ Fluent Bit sends chunks, in order, to each output that matches their tag. Most o
104104
then send the chunk immediately to their destination. A chunk is sent to the output's
105105
`flush` callback function, which must return one of `FLB_OK`, `FLB_RETRY`, or
106106
`FLB_ERROR`. Fluent Bit keeps count of the return values from each output's
107-
`flush` callback function. These counters are the data source for Fluent Bit's error, retry,
107+
`flush` callback function. These counters are the data source for Fluent Bit error, retry,
108108
and success metrics available in Prometheus format through its monitoring interface.
109109

110110
The S3 output plugin conforms to the Fluent Bit output plugin specification.
111111
Since S3's use case is to upload large files (over 2 MB), its behavior is different.
112112
S3's `flush` callback function buffers the incoming chunk to the filesystem, and
113113
returns an `FLB_OK`. This means Prometheus metrics available from the Fluent
114114
Bit HTTP server are meaningless for S3. In addition, the `storage.total_limit_size`
115-
parameter is not meaningful for S3 since it has its own buffering system in the
115+
parameter isn't meaningful for S3 since it has its own buffering system in the
116116
`store_dir`. Instead, use `store_dir_limit_size`. S3 requires a writeable filesystem.
117117
Running Fluent Bit on a read-only filesystem won't work with the S3 output.
118118

@@ -121,18 +121,18 @@ S3 uploads primarily initiate using the S3
121121
callback function, which runs separately from its `flush`.
122122

123123
S3 has its own buffering system and its own callback to upload data, so the normal
124-
sequential data ordering of chunks provided by the Fluent Bit engine may be
124+
sequential data ordering of chunks provided by the Fluent Bit engine can be
125125
compromised. S3 has the `presevere_data_ordering` option which ensures data is
126126
uploaded in the original order it was collected by Fluent Bit.
127127

128128
### Summary: Uniqueness in S3 Plugin
129129

130-
- The HTTP Monitoring interface output metrics are not meaningful for S3. AWS
131-
understands that this is non-ideal; we have
132-
[opened an issue with a design](https://github.com/fluent/fluent-bit/issues/6141)
130+
- The HTTP Monitoring interface output metrics aren't meaningful for S3. AWS
131+
understands that this is non-ideal. See the
132+
[open issue and design](https://github.com/fluent/fluent-bit/issues/6141)
133133
to allow S3 to manage its own output metrics.
134134
- You must use `store_dir_limit_size` to limit the space on disk used by S3 buffer files.
135-
- The original ordering of data inputted to Fluent Bit may not be preserved unless you enable
135+
- The original ordering of data inputted to Fluent Bit might not be preserved unless you enable
136136
`preserve_data_ordering On`.
137137

138138
## S3 Key Format and Tag Delimiters
@@ -142,10 +142,10 @@ inject the tag into the S3 key using the following syntax:
142142

143143
- `$TAG`: The full tag.
144144
- `$TAG[n]`: The nth part of the tag (index starting at zero). This syntax is copied
145-
from the rewrite tag filter. By default, “parts” of the tag are separated with
145+
from the rewrite tag filter. By default, tag parts are separated with
146146
dots, but you can change this with `s3_key_format_tag_delimiters`.
147147

148-
In the following example, assume the date is January 1st, 2020 00:00:00 and the tag
148+
In the following example, assume the date is `January 1st, 2020 00:00:00` and the tag
149149
associated with the logs in question is `my_app_name-logs.prod`.
150150

151151
```python
@@ -171,15 +171,15 @@ The key in S3 will be `/prod/my_app_name/2020/01/01/00/00/00/bgdHN1NM.gz`.
171171

172172
The Fluent Bit S3 output was designed to ensure that previous uploads will never be
173173
overwritten by a subsequent upload. The `s3_key_format` supports time formatters,
174-
`$UUID`, and `$INDEX`. `$INDEX` is special because it is saved in the `store_dir`. If
174+
`$UUID`, and `$INDEX`. `$INDEX` is special because it's saved in the `store_dir`. If
175175
you restart Fluent Bit with the same disk, it can continue incrementing the
176176
index from its last value in the previous run.
177177

178178
For files uploaded with the `PutObject` API, the S3 output requires that a unique
179179
random string be present in the S3 key. Many of the use cases for
180180
`PutObject` uploads involve a short time period between uploads, so a timestamp
181-
in the S3 key may not be unique enough between uploads. For example, if you only
182-
specify minute granularity timestamps in the S3 key, with a small upload size, it is
181+
in the S3 key might not be unique enough between uploads. For example, if you only
182+
specify minute granularity timestamps in the S3 key, with a small upload size, it's
183183
possible to have two uploads that have timestamps set in the same minute. This
184184
requirement can be disabled with `static_file_path On`.
185185

@@ -196,7 +196,7 @@ You should always specify `$UUID` somewhere in your S3 key format. Otherwise, if
196196
S3 key. This means that a file extension set at the end of an S3 key will have the
197197
random UUID appended to it. Disabled this with `static_file_path On`.
198198

199-
For example, we attempt to set a `.gz` extension without specifying `$UUID`:
199+
This example attempts to set a `.gz` extension without specifying `$UUID`:
200200

201201
```python
202202
[OUTPUT]
@@ -279,7 +279,7 @@ ability to restart Fluent Bit and give it access to the data stored in the
279279
`store_dir` from previous executions, some considerations apply. This might occur if
280280
you run Fluent Bit on [AWS Fargate](https://aws.amazon.com/fargate/).
281281

282-
In these situations, we recommend using the `PutObject` API and sending data
282+
In these situations, Fluent Bits recommend using the `PutObject` API and sending data
283283
frequently, to avoid local buffering as much as possible. This will limit data loss
284284
in the event Fluent Bit is killed unexpectedly.
285285

@@ -324,17 +324,17 @@ Uploads are triggered by these settings:
324324
- `upload_timeout`: Whenever locally buffered data has been present on the filesystem
325325
in the `store_dir` longer than the configured `upload_timeout`, it will be sent
326326
even when the desired byte size hasn't been reached.
327-
If you configure a small `upload_timeout`, your files may be smaller
327+
If you configure a small `upload_timeout`, your files can be smaller
328328
than the `total_file_size`. The timeout is evaluated against the time at which S3
329-
started buffering data for each unqiue tag (that is, the time when new data was
329+
started buffering data for each unique tag (that is, the time when new data was
330330
buffered for the unique tag after the last upload). The timeout is also evaluated
331331
against the
332332
[CreateMultipartUpload](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CreateMultipartUpload.html)
333333
time, so a multipart upload will be completed after `upload_timeout` has elapsed,
334-
even if the desired size has not yet been reached.
334+
even if the desired size hasn't yet been reached.
335335

336336
If your `upload_timeout` triggers an upload before the pending buffered data reaches
337-
the `upload_chunk_size`, it may be too small for a multipart upload. S3 will
337+
the `upload_chunk_size`, it might be too small for a multipart upload. S3 will
338338
fallback to use the [`PutObject` API](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html).
339339

340340
When you enable compression, S3 applies the compression algorithm at send time. The
@@ -403,6 +403,21 @@ Example:
403403

404404
The records store in the MinIO server.
405405

406+
## Usage with Google Cloud
407+
408+
You can send your S3 output to Google. You must generate HMAC keys on GCS and use
409+
those keys for `access-key` and `access-secret`.
410+
411+
Example:
412+
413+
```python
414+
[OUTPUT]
415+
Name s3
416+
Match *
417+
bucket your-bucket
418+
endpoint https://storage.googleapis.com
419+
```
420+
406421
## Get Started
407422

408423
To send records into Amazon S3, you can run the plugin from the command line or
@@ -455,7 +470,7 @@ Amazon distributes a container image with Fluent Bit and plugins.
455470

456471
### Amazon ECR Public Gallery
457472

458-
Our images are available in the Amazon ECR Public Gallery as
473+
Images are available in the Amazon ECR Public Gallery as
459474
[aws-for-fluent-bit](https://gallery.ecr.aws/aws-observability/aws-for-fluent-bit).
460475

461476
You can download images with different tags using the following command:
@@ -488,14 +503,14 @@ is also available from the Docker Hub.
488503

489504
### Amazon ECR
490505

491-
Use our SSM Public Parameters to find the Amazon ECR image URI in your region:
506+
Use Fluent Bit SSM Public Parameters to find the Amazon ECR image URI in your region:
492507

493508
```text
494509
aws ssm get-parameters-by-path --path /aws/service/aws-for-fluent-bit/
495510
```
496511

497512
For more information, see the
498-
[AWS for Fluent Bit GitHub repo](https://github.com/aws/aws-for-fluent-bit#public-images).
513+
[AWS for Fluent Bit GitHub repository](https://github.com/aws/aws-for-fluent-bit#public-images).
499514

500515
## Advanced usage
501516

0 commit comments

Comments
 (0)