Skip to content
This repository has been archived by the owner on Jan 31, 2025. It is now read-only.

Amazon S3 output plugin for Fluentd data collector

Notifications You must be signed in to change notification settings

change/fluent-plugin-s3

 
 

Repository files navigation

> This fork is no longer maintained.

Amazon S3 output plugin for Fluentd

Overview

s3 output plugin buffers event logs in local file and upload it to S3 periodically.

This plugin splits files exactly by using the time of event logs (not the time when the logs are received). For example, a log ‘2011-01-02 message B’ is reached, and then another log ‘2011-01-03 message B’ is reached in this order, the former one is stored in “20110102.gz” file, and latter one in “20110103.gz” file.

Installation

Simply use RubyGems:

gem install fluent-plugin-s3

Configuration

<match pattern>
  type s3

  aws_key_id YOUR_AWS_KEY_ID
  aws_sec_key YOUR_AWS_SECRET/KEY
  s3_bucket YOUR_S3_BUCKET_NAME
  s3_endpoint s3-ap-northeast-1.amazonaws.com
  s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension}
  path logs/
  buffer_path /var/log/fluent/s3

  time_slice_format %Y%m%d-%H
  time_slice_wait 10m
  utc
  after_flush path/to/script1, path/to/script2
  after_flush_config path/to/script1/config, path/to/script2/config
</match>
aws_key_id

AWS access key id. This parameter is required when your agent is not running on EC2 instance with an IAM Role.

aws_sec_key

AWS secret key. This parameter is required when your agent is not running on EC2 instance with an IAM Role.

s3_bucket (required)

S3 bucket name.

s3_endpoint

s3 endpoint name. For example, US West (Oregon) Region is “s3-us-west-2.amazonaws.com”. The full list of endpoints are available here. > docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

s3_object_key_format

The format of S3 object keys. You can use several built-in variables:

  • %{path}

  • %{time_slice}

  • %{index}

  • %{file_extension}

to decide keys dynamically.

%{path} is exactly the value of path configured in the configuration file. E.g., “logs/” in the example configuration above. %{time_slice} is the time-slice in text that are formatted with time_slice_format. %{index} is the sequential number starts from 0, increments when multiple files are uploaded to S3 in the same time slice. %{file_extention} is always “gz” for now.

The default format is “%{path}%{time_slice}_%{index}.%{file_extension}”.

For instance, using the example configuration above, actual object keys on S3 will be something like:

"logs/20130111-22_0.gz"
"logs/20130111-23_0.gz"
"logs/20130111-23_1.gz"
"logs/20130112-00_0.gz"

With the configuration:

s3_object_key_format %{path}/events/ts=%{time_slice}/events_%{index}.%{file_extension}
path log
time_slice_format %Y%m%d-%H

You get:

"log/events/ts=20130111-22/events_0.gz"
"log/events/ts=20130111-23/events_0.gz"
"log/events/ts=20130111-23/events_1.gz"
"log/events/ts=20130112-00/events_0.gz"

The fluent-mixin-config-placeholders mixin is also incorporated, so additional variables such as %{hostname}, %{uuid}, etc. can be used in the s3_object_key_format. This could prove useful in preventing filename conflicts when writing from multiple servers.

s3_object_key_format %{path}/events/ts=%{time_slice}/events_%{index}-%{hostname}.%{file_extension}
store_as

archive format on S3. You can use serveral format:

  • gzip (default)

  • json

  • text

  • lzo (Need lzop command)

format_json

Change one line format to only JSON in the S3 object. Default is false.

The default content is:

time\ttag\t{..json1..}
time\ttag\t{..json2..}
...

When “format_json” is true, the content is:

{..json1..}
{..json2..}
...

At this format, “time” and “tag” are omitted. But you can set these information to the record by setting “include_tag_key” / “tag_key” and “include_time_key” / “time_key” option. If you set following configuration in S3 output:

format_json true
include_time_key true
time_key log_time # default is time

then the record has log_time field.

{"log_time":"time string",...}
auto_create_bucket

Create S3 bucket if it does not exists. Default is true.

check_apikey_on_start

Check AWS key on start. Default is true.

proxy_uri

uri of proxy environment.

path

path prefix of the files on S3. Default is “” (no prefix).

buffer_path (required)

path prefix of the files to buffer logs.

time_slice_format

Format of the time used as the file name. Default is ‘%Y%m%d’. Use ‘%Y%m%d%H’ to split files hourly.

time_slice_wait

The time to wait old logs. Default is 10 minutes. Specify larger value if old logs may reache.

utc

Use UTC instead of local time.

after_flush

A string of scripts, comma separated, that will be run after each log is written to s3. Scripts will be passed the s3 path to the log file in the form “s3://my-s3bucket/path/to/log” as the first argument. Failures will be logged, but not raise an error.

after_flush_config

A string of config paths, that map to after_flush scripts.

IAM Policy

The following is an example for a minimal IAM policy needed to write to an s3 bucket (matches my-s3bucket/logs, my-s3bucket-test, etc.).

{ "Statement": [
 { "Effect":"Allow",
   "Action":"s3:*",
   "Resource":"arn:aws:s3:::my-s3bucket*"
  } ]
}

Note that the bucket must already exist and auto_create_bucket has no effect in this case.

Refer to the AWS documentation for example policies.

Using IAM roles with a properly configured IAM policy are preferred over embedding access keys on EC2 instances.

Website, license, et. al.

Web site

fluentd.org/

Documents

docs.fluentd.org/

Source repository

github.com/fluent

Discussion

groups.google.com/group/fluentd

Author

Sadayuki Furuhashi

Copyright

© 2011 FURUHASHI Sadayuki

License

Apache License, Version 2.0

About

Amazon S3 output plugin for Fluentd data collector

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Ruby 100.0%