This README provides a guide to understanding and customizing the fluentd module. Refer to the official Fluentd documentation for more comprehensive information on configuration and plugin development.
The fluentd module in the cmcd-toolkit acts as a flexible log aggregation and forwarding layer. Fluentd is an open-source data collector that allows you to unify data collection and consumption for better use and understanding of data.
In the context of this project, its primary roles are:
- Receiving CMCD Data from Collector: It's configured to receive structured log data (specifically CMCD data) from the
collectormodule. Thecollectorforwards data to Fluentd's input plugin. - Data Buffering and Routing: Fluentd can buffer the incoming data and then route it to various destinations (outputs) based on tags or other conditions.
- Output to Backends: In the default configuration (
fluent.conf), this module is set up to forward the CMCD data to an InfluxDB instance. This allows for time-series analysis and visualization of the CMCD metrics. However, Fluentd's strength lies in its ability to be configured to send data to a wide array of backends.
This module is typically used in the local Docker Compose setup (docker-compose.local.yml) to decouple the collector from the final data storage and to enable more complex data pipelines.
The behavior of the Fluentd module is primarily controlled by its configuration file (fluent.conf) and by adding or configuring Fluentd plugins.
The fluentd/fluent.conf file is the heart of the Fluentd module. You can modify it to change how data is collected, processed, and dispatched. The file consists of several types of directives:
-
<source>Directives: Define how Fluentd collects data.- Example: The default configuration uses
@type forwardto listen for TCP/IP packets from the collector. You could add other sources to collect logs from files, other services, or system logs.
- Example: The default configuration uses
-
<filter>Directives (Optional): Define processing pipelines for events. Filters are applied to events before they are sent to<match>directives.- Example: You could add a filter to parse specific fields from the log, add new fields, or mask sensitive information.
<filter node.collector.**> @type record_transformer <record> new_field "processed_by_fluentd" </record> </filter> -
<match>Directives: Define how Fluentd routes and outputs data. Events are routed to<match>directives based on their tags.- Example: The default configuration uses
@type influxdbto send data tagged withnode.collector.**to an InfluxDB instance. You can change the output plugin, its parameters, or add more<match>blocks to send data to multiple destinations.
# Example: Send the same data to standard output (console) for debugging <match node.collector.**> @type copy <store> @type influxdb host "#{ENV['INFLUXDB_HOST'] || 'influxdb'}" # ... other influxdb params </store> <store> @type stdout </store> </match>
- Example: The default configuration uses
Environment Variables in fluent.conf:
As seen in the default fluent.conf, you can use #{ENV['YOUR_ENV_VAR'] || 'default_value'} syntax to make your Fluentd configuration more flexible and controllable via environment variables passed to the Fluentd container.
Fluentd has a rich ecosystem of plugins for various inputs, filters, and outputs. If you need to integrate with a service or process data in a way not supported by the default plugins, you can add new ones.
To add a new plugin:
- Identify the plugin: Find the required plugin on the Fluentd plugin directory or GitHub.
- Update the Dockerfile: Add the installation command for the new plugin to the
fluentd/Dockerfile. Plugins are typically installed usingfluent-gem install <plugin-name>.FROM fluent/fluentd:v1.16-1 # Add your plugins here USER root RUN fluent-gem install fluent-plugin-elasticsearch \ && fluent-gem install fluent-plugin-rewrite-tag-filter USER fluent # ... rest of the Dockerfile
- Rebuild the Docker image:
docker compose build fluentd(or the equivalent command for your Docker environment). - Configure the plugin: Use the newly installed plugin in your
fluent.confby specifying its@typeand other configuration parameters as per the plugin's documentation.
- Send Data to Different Analytics Platforms:
- Configure output plugins for Elasticsearch, Splunk, Datadog, Kafka, or cloud-specific services like AWS S3, Google Cloud Storage, or Azure Blob Storage.
- Implement Custom Parsing Rules:
- If the
collectormodule starts sending logs in a different format (not JSON), you can add or modify<parse>directives within your<source>or<filter>blocks to correctly process them.
- If the
- Set Up Alerts Based on Specific Log Patterns:
- Use plugins like
fluent-plugin-grepcounterorfluent-plugin-Notifier(with appropriate output plugins likefluent-plugin-slackorfluent-plugin-pagerduty) to count occurrences of specific errors or patterns and send alerts.
- Use plugins like
- Data Enrichment:
- Use the
record_transformerfilter to add metadata to logs, such as geo-IP information based on client IP addresses.
- Use the
- Advanced Log Filtering and Routing:
- Use plugins like
fluent-plugin-rewrite-tag-filterto re-tag events based on their content, allowing for more granular routing to different output destinations.
- Use plugins like
- Multi-Format Output:
- Configure Fluentd to output data in multiple formats (e.g., JSON for one system, plain text for another) using the
copyoutput plugin.
- Configure Fluentd to output data in multiple formats (e.g., JSON for one system, plain text for another) using the
The fluentd module is typically built and run as part of the Docker Compose setup defined in the main project directory (e.g., docker-compose.local.yml). You can rebuild the Fluentd image specifically if you make changes to its Dockerfile or add plugins:
docker compose build fluentd
docker compose up fluentd
# Or, to restart all services defined in your compose file:
# docker compose up -d --force-recreate --buildEnsure that any services Fluentd depends on (like InfluxDB in the default configuration) are also running.