|
| 1 | +--- |
| 2 | +title: Introducing the new container log parser for OpenTelemetry Collector |
| 3 | +linkTitle: Collector container log parser |
| 4 | +date: 2024-05-22 |
| 5 | +author: '[Christos Markou](https://github.com/ChrsMark) (Elastic)' |
| 6 | +cSpell:ignore: Christos containerd Filelog filelog Jaglowski kube Markou |
| 7 | +--- |
| 8 | + |
| 9 | +[Filelog receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filelogreceiver) |
| 10 | +is one of the most commonly used components of the |
| 11 | +[OpenTelemetry Collector](/docs/collector), as indicated by the most recent |
| 12 | +[survey](/blog/2024/otel-collector-survey/#otel-components-usage). According to |
| 13 | +the same survey, it's unsurprising that |
| 14 | +[Kubernetes is the leading platform for Collector deployment (80.6%)](/blog/2024/otel-collector-survey/#deployment-scale-and-environment). |
| 15 | +Based on these two facts, we can realize the importance of seamless log |
| 16 | +collection on Kubernetes environments. |
| 17 | + |
| 18 | +Currently, the |
| 19 | +[filelog receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.100.0/receiver/filelogreceiver/README.md) |
| 20 | +is capable of parsing container logs from Kubernetes Pods, but it requires |
| 21 | +[extensive configuration](https://github.com/open-telemetry/opentelemetry-helm-charts/blob/aaa70bde1bf8bf15fc411282468ac6d2d07f772d/charts/opentelemetry-collector/templates/_config.tpl#L206-L282) |
| 22 | +to properly parse logs according to various container runtime formats. The |
| 23 | +reason is that container logs can come in various known formats depending on the |
| 24 | +container runtime, so you need to perform a specific set of operations in order |
| 25 | +to properly parse them: |
| 26 | + |
| 27 | +1. Detect the format of the incoming logs at runtime. |
| 28 | +2. Parse each format accordingly taking into account its format specific |
| 29 | + characteristics. For example, define if it's JSON or plain text and take into |
| 30 | + account the timestamp format. |
| 31 | +3. Extract known metadata relying on predefined patterns. |
| 32 | + |
| 33 | +Such advanced sequence of operations can be handled by chaining the proper |
| 34 | +[stanza](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/stanza) |
| 35 | +operators together. The end result is rather complex. This configuration |
| 36 | +complexity can be mitigated by using the corresponding |
| 37 | +[helm chart preset](https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector#configuration-for-kubernetes-container-logs). |
| 38 | +However, despite having the preset, it can still be challenging for users to |
| 39 | +maintain and troubleshoot such advanced configurations. |
| 40 | + |
| 41 | +The community has raised the issue of |
| 42 | +[improving the Kubernetes Logs Collection Experience](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/25251) |
| 43 | +in the past. One step towards achieving this would be to provide a simplified |
| 44 | +and robust option for parsing container logs without the need for manual |
| 45 | +specification or maintenance of the implementation details. With the proposal |
| 46 | +and implementation of the new |
| 47 | +[container parser](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959), |
| 48 | +all these implementation details are encapsulated and handled within the |
| 49 | +parser's implementation. Adding to this the ability to cover the implementation |
| 50 | +with unit tests and various fail-over logic indicates a significant improvement |
| 51 | +in container log parsing. |
| 52 | + |
| 53 | +## How container logs look like |
| 54 | + |
| 55 | +First of all let's quickly recall the different container log formats that can |
| 56 | +be met out there: |
| 57 | + |
| 58 | +- Docker container logs: |
| 59 | + |
| 60 | + `{"log":"INFO: This is a docker log line","stream":"stdout","time":"2024-03-30T08:31:20.545192187Z"}` |
| 61 | + |
| 62 | +- cri-o logs: |
| 63 | + |
| 64 | + `2024-04-13T07:59:37.505201169-05:00 stdout F This is a cri-o log line!` |
| 65 | + |
| 66 | +- Containerd logs: |
| 67 | + |
| 68 | + `2024-04-22T10:27:25.813799277Z stdout F This is an awesome containerd log line!` |
| 69 | + |
| 70 | +We can notice that cri-o and containerd log formats are quite similar (both |
| 71 | +follow the CRI logging format) but with a small difference in the timestamp |
| 72 | +format. |
| 73 | + |
| 74 | +To properly handle these 3 different formats you need 3 different routes of |
| 75 | +[stanza](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/stanza) |
| 76 | +operators as we can see in the |
| 77 | +[container parser operator issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959). |
| 78 | + |
| 79 | +In addition, the CRI format can provide partial logs which you would like to |
| 80 | +combine them into one at first place: |
| 81 | + |
| 82 | +```text |
| 83 | +2024-04-06T00:17:10.113242941Z stdout P This is a very very long line th |
| 84 | +2024-04-06T00:17:10.113242941Z stdout P at is really really long and spa |
| 85 | +2024-04-06T00:17:10.113242941Z stdout F ns across multiple log entries |
| 86 | +``` |
| 87 | + |
| 88 | +Ideally you would like our parser to be capable of automatically detecting the |
| 89 | +format at runtime and properly parse the log lines. We will see later that the |
| 90 | +container parser will do that for us. |
| 91 | + |
| 92 | +## Attribute handling |
| 93 | + |
| 94 | +Container log files follow a specific naming pattern from which you can extract |
| 95 | +useful metadata information during parsing. For example, from |
| 96 | +`/var/log/pods/kube-system_kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler/1.log`, |
| 97 | +you can extract the namespace, the name and UID of the pod, and the name of the |
| 98 | +container. |
| 99 | + |
| 100 | +After extracting this metadata, you need to store it properly using the |
| 101 | +appropriate attributes following the |
| 102 | +[Semantic Conventions](/docs/specs/semconv/resource/k8s/). This handling can |
| 103 | +also be encapsulated within the parser's implementation, eliminating the need |
| 104 | +for users to define it manually. |
| 105 | + |
| 106 | +## Using the new container parser |
| 107 | + |
| 108 | +With all these in mind, the container parser can be configured like this: |
| 109 | + |
| 110 | +```yaml |
| 111 | +receivers: |
| 112 | + filelog: |
| 113 | + include_file_path: true |
| 114 | + include: |
| 115 | + - /var/log/pods/*/*/*.log |
| 116 | + operators: |
| 117 | + - id: container-parser |
| 118 | + type: container |
| 119 | +``` |
| 120 | +
|
| 121 | +That configuration is more than enough to properly parse the log line and |
| 122 | +extract all the useful Kubernetes metadata. It's quite obvious how much less |
| 123 | +configuration is required now. Using a combination of operators would result in |
| 124 | +about 69 lines of configuration as it was pointed out at the |
| 125 | +[original proposal](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959). |
| 126 | +
|
| 127 | +A log line |
| 128 | +`{"log":"INFO: This is a docker log line","stream":"stdout","time":"2024-03-30T08:31:20.545192187Z"}` |
| 129 | +that is written at |
| 130 | +`/var/log/pods/kube-system_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log` |
| 131 | +will produce a log entry like the following: |
| 132 | + |
| 133 | +```json |
| 134 | +{ |
| 135 | + "timestamp": "2024-03-30 08:31:20.545192187 +0000 UTC", |
| 136 | + "body": "INFO: This is a docker log line", |
| 137 | + "attributes": { |
| 138 | + "time": "2024-03-30T08:31:20.545192187Z", |
| 139 | + "log.iostream": "stdout", |
| 140 | + "k8s.pod.name": "kube-controller-kind-control-plane", |
| 141 | + "k8s.pod.uid": "49cc7c1fd3702c40b2686ea7486091d6", |
| 142 | + "k8s.container.name": "kube-controller", |
| 143 | + "k8s.container.restart_count": "1", |
| 144 | + "k8s.namespace.name": "kube-system", |
| 145 | + "log.file.path": "/var/log/pods/kube-system_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log" |
| 146 | + } |
| 147 | +} |
| 148 | +``` |
| 149 | + |
| 150 | +You can notice that you don't have to define the format. The parser |
| 151 | +automatically detects the format and parses the logs accordingly. Even partial |
| 152 | +logs that cri-o or containerd runtimes can produce will be recombined properly |
| 153 | +without the need of any special configuration. |
| 154 | + |
| 155 | +This is really handy, because as a user you don't need to care about specifying |
| 156 | +the format and even maintaining different configurations for different |
| 157 | +environments. |
| 158 | + |
| 159 | +## Implementation details |
| 160 | + |
| 161 | +In order to implement that parser operator most of the code was written from |
| 162 | +scratch, but we were able to re-use the recombine operator internally for the |
| 163 | +partial logs parsing. To achieve this, some small refactoring was required but |
| 164 | +this gave us the opportunity to re-use an already existent and well tested |
| 165 | +component. |
| 166 | + |
| 167 | +During the discussions around the implementation of this feature, a question |
| 168 | +popped up: _Why to implement this as an operator and not as a processor?_ |
| 169 | + |
| 170 | +One basic reason is that the order of the log records arriving at processors is |
| 171 | +not guaranteed. However we need to ensure this, so as to properly handle the |
| 172 | +partial log parsing. That's why implementing it as an operator for now was the |
| 173 | +way to go. Moreover, at the moment |
| 174 | +[it is suggested](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/32080#issuecomment-2035301178) |
| 175 | +to do as much work during the collection as possible and having robust parsing |
| 176 | +capabilities allows that. |
| 177 | + |
| 178 | +More information about the implementation discussions can be found at the |
| 179 | +respective |
| 180 | +[GitHub issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959) |
| 181 | +and its related/linked PR. |
| 182 | + |
| 183 | +Last but not least, we should mention that with the example of the specific |
| 184 | +container parser we can notice the room for improvement that exists and how we |
| 185 | +could optimize further for popular technologies with known log formats in the |
| 186 | +future. |
| 187 | + |
| 188 | +## Conclusion: container logs parsing is now easier with filelog receiver |
| 189 | + |
| 190 | +Eager to learn more about the container parser? Visit the official |
| 191 | +[documentation](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/container.md) |
| 192 | +and if you give it a try let us know what you think. Don't hesitate to reach out |
| 193 | +to us in the official CNCF [Slack workspace](https://slack.cncf.io/) and |
| 194 | +specifically the `#otel-collector` channel. |
| 195 | + |
| 196 | +## Acknowledgements |
| 197 | + |
| 198 | +Kudos to [Daniel Jaglowski](https://github.com/djaglowski) for reviewing the |
| 199 | +parser's implementation and providing valuable feedback! |
0 commit comments