Skip to content

Commit b15af70

Browse files
ChrsMarksvrnm
andauthored
Add container parser blog post (#4489)
Signed-off-by: ChrsMark <[email protected]> Co-authored-by: Severin Neumann <[email protected]>
1 parent a1537f3 commit b15af70

File tree

2 files changed

+231
-0
lines changed

2 files changed

+231
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
---
2+
title: Introducing the new container log parser for OpenTelemetry Collector
3+
linkTitle: Collector container log parser
4+
date: 2024-05-22
5+
author: '[Christos Markou](https://github.com/ChrsMark) (Elastic)'
6+
cSpell:ignore: Christos containerd Filelog filelog Jaglowski kube Markou
7+
---
8+
9+
[Filelog receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filelogreceiver)
10+
is one of the most commonly used components of the
11+
[OpenTelemetry Collector](/docs/collector), as indicated by the most recent
12+
[survey](/blog/2024/otel-collector-survey/#otel-components-usage). According to
13+
the same survey, it's unsurprising that
14+
[Kubernetes is the leading platform for Collector deployment (80.6%)](/blog/2024/otel-collector-survey/#deployment-scale-and-environment).
15+
Based on these two facts, we can realize the importance of seamless log
16+
collection on Kubernetes environments.
17+
18+
Currently, the
19+
[filelog receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.100.0/receiver/filelogreceiver/README.md)
20+
is capable of parsing container logs from Kubernetes Pods, but it requires
21+
[extensive configuration](https://github.com/open-telemetry/opentelemetry-helm-charts/blob/aaa70bde1bf8bf15fc411282468ac6d2d07f772d/charts/opentelemetry-collector/templates/_config.tpl#L206-L282)
22+
to properly parse logs according to various container runtime formats. The
23+
reason is that container logs can come in various known formats depending on the
24+
container runtime, so you need to perform a specific set of operations in order
25+
to properly parse them:
26+
27+
1. Detect the format of the incoming logs at runtime.
28+
2. Parse each format accordingly taking into account its format specific
29+
characteristics. For example, define if it's JSON or plain text and take into
30+
account the timestamp format.
31+
3. Extract known metadata relying on predefined patterns.
32+
33+
Such advanced sequence of operations can be handled by chaining the proper
34+
[stanza](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/stanza)
35+
operators together. The end result is rather complex. This configuration
36+
complexity can be mitigated by using the corresponding
37+
[helm chart preset](https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector#configuration-for-kubernetes-container-logs).
38+
However, despite having the preset, it can still be challenging for users to
39+
maintain and troubleshoot such advanced configurations.
40+
41+
The community has raised the issue of
42+
[improving the Kubernetes Logs Collection Experience](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/25251)
43+
in the past. One step towards achieving this would be to provide a simplified
44+
and robust option for parsing container logs without the need for manual
45+
specification or maintenance of the implementation details. With the proposal
46+
and implementation of the new
47+
[container parser](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959),
48+
all these implementation details are encapsulated and handled within the
49+
parser's implementation. Adding to this the ability to cover the implementation
50+
with unit tests and various fail-over logic indicates a significant improvement
51+
in container log parsing.
52+
53+
## How container logs look like
54+
55+
First of all let's quickly recall the different container log formats that can
56+
be met out there:
57+
58+
- Docker container logs:
59+
60+
`{"log":"INFO: This is a docker log line","stream":"stdout","time":"2024-03-30T08:31:20.545192187Z"}`
61+
62+
- cri-o logs:
63+
64+
`2024-04-13T07:59:37.505201169-05:00 stdout F This is a cri-o log line!`
65+
66+
- Containerd logs:
67+
68+
`2024-04-22T10:27:25.813799277Z stdout F This is an awesome containerd log line!`
69+
70+
We can notice that cri-o and containerd log formats are quite similar (both
71+
follow the CRI logging format) but with a small difference in the timestamp
72+
format.
73+
74+
To properly handle these 3 different formats you need 3 different routes of
75+
[stanza](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/stanza)
76+
operators as we can see in the
77+
[container parser operator issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959).
78+
79+
In addition, the CRI format can provide partial logs which you would like to
80+
combine them into one at first place:
81+
82+
```text
83+
2024-04-06T00:17:10.113242941Z stdout P This is a very very long line th
84+
2024-04-06T00:17:10.113242941Z stdout P at is really really long and spa
85+
2024-04-06T00:17:10.113242941Z stdout F ns across multiple log entries
86+
```
87+
88+
Ideally you would like our parser to be capable of automatically detecting the
89+
format at runtime and properly parse the log lines. We will see later that the
90+
container parser will do that for us.
91+
92+
## Attribute handling
93+
94+
Container log files follow a specific naming pattern from which you can extract
95+
useful metadata information during parsing. For example, from
96+
`/var/log/pods/kube-system_kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler/1.log`,
97+
you can extract the namespace, the name and UID of the pod, and the name of the
98+
container.
99+
100+
After extracting this metadata, you need to store it properly using the
101+
appropriate attributes following the
102+
[Semantic Conventions](/docs/specs/semconv/resource/k8s/). This handling can
103+
also be encapsulated within the parser's implementation, eliminating the need
104+
for users to define it manually.
105+
106+
## Using the new container parser
107+
108+
With all these in mind, the container parser can be configured like this:
109+
110+
```yaml
111+
receivers:
112+
filelog:
113+
include_file_path: true
114+
include:
115+
- /var/log/pods/*/*/*.log
116+
operators:
117+
- id: container-parser
118+
type: container
119+
```
120+
121+
That configuration is more than enough to properly parse the log line and
122+
extract all the useful Kubernetes metadata. It's quite obvious how much less
123+
configuration is required now. Using a combination of operators would result in
124+
about 69 lines of configuration as it was pointed out at the
125+
[original proposal](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959).
126+
127+
A log line
128+
`{"log":"INFO: This is a docker log line","stream":"stdout","time":"2024-03-30T08:31:20.545192187Z"}`
129+
that is written at
130+
`/var/log/pods/kube-system_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log`
131+
will produce a log entry like the following:
132+
133+
```json
134+
{
135+
"timestamp": "2024-03-30 08:31:20.545192187 +0000 UTC",
136+
"body": "INFO: This is a docker log line",
137+
"attributes": {
138+
"time": "2024-03-30T08:31:20.545192187Z",
139+
"log.iostream": "stdout",
140+
"k8s.pod.name": "kube-controller-kind-control-plane",
141+
"k8s.pod.uid": "49cc7c1fd3702c40b2686ea7486091d6",
142+
"k8s.container.name": "kube-controller",
143+
"k8s.container.restart_count": "1",
144+
"k8s.namespace.name": "kube-system",
145+
"log.file.path": "/var/log/pods/kube-system_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
146+
}
147+
}
148+
```
149+
150+
You can notice that you don't have to define the format. The parser
151+
automatically detects the format and parses the logs accordingly. Even partial
152+
logs that cri-o or containerd runtimes can produce will be recombined properly
153+
without the need of any special configuration.
154+
155+
This is really handy, because as a user you don't need to care about specifying
156+
the format and even maintaining different configurations for different
157+
environments.
158+
159+
## Implementation details
160+
161+
In order to implement that parser operator most of the code was written from
162+
scratch, but we were able to re-use the recombine operator internally for the
163+
partial logs parsing. To achieve this, some small refactoring was required but
164+
this gave us the opportunity to re-use an already existent and well tested
165+
component.
166+
167+
During the discussions around the implementation of this feature, a question
168+
popped up: _Why to implement this as an operator and not as a processor?_
169+
170+
One basic reason is that the order of the log records arriving at processors is
171+
not guaranteed. However we need to ensure this, so as to properly handle the
172+
partial log parsing. That's why implementing it as an operator for now was the
173+
way to go. Moreover, at the moment
174+
[it is suggested](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/32080#issuecomment-2035301178)
175+
to do as much work during the collection as possible and having robust parsing
176+
capabilities allows that.
177+
178+
More information about the implementation discussions can be found at the
179+
respective
180+
[GitHub issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959)
181+
and its related/linked PR.
182+
183+
Last but not least, we should mention that with the example of the specific
184+
container parser we can notice the room for improvement that exists and how we
185+
could optimize further for popular technologies with known log formats in the
186+
future.
187+
188+
## Conclusion: container logs parsing is now easier with filelog receiver
189+
190+
Eager to learn more about the container parser? Visit the official
191+
[documentation](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/container.md)
192+
and if you give it a try let us know what you think. Don't hesitate to reach out
193+
to us in the official CNCF [Slack workspace](https://slack.cncf.io/) and
194+
specifically the `#otel-collector` channel.
195+
196+
## Acknowledgements
197+
198+
Kudos to [Daniel Jaglowski](https://github.com/djaglowski) for reviewing the
199+
parser's implementation and providing valuable feedback!

static/refcache.json

+32
Original file line numberDiff line numberDiff line change
@@ -2155,6 +2155,10 @@
21552155
"StatusCode": 200,
21562156
"LastSeen": "2024-03-15T20:34:22.210208944Z"
21572157
},
2158+
"https://github.com/ChrsMark": {
2159+
"StatusCode": 200,
2160+
"LastSeen": "2024-05-15T19:23:42.377730577+03:00"
2161+
},
21582162
"https://github.com/Cyprinus12138": {
21592163
"StatusCode": 200,
21602164
"LastSeen": "2024-03-28T22:25:37.072281206+08:00"
@@ -2495,6 +2499,10 @@
24952499
"StatusCode": 200,
24962500
"LastSeen": "2024-01-18T20:05:04.809604-05:00"
24972501
},
2502+
"https://github.com/djaglowski": {
2503+
"StatusCode": 200,
2504+
"LastSeen": "2024-05-15T19:23:48.979905025+03:00"
2505+
},
24982506
"https://github.com/dmathieu": {
24992507
"StatusCode": 200,
25002508
"LastSeen": "2024-02-14T08:44:43.625674121Z"
@@ -3083,6 +3091,18 @@
30833091
"StatusCode": 200,
30843092
"LastSeen": "2024-03-19T10:16:57.223070258Z"
30853093
},
3094+
"https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/25251": {
3095+
"StatusCode": 200,
3096+
"LastSeen": "2024-05-15T19:23:46.151980372+03:00"
3097+
},
3098+
"https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959": {
3099+
"StatusCode": 200,
3100+
"LastSeen": "2024-05-15T19:23:47.587898763+03:00"
3101+
},
3102+
"https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/32080#issuecomment-2035301178": {
3103+
"StatusCode": 200,
3104+
"LastSeen": "2024-05-15T19:23:48.560170178+03:00"
3105+
},
30863106
"https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/14132": {
30873107
"StatusCode": 200,
30883108
"LastSeen": "2024-01-18T20:06:09.209998-05:00"
@@ -5935,6 +5955,14 @@
59355955
"StatusCode": 206,
59365956
"LastSeen": "2024-05-06T07:53:28.679391-07:00"
59375957
},
5958+
"https://opentelemetry.io/blog/2024/otel-collector-survey/#deployment-scale-and-environment": {
5959+
"StatusCode": 206,
5960+
"LastSeen": "2024-05-15T19:23:44.74352841+03:00"
5961+
},
5962+
"https://opentelemetry.io/blog/2024/otel-collector-survey/#otel-components-usage": {
5963+
"StatusCode": 206,
5964+
"LastSeen": "2024-05-15T19:23:44.426379755+03:00"
5965+
},
59385966
"https://opentelemetry.io/blog/2024/scaling-collectors/": {
59395967
"StatusCode": 206,
59405968
"LastSeen": "2024-05-06T07:53:28.903161-07:00"
@@ -5991,6 +6019,10 @@
59916019
"StatusCode": 206,
59926020
"LastSeen": "2024-02-24T14:33:05.630341-08:00"
59936021
},
6022+
"https://opentelemetry.io/docs/specs/semconv/resource/k8s/": {
6023+
"StatusCode": 206,
6024+
"LastSeen": "2024-05-15T19:23:47.920456821+03:00"
6025+
},
59946026
"https://opentelemetry.io/ecosystem/integrations/": {
59956027
"StatusCode": 206,
59966028
"LastSeen": "2024-03-19T10:16:49.992495889Z"

0 commit comments

Comments
 (0)