Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Upgrade OTel Collector to version 0.120.0 (contrib 0.120.1) #1873

Merged
merged 28 commits into from
Mar 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
a548128
bump OTel image version to 0.120.0
hisarbalik Feb 25, 2025
432be5a
update unit tests
hisarbalik Feb 25, 2025
583af2c
fix failing nit tests
hisarbalik Feb 25, 2025
e7f0016
update OTTL statements with flat syntax
hisarbalik Feb 25, 2025
dfddc42
add additional resource attribute for the input prometheus to avoid i…
hisarbalik Feb 26, 2025
75345e4
Merge branch 'main' into bump-otel-version-to-0.120.0
hisarbalik Feb 26, 2025
99e9966
fix broken metric agent transform processor configuration
hisarbalik Feb 27, 2025
24216be
change the pipeline service configuration, move drop kyma proccessor …
hisarbalik Feb 27, 2025
099157f
fix prometheus inout metrics are not filtered out for other inputs
hisarbalik Feb 27, 2025
df35b26
Merge branch 'main' into bump-otel-version-to-0.120.0
hisarbalik Feb 27, 2025
328d1a8
remove context attribute from transform processor type
hisarbalik Feb 27, 2025
460bb16
fix linter issiues
hisarbalik Feb 27, 2025
af9f3dc
add context attribute back
hisarbalik Feb 27, 2025
b98c66f
add new annotation to the telemetry resource to enable OTel internal …
hisarbalik Feb 27, 2025
5c42f1e
Merge branch 'main' into bump-otel-version-to-0.120.0
hisarbalik Feb 28, 2025
ffdf974
add documentation about the new annotation telemetry.kyma-project.io/…
hisarbalik Feb 28, 2025
6b94e92
Merge branch 'main' into bump-otel-version-to-0.120.0
hisarbalik Feb 28, 2025
ca80459
Update docs/user/resources/01-telemetry.md
hisarbalik Feb 28, 2025
12f8480
Merge branch 'main' into bump-otel-version-to-0.120.0
hisarbalik Feb 28, 2025
230ce07
add load test results for OTel version 0.120.0
hisarbalik Feb 28, 2025
41759a3
Merge branch 'bump-otel-version-to-0.120.0' of github.com:hisarbalik/…
hisarbalik Feb 28, 2025
5dcb156
update backward compatibility annotation docs
hisarbalik Feb 28, 2025
b3dc6d4
Update docs/user/resources/01-telemetry.md
hisarbalik Feb 28, 2025
0bd2594
refactor rule data type extraction for self monitor alert rules
hisarbalik Feb 28, 2025
e4ad62e
add e2e test for metric pipeline healty path self monitor in compatib…
hisarbalik Mar 3, 2025
11cfc78
Apply suggestions from code review
hisarbalik Mar 3, 2025
c2e6007
Update docs/user/resources/01-telemetry.md
hisarbalik Mar 3, 2025
980073b
update docs
hisarbalik Mar 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,6 @@ ENV_GORELEASER_VERSION=v1.23.0
## Default Docker Images
DEFAULT_FLUENTBIT_EXPORTER_IMAGE="europe-docker.pkg.dev/kyma-project/prod/directory-size-exporter:v20250217-6a3cdc4a"
DEFAULT_FLUENTBIT_IMAGE="europe-docker.pkg.dev/kyma-project/prod/external/fluent/fluent-bit:3.2.7"
DEFAULT_OTEL_COLLECTOR_IMAGE="europe-docker.pkg.dev/kyma-project/prod/kyma-otel-collector:0.118.0-main"
DEFAULT_OTEL_COLLECTOR_IMAGE="europe-docker.pkg.dev/kyma-project/prod/kyma-otel-collector:0.120.0-main"
DEFAULT_SELFMONITOR_IMAGE="europe-docker.pkg.dev/kyma-project/prod/tpi/telemetry-self-monitor:3.2.0-825b449"
DEFAULT_TEST_TELEMETRYGEN_IMAGE="ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:v0.118.0"
DEFAULT_TEST_TELEMETRYGEN_IMAGE="ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:v0.120.0"
1 change: 1 addition & 0 deletions .github/workflows/pr-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ jobs:
- telemetry
- max-pipeline
- telemetry-log-analysis
- self-mon-metrics-healthy-compatibility-mode
runs-on: ubuntu-latest
steps:
- name: Checkout repo
Expand Down
12 changes: 8 additions & 4 deletions docs/contributor/benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,8 @@ A typical test result output looks like the following example:
| 0.114.0 | 19610 | 19453 | 0 | 127, 125 | 1, 1 | 11256 | 33308 | 0 | 175, 248 | 1.4, 1.4 | 10608 | 321 | 511 | 1737, 1735 | 0.5, 0.5 | 18442 | 956 | 510 | 1798, 1737 | 0.9, 0.9 |
| 0.115.0 | 18865 | 18718 | 0 | 191, 253 | 1, 1 | 11615 | 34386 | 0 | 275, 167 | 1.4, 1.5 | 11141 | 277 | 511 | 1747, 1731 | 0.5, 0.5 | 18258 | 880 | 510 | 1741, 1760 | 0.9, 0.9 |
| 0.116.0 | 19693 | 19540 | 0 | 165, 126 | 1.1, 1 | 11388 | 33717 | 0 | 196, 137 | 1.5, 1.4 | 11215 | 324 | 510 | 1658, 1738 | 0.5, 0.5 | 17974 | 886 | 509 | 1671, 1683 | 0.9, 0.9 |
| 0.118.0 | 19299 | 19148 | 0 | 88,97, | 1.1,1, | 11369 | 33659 | 0 | 137,159, | 1.4,1.5, | 10066 | 296 | 512 | 1551,1652, | 0.4,0.4, | 18852 | 945 | 510 | 1701,1688, | 0.9,0.9, |
| 0.118.0 | 19299 | 19148 | 0 | 88, 97 | 1.1, 1 | 11369 | 33659 | 0 | 137, 159 | 1.4, 1.5 | 10066 | 296 | 512 | 1551, 1652 | 0.4, 0.4 | 18852 | 945 | 510 | 1701, 1688 | 0.9, 0.9 |
| 0.120.0 | 18733 | 18586 | 4 | 99, 91 | 1.1, 1 | 10527 | 31168 | 6 | 144, 144 | 1.3, 1.5 | 11491 | 286 | 512 | 1536, 1533 | 0.4, 0.4 | 19400 | 873 | 509 | 1523, 1520 | 0.9, 0.9 |

</div>

Expand Down Expand Up @@ -245,8 +246,9 @@ are printed out.
| 0.110.0 | 4223 | 4222 | 0 | 130, 137 | 1.5, 1.5 | 3139 | 9417 | 1 | 197, 215 | 1.7, 1.7 | 830 | 640 | 287 | 841, 835 | 0.5, 0.5 | 2048 | 1907 | 510 | 1741, 1694 | 1.4, 1.4 |
| 0.114.0 | 4384 | 4385 | 0 | 131, 141 | 1.5, 1.5 | 3209 | 9624 | 0 | 189, 198 | 1.7, 1.8 | 757 | 635 | 393 | 807, 824 | 0.5, 0.4 | 2512 | 1691 | 510 | 1788, 1789 | 1.6, 1.6 |
| 0.115.0 | 4256 | 4255 | 0 | 144, 175 | 1.5, 1.5 | 3346 | 10040 | 0 | 244, 202 | 1.7, 1.8 | 726 | 627 | 361 | 821, 834 | 0.5, 0.5 | 2510 | 1926 | 505 | 1778, 1730 | 1.7, 1.6 |
| 0.116.0 | 4374 | 4374 | 0 | 100, 109 | 1.5, 1.5 | 3500 | 10500 | 0 | 171, 171 | 1.8, 2 | 710 | 641 | 383 | 857, 870 | 0.5, 0.5 | 3183 | 1780 | 509 | 1760, 1848 | 2, 2.1 |
| 0.118.0 | 4357 | 4357 | 0 | 120,115, | 1.5,1.5, | 3520 | 10566 | 0 | 151,179, | 2,1.8, | 813 | 522 | 443 | 880,1752, | 0.6,0.6, | 3264 | 1925 | 510 | 1837,1855, | 2,2.1, |
| 0.116.0 | 4374 | 4374 | 0 | 100, 109 | 1.5, 1.5 | 3500 | 10500 | 0 | 171, 171 | 1.8, 2 | 710 | 641 | 383 | 857, 870 | 0.5, 0.5 | 3183 | 1780 | 509 | 1760, 1848 | 2, 2.1 |
| 0.118.0 | 4357 | 4357 | 0 | 120, 115 | 1.5, 1.5 | 3520 | 10566 | 0 | 151, 179 | 2, 1.8 | 813 | 522 | 443 | 880, 1752 | 0.6, 0.6 | 3264 | 1925 | 510 | 1837, 1855 | 2, 2.1 |
| 0.120.0 | 4175 | 4177 | 1 | 137, 110 | 1.5, 1.5 | 3424 | 10275 | 5 | 171, 175 | 2, 1.9 | 698 | 696 | 314 | 824, 831 | 0.5, 0.5 | 2962 | 1729 | 509 | 1639, 1787 | 2, 2 |

</div>

Expand Down Expand Up @@ -296,7 +298,9 @@ On average, memory usage for MetricPipeline instances is ~150MB for a single Pod
| 0.114.0 | 19904 | 19904 | 0 | 683, 707 | 0.2, 0.2 | 19942 | 19958 | 0 | 701, 743 | 0.2, 0.2 |
| 0.115.0 | 20073 | 20073 | 0 | 697, 697 | 0.2, 0.2 | 19924 | 19954 | 0 | 700, 773 | 0.2, 0.3 |
| 0.116.0 | 20058 | 20057 | 0 | 690, 682 | 0.3, 0.3 | 19998 | 19999 | 0 | 713, 692 | 0.2, 0.3 |
| 0.118.0 | 19859 | 19859 | 0 | 694,672, | 0.2,0.2, | 20057 | 20057 | 0 | 661,664, | 0.2,0.2, |
| 0.118.0 | 19859 | 19859 | 0 | 694, 672 | 0.2, 0.2 | 20057 | 20057 | 0 | 661, 664 | 0.2, 0.2 |
| 0.120.0 | 20018 | 20017 | 0 | 733, 720 | 0.2, 0.2 | 19803 | 19803 | 0 | 698, 661 | 0.3, 0.2 |

</div>

The expected throughput for the MetricPipeline agent receiver is ~20K metrics/sec. Expected memory usage is on average ~700MB, and CPU usage is ~0.2 for each instance.
Expand Down
23 changes: 23 additions & 0 deletions docs/user/resources/01-telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,29 @@ For further examples, see the [samples](https://github.com/kyma-project/telemetr

For details, see the [Telemetry specification file](https://github.com/kyma-project/telemetry-manager/blob/main/apis/operator/v1alpha1/telemetry_types.go).

### Annotations

Backward compatibility for internal metrics in OpenTelemetry:
OpenTelemetry Collector 0.119.0 introduces breaking changes for internal metrics exposed through the Prometheus endpoint and breaks the stability of internal metrics. This affects Kyma Telemetry in the following ways:
- The metric name changes. For example, the counter metrics append a `_total` suffix.
- The metric unit is appended to the metric name. For example, a counter metric `request_duration` with unit `milliseconds` is exposed as `request_duration_milliseconds_total`.
- The internal metric exporter creates an `otel_scope_info` metric containing the metrics instrumentation scope, and also add labels about instrumentation scope to all metric points.

To maintain backward compatibility, the Telemetry Manager introduces the annotation `telemetry.kyma-project.io/internal-metrics-compatibility-mode` to control the internal metrics suffix.
To enable the backward compatibility mode, set the annotation `telemetry.kyma-project.io/internal-metrics-compatibility-mode: true` in the Telemetry CR.
```yaml
apiVersion: operator.kyma-project.io/v1alpha1
kind: Telemetry
metadata:
name: default
namespace: kyma-system
annotations:
telemetry.kyma-project.io/internal-metrics-compatibility-mode: "true"
```

> [! WARNING]
> By default, the backward compatibility mode is disabled.

<!-- The table below was generated automatically -->
<!-- Some special tags (html comments) are at the end of lines due to markdown requirements. -->
<!-- The content between "TABLE-START" and "TABLE-END" will be replaced -->
Expand Down
16 changes: 8 additions & 8 deletions hack/load-tests/run-load-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -297,8 +297,8 @@ EOF

function get_result_and_cleanup_trace() {
RESULT_TYPE="span"
QUERY_RECEIVED='query=round(sum(rate(otelcol_receiver_accepted_spans{service="telemetry-trace-gateway-metrics"}[20m])))'
QUERY_EXPORTED='query=round(sum(rate(otelcol_exporter_sent_spans{exporter=~"otlp/load-test.*"}[20m])))'
QUERY_RECEIVED='query=round(sum(rate(otelcol_receiver_accepted_spans_total{service="telemetry-trace-gateway-metrics"}[20m])))'
QUERY_EXPORTED='query=round(sum(rate(otelcol_exporter_sent_spans_total{exporter=~"otlp/load-test.*"}[20m])))'
QUERY_QUEUE='query=avg(sum(otelcol_exporter_queue_size{service="telemetry-trace-gateway-metrics"}))'
QUERY_MEMORY='query=round(sum(avg_over_time(container_memory_working_set_bytes{namespace="kyma-system", container="collector"}[20m]) * on(namespace,pod) group_left(workload) avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{namespace="kyma-system", workload="telemetry-trace-gateway"}[20m])) by (pod) / 1024 / 1024)'
QUERY_CPU='query=round(sum(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace="kyma-system"}[20m]) * on(namespace,pod) group_left(workload) avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{namespace="kyma-system", workload="telemetry-trace-gateway"}[20m])) by (pod), 0.1)'
Expand Down Expand Up @@ -327,8 +327,8 @@ function get_result_and_cleanup_trace() {

function get_result_and_cleanup_metric() {
RESULT_TYPE="metric"
QUERY_RECEIVED='query=round(sum(rate(otelcol_receiver_accepted_metric_points{service="telemetry-metric-gateway-metrics"}[20m])))'
QUERY_EXPORTED='query=round(sum(rate(otelcol_exporter_sent_metric_points{exporter=~"otlp/load-test.*"}[20m])))'
QUERY_RECEIVED='query=round(sum(rate(otelcol_receiver_accepted_metric_points_total{service="telemetry-metric-gateway-metrics"}[20m])))'
QUERY_EXPORTED='query=round(sum(rate(otelcol_exporter_sent_metric_points_total{exporter=~"otlp/load-test.*"}[20m])))'
QUERY_QUEUE='query=avg(sum(otelcol_exporter_queue_size{service="telemetry-metric-gateway-metrics"}))'
QUERY_MEMORY='query=round(sum(avg_over_time(container_memory_working_set_bytes{namespace="kyma-system", container="collector"}[20m]) * on(namespace,pod) group_left(workload) avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{namespace="kyma-system", workload="telemetry-metric-gateway"}[20m])) by (pod) / 1024 / 1024)'
QUERY_CPU='query=round(sum(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace="kyma-system"}[20m]) * on(namespace,pod) group_left(workload) avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{namespace="kyma-system", workload="telemetry-metric-gateway"}[20m])) by (pod), 0.1)'
Expand Down Expand Up @@ -358,8 +358,8 @@ function get_result_and_cleanup_metric() {

function get_result_and_cleanup_metricagent() {
RESULT_TYPE="metric"
QUERY_RECEIVED='query=round(sum(rate(otelcol_receiver_accepted_metric_points{service="telemetry-metric-agent-metrics"}[20m])))'
QUERY_EXPORTED='query=round(sum(rate(otelcol_exporter_sent_metric_points{service=~"telemetry-metric-agent-metrics"}[20m])))'
QUERY_RECEIVED='query=round(sum(rate(otelcol_receiver_accepted_metric_points_total{service="telemetry-metric-agent-metrics"}[20m])))'
QUERY_EXPORTED='query=round(sum(rate(otelcol_exporter_sent_metric_points_total{service=~"telemetry-metric-agent-metrics"}[20m])))'
QUERY_QUEUE='query=avg(sum(otelcol_exporter_queue_size{service="telemetry-metric-agent-metrics"}))'
QUERY_MEMORY='query=round(sum(avg_over_time(container_memory_working_set_bytes{namespace="kyma-system", container="collector"}[20m]) * on(namespace,pod) group_left(workload) avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{namespace="kyma-system", workload="telemetry-metric-agent"}[20m])) by (pod) / 1024 / 1024)'
QUERY_CPU='query=round(sum(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace="kyma-system"}[20m]) * on(namespace,pod) group_left(workload) avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{namespace="kyma-system", workload="telemetry-metric-agent"}[20m])) by (pod), 0.1)'
Expand All @@ -385,8 +385,8 @@ function get_result_and_cleanup_metricagent() {

function get_result_and_cleanup_log_otel() {
RESULT_TYPE="log"
QUERY_RECEIVED='query=round(sum(rate(otelcol_receiver_accepted_log_records{service=~"log-gateway-metrics"}[20m])))'
QUERY_EXPORTED='query=round(sum(rate(otelcol_exporter_sent_log_records{service=~"log-gateway-metrics"}[20m])))'
QUERY_RECEIVED='query=round(sum(rate(otelcol_receiver_accepted_log_records_total{service=~"log-gateway-metrics"}[20m])))'
QUERY_EXPORTED='query=round(sum(rate(otelcol_exporter_sent_log_records_total{service=~"log-gateway-metrics"}[20m])))'
QUERY_QUEUE='query=avg(sum(otelcol_exporter_queue_size{service=~"log-gateway-metrics"}))'
QUERY_MEMORY='query=round(sum(avg_over_time(container_memory_working_set_bytes{namespace="log-load-test", container="collector"}[20m]) * on(namespace,pod) group_left(workload) avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{namespace="log-load-test", workload="log-gateway"}[20m])) by (pod) / 1024 / 1024)'
QUERY_CPU='query=round(sum(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace="log-load-test"}[20m]) * on(namespace,pod) group_left(workload) avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{namespace="log-load-test", workload="log-gateway"}[20m])) by (pod), 0.1)'
Expand Down
2 changes: 1 addition & 1 deletion internal/images/images.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,6 @@ package images
const (
DefaultFluentBitExporterImage = "europe-docker.pkg.dev/kyma-project/prod/directory-size-exporter:v20250217-6a3cdc4a"
DefaultFluentBitImage = "europe-docker.pkg.dev/kyma-project/prod/external/fluent/fluent-bit:3.2.7"
DefaultOTelCollectorImage = "europe-docker.pkg.dev/kyma-project/prod/kyma-otel-collector:0.118.0-main"
DefaultOTelCollectorImage = "europe-docker.pkg.dev/kyma-project/prod/kyma-otel-collector:0.120.0-main"
DefaultSelfMonitorImage = "europe-docker.pkg.dev/kyma-project/prod/tpi/telemetry-self-monitor:3.2.0-825b449"
)
16 changes: 11 additions & 5 deletions internal/otelcollector/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,25 +56,31 @@ type MetricExporter struct {
}

type PrometheusMetricExporter struct {
Host string `yaml:"host"`
Port int32 `yaml:"port"`
Host string `yaml:"host"`
Port int32 `yaml:"port"`
WithoutScopeInfo bool `yaml:"without_scope_info,omitempty"`
WithoutTypeSuffix bool `yaml:"without_type_suffix,omitempty"`
WithoutUnits bool `yaml:"without_units,omitempty"`
}

type Logs struct {
Level string `yaml:"level"`
Encoding string `yaml:"encoding"`
}

func DefaultService(pipelines Pipelines) Service {
func DefaultService(pipelines Pipelines, compatibilityMode bool) Service {
telemetry := Telemetry{
Metrics: Metrics{
Readers: []MetricReader{
{
Pull: PullMetricReader{
Exporter: MetricExporter{
Prometheus: PrometheusMetricExporter{
Host: fmt.Sprintf("${%s}", EnvVarCurrentPodIP),
Port: ports.Metrics,
Host: fmt.Sprintf("${%s}", EnvVarCurrentPodIP),
Port: ports.Metrics,
WithoutScopeInfo: compatibilityMode,
WithoutTypeSuffix: compatibilityMode,
WithoutUnits: compatibilityMode,
},
},
},
Expand Down
Loading
Loading