-
Notifications
You must be signed in to change notification settings - Fork 75
feat: Add CpuAttributesSpanAppender #1283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,57 @@ | ||||||
| /* | ||||||
| * Copyright The OpenTelemetry Authors | ||||||
| * SPDX-License-Identifier: Apache-2.0 | ||||||
| */ | ||||||
|
|
||||||
| package io.opentelemetry.android | ||||||
|
|
||||||
| import android.os.Process | ||||||
| import io.opentelemetry.android.common.RumConstants | ||||||
| import io.opentelemetry.context.Context | ||||||
| import io.opentelemetry.sdk.trace.ReadWriteSpan | ||||||
| import io.opentelemetry.sdk.trace.ReadableSpan | ||||||
| import io.opentelemetry.sdk.trace.internal.ExtendedSpanProcessor | ||||||
|
|
||||||
| /** | ||||||
| * A [SpanProcessor] that uses the experimental ExtendedSpanProcessor API to append OS process | ||||||
| * cpu statistics into span attributes. We establish 'cpu utilization average' to be: | ||||||
| * | ||||||
| * cpuUtilizationAvg = 100 * (cpuTimeMs / spanDurationMs) / number of CPU cores | ||||||
| * * cpuTimeMs is the time in milliseconds that the app process has taken in active CPU | ||||||
| * time | ||||||
| * * spanDurationMs is the total running time in milliseconds that the span has been active | ||||||
| * for | ||||||
| */ | ||||||
| class CpuAttributesSpanAppender( | ||||||
| private val cpuCores: Int = Runtime.getRuntime().availableProcessors(), | ||||||
| ) : ExtendedSpanProcessor { | ||||||
| override fun isStartRequired(): Boolean = true | ||||||
|
|
||||||
| override fun onEnd(span: ReadableSpan) {} | ||||||
|
|
||||||
| override fun isEndRequired(): Boolean = false | ||||||
|
|
||||||
| override fun isOnEndingRequired(): Boolean = true | ||||||
|
|
||||||
| override fun onStart( | ||||||
| parentContext: Context, | ||||||
| span: ReadWriteSpan, | ||||||
| ) { | ||||||
| val cputime = Process.getElapsedCpuTime() | ||||||
| span.setAttribute(RumConstants.CPU_ELAPSED_TIME_START_KEY, Process.getElapsedCpuTime()) | ||||||
| } | ||||||
|
|
||||||
| override fun onEnding(span: ReadWriteSpan) { | ||||||
| val startCpuTime = | ||||||
| span.getAttribute(RumConstants.CPU_ELAPSED_TIME_START_KEY) ?: return | ||||||
| val endCpuTime = Process.getElapsedCpuTime() | ||||||
| val cpuTimeMs = (endCpuTime - startCpuTime).toDouble() | ||||||
| val spanDurationMs = (span.latencyNanos / 1_000_000).toDouble() | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tend to favor existing fluent utility classes instead of manually doing the math with conversion factors.
Suggested change
|
||||||
|
|
||||||
| if (spanDurationMs > 0) { | ||||||
| val cpuUtilization = (cpuTimeMs / spanDurationMs) * 100.0 / cpuCores.toDouble() | ||||||
| span.setAttribute(RumConstants.CPU_AVERAGE_KEY, cpuUtilization) | ||||||
| } | ||||||
| span.setAttribute(RumConstants.CPU_ELAPSED_TIME_END_KEY, endCpuTime) | ||||||
| } | ||||||
| } | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,6 +7,7 @@ | |
|
|
||
| import androidx.annotation.NonNull; | ||
| import androidx.annotation.Nullable; | ||
| import io.opentelemetry.android.CpuAttributesSpanAppender; | ||
| import io.opentelemetry.android.ScreenAttributesSpanProcessor; | ||
| import io.opentelemetry.android.features.diskbuffering.DiskBufferingConfig; | ||
| import io.opentelemetry.android.internal.services.network.CurrentNetworkProvider; | ||
|
|
@@ -27,6 +28,7 @@ public class OtelRumConfig { | |
| private boolean generateSdkInitializationEvents = true; | ||
| private boolean includeScreenAttributes = true; | ||
| private boolean discoverInstrumentations = true; | ||
| private boolean includeCpuAttributes = true; | ||
| private DiskBufferingConfig diskBufferingConfig = DiskBufferingConfig.create(); | ||
| private final List<String> suppressedInstrumentations = new ArrayList<>(); | ||
|
|
||
|
|
@@ -66,11 +68,25 @@ public OtelRumConfig disableNetworkAttributes() { | |
| return this; | ||
| } | ||
|
|
||
| /** | ||
| * Disables the cpu attributes for spans. See {@link CpuAttributesSpanAppender} for more | ||
| * information. Default = true. | ||
| */ | ||
| public OtelRumConfig disableCpuAttributes() { | ||
| includeCpuAttributes = false; | ||
| return this; | ||
| } | ||
|
Comment on lines
+71
to
+78
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Until we can establish that there is a broad(er) desire for these attributes, I think it's safer for this to be opt-in (default disabled). Can we flip this around? |
||
|
|
||
| /** Returns true if runtime network attributes are enabled, false otherwise. */ | ||
| public boolean shouldIncludeNetworkAttributes() { | ||
| return includeNetworkAttributes; | ||
| } | ||
|
|
||
| /** Returns true if cpu attributes are enabled, false otherwise */ | ||
| public boolean shouldIncludeCpuAttributes() { | ||
| return includeCpuAttributes; | ||
| } | ||
|
|
||
| /** | ||
| * Disables the collection of events related to the initialization of the OTel Android SDK | ||
| * itself. Default = true. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| /* | ||
| * Copyright The OpenTelemetry Authors | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
|
|
||
| package io.opentelemetry.android | ||
|
|
||
| import android.os.Process | ||
| import io.mockk.every | ||
| import io.mockk.impl.annotations.MockK | ||
| import io.mockk.junit5.MockKExtension | ||
| import io.mockk.mockkStatic | ||
| import io.mockk.verify | ||
| import io.opentelemetry.android.common.RumConstants | ||
| import io.opentelemetry.api.common.AttributeKey | ||
| import io.opentelemetry.context.Context | ||
| import io.opentelemetry.sdk.trace.ReadWriteSpan | ||
| import org.junit.jupiter.api.BeforeEach | ||
| import org.junit.jupiter.api.Test | ||
| import org.junit.jupiter.api.extension.ExtendWith | ||
|
|
||
| @ExtendWith(MockKExtension::class) | ||
| class CpuAttributesSpanAppenderTest { | ||
| @MockK | ||
| private lateinit var mockSpan: ReadWriteSpan | ||
|
|
||
| @MockK | ||
| private lateinit var context: Context | ||
|
|
||
| val processor = CpuAttributesSpanAppender(cpuCores = 1) | ||
|
|
||
| @BeforeEach | ||
| fun setup() { | ||
| mockkStatic(Process::class) | ||
| every { mockSpan.setAttribute(any<AttributeKey<Double>>(), any<Double>()) } returns mockSpan | ||
| every { mockSpan.setAttribute(any<AttributeKey<Long>>(), any<Long>()) } returns mockSpan | ||
| } | ||
|
|
||
| @Test | ||
| fun `onStart should set the right attribute`() { | ||
| every { Process.getElapsedCpuTime() } returns 5L | ||
|
|
||
| processor.onStart(context, mockSpan) | ||
|
|
||
| verify { | ||
| mockSpan.setAttribute(RumConstants.CPU_ELAPSED_TIME_START_KEY, 5L) | ||
| } | ||
| } | ||
|
|
||
| @Test | ||
| fun `onEnding should set the right attributes if span has duration`() { | ||
| every { Process.getElapsedCpuTime() } returns 50L | ||
| every { | ||
| mockSpan.getAttribute(RumConstants.CPU_ELAPSED_TIME_START_KEY) | ||
| } returns 5L | ||
| // cpuTime = 45 | ||
|
|
||
| every { mockSpan.latencyNanos } returns 100L * 1_000_000 | ||
|
|
||
| // Span took 100ms, process was active for 45ms of that time. Therefore, expect 45% cpu | ||
| processor.onEnding(mockSpan) | ||
|
|
||
| verify { | ||
| mockSpan.setAttribute(RumConstants.CPU_AVERAGE_KEY, 45.0) | ||
| mockSpan.setAttribute(RumConstants.CPU_ELAPSED_TIME_END_KEY, 50L) | ||
| } | ||
|
|
||
| // With multiple cores, divide CPU average, expect 22.5% cpu | ||
| val moreCoresProcessor = CpuAttributesSpanAppender(cpuCores = 2) | ||
| moreCoresProcessor.onEnding(mockSpan) | ||
|
|
||
| verify { | ||
| mockSpan.setAttribute(RumConstants.CPU_AVERAGE_KEY, 22.5) | ||
| mockSpan.setAttribute(RumConstants.CPU_ELAPSED_TIME_END_KEY, 50L) | ||
| } | ||
| } | ||
|
|
||
| @Test | ||
| fun `onEnding should not set CPU average attribute if span has zero duration`() { | ||
| every { Process.getElapsedCpuTime() } returns 50L | ||
| every { | ||
| mockSpan.getAttribute(RumConstants.CPU_ELAPSED_TIME_START_KEY) | ||
| } returns 5L | ||
|
|
||
| every { mockSpan.latencyNanos } returns 0 | ||
|
|
||
| processor.onEnding(mockSpan) | ||
|
|
||
| verify(exactly = 0) { | ||
| mockSpan.setAttribute(RumConstants.CPU_AVERAGE_KEY, any<Double>()) | ||
| } | ||
|
|
||
| verify { | ||
| mockSpan.setAttribute(RumConstants.CPU_ELAPSED_TIME_END_KEY, 50L) | ||
| } | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -508,6 +508,31 @@ public void verifyGlobalAttrsForLogs() { | |
| .build()); | ||
| } | ||
|
|
||
| @Test | ||
| public void verifyCpuAttributesSpanProcessor() { | ||
| createAndSetServiceManager(); | ||
| OtelRumConfig config = buildConfig(); | ||
|
|
||
| OpenTelemetryRum rum = | ||
| OpenTelemetryRum.builder(application, config) | ||
| .addTracerProviderCustomizer( | ||
| (tracerProviderBuilder, app) -> | ||
| tracerProviderBuilder.addSpanProcessor( | ||
| SimpleSpanProcessor.create(spanExporter))) | ||
| .build(); | ||
|
|
||
| rum.getOpenTelemetry().getTracer("test").spanBuilder("test span").startSpan().end(); | ||
|
|
||
| await().atMost(Duration.ofSeconds(30)) | ||
| .untilAsserted( | ||
| () -> { | ||
| List<SpanData> spans = spanExporter.getFinishedSpanItems(); | ||
| assertThat(spans).hasSize(1); | ||
| assertThat(spans.get(0).getAttributes().asMap()) | ||
| .containsKey(longKey("process.cpu.elapsed_time_start")); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would also help to
|
||
| }); | ||
| } | ||
|
|
||
| private static Services createAndSetServiceManager() { | ||
| Services services = mock(Services.class); | ||
| when(services.getAppLifecycle()).thenReturn(mock(AppLifecycle.class)); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess what? This value is not guaranteed to stay the same for the life of the runtime! I know, it's wild. https://developer.android.com/reference/java/lang/Runtime#availableProcessors()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yeah, unfortunately, this uncertainty can make the data harder to interpret without also sending along how many cores were used in the average calculation. For example, if the app is running along for some time with 8 cores, and then the number of cores drops to 4, the per-core reported "cpu average" reported value will double considerably and the user will have no indication about why.
To address this, one could imagine also sending the number of cores at span end and then the computation isn't required (it can be done in the backend, and the computed average is redundant).
This also opens up some other slightly stupider edge cases which are probably not worth thinking too deeply about....like what if the core count changes mid-span (hint: there's no trivial way of knowing!)...or the fact that you can't guarantee that every cpu/core is running with the same performance/frequency...etc.
This makes me think of an interesting feature possibility tho -- polling the cpu count and generating an event when that number changes. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we had to dig into the native layer to get consistent info about cores. It's arguably not worth it...