-
Notifications
You must be signed in to change notification settings - Fork 75
feat: Add CpuAttributesSpanAppender #1283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| /** | ||
| * Disables the cpu attributes for spans. See {@link CpuAttributesSpanAppender} for more | ||
| * information. Default = true. | ||
| */ | ||
| public OtelRumConfig disableCpuAttributes() { | ||
| includeCpuAttributes = false; | ||
| return this; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until we can establish that there is a broad(er) desire for these attributes, I think it's safer for this to be opt-in (default disabled). Can we flip this around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Doohl for the contribution. I think this is a curious/interesting feature, and I'm excited to hear what others think about it and how users will leverage this data.
I have left a few comments, but think that this could proceed with some changes.
| * for | ||
| */ | ||
| class CpuAttributesSpanAppender( | ||
| private val cpuCores: Int = Runtime.getRuntime().availableProcessors(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess what? This value is not guaranteed to stay the same for the life of the runtime! I know, it's wild. https://developer.android.com/reference/java/lang/Runtime#availableProcessors()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yeah, unfortunately, this uncertainty can make the data harder to interpret without also sending along how many cores were used in the average calculation. For example, if the app is running along for some time with 8 cores, and then the number of cores drops to 4, the per-core reported "cpu average" reported value will double considerably and the user will have no indication about why.
To address this, one could imagine also sending the number of cores at span end and then the computation isn't required (it can be done in the backend, and the computed average is redundant).
This also opens up some other slightly stupider edge cases which are probably not worth thinking too deeply about....like what if the core count changes mid-span (hint: there's no trivial way of knowing!)...or the fact that you can't guarantee that every cpu/core is running with the same performance/frequency...etc.
This makes me think of an interesting feature possibility tho -- polling the cpu count and generating an event when that number changes. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess what? This value is not guaranteed to stay the same for the life of the runtime! I know, it's wild. developer.android.com/reference/java/lang/Runtime#availableProcessors()
TIL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we had to dig into the native layer to get consistent info about cores. It's arguably not worth it...
| span.getAttribute(RumConstants.CPU_ELAPSED_TIME_START_KEY) ?: return | ||
| val endCpuTime = Process.getElapsedCpuTime() | ||
| val cpuTimeMs = (endCpuTime - startCpuTime).toDouble() | ||
| val spanDurationMs = (span.latencyNanos / 1_000_000).toDouble() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to favor existing fluent utility classes instead of manually doing the math with conversion factors.
| val spanDurationMs = (span.latencyNanos / 1_000_000).toDouble() | |
| val spanDurationMs = TimeUnit.NANOSECONDS.toMillis(span.latencyNanos).toDouble() |
| List<SpanData> spans = spanExporter.getFinishedSpanItems(); | ||
| assertThat(spans).hasSize(1); | ||
| assertThat(spans.get(0).getAttributes().asMap()) | ||
| .containsKey(longKey("process.cpu.elapsed_time_start")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would also help to
- verify that the other 2 attributes are also present
- verify that the end time is greater than or equal the start time
|
Android phone are similar to multi-tenant systems in that the app is not responsible for scheduling, and thus have no control on the cores it gets assigned or whether or not they are throttled by the OS. As well, the main thread is often blocked by binder calls, which means it's just waiting for the OS to respond, which would affect average utilization Given that, these metrics will be missing key aspects to contextualize why the numbers are the way they are, which would result in pretty noisy data. BTW, what is the plan of how this data will be consumed, and what problems do you think it'll find? |
fixes #1142
What is this change?
This PR implements a new SpanProcessor (
CpuAttributesSpanAppender) that appends the following attributes to spans:process.cpu.avg_utilization- The relative, average CPU utilization for the app process during the spanprocess.cpu_elapsed_time_start- The elapsed CPU time at the start of this span (in millis)process.cpu_elapsed_time_end- The elapsed CPU time at the end of this span (in millis)How was this tested?
Tested manually with the demo app. Here is an example of the telemetry generated:
{ "data": [ { "traceID": "1b720aae66b3959b06eb5335247ada5b", "spans": [ { "traceID": "1b720aae66b3959b06eb5335247ada5b", "spanID": "e53cbd939dce77af", "operationName": "Created", "references": [], "startTime": 1759468632716000, "duration": 46198, "tags": [ { "key": "otel.library.name", "type": "string", "value": "io.opentelemetry.lifecycle" }, { "key": "activity.name", "type": "string", "value": "AstronomyShopActivity" }, { "key": "last.screen.name", "type": "string", "value": "MainActivity" }, { "key": "network.connection.type", "type": "string", "value": "wifi" }, { "key": "process.cpu.avg_utilization", "type": "float64", "value": 37.5 }, { "key": "process.cpu.elapsed_time_end", "type": "int64", "value": 1827 }, { "key": "process.cpu.elapsed_time_start", "type": "int64", "value": 1758 }, { "key": "screen.name", "type": "string", "value": "AstronomyShopActivity" }, { "key": "session.id", "type": "string", "value": "43e73022b738dfbeb9aaa2674806fb38" }, { "key": "toolkit", "type": "string", "value": "jetpack compose" }, { "key": "span.kind", "type": "string", "value": "internal" }, { "key": "internal.span.format", "type": "string", "value": "otlp" } ], "logs": [ { "timestamp": 1759468632716095, "fields": [ { "key": "event", "type": "string", "value": "activityPreCreated" } ] }, { "timestamp": 1759468632724703, "fields": [ { "key": "event", "type": "string", "value": "activityCreated" } ] }, { "timestamp": 1759468632759650, "fields": [ { "key": "event", "type": "string", "value": "activityPostCreated" } ] }, { "timestamp": 1759468632760196, "fields": [ { "key": "event", "type": "string", "value": "activityPreStarted" } ] }, { "timestamp": 1759468632760275, "fields": [ { "key": "event", "type": "string", "value": "activityStarted" } ] }, { "timestamp": 1759468632760738, "fields": [ { "key": "event", "type": "string", "value": "activityPostStarted" } ] }, { "timestamp": 1759468632760962, "fields": [ { "key": "event", "type": "string", "value": "activityPreResumed" } ] }, { "timestamp": 1759468632760985, "fields": [ { "key": "event", "type": "string", "value": "activityResumed" } ] }, { "timestamp": 1759468632762171, "fields": [ { "key": "event", "type": "string", "value": "activityPostResumed" } ] } ], "processID": "p1", "warnings": null } ], "processes": { "p1": { "serviceName": "OpenTelemetryDemoApp", "tags": [ { "key": "device.manufacturer", "type": "string", "value": "Google" }, { "key": "device.model.identifier", "type": "string", "value": "sdk_gphone64_arm64" }, { "key": "device.model.name", "type": "string", "value": "sdk_gphone64_arm64" }, { "key": "os.description", "type": "string", "value": "Android Version 16 (Build BP22.250221.010 API level 36)" }, { "key": "os.name", "type": "string", "value": "Android" }, { "key": "os.type", "type": "string", "value": "linux" }, { "key": "os.version", "type": "string", "value": "16" }, { "key": "rum.sdk.version", "type": "string", "value": "0.16.0-alpha-SNAPSHOT" }, { "key": "service.version", "type": "string", "value": "1.0" }, { "key": "telemetry.sdk.language", "type": "string", "value": "java" }, { "key": "telemetry.sdk.name", "type": "string", "value": "opentelemetry" }, { "key": "telemetry.sdk.version", "type": "string", "value": "1.54.1" } ] } }, "warnings": null } ], "total": 0, "limit": 0, "offset": 0, "errors": null }