Description
Whilst trying to get to grips on the expected usage of OTEL using the agent and coroutines I felt it was ambiguous the intended usage. And I'm not clear how and when the out-of-the-box javaagent behaviour kicks in. It appears like it's hard to escape peppering in the OTEL additions which looks easy for people to accidentally miss (which is not ideal!).
Let's assume this is always how the parent and root span is defined:
val span = tracer.spanBuilder("demo-client")
.setNoParent()
.startSpan()
And the Ktor HttpClient is defined like this, I'll omit the request prepare statement as it's not relevant (as far as I know):
HttpClient(Java) {
install(ContentNegotiation) {
jackson()
}
}
Both of the following leads to new trace IDs, not using my defined span as the parent:
try {
span.makeCurrent().use { // Presuming the scope should be closed before the span is ended?
withContext(Dispatchers.IO) { // Expecting the agent to add the OTEL context agent here?
post.execute { response ->
response.body<DummyDto>()
}
post.execute { response ->
response.body<DummyDto>()
}
}
}
} finally {
span.end()
}
try {
span.makeCurrent().use { // The `withContext` isn't strictly required anyway - the ThreadLocal appears to not be propagated into the ktor-client scopes out of the box.
post.execute { response ->
response.body<DummyDto>()
}
post.execute { response ->
response.body<DummyDto>()
}
}
} finally {
span.end()
}
If I'm not mistaken this looks like the intended use, which implies there's nothing out of the box with coroutines:
Works as expected, traceIds are correct with this example
try {
span.makeCurrent().use { // You would still want to load the ThreadLocal context, e.g. for loggers reading ThreadLocal MDC?
withContext(span.asContextElement()) { // Could have used the OTEL context because the span was made current.
post.execute { response ->
response.body<DummyDto>()
}
post.execute { response ->
response.body<DummyDto>()
}
}
}
} finally {
span.end()
}
In the prior example where the withContext(Dispatchers.IO) { .. }
was used, whenever withContext is used we must +
the coroutine context e.g.:
Works as expected
try {
span.makeCurrent().use {
withContext(Dispatchers.IO + Context.current().asContextElement()) {
post.execute { response ->
response.body<DummyDto>()
}
post.execute { response ->
response.body<DummyDto>()
}
}
}
} finally {
span.end()
}
And to throw a spanner in the works if I add a delay
in here it goes wrong again:
Does not work as expected
try {
span.makeCurrent().use {
withContext(Context.current().asContextElement()) {
delay(1000)
post.execute { response ->
response.body<DummyDto>()
}
post.execute { response ->
response.body<DummyDto>()
}
}
}
} finally {
span.end()
}
However if I add the Dispatchers.IO
it repairs it:
Works as expected
try {
span.makeCurrent().use {
withContext(Dispatchers.IO + Context.current().asContextElement()) {
delay(1000)
post.execute { response ->
response.body<DummyDto>()
}
post.execute { response ->
response.body<DummyDto>()
}
}
}
} finally {
span.end()
}
I can also isolate the delay
but it's getting very noisy at this point:
Works as expected
try {
span.makeCurrent().use {
withContext(Context.current().asContextElement()) {
withContext(Context.current().asContextElement()) {
delay(1000)
}
post.execute { response ->
response.body<DummyDto>()
}
post.execute { response ->
response.body<DummyDto>()
}
}
}
} finally {
span.end()
}
I'm hoping some of this usability is the result of some subtle bug or something I've missed.
Using 2.9.0 agent with
implementation(platform("io.opentelemetry.instrumentation:opentelemetry-instrumentation-bom:2.9.0"))
implementation("io.opentelemetry:opentelemetry-api")
implementation("io.opentelemetry:opentelemetry-extension-kotlin")