Skip to content

javaagent Kotlin coroutines usage (with ktor client), potential bug with delay and contexts. #12611

Closed as not planned
@amccague

Description

@amccague

Whilst trying to get to grips on the expected usage of OTEL using the agent and coroutines I felt it was ambiguous the intended usage. And I'm not clear how and when the out-of-the-box javaagent behaviour kicks in. It appears like it's hard to escape peppering in the OTEL additions which looks easy for people to accidentally miss (which is not ideal!).

Let's assume this is always how the parent and root span is defined:

val span = tracer.spanBuilder("demo-client")
    .setNoParent()
    .startSpan()

And the Ktor HttpClient is defined like this, I'll omit the request prepare statement as it's not relevant (as far as I know):

HttpClient(Java) {
    install(ContentNegotiation) {
        jackson()
    }
}

Both of the following leads to new trace IDs, not using my defined span as the parent:

try {
    span.makeCurrent().use { // Presuming the scope should be closed before the span is ended?
        withContext(Dispatchers.IO) { // Expecting the agent to add the OTEL context agent here?
            post.execute { response ->
                response.body<DummyDto>()
            }
            post.execute { response ->
                response.body<DummyDto>()
            }
        }
    }
} finally {
    span.end()
}
try {
    span.makeCurrent().use { // The `withContext` isn't strictly required anyway - the ThreadLocal appears to not be propagated into the ktor-client scopes out of the box.
        post.execute { response ->
            response.body<DummyDto>()
        }
        post.execute { response ->
            response.body<DummyDto>()
        }
    }
} finally {
    span.end()
}

If I'm not mistaken this looks like the intended use, which implies there's nothing out of the box with coroutines:
Works as expected, traceIds are correct with this example

try {
    span.makeCurrent().use { // You would still want to load the ThreadLocal context, e.g. for loggers reading ThreadLocal MDC?
        withContext(span.asContextElement()) { // Could have used the OTEL context because the span was made current.
            post.execute { response ->
                response.body<DummyDto>()
            }
            post.execute { response ->
                response.body<DummyDto>()
            }
        }
    }
} finally {
    span.end()
}

In the prior example where the withContext(Dispatchers.IO) { .. } was used, whenever withContext is used we must + the coroutine context e.g.:
Works as expected

try {
    span.makeCurrent().use {
        withContext(Dispatchers.IO + Context.current().asContextElement()) {
            post.execute { response ->
                response.body<DummyDto>()
            }
            post.execute { response ->
                response.body<DummyDto>()
            }
        }
    }
} finally {
    span.end()
}

And to throw a spanner in the works if I add a delay in here it goes wrong again:
Does not work as expected

try {
    span.makeCurrent().use {
        withContext(Context.current().asContextElement()) {
            delay(1000)
            post.execute { response ->
                response.body<DummyDto>()
            }
            post.execute { response ->
                response.body<DummyDto>()
            }
        }
    }
} finally {
    span.end()
}

However if I add the Dispatchers.IO it repairs it:
Works as expected

try {
    span.makeCurrent().use {
        withContext(Dispatchers.IO + Context.current().asContextElement()) {
            delay(1000)
            post.execute { response ->
                response.body<DummyDto>()
            }
            post.execute { response ->
                response.body<DummyDto>()
            }
        }
    }
} finally {
    span.end()
}

I can also isolate the delay but it's getting very noisy at this point:
Works as expected

try {
    span.makeCurrent().use {
        withContext(Context.current().asContextElement()) {
            withContext(Context.current().asContextElement()) {
                delay(1000)
            }
            post.execute { response ->
                response.body<DummyDto>()
            }
            post.execute { response ->
                response.body<DummyDto>()
            }
        }
    }
} finally {
    span.end()
}

I'm hoping some of this usability is the result of some subtle bug or something I've missed.

Using 2.9.0 agent with

implementation(platform("io.opentelemetry.instrumentation:opentelemetry-instrumentation-bom:2.9.0"))
implementation("io.opentelemetry:opentelemetry-api")
implementation("io.opentelemetry:opentelemetry-extension-kotlin")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions