-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port span correlation #2251
Port span correlation #2251
Conversation
…aces selected for snapshotting.
…tests to reliably inject the snapshot profiling active span tracker into the OpenTelemetry SDK runtime.
…ker to be implemented as a Singleton.
…her or not to track it.
…tting class in favor of direct use of OpenTelemetry's IdGenerator.
…ContextBuilder obtained from Snapshotting.
…ace instances so to always have a trace ID associated even if span tracking is offline.
…to a parent span upon scope closing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class primarily exists to ease testing since ContextStorage.addWrapper
must be called prior to the ContextStorage
being initialized. Putting this behind a Supplier
offers tests a way to hook in (via the SettableContextStorageProvider
) without trying to coordinate the ContextStorage
initialization across every single test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early designs had ActiveSpanTracker
as a singleton object. That worked but was quite intricate to use correctly. Pushing the singleton behavior into a provider removes a lot of the complexities with a minor inconvenience in ScheduledExecutorStackTraceSampler
where the SpanTracker
is needed.
profiler/src/main/java/com/splunk/opentelemetry/profiler/snapshot/ActiveSpanTracker.java
Outdated
Show resolved
Hide resolved
profiler/src/main/java/com/splunk/opentelemetry/profiler/snapshot/ActiveSpanTracker.java
Outdated
Show resolved
Hide resolved
…ces that span multiple threads.
} | ||
|
||
private boolean doNotTrack(SpanContext spanContext) { | ||
return !spanContext.isSampled() || !registry.isRegistered(spanContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps should also add !spanContext.isValid()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a test for invalid spans but it did not require adding a explicit check. I think the isSampled
check doubles as a check for isValid
. I'll leave the test to make that scenario documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, Span.wrap(SpanContext.getInvalid())
creates a span where sampled flag is not set. I thought a bit more about this and since for registry.isRegistered(spanContext)
to pass span must be registered and since the register is called from a span processor it should be impossible for an invalid span to pass this method. Since SnapshotProfilingSpanProcessor
calls register only for the root span only root spans will be reported as active spans. Is this true? If it is perhaps it would make sense to spell this out somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that the trace registry only checks for the trace id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 24 in 7bb828a
private final Map<String, SpanContext> traceIds = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you answered your question, however:
Since
SnapshotProfilingSpanProcessor
calls register only for the root span only root spans will be reported as active spans. Is this true?
No. Its true that only a root span will trigger the trace registration process but once a trace is registered the ActiveSpanTracker
will keep track of child spans as they are added to the OpenTelemetry Context
.
Line 24 in 7bb828a
private final Map<String, SpanContext> traceIds = new ConcurrentHashMap<>(); could be a set as the map value is not used
Yes it can! I will submit a separate PR to update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you answered your question, however:
Since
SnapshotProfilingSpanProcessor
calls register only for the root span only root spans will be reported as active spans. Is this true?No. Its true that only a root span will trigger the trace registration process but once a trace is registered the
ActiveSpanTracker
will keep track of child spans as they are added to the OpenTelemetryContext
.
One minor -- but important -- detail I left out: trace selection only happens for root spans (see SnapshotVolumePropagator
) but trace registration happens for service entry spans (see SnapshotProfilingSpanProcessor
).
@@ -88,13 +91,19 @@ public void run() { | |||
try { | |||
Instant now = Instant.now(); | |||
ThreadInfo threadInfo = threadMXBean.getThreadInfo(threadId, Integer.MAX_VALUE); | |||
StackTrace stackTrace = StackTrace.from(now, samplingPeriod, traceId, threadInfo); | |||
SpanContext spanContext = retrieveActiveSpan(threadId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is race between taking the thread dump and asking the active span it is possible that the stack trace and the span won't match (stack trace won't include the method where the span is active). Is that an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, yes, but in practice I would suspect the risk is likely quite low. Low enough that I don't think we would want to block the trace thread(s) every time the snapshot profiler wants to take a sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine for now. Continuous profiler has the same issue, there the dump is taken by jfr and we can't really do anything to synchronize with that. If blocking is of concern then keep in mind that, as far as I know, taking the stack trace suspends all threads. You could reduce the blocking by having only one scheduled task that takes the dump for all threads that you care about in one operation.
profiler/src/main/java/com/splunk/opentelemetry/profiler/snapshot/ActiveSpanTracker.java
Outdated
Show resolved
Hide resolved
var executor = Executors.newFixedThreadPool(2); | ||
try (var scope1 = executor.submit(attach(span1)).get(); | ||
var scope2 = executor.submit(attach(span2)).get()) { | ||
assertEquals( | ||
Optional.of(span1.getSpanContext()), spanTracker.getActiveSpan(scope1.getThreadId())); | ||
assertEquals( | ||
Optional.of(span2.getSpanContext()), spanTracker.getActiveSpan(scope2.getThreadId())); | ||
} finally { | ||
executor.shutdown(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it hard to reason why both actions should be executed on separate threads. Looks line an implementation detail of ThreadPoolExecutor. I guess it is fine as long as it works reliably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ActiveSpanTracker
records the thread ID using Thread.currentThread().getId()
so if both spans are attached to the OpenTelemetry Context
on the same thread only one would be considered the active thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me rephrase. If you change Executors.newFixedThreadPool(2)
to Executors.newFixedThreadPool(1)
the test will fail. As far as I can tell the fact that when using Executors.newFixedThreadPool(2)
both of the actions run on separate threads is an implementation detail of ThreadPoolExecutor
not a specified behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand now. You are technically correct, there aren't any documented guarantees that the two operations will actually be performed on separate threads so the test is relying on the first task not finishing prior to the second task actually being submitted. I've used this technique quite a few times over the years and it hasn't caused issues but I can add an explicit release step so the test ensures that the first submitted operation doesn't end to quickly to ensure that both threads in the thread pool much be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the test is relying on the first task not finishing prior to the second task actually being submitted
I think this is not necessary. Looking at ThreadPoolExecutor
I got the impression that it creates new threads for tasks as long as the thread pool has reached the requested size. So even when the first task is completed it will create a new thread for the second task to reach the pool size.
profiler/src/test/java/com/splunk/opentelemetry/profiler/snapshot/ActiveSpanTrackerTest.java
Outdated
Show resolved
Hide resolved
…eak reference keys in ActiveSpanTracker to ensure than map entries are automatically removed if there are a thread terminates without the OpenTelemetry scope being closed properly.
…tasks to ensure those tasks execute on multiple threads.
This PR add span correlation snapshot profiling stack traces. Span correlation is done by intercepting OpenTelemetry context changes via a
ContextStorage
that maps trace IDs toSpanContext
s as newContext
s are attached and detached.