feat: Implement process context publishing (OTEP-4719)#3460
feat: Implement process context publishing (OTEP-4719)#3460scottgerring wants to merge 9 commits into
Conversation
0c9f6f1 to
f57d295
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3460 +/- ##
======================================
Coverage 82.8% 82.9%
======================================
Files 130 132 +2
Lines 27289 27558 +269
======================================
+ Hits 22618 22851 +233
- Misses 4671 4707 +36 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f57d295 to
1099f1b
Compare
| @@ -0,0 +1,139 @@ | |||
| //! Process context sharing via memory-mapped regions. | |||
There was a problem hiding this comment.
I've pulled this up to represent the public API, subbing in either a linux-64 or a no-op impl based on the target
| tonic-prost-build = { workspace = true } | ||
| tempfile = { workspace = true } | ||
| serde_json = { workspace = true } | ||
| serial_test = { workspace = true } |
There was a problem hiding this comment.
Makes it easier for testing
|
@cijothomas @lalitb it would be great to get alignment on where this should live (see my idea of the options in the PR text) first! |
Hey @yannham , do you mind signing the CTA now that i've taken your suggestion ? |
|
/easycla |
|
@scottgerring - Thanks for calling out the placement tradeoff in the PR description. I agree that the process-context proto message itself is related to opentelemetry-proto, but I am less sure about adding publish/unpublish there as the public API. That API owns process-global runtime behavior, Linux mmap/memfd/prctl details, update semantics, and SDK Resource lifecycle. With thread-context sharing also being discussed, this area may grow beyond just process-level context. Option 2 from the description, a small dedicated context-sharing crate, feels cleaner to me than expanding One related question: should I do not want to block this PR only on crate placement, but I think it is worth aligning on the intended long-term home before this becomes a public API we have to support. |
|
Hey @lalitb thanks for taking a look!
Agreed 100% it is weird in the proto repo! I think there's an argument to be made it could go in P.S. i've added two commits on the top to show what it looks like as a standalone crate.
There's some discussion ongoing about this but it does look like it's going to end up in the proto repo. I think in the meantime we could get it going with a local copy with heavy caveats ("this is going to move!!1") - what do you think? |
b4bea67 to
5998f09
Compare
4b6b976 to
26f2de3
Compare
26f2de3 to
84ccaf4
Compare
84ccaf4 to
3e1cc95
Compare
Summary
Implements the publisher side of OTEP-4719 (Process Context Sharing) behind the
process-contextfeature flag onopentelemetry-proto. This is helpful for signal correlation - and together with the thread context sharing (OTEP-4947), will allow the Rust community to have request-correlated profiling support. This is a linux-only mechanism by design and the API to support it provides a no-op impl for other targets.On Linux, this publishes SDK resource attributes to a named memory mapping (
OTEL_CTX) so external readers (such as the OpenTelemetry eBPF Profiler) can discover process metadata without direct integration or process activity.This is a precursor for OTEP-4947 which will enable per-thread resource attribution building on top of the process-level mechanism implemented here.
The implementation is adapted from the implementation used in libdatadog, a shared library we use underneath many of our own datadog libraries.
Usage
Decisions
Why
opentelemetry-proto?I reckon this probably belongs in
opentelemetry-sdk, but becauseopentelemetry-protodepends on that for its transformation layer we can't reference our proto types there (see #3045).We have options:
I'd like to do (3) (and am happy to impl it myself), but as I poked this issue recently and heard nothing back I am hesitant to put it in the dependency path for this.
Publishing mechanism
I've made this explicit - the user must explicitly use this API to publish - look at the updated example code. I'd prefer if this happened automatically, but because each signal stands alone and is provided its own resource, it is difficult to envisage how an automatic publishing mechanism should work. If anyone has any better ideas lmk.
Merge requirement checklist
CHANGELOG.mdfiles updated for non-trivial, user-facing changes