Skip to content

Attach values to CPU samples#63163

Draft
szegedi wants to merge 4 commits intonodejs:mainfrom
szegedi:feat/sample-context-profiler
Draft

Attach values to CPU samples#63163
szegedi wants to merge 4 commits intonodejs:mainfrom
szegedi:feat/sample-context-profiler

Conversation

@szegedi
Copy link
Copy Markdown
Contributor

@szegedi szegedi commented May 7, 2026

Adds a feature where users of the V8 CPU profiler can associate arbitrary values with CPU samples. Typical usage for such values are labels, trace IDs, and so on. We'll typically refer to these values as "sample context".

This work is essentially a reimplementation of Datadog's equivalent functionality in the https://github.com/DataDog/pprof-nodejs project, only implemented elegantly and without workarounds we had to resort there 'cause our code is outside of both Node.js and V8.

It can be easily separated into two parts: support in V8 profiler code, and Node.js specific implementation on top of it.

V8 code to support associating values with samples

I tried to keep this to minimum, for obvious reasons of easy future patch management and upstreaming potential. It essentially just allows an opaque void* to be added to each sample. In the public API part, this means that there's now:

  • a CpuProfilingOptions::sample_context_extractor with which you can specify a callback that takes the current isolate and returns a void*, this will be invoked from the sampling code (either Unix signal handler or Windows thread state mechanism.)
  • A void* CpuProfile::GetSampleContext(index) where the user can read the data assigned to every sample.
  • As a final element, there's the exported LookupCpedMapAlignedPointer function. This is a signal-safe lookup implementation for the common pattern of using a JS map in the CPED for storing multiple independent bits of data (which is what Node.js also does with AsyncContextFrame) and then having an object with an internal field bound to a key in the map carry the native void * pointer to embedder data. This function lives in the V8 code because hosting it in Node.js code would be very unfeasible. TL;DR: if we include embedder-data-slot-inl.h, js-collection-inl.h, ordered-hash-table-inl.h, etc. in Node.js code, it pulls in tons of build infrastructure that only V8's own compilation is set up for. In V8 code on the other hand, it has a very nice minimal implementation as it can rely on internal signal-safe APIs that read hashtable data.

Node.js code

With the above V8 bits in place, the Node.js implementation itself becomes fairly straightforward.

  • We have new JS APIs on the existing SyncCPUProfileHandle that set the current sample context in the internal ALS: enterWithContext() for enterWith-like semantics and runWithContext() for run-like semantics.
  • We have a new stopAndCapture() method on the handle (because stop() is already taken) that returns an object tree basically mapping the v8::CPUProfile structure and also carries sample contexts. The existing stop() that preemptively serializes to JSON is not suitable for use where you want to inspect the contexts, and it is also a nice object oriented way to get the data upon which arbitrary serializers (I'm thinking pprof and/or otlp) can be built later.
  • We have a new snapshot() method on the handle that returns the same data as stopAndCapture() but doesn't stop profiling, rather internally immediately starts a new profiling session. This allows for creation of gapless continuous profiler solutions (like Datadog's is.)

Internally, most of the interesting details pertain to needs of signal safe capture and storage of sample contexts. In C++ we consistently hold a reference to a sample context using a std::shared_ptr<v8::Global<v8::Value>>. When profiling starts, we allocate a vector of these large enough to hold contexts for all captured samples, as we are not allowed to allocate anything in a signal handler. Having a Global reference to the value keeps it alive in V8. However, we could not be copying globals directly from CPED-bound context holder's to our internal buffer, hence using a std::shared_ptr which we can safely copy from the same context holder's internal field to our buffer.

Finally, we must not try to chase pointers from CPED to our context holder if GC is in progress. For this reason, we register GC prologue and epilogue callbacks to observe GC, and furthermore capture and cache the current sample context in the prologue (then it is still safe to do it) and use it for all GC samples. This way, not even GC samples lose context.

Anyhow, as I said, this whole approach is pretty much what we use in Datadog's pprof-nodejs already and successfully for quite some time now. Our approach unfortunately had to resort to hijacking the PROF signal handler (and thus doesn't work on Windows), we had to reimplement the internal V8 hashmap lookup from scratch (which is brittle as V8 evolves), and we had to resort to timestamp bracketing to associate samples with our contexts in the buffer. None of these kluges are needed here, so this implementation is as clean as it gets IMHO.

Review away :-)

@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/security-wg
  • @nodejs/v8-update

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels May 7, 2026
@szegedi szegedi force-pushed the feat/sample-context-profiler branch 2 times, most recently from 96a4d45 to 1554b9f Compare May 7, 2026 12:05
szegedi added 4 commits May 7, 2026 14:20
Signed-off-by: Attila Szegedi <attila.szegedi@datadoghq.com>
…a as a structured tree as well as associating values with samples
@szegedi szegedi force-pushed the feat/sample-context-profiler branch from 1554b9f to d64a519 Compare May 7, 2026 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants