Skip to content

Conversation

@watson
Copy link
Collaborator

@watson watson commented Nov 12, 2025

What does this PR do?

  • Enforces a per-snapshot time budget for Dynamic Instrumentation debugger snapshots.
  • Default budget is 10ms; configurable via:
    • Env var: DD_DYNAMIC_INSTRUMENTATION_CAPTURE_TIMEOUT_MS
    • Programmatic: tracer.init({ dynamicInstrumentation: { captureTimeoutMs }})
  • When the budget is exceeded, remaining values that are not already resolved or in the process of being resolved (awaiting a session.post promise), are marked with notCapturedReason: 'timeout'.
  • Enforce max concurrency when resolving object properties (previously objects would have their properties resolved in parallel, now we enforce a max concurrency of two)
  • Adds unit and integration tests to validate partial snapshots under tight budgets.

Motivation

Prevent snapshot collection from exceeding acceptable latency and resource usage in complex/large object graphs.

Additional Notes

Depends on DataDog/system-tests#5705

Not covered by this PR:

Copy link
Collaborator Author

watson commented Nov 12, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@watson watson self-assigned this Nov 12, 2025
@github-actions
Copy link

github-actions bot commented Nov 12, 2025

Overall package size

Self size: 13.24 MB
Deduped: 116.15 MB
No deduping: 131.17 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 10.3.0 | 20.73 MB | 20.74 MB | | @datadog/native-iast-taint-tracking | 4.0.0 | 11.72 MB | 11.73 MB | | @datadog/pprof | 5.12.0 | 11.19 MB | 11.57 MB | | @opentelemetry/resources | 1.30.1 | 557.67 kB | 7.71 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.4 | 2.95 MB | 5.82 MB | | @datadog/wasm-js-rewriter | 5.0.1 | 2.82 MB | 3.55 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api-logs | 0.208.0 | 199.48 kB | 1.42 MB | | @opentelemetry/api | 1.9.0 | 1.22 MB | 1.22 MB | | jsonpath-plus | 10.3.0 | 617.18 kB | 1.08 MB | | import-in-the-middle | 1.15.0 | 127.66 kB | 856.24 kB | | lru-cache | 10.4.3 | 804.3 kB | 804.3 kB | | @datadog/openfeature-node-server | 0.2.0 | 118.51 kB | 437.19 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | @isaacs/ttlcache | 2.1.1 | 90.58 kB | 90.58 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 7.0.5 | 63.38 kB | 63.38 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.3 | 23.74 kB | 23.74 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB | | escape-string-regexp | 5.0.0 | 3.66 kB | 3.66 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@codecov
Copy link

codecov bot commented Nov 12, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.59%. Comparing base (4ff4de8) to head (af0b05b).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6897      +/-   ##
==========================================
+ Coverage   83.83%   84.59%   +0.75%     
==========================================
  Files         506      505       -1     
  Lines       21359    21171     -188     
==========================================
+ Hits        17907    17910       +3     
+ Misses       3452     3261     -191     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pr-commenter
Copy link

pr-commenter bot commented Nov 12, 2025

Benchmarks

Benchmark execution time: 2025-11-15 05:56:51

Comparing candidate commit 495678c in PR branch watson/DEBUG-4727/debugger-snapshot-time-budget with baseline commit e098151 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1608 metrics, 62 unstable metrics.

@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch from 3f58922 to 1992630 Compare November 12, 2025 14:52
@watson watson marked this pull request as ready for review November 12, 2025 15:20
@watson watson requested review from a team as code owners November 12, 2025 15:20
@watson watson requested review from BridgeAR and removed request for a team November 12, 2025 15:20
@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch from 1992630 to 03362e2 Compare November 12, 2025 15:23
@datadog-datadog-prod-us1

This comment has been minimized.

tylfin
tylfin previously approved these changes Nov 12, 2025
@watson watson marked this pull request as draft November 13, 2025 08:49
@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch 2 times, most recently from 4f232ba to c5280f9 Compare November 13, 2025 10:01
@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch from c5280f9 to b9b7fa5 Compare November 13, 2025 13:31
@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch 2 times, most recently from ddd5ec0 to bfd99a1 Compare November 14, 2025 10:27
@watson watson marked this pull request as ready for review November 14, 2025 10:29
@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch 2 times, most recently from 65c4e4b to 4a970c8 Compare November 14, 2025 11:31
@watson watson requested a review from tylfin November 14, 2025 13:16
shatzi
shatzi previously approved these changes Nov 14, 2025
// This can be skipped and we can go directly to its children, of which
// there will always be exactly two, the first containing the key, and the
// second containing the value of this entry of the Map.
if (Object.hasOwn(value, timeBudgetSym)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can reduce the snapshot size by simply filter those items out and put noCaptureReason: 'timeout' on the map/collection object itself. same as size capture limit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I didn't like this bit either. I took a second look and found a better implementation that addresses your concern.

@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch from 4a970c8 to 6f67548 Compare November 15, 2025 05:18
@watson watson requested a review from a team as a code owner November 15, 2025 05:18
@watson watson requested a review from shatzi November 15, 2025 05:18
@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch from 6f67548 to 495678c Compare November 15, 2025 05:41
Copy link
Collaborator

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the code, not the tests.


// Trigger the breakpoint once probe is successfully installed
function triggerBreakpoint (url) {
async function triggerBreakpoint (url) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually adds a single tick without any other benefit (besides marking the method as an async one).

Could you add JSDoc as well? :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is test code, so I optimize for readability rather than performance. But I can add the JSDoc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up adding types to the entirety of the debugger test utils, and the changes have become quite extensive, so I'll add those in follow-up PR to keep this one clean

maxCollectionSize,
maxFieldCount,
maxLength,
deadlineNs: start + BigInt(config.dynamicInstrumentation.captureTimeoutMs ?? 10) * 1_000_000n
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the default is already defined in the config. We could also create the BigInt in the config directly (and also define it as default). That seems best in my opinion.

Suggested change
deadlineNs: start + BigInt(config.dynamicInstrumentation.captureTimeoutMs ?? 10) * 1_000_000n
deadlineNs: start + config.dynamicInstrumentation.captureTimeoutMs * 1_000_000n

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean convert captureTimeoutMs to nanoseconds in the config? So we still take captureTimeoutMs as the init input, but compute a separate captureTimeoutNs at config time, which we reference here?

* @property {number} maxCollectionSize - The maximum size of a collection to include in the snapshot
* @property {number} maxFieldCount - The maximum number of properties on an object to include in the snapshot
* @property {bigint} deadlineNs - The deadline in nanoseconds compared to `process.hrtime.bigint()`
* @property {boolean} [deadlineReached=false] - Whether the deadline has been reached. Should not be set by the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it should not be set by a user, let us not document it here? And if I understand it correct, it is just a way to transport the information to other code parts. Is there any other way to transport that information?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't document it here, TypeScript will complain when we reference it, right?

But yes, it's just a way to transport the information out to the caller, so that when they pass in the same opts object again later in a subsequent call to getRuntimeObject, this information is retained, so that we don't have to re-compute it. I agree this is not a good programming pattern, but I felt it was the one that had the best performance for what we wanted. But I'd be happy to consider alternatives 😊

Another approach I considered was a context object that could be passed around. But since we already pass the opts object around all over code, I felt it was easier to just re-use that. But maybe we could make it more obvious by nesting it under a ctx property. So instead of deadlineReached being directly on the root of the opts object, it could be ctx.deadlineReached.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this in the latest commit - both the JSDoc and added the ctx property

Enforce a per-snapshot time budget. By default this budget is 10ms, but
can be modified by the experimental config, either as an environment
variable:

    DD_DYNAMIC_INSTRUMENTATION_CAPTURE_TIMEOUT_MS=20

Or programatically:

    tracer.init({
      dynamicInstrumentation: {
        captureTimeoutMs: 20
      }
    })

When the budget is exceeded, remaining values that are not already
resolved, are marked with `notCapturedReason: 'timeout'`.
@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch 3 times, most recently from 8b643eb to c5336bf Compare November 17, 2025 14:36
@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch from c5336bf to af0b05b Compare November 17, 2025 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants