HTTP Latency Histogram Metrics with Route Attribution#431
HTTP Latency Histogram Metrics with Route Attribution#431santigimeno wants to merge 6 commits intonode-v24.x-nsolid-v6.xfrom
Conversation
Refactor NSolid metrics interval handling so configuration and updates are managed in a single place in JS, and the native API no longer exposes a separate setMetricsInterval entry point. This reduces duplicated logic between JS and C++ and makes interval changes easier to reason about.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds HTTP datapoint attributes and per-attribute exponential histogram collection: new types and attributes in public API, propagation of HTTP metadata through JS/native/agent paths, OTLP/gRPC histogram emission, integration tests, and Makefile/test infra updates. Changes
Sequence DiagramsequenceDiagram
participant JS as JavaScript Layer
participant Bind as Native Bindings
participant Queue as Datapoint Queue
participant Cache as Per-thread Cache
participant Agg as Histogram Aggregator
participant Agent as gRPC/OTLP Agent
JS->>Bind: Fast/SlowPushClient/ServerBucket(method,status,addr,port,scheme,version,route)
Bind->>Queue: enqueue ExtMetricsStor (metrics stor + http client/server points)
Queue->>Cache: accumulate per-thread stor and http point attributes
Cache->>Agg: on PTiles interval flush attribute hashmaps -> PointDataAttributesVector
Agg->>Agent: build exponential histogram MetricData (delta) with attributes and emit
Agent->>Collector: deliver metrics to collector / test listener
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/nsolid/nsolid_api.cc (1)
2095-2130:⚠️ Potential issue | 🟠 MajorConvert HTTP samples to seconds before feeding the exponential histograms.
The same raw
valuestill backs the millisecond summaries inagents/otlp/src/otlp_common.ccLines 354-367, but the new OTel histograms are exported with unit seconds there on Lines 420-425. Aggregatingvalueunchanged here makes the histogram sum/min/max/buckets 1000× too large.🩹 Proposed fix
- auto aggregate_into_hashmap = [&value]( - AttributesHashMap* hashmap, - MetricAttributes&& metric_attrs) { + auto aggregate_into_hashmap = []( + AttributesHashMap* hashmap, + MetricAttributes&& metric_attrs, + double sample_seconds) { auto factory = []() -> std::unique_ptr<Aggregation> { return std::make_unique<Base2ExponentialHistogramAggregation>(); }; auto* agg = hashmap->GetOrSetDefault(std::move(metric_attrs), factory); - agg->Aggregate(value); + agg->Aggregate(sample_seconds); }; @@ aggregate_into_hashmap( http_client_hashmap_.get(), - BuildMetricAttributes(*http, type)); + BuildMetricAttributes(*http, type), + value / 1e3); @@ aggregate_into_hashmap( http_server_hashmap_.get(), - BuildMetricAttributes(*http, type)); + BuildMetricAttributes(*http, type), + value / 1e3);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nsolid/nsolid_api.cc` around lines 2095 - 2130, The HTTP datapoints are being aggregated into exponential histograms using the raw millisecond `value`, causing a 1000× scale error; update the places that call aggregate_into_hashmap for MetricsStream::Type::kHttpClient and kHttpServer so they convert `value` from milliseconds to seconds (e.g., compute a double seconds_value = value / 1000.0) and pass that converted value into the aggregation instead of the original `value` when invoking aggregate_into_hashmap (leave other uses of `value` for ms summaries unchanged); locate usages by the symbols aggregate_into_hashmap, BuildMetricAttributes, http_client_hashmap_, and http_server_hashmap_.
🧹 Nitpick comments (5)
test/addons/nsolid-statsdagent/binding.gyp (1)
10-11: Move these include directories totest/addons/common.gypi. The OpenTelemetry include paths (../../../deps/opentelemetry-cpp/api/includeand../../../deps/opentelemetry-cpp/sdk/include) are duplicated across four addon binding files (nsolid-log-hooks, nsolid-statsdagent, nsolid-eventloop-cmd, nsolid-dispatchqueue). Centralizing them in the shared config file will reduce maintenance burden.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/addons/nsolid-statsdagent/binding.gyp` around lines 10 - 11, The two OpenTelemetry include paths '../../../deps/opentelemetry-cpp/api/include' and '../../../deps/opentelemetry-cpp/sdk/include' should be removed from this binding.gyp and added into the shared test/addons/common.gypi include_dirs list; update common.gypi to include those two paths (so build configs reference them centrally) and then remove the duplicated entries from the other addon binding.gyp files (nsolid-log-hooks, nsolid-eventloop-cmd, nsolid-dispatchqueue) so all addons rely on the shared common.gypi include_dirs instead of repeating the same paths here.test/integrations/express/v5/package.json (1)
5-6: The package.json versions are overridden by the Makefile test setup, but consider aligning them for clarity.The Makefile (
test-integrations-prereqs) installs these fixtures using explicit pinned versions (express@4,express@5,fastify@5), not the floating ranges in package.json. Since the package.json versions are bypassed in CI, the stated concern about floating ranges doesn't impact test stability. However, this discrepancy is confusing—if the package.json is not meant to be used, either remove the dependencies or update them to match the Makefile's pinned versions (e.g.,"express": "5"instead of"^5.0.0"). Tests already skip gracefully when dependencies are missing (viaexistsSyncguards).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/integrations/express/v5/package.json` around lines 5 - 6, The package.json dependency version for express is inconsistent with the Makefile's pinned install in test-integrations-prereqs; update the "dependencies" entry in package.json (the "express" key) to match the pinned version used by the Makefile (e.g., change "express": "^5.0.0" to "express": "5") or remove the dependency entirely if package.json is not intended to be authoritative for these fixtures so the declared versions no longer conflict with the test setup.test/integrations/express/v4/test-http-route.mjs (1)
122-137: Consider adding a timeout to prevent indefinite test hangs.The
metricsPromisewaits for all routes to be validated but has no timeout. If metrics never arrive (e.g., due to a bug or misconfiguration), the test will hang indefinitely.⏱️ Proposed fix to add timeout
// Listen for metrics - const metricsPromise = new Promise((resolve, reject) => { + const metricsPromise = new Promise((resolve) => { + const timeout = setTimeout(() => { + throw new Error(`Timeout: Only found routes ${[...foundRoutes].join(', ')}`); + }, 30000); + grpcServer.on('metrics', (data) => { const found = checkHttpRouteAttribute(data, expectedRoutes); for (const route of found) { if (!foundRoutes.has(route)) { foundRoutes.add(route); console.log(`Validated route: ${route} (${foundRoutes.size}/${expectedRoutes.size})`); } } if (foundRoutes.size === expectedRoutes.size) { console.log('All routes validated!'); + clearTimeout(timeout); resolve(); } }); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/integrations/express/v4/test-http-route.mjs` around lines 122 - 137, The metricsPromise can hang indefinitely if metrics never arrive; update the Promise created for metricsPromise (the grpcServer.on('metrics', ...) handler) to include a timeout (e.g., setTimeout) that rejects the promise after a reasonable interval, ensure you clear that timer and remove the 'metrics' listener when either resolved or rejected, and make sure the rejection message includes context (mentioning checkHttpRouteAttribute, foundRoutes and expectedRoutes) so the test fails fast instead of hanging.test/integrations/express/v5/test-http-route.mjs (2)
122-137: Same timeout concern as Express v4 test.The
metricsPromiselacks a timeout, which could cause indefinite hangs if metrics don't arrive as expected.⏱️ Proposed fix to add timeout
// Listen for metrics - const metricsPromise = new Promise((resolve, reject) => { + const metricsPromise = new Promise((resolve) => { + const timeout = setTimeout(() => { + throw new Error(`Timeout: Only found routes ${[...foundRoutes].join(', ')}`); + }, 30000); + grpcServer.on('metrics', (data) => { const found = checkHttpRouteAttribute(data, expectedRoutes); for (const route of found) { if (!foundRoutes.has(route)) { foundRoutes.add(route); console.log(`Validated route: ${route} (${foundRoutes.size}/${expectedRoutes.size})`); } } if (foundRoutes.size === expectedRoutes.size) { console.log('All routes validated!'); + clearTimeout(timeout); resolve(); } }); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/integrations/express/v5/test-http-route.mjs` around lines 122 - 137, The metricsPromise created around grpcServer.on('metrics') can hang forever; update the Promise logic in test-http-route.mjs (the metricsPromise that listens to 'metrics' and uses checkHttpRouteAttribute, foundRoutes, and expectedRoutes) to implement a configurable timeout: start a timer when the Promise is created that rejects the Promise with a clear error after e.g. 5-10s, and ensure you clear the timer and remove the 'metrics' listener on both resolve and reject to avoid leaks; keep existing route-found logic but add the timeout and proper cleanup.
31-67: Consider extracting shared test utilities.The
getAttrandcheckHttpRouteAttributefunctions are duplicated across Express v4, v5, and likely Fastify v5 tests. Consider extracting to a shared test helper module.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/integrations/express/v5/test-http-route.mjs` around lines 31 - 67, Extract the duplicated test helpers into a shared test utility module and import them into the Express v4, Express v5, and Fastify v5 tests: move the getAttr function and checkHttpRouteAttribute function logic into a new/shared helper (e.g., export getAttr and checkHttpRouteAttribute from the helper), replace the local implementations in test-http-route.mjs (and the other duplicated test files) with imports from that helper, and update tests to call the imported getAttr and checkHttpRouteAttribute to remove duplication while preserving existing behavior and signatures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Makefile`:
- Around line 1688-1698: The new integration setup targets
test-integrations-prereqs and test-integrations-prereqs-clean are not wired into
the normal lifecycle; update the Makefile so that jstest, test, and test-ci
depend on test-integrations-prereqs (so integration deps are installed before
running tests) and make clean depend on test-integrations-prereqs-clean (so
integration node_modules get removed on clean); ensure you add these targets as
prerequisites to the existing jstest, test, test-ci, and clean targets rather
than duplicating install/clean logic.
In `@src/nsolid/nsolid_api.cc`:
- Around line 136-152: The server branch in the MetricsStream handling (when
type != MetricsStream::Type::kHttpClient) only sets url.scheme and http.route,
so server histogram points still lack host/port; update the server-side branch
to also insert {"server.address", http.server_address} when http.server_address
is non-empty and {"server.port", static_cast<int64_t>(http.server_port)} when
http.server_port != 0 (matching the client branch) in the same block that sets
url.scheme/http.route, and ensure these attributes propagate through
FastPushServerBucket / PushServerBucketImpl paths so
http.server.request.duration includes host/port.
- Around line 1276-1283: UpdateConfig() must not call gen_ptiles_timer_.start()
directly; instead store the new interval into gen_ptiles_interval (or a pending
variable) and enqueue a restart command to the loop thread using the existing
async queue pattern (e.g., push a command onto eloop_cmds_q_ or
on_config_string_q_). Implement the actual stop/start of gen_ptiles_timer_
(using gen_ptiles_timer_.stop() then gen_ptiles_timer_.start(gen_ptiles_cb_,
gen_ptiles_interval, gen_ptiles_interval)) inside the loop thread’s handler
(env_list_routine_ or the async callback) so the libuv handle is always
manipulated on thread_loop_. Ensure the queued command uses the loop thread
context and checks/uses the stored interval value.
In `@test/integrations/testcfg.py`:
- Around line 8-22: Ls currently returns nested test paths as a single string
(e.g., "fastify/v5/test-http-route.mjs") which prevents
SimpleTestConfiguration.ListTests (which builds candidates as current_path +
[t]) from matching via Contains; modify Ls so it returns path components instead
of a single joined string: compute rel_dir = os.path.relpath(root, path),
normalize it, and if rel_dir == '.' return the filename as a single-element list
[f], otherwise return rel_dir.split(os.sep) + [f] (or equivalent sequence of
path components) so ListTests' concatenation and Contains(path, tst) will work
correctly.
---
Outside diff comments:
In `@src/nsolid/nsolid_api.cc`:
- Around line 2095-2130: The HTTP datapoints are being aggregated into
exponential histograms using the raw millisecond `value`, causing a 1000× scale
error; update the places that call aggregate_into_hashmap for
MetricsStream::Type::kHttpClient and kHttpServer so they convert `value` from
milliseconds to seconds (e.g., compute a double seconds_value = value / 1000.0)
and pass that converted value into the aggregation instead of the original
`value` when invoking aggregate_into_hashmap (leave other uses of `value` for ms
summaries unchanged); locate usages by the symbols aggregate_into_hashmap,
BuildMetricAttributes, http_client_hashmap_, and http_server_hashmap_.
---
Nitpick comments:
In `@test/addons/nsolid-statsdagent/binding.gyp`:
- Around line 10-11: The two OpenTelemetry include paths
'../../../deps/opentelemetry-cpp/api/include' and
'../../../deps/opentelemetry-cpp/sdk/include' should be removed from this
binding.gyp and added into the shared test/addons/common.gypi include_dirs list;
update common.gypi to include those two paths (so build configs reference them
centrally) and then remove the duplicated entries from the other addon
binding.gyp files (nsolid-log-hooks, nsolid-eventloop-cmd, nsolid-dispatchqueue)
so all addons rely on the shared common.gypi include_dirs instead of repeating
the same paths here.
In `@test/integrations/express/v4/test-http-route.mjs`:
- Around line 122-137: The metricsPromise can hang indefinitely if metrics never
arrive; update the Promise created for metricsPromise (the
grpcServer.on('metrics', ...) handler) to include a timeout (e.g., setTimeout)
that rejects the promise after a reasonable interval, ensure you clear that
timer and remove the 'metrics' listener when either resolved or rejected, and
make sure the rejection message includes context (mentioning
checkHttpRouteAttribute, foundRoutes and expectedRoutes) so the test fails fast
instead of hanging.
In `@test/integrations/express/v5/package.json`:
- Around line 5-6: The package.json dependency version for express is
inconsistent with the Makefile's pinned install in test-integrations-prereqs;
update the "dependencies" entry in package.json (the "express" key) to match the
pinned version used by the Makefile (e.g., change "express": "^5.0.0" to
"express": "5") or remove the dependency entirely if package.json is not
intended to be authoritative for these fixtures so the declared versions no
longer conflict with the test setup.
In `@test/integrations/express/v5/test-http-route.mjs`:
- Around line 122-137: The metricsPromise created around
grpcServer.on('metrics') can hang forever; update the Promise logic in
test-http-route.mjs (the metricsPromise that listens to 'metrics' and uses
checkHttpRouteAttribute, foundRoutes, and expectedRoutes) to implement a
configurable timeout: start a timer when the Promise is created that rejects the
Promise with a clear error after e.g. 5-10s, and ensure you clear the timer and
remove the 'metrics' listener on both resolve and reject to avoid leaks; keep
existing route-found logic but add the timeout and proper cleanup.
- Around line 31-67: Extract the duplicated test helpers into a shared test
utility module and import them into the Express v4, Express v5, and Fastify v5
tests: move the getAttr function and checkHttpRouteAttribute function logic into
a new/shared helper (e.g., export getAttr and checkHttpRouteAttribute from the
helper), replace the local implementations in test-http-route.mjs (and the other
duplicated test files) with imports from that helper, and update tests to call
the imported getAttr and checkHttpRouteAttribute to remove duplication while
preserving existing behavior and signatures.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 99eb215f-fea3-440a-a0dd-c8947a5772c4
📒 Files selected for processing (30)
.gitignoreMakefileagents/grpc/src/grpc_agent.ccagents/grpc/src/grpc_agent.hagents/otlp/src/otlp_common.ccagents/otlp/src/otlp_common.hdeps/opentelemetry-cpp/otlp-http-exporter.gyplib/_http_client.jslib/_http_server.jslib/internal/nsolid_diag.jslib/internal/nsolid_http_consts.jslib/nsolid.jssrc/nsolid.hsrc/nsolid/nsolid_api.ccsrc/nsolid/nsolid_api.hsrc/nsolid/nsolid_bindings.hsrc/nsolid/nsolid_trace.htest/addons/nsolid-dispatchqueue/binding.gyptest/addons/nsolid-eventloop-cmd/binding.gyptest/addons/nsolid-log-hooks/binding.gyptest/addons/nsolid-statsdagent/binding.gyptest/agents/test-grpc-http-histograms.mjstest/integrations/express/v4/package.jsontest/integrations/express/v4/test-http-route.mjstest/integrations/express/v5/package.jsontest/integrations/express/v5/test-http-route.mjstest/integrations/fastify/v5/package.jsontest/integrations/fastify/v5/test-http-route.mjstest/integrations/testcfg.pytest/parallel/test-bootstrap-modules.js
9dc1558 to
4759af6
Compare
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@agents/grpc/src/grpc_agent.cc`:
- Around line 955-983: The loop processing dequeued ExtMetricsStor items
currently re-inserts metrics for threads that were removed (env_msg_cb_ erased
thr_metrics_cache_) because queued items can arrive after deletion; before
calling otlp::fill_env_metrics, otlp::fill_http_histograms or updating
thr_metrics_cache_ and thr_metrics_hist_prev_end_ts_ms_, check whether
ext_stor.stor.thread_id exists in env_metrics_map_ (or equivalent live-thread
set) and if not simply skip/purge this dequeued item (continue) so dead threads
are not exported or re-added to thr_metrics_cache_; apply this check around the
body handling ExtMetricsStor (using ExtMetricsStor, thr_metrics_msg_q_.dequeue,
thr_metrics_cache_, thr_metrics_hist_prev_end_ts_ms_, and env_metrics_map_ names
to locate the code).
In `@lib/internal/nsolid_diag.js`:
- Around line 381-385: The current handler in
dc.subscribe('http.server.response.finish', ...) sets request[nsolid_route_s] to
request.route.path which omits mounted router prefixes; change it to compose the
full route by using (request.baseUrl || '') + request.route.path (or equivalent)
and assign that to request[nsolid_route_s] so mounted routes like
app.use('/api', router) are preserved; ensure you handle undefined baseUrl and
avoid accidental duplicate slashes when concatenating, and keep the check that
skips setting when request[nsolid_route_s] already exists.
In `@lib/internal/nsolid_http_consts.js`:
- Around line 21-26: The kHttpVersionMap is missing the '2.0' entry so raw
httpVersion === "2.0" falls through to kHttpVersionOther; add a '2.0':
nsolid_consts.kHttpVersion2 mapping to kHttpVersionMap (next to the existing '2'
key) so both "2" and "2.0" resolve to nsolid_consts.kHttpVersion2; leave the
call sites (where httpVersion strings are passed directly into kHttpVersionMap
in lib/_http_server.js and lib/_http_client.js) unchanged.
In `@Makefile`:
- Line 315: The CI path `test-ci-js` is missing the new
`test-integrations-prereqs` dependency and can skip integration setup; update
the Makefile so `test-ci-js` depends on `test-integrations-prereqs` (or ensure
`$(CI_JS_SUITES)` target pulls that prereq) so it runs the same integration
prerequisites as `jstest` and `test-ci`; modify the `test-ci-js` target to
include `test-integrations-prereqs` alongside existing deps.
In `@src/nsolid/nsolid_api.cc`:
- Around line 1276-1281: The code writes gen_ptiles_interval in UpdateConfig()
without synchronization while gen_ptiles_cb_() (run on thread_loop_) reads it,
causing a data race; fix by making gen_ptiles_interval access thread-safe —
either change the variable to std::atomic<uint64_t> and use atomic loads/stores,
or protect both writers and readers with the same mutex used by thread_loop_ (or
a new mutex) so UpdateConfig() updates and gen_ptiles_cb_() reads the value
under the same lock; update all references (writes in UpdateConfig and reads in
gen_ptiles_cb_) to use the chosen synchronization primitive.
- Around line 124-153: BuildMetricAttributes currently only inserts "url.scheme"
for server metrics, so client metrics lack the URL scheme; update
BuildMetricAttributes (function BuildMetricAttributes) to insert {"url.scheme",
std::string(scheme)} in the client branch as well, where scheme =
HttpUrlSchemeToString(http.url_scheme). Then add a url_scheme parameter to
PushClientBucketImpl (and propagate it through SlowPushClientBucket and
FastPushClientBucket constructors/signatures) so the client-side code can pass
the real scheme instead of the hardcoded HttpUrlScheme::kOther, and update all
C++ callers accordingly; finally update the three JS call sites
(lib/internal/nsolid_diag.js and lib/_http_client.js locations that construct
client bucket instances) to pass the actual URL scheme value instead of kOther
so PushClientBucketImpl receives and forwards the correct url_scheme.
In `@src/nsolid/nsolid_api.h`:
- Around line 414-417: The hard cap of 100 for route-level histograms is too
low: replace the fixed AttributesHashMap(100) usages for http_client_hashmap_
and http_server_hashmap_ with a configurable limit, e.g., use
aggregation_config_->cardinality_limit_ (or fall back to a higher default like
2000) when constructing AttributesHashMap in the code that initializes
http_client_hashmap_ and http_server_hashmap_, mirroring the approach in
sync_metric_storage.cc so the cardinality can be tuned instead of being
hardcoded.
In `@test/integrations/express/v4/test-http-route.mjs`:
- Around line 69-171: The test's runTest function can leak servers on failures —
wrap the main test body (everything after starting grpcServer and Express server
up to awaiting metricsPromise) in a try/finally so cleanup always runs; replace
awaiting metricsPromise directly with a raced promise (e.g.,
Promise.race([metricsPromise, timeoutPromise]) with a short timeout) so the test
fails fast if metrics never arrive; in the finally block always destroy tracked
connections and close the Express server and grpcServer (references: runTest,
grpcServer, server, connections, metricsPromise) to ensure no handles are left
open.
In `@test/integrations/express/v5/test-http-route.mjs`:
- Around line 31-66: The test currently only inspects
resourceMetrics[0].scopeMetrics[0].metrics so it can miss the expected histogram
when multiple resource or scope batches are present; update
checkHttpRouteAttribute to iterate over all resourceMetrics and, within each,
all scopeMetrics and their metrics (use nested for-loops over resourceMetrics
and scopeMetrics instead of indexing [0][0]) and then process each
scopeMetrics[i].metrics array as before (keep the existing metric name/type
checks and dataPoints handling and retain use of getAttr and foundRoutes logic).
- Around line 19-23: The current prereq guard uses existsSync(join(__dirname,
'node_modules')) and does a console.log plus process.exit(0), which masks
missing integration dependencies; replace this with a failing assertion or
non-zero exit so the test fails loudly (e.g., use assert.ok or throw new Error
when existsSync(...) is false) so the test run does not silently skip Express v5
coverage; update the block around existsSync, join and __dirname to assert the
prereq rather than calling process.exit(0).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2386a9f0-037b-485b-aeaa-0305b220b52f
📒 Files selected for processing (30)
.gitignoreMakefileagents/grpc/src/grpc_agent.ccagents/grpc/src/grpc_agent.hagents/otlp/src/otlp_common.ccagents/otlp/src/otlp_common.hdeps/opentelemetry-cpp/otlp-http-exporter.gyplib/_http_client.jslib/_http_server.jslib/internal/nsolid_diag.jslib/internal/nsolid_http_consts.jssrc/nsolid.hsrc/nsolid/nsolid_api.ccsrc/nsolid/nsolid_api.hsrc/nsolid/nsolid_bindings.hsrc/nsolid/nsolid_metrics_types.hsrc/nsolid/nsolid_trace.htest/addons/nsolid-dispatchqueue/binding.gyptest/addons/nsolid-eventloop-cmd/binding.gyptest/addons/nsolid-log-hooks/binding.gyptest/addons/nsolid-statsdagent/binding.gyptest/agents/test-grpc-http-histograms.mjstest/integrations/express/v4/package.jsontest/integrations/express/v4/test-http-route.mjstest/integrations/express/v5/package.jsontest/integrations/express/v5/test-http-route.mjstest/integrations/fastify/v5/package.jsontest/integrations/fastify/v5/test-http-route.mjstest/integrations/testcfg.pytest/parallel/test-bootstrap-modules.js
🚧 Files skipped from review as they are similar to previous changes (14)
- .gitignore
- test/integrations/express/v4/package.json
- test/parallel/test-bootstrap-modules.js
- test/addons/nsolid-dispatchqueue/binding.gyp
- test/integrations/fastify/v5/package.json
- test/addons/nsolid-eventloop-cmd/binding.gyp
- test/integrations/testcfg.py
- test/addons/nsolid-statsdagent/binding.gyp
- lib/_http_client.js
- test/agents/test-grpc-http-histograms.mjs
- test/addons/nsolid-log-hooks/binding.gyp
- test/integrations/fastify/v5/test-http-route.mjs
- test/integrations/express/v5/package.json
- src/nsolid/nsolid_bindings.h
| ExtMetricsStor ext_stor; | ||
| while (agent->thr_metrics_msg_q_.dequeue(ext_stor)) { | ||
| auto& stor = ext_stor.stor; | ||
| otlp::fill_env_metrics(metrics, stor, false); | ||
| agent->thr_metrics_cache_.insert_or_assign(stor.thread_id, std::move(stor)); | ||
| uint64_t thread_id = stor.thread_id; | ||
| uint64_t start_ts_ms = 0; | ||
| auto prev_it = agent->thr_metrics_hist_prev_end_ts_ms_.find(thread_id); | ||
| if (prev_it != agent->thr_metrics_hist_prev_end_ts_ms_.end()) { | ||
| start_ts_ms = prev_it->second; | ||
| } | ||
| // Add exponential histogram metrics for HTTP latency. | ||
| otlp::fill_http_histograms( | ||
| metrics, | ||
| stor, | ||
| ext_stor.http_client_points, | ||
| ext_stor.http_server_points, | ||
| false, | ||
| start_ts_ms); | ||
| agent->thr_metrics_hist_prev_end_ts_ms_.insert_or_assign( | ||
| thread_id, | ||
| stor.timestamp); | ||
| agent->thr_metrics_cache_.insert_or_assign( | ||
| thread_id, | ||
| CachedThreadMetrics{ | ||
| std::move(stor), | ||
| start_ts_ms, | ||
| ext_stor.http_client_points, | ||
| ext_stor.http_server_points | ||
| }); |
There was a problem hiding this comment.
Drop queued samples for threads that are already gone.
env_msg_cb_() now erases thr_metrics_cache_ on thread removal, but thr_metrics_msg_q_ can already contain an ExtMetricsStor harvested just before that delete callback ran. If the delete async is processed first, this loop exports the sample and inserts the dead thread right back into thr_metrics_cache_. Skip or purge queued entries whose thread_id is no longer present in env_metrics_map_ before exporting/caching them.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/grpc/src/grpc_agent.cc` around lines 955 - 983, The loop processing
dequeued ExtMetricsStor items currently re-inserts metrics for threads that were
removed (env_msg_cb_ erased thr_metrics_cache_) because queued items can arrive
after deletion; before calling otlp::fill_env_metrics,
otlp::fill_http_histograms or updating thr_metrics_cache_ and
thr_metrics_hist_prev_end_ts_ms_, check whether ext_stor.stor.thread_id exists
in env_metrics_map_ (or equivalent live-thread set) and if not simply skip/purge
this dequeued item (continue) so dead threads are not exported or re-added to
thr_metrics_cache_; apply this check around the body handling ExtMetricsStor
(using ExtMetricsStor, thr_metrics_msg_q_.dequeue, thr_metrics_cache_,
thr_metrics_hist_prev_end_ts_ms_, and env_metrics_map_ names to locate the
code).
| const kHttpVersionMap = { | ||
| '__proto__': null, | ||
| '1.0': nsolid_consts.kHttpVersion10, | ||
| '1.1': nsolid_consts.kHttpVersion11, | ||
| '2': nsolid_consts.kHttpVersion2, | ||
| }; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== kHttpVersionMap call sites =="
rg -n -C2 'kHttpVersionMap\s*\[[^\]]+\]' lib
echo
echo "== Nearby HTTP version normalization logic =="
rg -n -C2 '\bhttpVersion(Major|Minor)?\b' lib/_http_client.js lib/_http_server.jsRepository: nodesource/nsolid
Length of output: 2830
🌐 Web query:
In Node.js HTTP/2 compatibility API, what exact string does httpVersionexpose on request/response objects:2or2.0?
💡 Result:
In Node.js’s HTTP/2 Compatibility API, req.httpVersion / res.httpVersion exposes the string "2.0" (not "2"). [1][2]
Add '2.0' key to kHttpVersionMap for HTTP/2 compatibility.
The Node.js HTTP/2 Compatibility API exposes httpVersion as "2.0" on request/response objects. Without this key, HTTP/2 traffic will fall through to kHttpVersionOther, losing protocol attribution. Both call sites in lib/_http_server.js:331 and lib/_http_client.js:872 pass raw httpVersion strings directly to the map without normalization.
Required change
const kHttpVersionMap = {
'__proto__': null,
'1.0': nsolid_consts.kHttpVersion10,
'1.1': nsolid_consts.kHttpVersion11,
'2': nsolid_consts.kHttpVersion2,
+ '2.0': nsolid_consts.kHttpVersion2,
};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/internal/nsolid_http_consts.js` around lines 21 - 26, The kHttpVersionMap
is missing the '2.0' entry so raw httpVersion === "2.0" falls through to
kHttpVersionOther; add a '2.0': nsolid_consts.kHttpVersion2 mapping to
kHttpVersionMap (next to the existing '2' key) so both "2" and "2.0" resolve to
nsolid_consts.kHttpVersion2; leave the call sites (where httpVersion strings are
passed directly into kHttpVersionMap in lib/_http_server.js and
lib/_http_client.js) unchanged.
| it = config.find("interval"); | ||
| if (it != config.end() && !it->is_null()) { | ||
| // If interval changes, next timer cb the timer is reset. | ||
| uint64_t interval = it->get<uint64_t>(); | ||
| gen_ptiles_interval = interval; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "# Declaration and all accesses to gen_ptiles_interval"
rg -n -C2 '\bgen_ptiles_interval\b' src/nsolid/nsolid_api.cc
echo
echo "# Timer ownership and callback context"
rg -n -C2 'env_list_routine_|gen_ptiles_cb_|UpdateConfig\(' src/nsolid/nsolid_api.ccRepository: nodesource/nsolid
Length of output: 3240
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check for existing atomic usage in src/nsolid/nsolid_api.cc ==="
rg -n 'std::atomic|atomic_utils' src/nsolid/nsolid_api.cc | head -20
echo
echo "=== Check atomic_utils.h availability in src/nsolid/ ==="
fd -t f 'atomic_utils' src/nsolid/
echo
echo "=== Check mutex/lock patterns in UpdateConfig function ==="
sed -n '1226,1315p' src/nsolid/nsolid_api.cc | head -50Repository: nodesource/nsolid
Length of output: 2102
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Full UpdateConfig function starting at line 1226 ==="
sed -n '1226,1310p' src/nsolid/nsolid_api.ccRepository: nodesource/nsolid
Length of output: 2641
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Verify gen_ptiles_cb_ is called on thread_loop_ ==="
sed -n '1738,1760p' src/nsolid/nsolid_api.cc
echo
echo "=== Verify timer callback context ==="
sed -n '1833,1845p' src/nsolid/nsolid_api.ccRepository: nodesource/nsolid
Length of output: 1496
Synchronize gen_ptiles_interval across threads.
UpdateConfig() writes gen_ptiles_interval at line 1280 without holding any lock, while gen_ptiles_cb_() reads it at lines 1885 and 1887 on thread_loop_. This is a data race; on 32-bit builds the value can tear.
🔧 Minimal fix
-uint64_t gen_ptiles_interval = 5000;
+std::atomic<uint64_t> gen_ptiles_interval{5000};- uint64_t interval = it->get<uint64_t>();
- gen_ptiles_interval = interval;
+ uint64_t interval = it->get<uint64_t>();
+ gen_ptiles_interval.store(interval, std::memory_order_relaxed);- if (gen_ptiles_interval != interval) {
+ uint64_t next = gen_ptiles_interval.load(std::memory_order_relaxed);
+ if (next != interval) {
int er = envlist->gen_ptiles_timer_.start(
- gen_ptiles_cb_, gen_ptiles_interval, gen_ptiles_interval);
+ gen_ptiles_cb_, next, next);
CHECK_EQ(er, 0);
}Also applies to: 1882-1888
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/nsolid/nsolid_api.cc` around lines 1276 - 1281, The code writes
gen_ptiles_interval in UpdateConfig() without synchronization while
gen_ptiles_cb_() (run on thread_loop_) reads it, causing a data race; fix by
making gen_ptiles_interval access thread-safe — either change the variable to
std::atomic<uint64_t> and use atomic loads/stores, or protect both writers and
readers with the same mutex used by thread_loop_ (or a new mutex) so
UpdateConfig() updates and gen_ptiles_cb_() reads the value under the same lock;
update all references (writes in UpdateConfig and reads in gen_ptiles_cb_) to
use the chosen synchronization primitive.
| UniqueAttributesHashMap | ||
| http_client_hashmap_ = std::make_unique<AttributesHashMap>(100); | ||
| UniqueAttributesHashMap | ||
| http_server_hashmap_ = std::make_unique<AttributesHashMap>(100); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
hdr="$(fd '^attributes_hashmap\.h$' deps/opentelemetry-cpp | head -n1)"
if [ -z "$hdr" ]; then
echo "attributes_hashmap.h not found" >&2
exit 1
fi
echo "== Constructor / size semantics =="
rg -n -C4 'class AttributesHashMap|AttributesHashMap\s*\(' "$hdr"
echo
echo "== Limit / overflow behavior =="
rg -n -C4 'overflow|max_size_|cardinality|GetOrSetDefault' "$hdr" deps/opentelemetry-cpp/sdk/src
echo
echo "== NSolid call sites =="
rg -n -C2 'AttributesHashMap\s*\(\s*100\s*\)' src agentsRepository: nodesource/nsolid
Length of output: 16762
Reconsider the cardinality limit for route-level HTTP histograms.
AttributesHashMap(100) is a hard cardinality cap, not preallocation. With {method, route, scheme, host, port} as the key, 100 distinct combinations per interval is easy to exceed. When exceeded, new series route to an overflow bucket (otel.metrics.overflow=true), which defeats the route-level breakdown. The OpenTelemetry default is 2000; either make this configurable via aggregation_config_->cardinality_limit_ (like sync_metric_storage.cc does) or raise the limit if route-level metrics are a core feature.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/nsolid/nsolid_api.h` around lines 414 - 417, The hard cap of 100 for
route-level histograms is too low: replace the fixed AttributesHashMap(100)
usages for http_client_hashmap_ and http_server_hashmap_ with a configurable
limit, e.g., use aggregation_config_->cardinality_limit_ (or fall back to a
higher default like 2000) when constructing AttributesHashMap in the code that
initializes http_client_hashmap_ and http_server_hashmap_, mirroring the
approach in sync_metric_storage.cc so the cardinality can be tuned instead of
being hardcoded.
| // Skip test if dependencies not installed | ||
| if (!existsSync(join(__dirname, 'node_modules'))) { | ||
| console.log('SKIP: node_modules not found. Run "make test-integrations-prereqs" first.'); | ||
| process.exit(0); | ||
| } |
There was a problem hiding this comment.
Fail fast when integration deps are missing.
This turns a broken integration setup into a green run, so CI can silently stop exercising the Express v5 coverage added in this PR. Assert the prereq instead of exiting with 0.
Suggested change
-// Skip test if dependencies not installed
-if (!existsSync(join(__dirname, 'node_modules'))) {
- console.log('SKIP: node_modules not found. Run "make test-integrations-prereqs" first.');
- process.exit(0);
-}
+assert.ok(
+ existsSync(join(__dirname, 'node_modules')),
+ 'Missing integration deps. Run "make test-integrations-prereqs" first.',
+);Based on learnings, the nsolid test suite expects prerequisites to be prepared ahead of time rather than silently skipped with fs.existsSync guards.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Skip test if dependencies not installed | |
| if (!existsSync(join(__dirname, 'node_modules'))) { | |
| console.log('SKIP: node_modules not found. Run "make test-integrations-prereqs" first.'); | |
| process.exit(0); | |
| } | |
| // Test will fail if dependencies not installed | |
| assert.ok( | |
| existsSync(join(__dirname, 'node_modules')), | |
| 'Missing integration deps. Run "make test-integrations-prereqs" first.', | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/integrations/express/v5/test-http-route.mjs` around lines 19 - 23, The
current prereq guard uses existsSync(join(__dirname, 'node_modules')) and does a
console.log plus process.exit(0), which masks missing integration dependencies;
replace this with a failing assertion or non-zero exit so the test fails loudly
(e.g., use assert.ok or throw new Error when existsSync(...) is false) so the
test run does not silently skip Express v5 coverage; update the block around
existsSync, join and __dirname to assert the prereq rather than calling
process.exit(0).
| } | ||
| }); | ||
|
|
||
| dc.subscribe('tracing:fastify.request.handler:start', ({ request, route }) => { |
There was a problem hiding this comment.
shouldn't we map it per-library? express, fastify, ...
4759af6 to
760c373
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (8)
Makefile (1)
608-608:⚠️ Potential issue | 🟠 MajorKeep the JS-only CI path in sync with the new integration prereqs.
Line 608 wires
test-ci, buttest-ci-jsstill doesn't depend ontest-integrations-prereqs. On a clean worker, the new Express/Fastify route tests can hit theirnode_modulesskip path instead of exercising the coverage.🧩 Suggested fix
-test-ci-js: | clear-stalled ## Build and test JavaScript with building anything else. +test-ci-js: | clear-stalled test-integrations-prereqs ## Build and test JavaScript with building anything else.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` at line 608, The JS-only CI target test-ci-js is missing the new integration prereq so update the Makefile to add test-integrations-prereqs to test-ci-js's dependency list (mirror what you did for test-ci); locate the test-ci-js target in the Makefile and append test-integrations-prereqs to its prerequisites so the Express/Fastify route tests run real installs instead of hitting the node_modules skip path.lib/internal/nsolid_http_consts.js (1)
21-26:⚠️ Potential issue | 🟠 MajorMap
"2.0"to the HTTP/2 enum as well.The HTTP/2 compatibility API reports
httpVersionas"2.0". Without that key, callers that pass the raw version string fall back tokHttpVersionOtherand lose protocol attribution.🧩 Suggested fix
const kHttpVersionMap = { '__proto__': null, '1.0': nsolid_consts.kHttpVersion10, '1.1': nsolid_consts.kHttpVersion11, '2': nsolid_consts.kHttpVersion2, + '2.0': nsolid_consts.kHttpVersion2, };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/internal/nsolid_http_consts.js` around lines 21 - 26, kHttpVersionMap is missing the "2.0" key so raw httpVersion values of "2.0" map to kHttpVersionOther; add a "2.0": nsolid_consts.kHttpVersion2 entry to kHttpVersionMap (the object defined as kHttpVersionMap) so both "2" and "2.0" resolve to the HTTP/2 enum value (nsolid_consts.kHttpVersion2).lib/internal/nsolid_diag.js (1)
381-385:⚠️ Potential issue | 🟠 MajorCompose the mounted prefix into Express
http.route.
request.route.pathis only the leaf pattern. With mounted routers, Line 384 dropsrequest.baseUrl, so different handlers can collapse into the same route bucket.🧩 Suggested fix
- request[nsolid_route_s] = request.route.path; + request[nsolid_route_s] = + `${request.baseUrl || ''}${request.route.path}`;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/internal/nsolid_diag.js` around lines 381 - 385, The current dc.subscribe handler stores only request.route.path into the nsolid_route_s symbol, losing mounted router prefixes from request.baseUrl; update the handler in dc.subscribe('http.server.response.finish', ...) so it composes the full route by prepending request.baseUrl (if present) to request.route.path before assigning request[nsolid_route_s], e.g., combine (request.baseUrl || '') with request.route.path and normalize to avoid duplicate slashes so mounted routes are preserved in the unified nsolid_route_s value.src/nsolid/nsolid_api.cc (2)
1276-1281:⚠️ Potential issue | 🔴 CriticalSynchronize
gen_ptiles_intervalacross threads.Line 1280 writes this global from
EnvList::UpdateConfig(), while Lines 1885-1887 read it onthread_loop_without any lock or atomic. That is a real C++ data race, and on 32-bit builds the value can tear.Also applies to: 1882-1888
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nsolid/nsolid_api.cc` around lines 1276 - 1281, The global gen_ptiles_interval is written in EnvList::UpdateConfig() and read in thread_loop_, causing a data race and potential tearing; make access thread-safe by either changing gen_ptiles_interval to std::atomic<uint64_t> and using atomic load/store where EnvList::UpdateConfig() assigns it and where thread_loop_ reads it, or protect both the writer in EnvList::UpdateConfig() and the readers in thread_loop_ with the same mutex; update all references to use the chosen synchronization (atomic methods or lock/unlock) so both update and read sites consistently use the same mechanism.
136-146:⚠️ Potential issue | 🟠 MajorPropagate the client URL scheme end-to-end.
Line 144 only emits
url.schemein the server branch, and Line 2452 still hardcodesHttpUrlScheme::kOtherfor client buckets. HTTP and HTTPS client requests therefore collapse into the samehttp.client.request.durationseries.Also applies to: 2440-2456
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nsolid/nsolid_api.cc` around lines 136 - 146, The client-side code is not propagating the HTTP URL scheme: ensure you insert url.scheme for client requests the same way as server requests by using HttpUrlSchemeToString(http.url_scheme) and attrs.insert({"url.scheme", std::string(scheme)}) when handling client metrics (MetricsStream::Type::kHttpClient branch or the client-bucket creation path), and when creating client buckets replace the hardcoded HttpUrlScheme::kOther with the actual http.url_scheme so that http.client.request.duration series distinguish HTTP vs HTTPS.test/integrations/express/v5/test-http-route.mjs (2)
19-23:⚠️ Potential issue | 🟡 MinorFail fast when the integration deps are missing.
Exiting
0here turns a broken setup into a green run, so CI can silently stop exercising the Express v5 route coverage added in this PR.Minimal fix
-// Skip test if dependencies not installed -if (!existsSync(join(__dirname, 'node_modules'))) { - console.log('SKIP: node_modules not found. Run "make test-integrations-prereqs" first.'); - process.exit(0); -} +assert.ok( + existsSync(join(__dirname, 'node_modules')), + 'Missing integration deps. Run "make test-integrations-prereqs" first.', +);Based on learnings, the nsolid test suite expects prerequisites to be prepared ahead of time rather than silently skipped.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/integrations/express/v5/test-http-route.mjs` around lines 19 - 23, The test currently checks for the presence of node_modules using existsSync(join(__dirname, 'node_modules')) and calls process.exit(0) to skip when missing; change this to fail fast by exiting with a non-zero status (e.g., process.exit(1)) or throw an Error instead so CI marks the run as failing when integration dependencies are not installed; update the branch that uses existsSync/join/__dirname to call process.exit(1) (or throw new Error('Missing integration prerequisites: run "make test-integrations-prereqs"')) and ensure the log message remains descriptive.
31-66:⚠️ Potential issue | 🟡 MinorWalk every OTLP resource/scope batch here.
This helper only inspects
resourceMetrics[0].scopeMetrics[0], so it will miss the expected histogram when another resource or scope batch is emitted first.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/integrations/express/v5/test-http-route.mjs` around lines 31 - 66, The helper checkHttpRouteAttribute only inspects resourceMetrics[0].scopeMetrics[0], which misses metrics in other resource/scope batches; update checkHttpRouteAttribute to loop over all resourceMetrics and for each resource loop over its scopeMetrics before iterating metrics, keeping existing logic that filters metric.name === 'http.server.request.duration' and metric.data === 'exponentialHistogram', then process metric.exponentialHistogram.dataPoints, use getAttr to find 'http.route' and add unique route strings into foundRoutes (use the same expectedRoutes and foundRoutes checks).agents/grpc/src/grpc_agent.cc (1)
955-983:⚠️ Potential issue | 🟠 MajorDrop queued samples for threads that are already gone.
env_msg_cb_()now erases the cache on deletion, but this loop will still export and reinsert anyExtMetricsStorthat was already queued for the samethread_id. That resurrects dead threads inthr_metrics_cache_and can leak stale histogram points into latermetricsresponses.Minimal fix
while (agent->thr_metrics_msg_q_.dequeue(ext_stor)) { auto& stor = ext_stor.stor; + if (agent->env_metrics_map_.find(stor.thread_id) == + agent->env_metrics_map_.end()) { + continue; + } otlp::fill_env_metrics(metrics, stor, false); uint64_t thread_id = stor.thread_id;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/grpc/src/grpc_agent.cc` around lines 955 - 983, The loop must drop queued ExtMetricsStor samples for thread_ids that were deleted (to avoid resurrecting them); after dequeuing ext_stor and extracting thread_id, check whether agent->thr_metrics_cache_.contains(thread_id) (or find(thread_id) != end()) and if it is not present, continue (skip processing and do not insert into thr_metrics_hist_prev_end_ts_ms_ or thr_metrics_cache_). Update the block around agent->thr_metrics_msg_q_.dequeue/auto& stor to bail out early for missing thread IDs so we never reinsert or update CachedThreadMetrics for dead threads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/internal/nsolid_diag.js`:
- Around line 184-190: The negotiated protocol (h1 vs 2) captured in the
undici:client:connected handler is only stored on the span but not propagated to
requests, causing nsolidApi.pushClientBucket in undici:request:headers to always
use kHttpVersion11; update the undici:request:create path to copy the negotiated
protocol (e.g. set request.__nsolidNegotiatedHttpVersion or similar) onto the
request object when the client connected handler determines it, then read that
property in the undici:request:headers handler and pass the corresponding
protocol constant (instead of kHttpVersion11) into nsolidApi.pushClientBucket so
HTTP/2 requests are categorized correctly.
In `@test/integrations/fastify/v5/test-http-route.mjs`:
- Around line 110-125: The metrics wait currently never times out and only
closes servers after resolve; wrap the existing metricsPromise (the code that
listens on grpcServer.on('metrics'...) and resolves when foundRoutes.size ===
expectedRoutes.size) in a Promise.race against a timeout promise that rejects
after a sensible timeout, and ensure you await the raced promise inside a
try/catch/finally where the finally always calls fastify.close() and
grpcServer.close() (and handles errors from close). Apply the same pattern to
the second similar block around lines 145-151 so both metric waits will timeout
and always teardown the servers.
---
Duplicate comments:
In `@agents/grpc/src/grpc_agent.cc`:
- Around line 955-983: The loop must drop queued ExtMetricsStor samples for
thread_ids that were deleted (to avoid resurrecting them); after dequeuing
ext_stor and extracting thread_id, check whether
agent->thr_metrics_cache_.contains(thread_id) (or find(thread_id) != end()) and
if it is not present, continue (skip processing and do not insert into
thr_metrics_hist_prev_end_ts_ms_ or thr_metrics_cache_). Update the block around
agent->thr_metrics_msg_q_.dequeue/auto& stor to bail out early for missing
thread IDs so we never reinsert or update CachedThreadMetrics for dead threads.
In `@lib/internal/nsolid_diag.js`:
- Around line 381-385: The current dc.subscribe handler stores only
request.route.path into the nsolid_route_s symbol, losing mounted router
prefixes from request.baseUrl; update the handler in
dc.subscribe('http.server.response.finish', ...) so it composes the full route
by prepending request.baseUrl (if present) to request.route.path before
assigning request[nsolid_route_s], e.g., combine (request.baseUrl || '') with
request.route.path and normalize to avoid duplicate slashes so mounted routes
are preserved in the unified nsolid_route_s value.
In `@lib/internal/nsolid_http_consts.js`:
- Around line 21-26: kHttpVersionMap is missing the "2.0" key so raw httpVersion
values of "2.0" map to kHttpVersionOther; add a "2.0":
nsolid_consts.kHttpVersion2 entry to kHttpVersionMap (the object defined as
kHttpVersionMap) so both "2" and "2.0" resolve to the HTTP/2 enum value
(nsolid_consts.kHttpVersion2).
In `@Makefile`:
- Line 608: The JS-only CI target test-ci-js is missing the new integration
prereq so update the Makefile to add test-integrations-prereqs to test-ci-js's
dependency list (mirror what you did for test-ci); locate the test-ci-js target
in the Makefile and append test-integrations-prereqs to its prerequisites so the
Express/Fastify route tests run real installs instead of hitting the
node_modules skip path.
In `@src/nsolid/nsolid_api.cc`:
- Around line 1276-1281: The global gen_ptiles_interval is written in
EnvList::UpdateConfig() and read in thread_loop_, causing a data race and
potential tearing; make access thread-safe by either changing
gen_ptiles_interval to std::atomic<uint64_t> and using atomic load/store where
EnvList::UpdateConfig() assigns it and where thread_loop_ reads it, or protect
both the writer in EnvList::UpdateConfig() and the readers in thread_loop_ with
the same mutex; update all references to use the chosen synchronization (atomic
methods or lock/unlock) so both update and read sites consistently use the same
mechanism.
- Around line 136-146: The client-side code is not propagating the HTTP URL
scheme: ensure you insert url.scheme for client requests the same way as server
requests by using HttpUrlSchemeToString(http.url_scheme) and
attrs.insert({"url.scheme", std::string(scheme)}) when handling client metrics
(MetricsStream::Type::kHttpClient branch or the client-bucket creation path),
and when creating client buckets replace the hardcoded HttpUrlScheme::kOther
with the actual http.url_scheme so that http.client.request.duration series
distinguish HTTP vs HTTPS.
In `@test/integrations/express/v5/test-http-route.mjs`:
- Around line 19-23: The test currently checks for the presence of node_modules
using existsSync(join(__dirname, 'node_modules')) and calls process.exit(0) to
skip when missing; change this to fail fast by exiting with a non-zero status
(e.g., process.exit(1)) or throw an Error instead so CI marks the run as failing
when integration dependencies are not installed; update the branch that uses
existsSync/join/__dirname to call process.exit(1) (or throw new Error('Missing
integration prerequisites: run "make test-integrations-prereqs"')) and ensure
the log message remains descriptive.
- Around line 31-66: The helper checkHttpRouteAttribute only inspects
resourceMetrics[0].scopeMetrics[0], which misses metrics in other resource/scope
batches; update checkHttpRouteAttribute to loop over all resourceMetrics and for
each resource loop over its scopeMetrics before iterating metrics, keeping
existing logic that filters metric.name === 'http.server.request.duration' and
metric.data === 'exponentialHistogram', then process
metric.exponentialHistogram.dataPoints, use getAttr to find 'http.route' and add
unique route strings into foundRoutes (use the same expectedRoutes and
foundRoutes checks).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f0e376c6-6a33-4adc-ac3f-db1b928c37c0
📒 Files selected for processing (30)
.gitignoreMakefileagents/grpc/src/grpc_agent.ccagents/grpc/src/grpc_agent.hagents/otlp/src/otlp_common.ccagents/otlp/src/otlp_common.hdeps/opentelemetry-cpp/otlp-http-exporter.gyplib/_http_client.jslib/_http_server.jslib/internal/nsolid_diag.jslib/internal/nsolid_http_consts.jssrc/nsolid.hsrc/nsolid/nsolid_api.ccsrc/nsolid/nsolid_api.hsrc/nsolid/nsolid_bindings.hsrc/nsolid/nsolid_metrics_types.hsrc/nsolid/nsolid_trace.htest/addons/nsolid-dispatchqueue/binding.gyptest/addons/nsolid-eventloop-cmd/binding.gyptest/addons/nsolid-log-hooks/binding.gyptest/addons/nsolid-statsdagent/binding.gyptest/agents/test-grpc-http-histograms.mjstest/integrations/express/v4/package.jsontest/integrations/express/v4/test-http-route.mjstest/integrations/express/v5/package.jsontest/integrations/express/v5/test-http-route.mjstest/integrations/fastify/v5/package.jsontest/integrations/fastify/v5/test-http-route.mjstest/integrations/testcfg.pytest/parallel/test-bootstrap-modules.js
🚧 Files skipped from review as they are similar to previous changes (13)
- test/integrations/fastify/v5/package.json
- test/addons/nsolid-log-hooks/binding.gyp
- test/addons/nsolid-dispatchqueue/binding.gyp
- test/integrations/express/v4/test-http-route.mjs
- deps/opentelemetry-cpp/otlp-http-exporter.gyp
- lib/_http_server.js
- test/agents/test-grpc-http-histograms.mjs
- .gitignore
- src/nsolid/nsolid_bindings.h
- agents/otlp/src/otlp_common.h
- test/integrations/testcfg.py
- src/nsolid/nsolid_metrics_types.h
- test/parallel/test-bootstrap-modules.js
| const metricsPromise = new Promise((resolve, reject) => { | ||
| grpcServer.on('metrics', (data) => { | ||
| const found = checkHttpRouteAttribute(data, expectedRoutes); | ||
| for (const route of found) { | ||
| if (!foundRoutes.has(route)) { | ||
| foundRoutes.add(route); | ||
| console.log(`Validated route: ${route} (${foundRoutes.size}/${expectedRoutes.size})`); | ||
| } | ||
| } | ||
|
|
||
| if (foundRoutes.size === expectedRoutes.size) { | ||
| console.log('All routes validated!'); | ||
| resolve(); | ||
| } | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Add a timeout and guaranteed teardown.
metricsPromise never rejects or times out, and fastify.close() / grpcServer.close() only run after a successful resolve. If metric export regresses, this test can hang the job indefinitely and leave both servers open. Please race the wait against a timeout and move teardown into a finally block.
Also applies to: 145-151
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/integrations/fastify/v5/test-http-route.mjs` around lines 110 - 125, The
metrics wait currently never times out and only closes servers after resolve;
wrap the existing metricsPromise (the code that listens on
grpcServer.on('metrics'...) and resolves when foundRoutes.size ===
expectedRoutes.size) in a Promise.race against a timeout promise that rejects
after a sensible timeout, and ensure you await the raced promise inside a
try/catch/finally where the finally always calls fastify.close() and
grpcServer.close() (and handles errors from close). Apply the same pattern to
the second similar block around lines 145-151 so both metric waits will timeout
and always teardown the servers.
760c373 to
988f7a6
Compare
Introduce base2 exponential histograms for HTTP client and server latency metrics in the NSolid core and wire them into the OTLP exporter. Maintain per-attribute aggregation using OTLP attributes hash maps and expose harvested histogram points for export. Update test addon binding.gyp files so they link correctly against the new opentelemetry-cpp dependencies pulled in by nsolid_api.cc.
Extend internal NSolid diagnostics to extract and record the http.route attribute for routes handled by Express and Fastify. Normalize framework route patterns and attach them to the datapoints sent to the native metrics layer so HTTP latency histograms can be broken down by logical route.
Add a dedicated test that generates HTTP client and server traffic, then asserts on the exported histogram data over gRPC/OTLP. Validate bucket counts and attributes to ensure the new HTTP latency histogram pipeline is exercised end to end.
Extend the integrations test harness to discover nested test-*.mjs files and add Express v4/v5 and Fastify v5 scenarios. Update the Makefile to wire these integrations into the existing test targets. For each framework, exercise multiple routes and verify that HTTP server latency histograms include the expected http.route patterns.
988f7a6 to
00e8e75
Compare
There was a problem hiding this comment.
Actionable comments posted: 8
♻️ Duplicate comments (2)
src/nsolid/nsolid_api.h (1)
414-417:⚠️ Potential issue | 🟠 MajorRaise the route histogram cardinality limit.
AttributesHashMap(100)is a hard series cap, not just preallocation. With{method, route, scheme, host, port}keys, 100 distinct combinations is easy to hit; once that happens, new series fold into the overflow bucket and the route breakdown this PR adds degrades quickly. Please make this configurable or raise it substantially.Expected: the constructor argument maps to the cardinality/max-size limit, and the search finds both HTTP maps pinned at
100.🔎 Verification script
#!/bin/bash set -euo pipefail mapfile -t files < <(fd 'attributes_hashmap\.(h|cc)$') if [ "${`#files`[@]}" -eq 0 ]; then echo "attributes_hashmap.{h,cc} not found" >&2 exit 1 fi echo "== AttributesHashMap definition ==" printf '%s\n' "${files[@]}" | xargs rg -n -C4 'class AttributesHashMap|AttributesHashMap\s*\(' echo echo "== Overflow / cardinality handling ==" printf '%s\n' "${files[@]}" | xargs rg -n -C4 'overflow|max_size_|cardinality|GetOrSetDefault' echo echo "== NSolid hard-coded limits ==" rg -n -C2 'AttributesHashMap\s*\(\s*100\s*\)' src🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nsolid/nsolid_api.h` around lines 414 - 417, The hard cap of 100 used when constructing the two HTTP attribute maps (http_client_hashmap_ and http_server_hashmap_) makes AttributesHashMap(100) act as a strict cardinality limit and will cause new {method,route,scheme,host,port} series to fold into the overflow bucket; update the code that constructs http_client_hashmap_ and http_server_hashmap_ to use a much larger default (or read a configurable limit) instead of the literal 100 so the constructor argument reflects the intended max-size/cardinality, e.g. replace the fixed 100 with a constant or config value and wire that into the AttributesHashMap(...) calls so both maps no longer use the hard-coded 100 limit.agents/grpc/src/grpc_agent.cc (1)
955-983:⚠️ Potential issue | 🟠 MajorDrop queued samples for deleted threads.
env_msg_cb_()erases the dead thread from the live maps, but this loop can immediately export and reinsert it if anExtMetricsStorharvested just before deletion is still queued. Guard the export/cache path with a live-thread check.🛠️ Minimal fix
while (agent->thr_metrics_msg_q_.dequeue(ext_stor)) { auto& stor = ext_stor.stor; + uint64_t thread_id = stor.thread_id; + if (agent->env_metrics_map_.find(thread_id) == + agent->env_metrics_map_.end()) { + continue; + } + otlp::fill_env_metrics(metrics, stor, false); - uint64_t thread_id = stor.thread_id; uint64_t start_ts_ms = 0;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/grpc/src/grpc_agent.cc` around lines 955 - 983, The loop exporting ExtMetricsStor from agent->thr_metrics_msg_q_.dequeue must skip any queued sample whose thread was deleted; before calling otlp::fill_env_metrics, otlp::fill_http_histograms, and before inserting into agent->thr_metrics_hist_prev_end_ts_ms_ or agent->thr_metrics_cache_, check that the thread_id is still live (e.g., verify agent->IsThreadLive(thread_id) or agent->thr_live_map_.contains(thread_id)); if the thread is not live, continue to the next dequeued item and do not reinsert/cached it.
🧹 Nitpick comments (1)
src/nsolid.h (1)
853-889: Pin the enum values explicitly.The
FastPush*Bucketimplementations insrc/nsolid/nsolid_api.cccast rawuint32_tcodes from JS into these enums. Right now that JS/native contract depends on implicit ordinal ordering, so a future reorder or insertion here will silently corrupt the exported attributes. Assign explicit numeric values that match the JS constants.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nsolid.h` around lines 853 - 889, The enums HttpMethod, HttpProtocolVersion, and HttpUrlScheme must have explicit numeric values to prevent JS/native ordinal coupling; update each enum to assign explicit integer values for every enumerator (e.g., kGet=0, kHead=1, ... and similarly for HttpProtocolVersion and HttpUrlScheme) so the values match the JS constants used by the FastPush*Bucket code in src/nsolid/nsolid_api.cc; ensure no gaps or reorders are left that would change the numeric mapping expected by the JS side and keep HttpDatapointAttrs and DatapointAttrs unchanged aside from relying on the pinned enum values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/internal/nsolid_diag.js`:
- Around line 156-171: The socket version is being stored as raw strings ('1.1'
/ '2') which breaks histogram bucket attribution—update the undici handlers to
store the canonical kHttpVersion* constants instead: in the
'undici:client:connected' subscriber (where socketVersionMap is set) map
connectParams.version to kHttpVersion11 or kHttpVersion2 (use the existing
constant names) and store those on socketVersionMap; in the
'undici:client:sendHeaders' subscriber, set request[nsolid_net_prot_s] to the
same kHttpVersion* constant retrieved from socketVersionMap so downstream
pushClientBucket() and other histogram paths receive the canonical constants
rather than raw strings.
- Around line 3-5: The code imports Number from primordials but never uses it
and later constructs a regular WeakMap; change the import to pull SafeWeakMap
from primordials instead of Number (remove Number) and replace the instantiation
new WeakMap() with new SafeWeakMap() (referencing the SafeWeakMap symbol and the
place where WeakMap is constructed) so the module uses the primordials-safe
WeakMap implementation.
- Around line 392-401: The handler registered in
dc.subscribe('http.server.response.finish') assumes request.route.path is a
string and uses startsWith/slice; guard against non-string Express routes by
checking typeof routePath === 'string' (where routePath = request.route.path)
before using string helpers, and if it's not a string skip composing fullPath or
set request[nsolid_route_s] to a safe fallback; ensure you still reference
baseUrl and assign request[nsolid_route_s] only when routePath is a string to
avoid TypeError.
In `@src/nsolid/nsolid_api.cc`:
- Around line 3565-3569: The macro NSOLID_HTTP_VERSION_CONSTS has a line for
V(kHttpVersion20, MetricsStream::HttpProtocolVersion::k2) that exceeds 80
characters; split that macro entry into two physical lines so the long qualified
enum name is wrapped (e.g., break after the comma with a backslash so the entry
becomes V(kHttpVersion20, \n MetricsStream::HttpProtocolVersion::k2) ) to
keep the line length ≤80 while preserving the same token sequence and
identifiers.
In `@test/agents/test-grpc-http-histograms.mjs`:
- Around line 200-258: runTest currently never cleans up if histograms never
arrive; wrap the metrics-wait logic in a timeout and ensure teardown always runs
in a finally block: start a Promise.race between the existing success path and a
timeout that rejects/flags failure, and move clearInterval(trafficTimer),
child.shutdown(...), and grpcServer.close() into a finally so they run whether
the test succeeds, times out, or errors; modify references in the runTest
function (GRPCServer, TestClient, trafficTimer, mustCallAtLeast handler, and the
code that calls checkHistogramMetrics/collectHistogramWindows) so cleanup always
executes and the Promise resolves or rejects when the timeout fires.
- Around line 124-128: The helpers currently only inspect
resourceMetrics[0].scopeMetrics[0] and metrics[0], which misses histograms in
later resource/scope batches; update both histogram helper functions in this
file to iterate over all resourceMetrics and for each resource iterate all
scopeMetrics, then iterate each scope's metrics array to find and process the
histogram metrics (use the existing variable names resourceMetrics,
scopeMetrics, metrics to locate the code), aggregating/returning the first
matching histogram or the expected collection instead of assuming index 0; apply
the same change to the other helper (the block around lines 167-175) so both
walk every resource/scope/metric batch.
In `@test/integrations/express/v5/test-http-route.mjs`:
- Around line 134-183: The test currently awaits metricsPromise and only runs
teardown after it resolves, which can hang; modify the test so the wait for
metrics (metricsPromise) is raced against a timeout Promise (e.g., Promise.race
with a timer that rejects or resolves after a configurable ms) and ensure
cleanup always runs by moving connection teardown and server/grpcServer close
logic into a finally block that executes regardless of whether metricsPromise
resolved or timed out; reference the existing symbols metricsPromise,
grpcServer, server, connections and the fetch request sequence so the timeout
covers the same wait and the finally performs conn.destroy() for each
connections and calls server.close()/grpcServer.close().
In `@test/integrations/fastify/v5/test-http-route.mjs`:
- Around line 19-23: Replace the silent successful exit when integration deps
are missing: in the block that checks existsSync(join(__dirname,
'node_modules')) (the pre-check in test-http-route.mjs), stop calling
process.exit(0) and instead fail the test by throwing an Error or using an
assertion (e.g., throw new Error or assert.fail) with a clear message like
"SKIP: node_modules not found. Run 'make test-integrations-prereqs' first." so
CI surfaces the missing prerequisites as a failure rather than a silent pass.
---
Duplicate comments:
In `@agents/grpc/src/grpc_agent.cc`:
- Around line 955-983: The loop exporting ExtMetricsStor from
agent->thr_metrics_msg_q_.dequeue must skip any queued sample whose thread was
deleted; before calling otlp::fill_env_metrics, otlp::fill_http_histograms, and
before inserting into agent->thr_metrics_hist_prev_end_ts_ms_ or
agent->thr_metrics_cache_, check that the thread_id is still live (e.g., verify
agent->IsThreadLive(thread_id) or agent->thr_live_map_.contains(thread_id)); if
the thread is not live, continue to the next dequeued item and do not
reinsert/cached it.
In `@src/nsolid/nsolid_api.h`:
- Around line 414-417: The hard cap of 100 used when constructing the two HTTP
attribute maps (http_client_hashmap_ and http_server_hashmap_) makes
AttributesHashMap(100) act as a strict cardinality limit and will cause new
{method,route,scheme,host,port} series to fold into the overflow bucket; update
the code that constructs http_client_hashmap_ and http_server_hashmap_ to use a
much larger default (or read a configurable limit) instead of the literal 100 so
the constructor argument reflects the intended max-size/cardinality, e.g.
replace the fixed 100 with a constant or config value and wire that into the
AttributesHashMap(...) calls so both maps no longer use the hard-coded 100
limit.
---
Nitpick comments:
In `@src/nsolid.h`:
- Around line 853-889: The enums HttpMethod, HttpProtocolVersion, and
HttpUrlScheme must have explicit numeric values to prevent JS/native ordinal
coupling; update each enum to assign explicit integer values for every
enumerator (e.g., kGet=0, kHead=1, ... and similarly for HttpProtocolVersion and
HttpUrlScheme) so the values match the JS constants used by the FastPush*Bucket
code in src/nsolid/nsolid_api.cc; ensure no gaps or reorders are left that would
change the numeric mapping expected by the JS side and keep HttpDatapointAttrs
and DatapointAttrs unchanged aside from relying on the pinned enum values.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9a5b38f9-22a3-48a4-ac37-821ad46773ee
📒 Files selected for processing (30)
.gitignoreMakefileagents/grpc/src/grpc_agent.ccagents/grpc/src/grpc_agent.hagents/otlp/src/otlp_common.ccagents/otlp/src/otlp_common.hdeps/opentelemetry-cpp/otlp-http-exporter.gyplib/_http_client.jslib/_http_server.jslib/internal/nsolid_diag.jslib/internal/nsolid_http_consts.jssrc/nsolid.hsrc/nsolid/nsolid_api.ccsrc/nsolid/nsolid_api.hsrc/nsolid/nsolid_bindings.hsrc/nsolid/nsolid_metrics_types.hsrc/nsolid/nsolid_trace.htest/addons/nsolid-dispatchqueue/binding.gyptest/addons/nsolid-eventloop-cmd/binding.gyptest/addons/nsolid-log-hooks/binding.gyptest/addons/nsolid-statsdagent/binding.gyptest/agents/test-grpc-http-histograms.mjstest/integrations/express/v4/package.jsontest/integrations/express/v4/test-http-route.mjstest/integrations/express/v5/package.jsontest/integrations/express/v5/test-http-route.mjstest/integrations/fastify/v5/package.jsontest/integrations/fastify/v5/test-http-route.mjstest/integrations/testcfg.pytest/parallel/test-bootstrap-modules.js
🚧 Files skipped from review as they are similar to previous changes (12)
- test/integrations/fastify/v5/package.json
- test/integrations/testcfg.py
- lib/_http_client.js
- test/addons/nsolid-eventloop-cmd/binding.gyp
- lib/_http_server.js
- test/addons/nsolid-log-hooks/binding.gyp
- test/addons/nsolid-statsdagent/binding.gyp
- test/integrations/express/v4/package.json
- deps/opentelemetry-cpp/otlp-http-exporter.gyp
- test/integrations/express/v4/test-http-route.mjs
- test/addons/nsolid-dispatchqueue/binding.gyp
- test/integrations/express/v5/package.json
| const { | ||
| Number, | ||
| } = primordials; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
head -70 lib/internal/nsolid_diag.jsRepository: nodesource/nsolid
Length of output: 1569
🏁 Script executed:
# Check if SafeWeakMap exists in the codebase
rg -r 'SafeWeakMap' --type js -A 2 | head -50Repository: nodesource/nsolid
Length of output: 457
🏁 Script executed:
# Search for how primordials are typically used in the codebase
rg 'const \{[^}]*\} = primordials' --type js | head -20Repository: nodesource/nsolid
Length of output: 2294
🏁 Script executed:
# Check for WeakMap usage patterns
rg 'new WeakMap' --type js | head -20Repository: nodesource/nsolid
Length of output: 1788
🏁 Script executed:
# Check if SafeWeakMap exists and how it's used
rg 'SafeWeakMap' --type js -B 2 -A 2Repository: nodesource/nsolid
Length of output: 8359
🏁 Script executed:
# Check lines 59-60 specifically in the file
sed -n '55,65p' lib/internal/nsolid_diag.jsRepository: nodesource/nsolid
Length of output: 424
🏁 Script executed:
# Look for what primordials exports, especially Number
fd primordials.js --type js | head -5Repository: nodesource/nsolid
Length of output: 230
🏁 Script executed:
# Check eslint rules related to primordials
fd -e js -e json . | xargs grep -l "prefer-primordials\|alphabetize-primordials" | grep -E "(eslint|rules)" | head -10Repository: nodesource/nsolid
Length of output: 781
🏁 Script executed:
# Check if there's a SafeWeakMap export or definition
rg 'SafeWeakMap' --type jsRepository: nodesource/nsolid
Length of output: 1983
🏁 Script executed:
# Check what's actually at lines 59-60
sed -n '59,60p' lib/internal/nsolid_diag.jsRepository: nodesource/nsolid
Length of output: 172
🏁 Script executed:
# Look at primordials export to see if Number is exported
cat lib/internal/per_context/primordials.js | grep -E "(Number|exports)" | head -30Repository: nodesource/nsolid
Length of output: 72
🏁 Script executed:
# Search for Number in primordials more broadly
rg 'Number' lib/internal/per_context/primordials.js -B 1 -A 1Repository: nodesource/nsolid
Length of output: 93
🏁 Script executed:
# Check eslint rules configuration
find . -name ".eslintrc*" -o -name "eslint.config.*" | head -5Repository: nodesource/nsolid
Length of output: 267
🏁 Script executed:
# Check the eslint rules file to see if Number primordial usage is flagged
cat ./agents/.eslintrc.yaml | head -100Repository: nodesource/nsolid
Length of output: 6418
🏁 Script executed:
# Check if there's a specific rule that flags Number specifically
rg 'Number' tools/eslint-rules/prefer-primordials.js -B 3 -A 3Repository: nodesource/nsolid
Length of output: 43
🏁 Script executed:
# Check the eslint rules directory
find . -path ./deps -prune -o -path ./test -prune -o -name "*prefer-primordials*" -type f -printRepository: nodesource/nsolid
Length of output: 103
🏁 Script executed:
# Examine the prefer-primordials rule to understand what it flags
cat tools/eslint-rules/prefer-primordials.js | head -150Repository: nodesource/nsolid
Length of output: 4109
🏁 Script executed:
# Search for specific handling of Number in the rule
rg 'Number' tools/eslint-rules/prefer-primordials.js -B 2 -A 2Repository: nodesource/nsolid
Length of output: 43
🏁 Script executed:
# Check if nsolid_diag.js file has any lint exemptions
head -20 lib/internal/nsolid_diag.js | grep -E "(eslint|lint)"Repository: nodesource/nsolid
Length of output: 43
🏁 Script executed:
# Search for how other files use Number from primordials
rg 'const \{[^}]*Number[^}]*\} = primordials' --type js -A 1 -B 1Repository: nodesource/nsolid
Length of output: 1136
🏁 Script executed:
# Check if Number specifically is flagged in any test cases
rg 'Number.*primordials\|primordials.*Number' test/parallel/test-eslint-prefer-primordials.js -B 2 -A 2Repository: nodesource/nsolid
Length of output: 43
🏁 Script executed:
# Look for what the prefer-primordials rule config is for this project
rg 'prefer-primordials' .eslintrc.js -B 5 -A 5Repository: nodesource/nsolid
Length of output: 113
🏁 Script executed:
# Check the root directory for eslint config
ls -la | grep eslintRepository: nodesource/nsolid
Length of output: 130
🏁 Script executed:
# Find the actual eslint configuration
find . -maxdepth 2 -name "*eslint*" -o -name ".eslintignore"Repository: nodesource/nsolid
Length of output: 275
🏁 Script executed:
# Check the eslint.config.mjs for prefer-primordials rules
cat eslint.config.mjs | head -200Repository: nodesource/nsolid
Length of output: 7187
🏁 Script executed:
# Look more carefully at the test case to understand the Number issue
cat test/parallel/test-eslint-prefer-primordials.js | grep -A 15 "const { Number } = primordials"Repository: nodesource/nsolid
Length of output: 560
🏁 Script executed:
# Check if nsolid_diag.js is even subject to eslint
cat lib/eslint.config_partial.mjs | head -50Repository: nodesource/nsolid
Length of output: 2112
🏁 Script executed:
# Search for how Number is actually used in nsolid_diag.js
rg 'Number\.' lib/internal/nsolid_diag.js -B 2 -A 2Repository: nodesource/nsolid
Length of output: 43
🏁 Script executed:
# Get the full context around lines 3-10 to see the actual usage
sed -n '1,100p' lib/internal/nsolid_diag.js | grep -A 50 "const {"Repository: nodesource/nsolid
Length of output: 2762
Replace new WeakMap() with new SafeWeakMap() at line 56.
The WeakMap constructor should use the safe version from primordials, consistent with the codebase pattern. Separately, the Number import is unused and can be removed.
🧰 Tools
🪛 Biome (2.4.6)
[error] 4-4: Do not shadow the global "Number" property.
(lint/suspicious/noShadowRestrictedNames)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/internal/nsolid_diag.js` around lines 3 - 5, The code imports Number from
primordials but never uses it and later constructs a regular WeakMap; change the
import to pull SafeWeakMap from primordials instead of Number (remove Number)
and replace the instantiation new WeakMap() with new SafeWeakMap() (referencing
the SafeWeakMap symbol and the place where WeakMap is constructed) so the module
uses the primordials-safe WeakMap implementation.
| dc.subscribe('undici:client:connected', ({ connectParams, socket }) => { | ||
| const api = getApi(); | ||
| const span = api.trace.getSpan(api.context.active()); | ||
| const version = connectParams.version === 'h1' ? '1.1' : '2'; | ||
| // Store version on socket for later retrieval in sendHeaders | ||
| socketVersionMap.set(socket, version); | ||
| if (span && span.type === kSpanHttpClient) { | ||
| const version = connectParams.version === 'h1' ? '1.1' : '2'; | ||
| span._pushSpanDataString(kSpanHttpProtocolVersion, version); | ||
| return api.context.active(); | ||
| } | ||
| }); | ||
|
|
||
| dc.subscribe('undici:client:sendHeaders', ({ request, socket }) => { | ||
| const version = socketVersionMap.get(socket); | ||
| request[nsolid_net_prot_s] = version; | ||
| }); |
There was a problem hiding this comment.
Don't cache span version strings in the histogram path.
This stores '1.1' / '2' on the socket and then passes that value into pushClientBucket(), while the other bucket paths use kHttpVersion* constants. That makes Undici protocol attribution depend on coercion and is also why kHttpVersion11 is currently dead.
Suggested fix
dc.subscribe('undici:client:connected', ({ connectParams, socket }) => {
const api = getApi();
const span = api.trace.getSpan(api.context.active());
const version = connectParams.version === 'h1' ? '1.1' : '2';
+ const versionEnum = connectParams.version === 'h1' ?
+ kHttpVersion11 :
+ kHttpVersion2;
// Store version on socket for later retrieval in sendHeaders
- socketVersionMap.set(socket, version);
+ socketVersionMap.set(socket, versionEnum);
if (span && span.type === kSpanHttpClient) {
span._pushSpanDataString(kSpanHttpProtocolVersion, version);
return api.context.active();
}
});Also applies to: 195-201
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/internal/nsolid_diag.js` around lines 156 - 171, The socket version is
being stored as raw strings ('1.1' / '2') which breaks histogram bucket
attribution—update the undici handlers to store the canonical kHttpVersion*
constants instead: in the 'undici:client:connected' subscriber (where
socketVersionMap is set) map connectParams.version to kHttpVersion11 or
kHttpVersion2 (use the existing constant names) and store those on
socketVersionMap; in the 'undici:client:sendHeaders' subscriber, set
request[nsolid_net_prot_s] to the same kHttpVersion* constant retrieved from
socketVersionMap so downstream pushClientBucket() and other histogram paths
receive the canonical constants rather than raw strings.
| dc.subscribe('http.server.response.finish', ({ request, response }) => { | ||
| if (request?.route?.path && !request[nsolid_route_s]) { | ||
| // Compose full route path including mounted router prefixes | ||
| const baseUrl = request.baseUrl || ''; | ||
| const routePath = request.route.path; | ||
| // Avoid double slashes when concatenating (e.g., baseUrl ends with / and routePath starts with /) | ||
| const fullPath = baseUrl.endsWith('/') && routePath.startsWith('/') | ||
| ? baseUrl + routePath.slice(1) | ||
| : baseUrl + routePath; | ||
| request[nsolid_route_s] = fullPath; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the actual code in the file
cat -n lib/internal/nsolid_diag.js | sed -n '385,410p'Repository: nodesource/nsolid
Length of output: 1033
🌐 Web query:
Express.js req.route.path type regex array
💡 Result:
- In Express,
req.routeis the currently matched route object, and itspathis the original route “pattern” (e.g.'/user/:id?') — i.e.req.route.pathis the route path you registered, not the URL path (req.path). (express.js.cn) - If you need the compiled regular expression Express matched with, use
req.route.regexp(it’s shown alongsidepathin thereq.routeobject). (express.js.cn) - Express route “paths” (what you pass as the first argument to
app.get,router.get, etc.) can be strings or regular expressions. (expressjs.com) - Express also allows that “path” argument to be an array of strings/regexes (
string | RegExp | Array<string | RegExp>in the TypeScript defs). (jsdocs.io)- In practice, once a request is matched,
req.route.pathrefers to the single matched route’s path pattern (not an array). (express.js.cn)
- In practice, once a request is matched,
Citations:
- 1: https://express.js.cn/en/4x/api.html
- 2: https://express.js.cn/en/4x/api.html
- 3: https://expressjs.com/en/guide/routing.html
- 4: https://www.jsdocs.io/package/%40types/express-serve-static-core
- 5: https://express.js.cn/en/4x/api.html
🌐 Web query:
Express req.route.path RegExp type when route registered regex
💡 Result:
In Express, req.route.path is the original “path” you registered the route with. Since Express route paths can be regular expressions, if you registered the route with a regex, then req.route.path will be a RegExp object (so req.route.path instanceof RegExp === true). [1], [3]
Type-wise (TypeScript), Express route “path” params are defined as string | RegExp | Array<string | RegExp>, which matches that behavior. [2]
Sources
[1] (expressjs.com)
[2] (app.unpkg.com)
[3] (gist.github.com)
Citations:
- 1: https://expressjs.com/en/guide/routing.html?utm_source=openai
- 2: https://app.unpkg.com/%40types/express-serve-static-core%404.17.41/files/index.d.ts
- 3: https://gist.github.com/7126223?utm_source=openai
🏁 Script executed:
# Let's also check if there are any type definitions or tests in the nsolid repo
# that might clarify this expectation
rg -t js "route\.path" lib/internal/nsolid_diag.js -B 3 -A 3Repository: nodesource/nsolid
Length of output: 706
Guard against non-string Express routes before using string helpers.
req.route.path can be a RegExp object if the route was registered with a regex pattern. Calling .startsWith() and .slice() on a RegExp will throw a TypeError. Check that routePath is a string before calling these methods, or skip non-string routes entirely.
🧰 Tools
🪛 GitHub Check: lint-js-and-md
[failure] 400-400:
':' should be placed at the end of the line
[failure] 399-399:
'?' should be placed at the end of the line
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/internal/nsolid_diag.js` around lines 392 - 401, The handler registered
in dc.subscribe('http.server.response.finish') assumes request.route.path is a
string and uses startsWith/slice; guard against non-string Express routes by
checking typeof routePath === 'string' (where routePath = request.route.path)
before using string helpers, and if it's not a string skip composing fullPath or
set request[nsolid_route_s] to a safe fallback; ensure you still reference
baseUrl and assign request[nsolid_route_s] only when routePath is a string to
avoid TypeError.
| #define NSOLID_HTTP_VERSION_CONSTS(V) \ | ||
| V(kHttpVersion10, MetricsStream::HttpProtocolVersion::k10) \ | ||
| V(kHttpVersion11, MetricsStream::HttpProtocolVersion::k11) \ | ||
| V(kHttpVersion20, MetricsStream::HttpProtocolVersion::k2) \ | ||
| V(kHttpVersionOther, MetricsStream::HttpProtocolVersion::kOther) |
There was a problem hiding this comment.
Fix line length to pass CI.
Line 3568 exceeds 80 characters, causing the make lint-cpp pipeline failure.
🔧 Proposed fix
`#define` NSOLID_HTTP_VERSION_CONSTS(V) \
V(kHttpVersion10, MetricsStream::HttpProtocolVersion::k10) \
V(kHttpVersion11, MetricsStream::HttpProtocolVersion::k11) \
- V(kHttpVersion20, MetricsStream::HttpProtocolVersion::k2) \
+ V(kHttpVersion20, MetricsStream::HttpProtocolVersion::k2) \
V(kHttpVersionOther, MetricsStream::HttpProtocolVersion::kOther)🧰 Tools
🪛 GitHub Actions: Linters
[error] 3568-3568: make lint-cpp failed: cpplint reported line_length error in src/nsolid/nsolid_api.cc at line 3568 (Lines should be <= 80 characters long [whitespace/line_length]).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/nsolid/nsolid_api.cc` around lines 3565 - 3569, The macro
NSOLID_HTTP_VERSION_CONSTS has a line for V(kHttpVersion20,
MetricsStream::HttpProtocolVersion::k2) that exceeds 80 characters; split that
macro entry into two physical lines so the long qualified enum name is wrapped
(e.g., break after the comma with a backslash so the entry becomes
V(kHttpVersion20, \n MetricsStream::HttpProtocolVersion::k2) ) to keep the
line length ≤80 while preserving the same token sequence and identifiers.
| const scopeMetrics = resourceMetrics[0].scopeMetrics; | ||
| if (!scopeMetrics || scopeMetrics.length === 0) return null; | ||
|
|
||
| const metrics = scopeMetrics[0].metrics; | ||
| if (!metrics) return null; |
There was a problem hiding this comment.
Walk every resource/scope batch in both histogram helpers.
Both helpers stop at resourceMetrics[0].scopeMetrics[0]. A valid export can place the HTTP histograms in a later resource or scope group, which makes this test miss real data and either false-fail or hang waiting for continuity.
Also applies to: 167-175
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/agents/test-grpc-http-histograms.mjs` around lines 124 - 128, The
helpers currently only inspect resourceMetrics[0].scopeMetrics[0] and
metrics[0], which misses histograms in later resource/scope batches; update both
histogram helper functions in this file to iterate over all resourceMetrics and
for each resource iterate all scopeMetrics, then iterate each scope's metrics
array to find and process the histogram metrics (use the existing variable names
resourceMetrics, scopeMetrics, metrics to locate the code),
aggregating/returning the first matching histogram or the expected collection
instead of assuming index 0; apply the same change to the other helper (the
block around lines 167-175) so both walk every resource/scope/metric batch.
| // Skip test if dependencies not installed | ||
| if (!existsSync(join(__dirname, 'node_modules'))) { | ||
| console.log('SKIP: node_modules not found. Run "make test-integrations-prereqs" first.'); | ||
| process.exit(0); | ||
| } |
There was a problem hiding this comment.
Fail fast when integration deps are missing.
Use an assertion or throw here instead of process.exit(0). Exiting successfully turns a broken Fastify integration setup into a green run, so CI can silently stop covering this path. Based on learnings, the nsolid test suite expects prerequisites to be prepared ahead of time rather than silently skipped with fs.existsSync guards.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/integrations/fastify/v5/test-http-route.mjs` around lines 19 - 23,
Replace the silent successful exit when integration deps are missing: in the
block that checks existsSync(join(__dirname, 'node_modules')) (the pre-check in
test-http-route.mjs), stop calling process.exit(0) and instead fail the test by
throwing an Error or using an assertion (e.g., throw new Error or assert.fail)
with a clear message like "SKIP: node_modules not found. Run 'make
test-integrations-prereqs' first." so CI surfaces the missing prerequisites as a
failure rather than a silent pass.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (12)
test/integrations/fastify/v5/test-http-route.mjs (2)
19-23:⚠️ Potential issue | 🟡 MinorFail missing integration prereqs instead of exiting successfully.
process.exit(0)turns a broken Fastify setup into a green run, so CI can silently stop exercising this coverage. Throw or assert here instead of skipping.Based on learnings, in the nsolid test suite, native addon bindings are expected to be built beforehand; tests do not add
fs.existsSyncguards to skip when bindings are missing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/integrations/fastify/v5/test-http-route.mjs` around lines 19 - 23, The current check using existsSync(join(__dirname, 'node_modules')) calls process.exit(0) which hides missing integration prerequisites; replace the exit with a failing test path: throw an Error or use an assertion so the test fails when node_modules is absent. Update the guard around existsSync(...) to throw(new Error(...)) or assert.fail(...) with a clear message instead of calling process.exit(0), referencing the existsSync and process.exit usage in this test file so CI fails visibly when prerequisites are missing.
115-156:⚠️ Potential issue | 🟠 MajorBound the route wait and always tear down both servers.
If route export regresses, or one of the request assertions fails first,
await metricsPromisenever reaches the cleanup block and both servers stay open. Put the wait behind a timeout and movefastify.close()/grpcServer.close()intofinally.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/integrations/fastify/v5/test-http-route.mjs` around lines 115 - 156, The test currently awaits metricsPromise without a timeout and performs cleanup after that, which can hang and leave servers running; change the waiting to a bounded wait (e.g., replace await metricsPromise with await Promise.race([metricsPromise, new Promise((_, rej) => setTimeout(() => rej(new Error('metrics timeout')), 5000))]) or similar) and move shutdown into a finally block so both fastify.close() and grpcServer.close() are always called; update the code around metricsPromise, the await of that promise, and the cleanup to ensure the listener/Promise created for metricsPromise still resolves/rejects properly and both fastify.close() and grpcServer.close() are invoked in the finally.test/integrations/express/v5/test-http-route.mjs (2)
19-23:⚠️ Potential issue | 🟡 MinorFail missing integration prereqs instead of exiting successfully.
This still turns a broken Express v5 setup into a green run, so CI can silently stop covering the new route test. Throw or assert here instead of exiting with
0.Based on learnings, in the nsolid test suite, native addon bindings are expected to be built beforehand; tests do not add
fs.existsSyncguards to skip when bindings are missing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/integrations/express/v5/test-http-route.mjs` around lines 19 - 23, The current prereq guard using existsSync(join(__dirname, 'node_modules')) logs and calls process.exit(0), which masks missing integration prerequisites; replace that exit-with-success behavior by throwing an Error or using an assertion (e.g., throw new Error(...) or assert.fail(...)) when the node_modules check fails so the test fails instead of passing; update the block that references existsSync, join and process.exit to raise the failure with a clear message about running the integration prereqs.
134-183:⚠️ Potential issue | 🟠 MajorBound the route wait and move teardown into
finally.Only the happy path reaches
conn.destroy(),server.close(), andgrpcServer.close(). If metric export regresses or a fetch/assert fails first, this test can hang the job with open handles.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/integrations/express/v5/test-http-route.mjs` around lines 134 - 183, The test currently awaits metricsPromise and performs cleanup only on the happy path; wrap the request/assert/await sequence in try/finally, bound waiting on metricsPromise with a timeout (e.g., race metricsPromise against a Promise that rejects after N ms) so failures/timeouts don't hang, and move teardown into the finally block to always run connections.forEach(conn => conn.destroy()), server.close(...), and grpcServer.close(); update references to metricsPromise, connections, server, and grpcServer to implement the bounded wait and guaranteed cleanup.test/agents/test-grpc-http-histograms.mjs (2)
200-258:⚠️ Potential issue | 🟠 MajorAdd a timeout and guaranteed teardown around the histogram wait.
If the histograms never show up,
trafficTimerkeeps firing and neitherchild.shutdown()norgrpcServer.close()runs. Race the wait against a timeout and move the teardown path intofinally.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/agents/test-grpc-http-histograms.mjs` around lines 200 - 258, The test's runTest function can hang because trafficTimer keeps running and child/grpcServer teardown only happens on successful histogram validation; wrap the histogram wait in a Promise that races the existing success path against a timeout and move teardown into a finally block so it always runs: create a timeout Promise (e.g. new Promise((_, reject) => setTimeout(() => reject(new Error('histogram timeout')), TIMEOUT_MS))) and use Promise.race to await either success or timeout, ensure you clearInterval(trafficTimer) and call child.shutdown(...) and grpcServer.close() in a finally handler (or in both success and timeout catch paths) and set shutdownCalled so the metrics handler no-ops after teardown; update references to trafficTimer, shutdownCalled, grpcServer, child, and the mustCallAtLeast metrics handler accordingly.
120-159:⚠️ Potential issue | 🟠 MajorWalk every OTLP resource/scope batch in both helpers.
Both helpers still inspect only
resourceMetrics[0].scopeMetrics[0]. A valid export can place the HTTP histograms in later resource or scope groups, so this can miss real data and false-fail the test.Also applies to: 161-198
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/agents/test-grpc-http-histograms.mjs` around lines 120 - 159, The helper checkHistogramMetrics only inspects resourceMetrics[0].scopeMetrics[0], so it can miss histograms in later resource/scope batches; change it to iterate over all resourceMetrics and for each resource iterate over all scopeMetrics and then over each scope's metrics, applying the same matching/validation logic (unit, aggregationTemporality, dataPoints, dp check, and removal from remaining). Make the identical change to the other histogram helper in this file so both walk every resource and every scope instead of only the first entries.test/integrations/express/v4/test-http-route.mjs (1)
134-183:⚠️ Potential issue | 🟠 MajorBound the route wait and always tear down the servers.
Right now cleanup only runs after
await metricsPromise. If the metrics never expose all routes, or one of the request assertions throws first, this test can hang the job with open HTTP and gRPC handles.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/integrations/express/v4/test-http-route.mjs` around lines 134 - 183, The test can hang because await metricsPromise may never resolve and cleanup only runs after it; change the test to bound waiting and always perform teardown: wrap the request/assert logic and awaiting of metricsPromise in a try/finally (or use Promise.race with a timeout) so that if metricsPromise times out or an assertion throws you still run the cleanup code that destroys connections and closes server and grpcServer; reference the metricsPromise, grpcServer.on('metrics'), server, grpcServer, and connections identifiers when making the change.lib/internal/nsolid_diag.js (3)
392-401:⚠️ Potential issue | 🟠 MajorGuard
routePathbefore using string helpers.
request.route.pathcan be non-string for some Express route definitions. In that casestartsWith()/slice()throws on the response-finish path.🛡️ Safe guard
if (request?.route?.path && !request[nsolid_route_s]) { const baseUrl = request.baseUrl || ''; const routePath = request.route.path; + if (typeof routePath !== 'string') + return; const fullPath = baseUrl.endsWith('/') && routePath.startsWith('/') ? baseUrl + routePath.slice(1) : baseUrl + routePath;Express req.route.path RegExp array string🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/internal/nsolid_diag.js` around lines 392 - 401, The code in the dc.subscribe('http.server.response.finish', ...) handler assumes request.route.path is a string and calls startsWith()/slice(), which throws if route.path is a RegExp or array; guard routePath by checking typeof routePath === 'string' (or coerce to string safely) before using string helpers and only compute fullPath when routePath is a string; update the logic around routePath, baseUrl and request[nsolid_route_s] (inside the dc.subscribe callback) to skip or fallback (e.g., use request.route.path.toString() or simply set request[nsolid_route_s] = baseUrl) for non-string route.path values.
156-171:⚠️ Potential issue | 🟠 MajorStore the protocol enum, not the display string.
nsolid_net_prot_snow feeds a binding that takesuint32_t protocol_version. Writing'1.1'/'2'here makes Undici histogram attribution depend on coercion and can misbucket HTTP/1.1 traffic.🔧 Minimal fix
dc.subscribe('undici:client:connected', ({ connectParams, socket }) => { const api = getApi(); const span = api.trace.getSpan(api.context.active()); const version = connectParams.version === 'h1' ? '1.1' : '2'; + const versionEnum = + connectParams.version === 'h1' ? kHttpVersion11 : kHttpVersion2; // Store version on socket for later retrieval in sendHeaders - socketVersionMap.set(socket, version); + socketVersionMap.set(socket, versionEnum); if (span && span.type === kSpanHttpClient) { span._pushSpanDataString(kSpanHttpProtocolVersion, version);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/internal/nsolid_diag.js` around lines 156 - 171, The code stores the protocol as the display string ('1.1'/'2') on the socket and writes that into request[nsolid_net_prot_s], but the binding expects a uint32_t protocol enum; change the logic in the dc.subscribe handlers (the 'undici:client:connected' callback that sets socketVersionMap and the 'undici:client:sendHeaders' callback that writes request[nsolid_net_prot_s]) to store and write an integer enum constant (e.g., 1 for HTTP/1.1, 2 for HTTP/2) instead of the strings, and continue pushing the human-readable value into the span via span._pushSpanDataString(kSpanHttpProtocolVersion) if needed so telemetry and the binding receive the correct numeric protocol_version.
3-5:⚠️ Potential issue | 🟠 MajorFix the new primordials/lint blockers.
CI is already red here: the
Numberbinding tripsnoShadowRestrictedNames, andnew WeakMap()trips the primordials-safeSafeWeakMaprequirement for lib code.Also applies to: 59-60
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/internal/nsolid_diag.js` around lines 3 - 5, The code currently destructures Number from primordials (creating a shadowed restricted name) and uses new WeakMap() in lib code; update the module to stop binding restricted global names and to use the primordials-safe wrapper: remove the const { Number } = primordials pattern and reference the global Number directly (or use a uniquely named local alias if needed), and replace any instantiation of new WeakMap() with the primordials-safe SafeWeakMap (e.g., use SafeWeakMap when creating maps in nsolid_diag.js, including the occurrences around the later block at the same file mentioned in the comment). Ensure identifiers referenced in the file (e.g., any existing WeakMap usages and the Number binding) are updated accordingly so lint rule noShadowRestrictedNames and primordials safety are satisfied.src/nsolid/nsolid_api.h (1)
414-417:⚠️ Potential issue | 🟠 MajorReconsider the hard
100series cap here.This looks like a hard cardinality cap rather than a reserve size. With
{method, route, scheme, host, port}keys, 100 distinct combinations per interval is easy to exhaust, which would push new traffic into the overflow bucket and blunt the new route-level breakdown.Run this read-only check to confirm the ctor and overflow semantics in the vendored OTEL code. Expect matches showing overflow handling plus both new HTTP maps pinned to
100.#!/bin/bash set -euo pipefail hdr="$(fd '^attributes_hashmap\.h$' deps/opentelemetry-cpp | head -n1)" if [ -z "$hdr" ]; then echo "attributes_hashmap.h not found" >&2 exit 1 fi echo "== AttributesHashMap constructor / overflow semantics ==" rg -n -C3 'AttributesHashMap\s*\(|GetOrSetDefault|overflow|max_size_|cardinality' \ "$hdr" deps/opentelemetry-cpp/sdk/src echo echo "== Current NSolid call sites ==" rg -n -C1 'AttributesHashMap\s*\(\s*100\s*\)' src/nsolid🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nsolid/nsolid_api.h` around lines 414 - 417, The two HTTP attribute maps (http_client_hashmap_ and http_server_hashmap_) are constructed with a hard-coded cardinality cap of 100 which can be easily exhausted; inspect the AttributesHashMap constructor/overflow semantics in the vendored OpenTelemetry code (AttributesHashMap) and replace the literal 100 in the http_client_hashmap_ and http_server_hashmap_ initializations with a safer approach: either use the AttributesHashMap default ctor (if it represents an unbounded/reservable map), or introduce a named constant (e.g., kHttpAttributesReserveSize) read from configuration/feature-flag and set a much larger default (or expose it to runtime config) so the maps won’t overflow quickly; ensure the change is applied to both http_client_hashmap_ and http_server_hashmap_ and that you verify the AttributesHashMap overflow behavior to decide whether reserve vs cap semantics require increasing or removing the limit.agents/grpc/src/grpc_agent.cc (1)
955-984:⚠️ Potential issue | 🟠 MajorSkip processing queued metrics for threads that no longer exist.
When a thread is deleted,
env_msg_cb_erases its entries from all caches. However,thr_metrics_msg_q_may already contain anExtMetricsStorharvested just before the deletion callback ran. If the deletion async is processed first, this loop will re-insert the dead thread intothr_metrics_hist_prev_end_ts_ms_andthr_metrics_cache_.Check whether the thread still exists in
env_metrics_map_before exporting and caching:🛠️ Proposed fix
ExtMetricsStor ext_stor; while (agent->thr_metrics_msg_q_.dequeue(ext_stor)) { auto& stor = ext_stor.stor; + uint64_t thread_id = stor.thread_id; + // Skip metrics for threads that have been deleted + if (agent->env_metrics_map_.find(thread_id) == agent->env_metrics_map_.end()) { + continue; + } otlp::fill_env_metrics(metrics, stor, false); - uint64_t thread_id = stor.thread_id; uint64_t start_ts_ms = 0;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/grpc/src/grpc_agent.cc` around lines 955 - 984, The loop that processes ExtMetricsStor from thr_metrics_msg_q_ should skip entries for threads that have been deleted: before calling otlp::fill_env_metrics, otlp::fill_http_histograms or inserting into thr_metrics_hist_prev_end_ts_ms_ and thr_metrics_cache_, check whether agent->env_metrics_map_ contains stor.thread_id (the thread exists); if not, continue and drop this ExtMetricsStor. This prevents resurrecting deleted threads that env_msg_cb_ already removed; update the loop around thr_metrics_msg_q_.dequeue handling to guard on agent->env_metrics_map_.find(thread_id) != end() and only then perform the export and cache insert_or_assign operations.
🧹 Nitpick comments (1)
src/nsolid/nsolid_api.cc (1)
2098-2100: PassDatapointAttrsby const reference on this hot path.
add_metric_datapoint_()only inspectsattrs, but the current signature copies the full variant for every datapoint on the NSolid thread. That is extra churn oncehttp.routestrings start flowing through here.♻️ Small refactor
void EnvInst::add_metric_datapoint_(MetricsStream::Type type, double value, - MetricsStream::DatapointAttrs attrs) { + const MetricsStream::DatapointAttrs& attrs) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nsolid/nsolid_api.cc` around lines 2098 - 2100, The function signature EnvInst::add_metric_datapoint_(MetricsStream::Type type, double value, MetricsStream::DatapointAttrs attrs) copies the DatapointAttrs variant on a hot path; change the parameter to take const MetricsStream::DatapointAttrs& attrs instead and update all callers to pass their attrs by const reference (or const lvalue reference) so no unnecessary copies occur; ensure any places that previously relied on move semantics still behave correctly and that the header/declaration and definition signatures match.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/_http_server.js`:
- Around line 322-332: The code currently always sends res.statusCode (default
200) which emits a synthetic 200 for aborted requests; change the call site that
invokes binding.pushServerBucket so it checks res._headerSent and passes the
actual statusCode only if headers were sent, otherwise pass a sentinel value (0)
to indicate unknown; update the corresponding native handling in nsolid_api.cc
to treat 0 as “no status code” and skip emitting http.response.status_code.
Reference the call to binding.pushServerBucket and the
res._headerSent/res.statusCode fields, and the native nsolid_api.cc logic that
writes http.response.status_code.
In `@Makefile`:
- Around line 1695-1699: The distclean target does not remove the new
integration node_modules directories added under test/integrations; update the
Makefile so the distclean target also removes
test/integrations/express/v4/node_modules,
test/integrations/express/v5/node_modules, and
test/integrations/fastify/v5/node_modules (same removals as in the
test-integrations-prereqs-clean target) so distclean truly removes all build and
test artifacts.
In `@test/integrations/express/v4/test-http-route.mjs`:
- Around line 19-23: The current guard uses process.exit(0) so missing
integration prereqs mark the test as a success; replace this with a failure
signal by throwing an Error or using an assertion instead of exiting. Locate the
existsSync(join(__dirname, 'node_modules')) check in the test and change the
process.exit(0) branch to either throw new Error('SKIP: node_modules not found.
Run "make test-integrations-prereqs" first.') or call assert.fail(...) so the
test fails/flags missing prerequisites instead of exiting cleanly. Ensure the
log message remains descriptive and includes the original guidance.
---
Duplicate comments:
In `@agents/grpc/src/grpc_agent.cc`:
- Around line 955-984: The loop that processes ExtMetricsStor from
thr_metrics_msg_q_ should skip entries for threads that have been deleted:
before calling otlp::fill_env_metrics, otlp::fill_http_histograms or inserting
into thr_metrics_hist_prev_end_ts_ms_ and thr_metrics_cache_, check whether
agent->env_metrics_map_ contains stor.thread_id (the thread exists); if not,
continue and drop this ExtMetricsStor. This prevents resurrecting deleted
threads that env_msg_cb_ already removed; update the loop around
thr_metrics_msg_q_.dequeue handling to guard on
agent->env_metrics_map_.find(thread_id) != end() and only then perform the
export and cache insert_or_assign operations.
In `@lib/internal/nsolid_diag.js`:
- Around line 392-401: The code in the
dc.subscribe('http.server.response.finish', ...) handler assumes
request.route.path is a string and calls startsWith()/slice(), which throws if
route.path is a RegExp or array; guard routePath by checking typeof routePath
=== 'string' (or coerce to string safely) before using string helpers and only
compute fullPath when routePath is a string; update the logic around routePath,
baseUrl and request[nsolid_route_s] (inside the dc.subscribe callback) to skip
or fallback (e.g., use request.route.path.toString() or simply set
request[nsolid_route_s] = baseUrl) for non-string route.path values.
- Around line 156-171: The code stores the protocol as the display string
('1.1'/'2') on the socket and writes that into request[nsolid_net_prot_s], but
the binding expects a uint32_t protocol enum; change the logic in the
dc.subscribe handlers (the 'undici:client:connected' callback that sets
socketVersionMap and the 'undici:client:sendHeaders' callback that writes
request[nsolid_net_prot_s]) to store and write an integer enum constant (e.g., 1
for HTTP/1.1, 2 for HTTP/2) instead of the strings, and continue pushing the
human-readable value into the span via
span._pushSpanDataString(kSpanHttpProtocolVersion) if needed so telemetry and
the binding receive the correct numeric protocol_version.
- Around line 3-5: The code currently destructures Number from primordials
(creating a shadowed restricted name) and uses new WeakMap() in lib code; update
the module to stop binding restricted global names and to use the
primordials-safe wrapper: remove the const { Number } = primordials pattern and
reference the global Number directly (or use a uniquely named local alias if
needed), and replace any instantiation of new WeakMap() with the
primordials-safe SafeWeakMap (e.g., use SafeWeakMap when creating maps in
nsolid_diag.js, including the occurrences around the later block at the same
file mentioned in the comment). Ensure identifiers referenced in the file (e.g.,
any existing WeakMap usages and the Number binding) are updated accordingly so
lint rule noShadowRestrictedNames and primordials safety are satisfied.
In `@src/nsolid/nsolid_api.h`:
- Around line 414-417: The two HTTP attribute maps (http_client_hashmap_ and
http_server_hashmap_) are constructed with a hard-coded cardinality cap of 100
which can be easily exhausted; inspect the AttributesHashMap
constructor/overflow semantics in the vendored OpenTelemetry code
(AttributesHashMap) and replace the literal 100 in the http_client_hashmap_ and
http_server_hashmap_ initializations with a safer approach: either use the
AttributesHashMap default ctor (if it represents an unbounded/reservable map),
or introduce a named constant (e.g., kHttpAttributesReserveSize) read from
configuration/feature-flag and set a much larger default (or expose it to
runtime config) so the maps won’t overflow quickly; ensure the change is applied
to both http_client_hashmap_ and http_server_hashmap_ and that you verify the
AttributesHashMap overflow behavior to decide whether reserve vs cap semantics
require increasing or removing the limit.
In `@test/agents/test-grpc-http-histograms.mjs`:
- Around line 200-258: The test's runTest function can hang because trafficTimer
keeps running and child/grpcServer teardown only happens on successful histogram
validation; wrap the histogram wait in a Promise that races the existing success
path against a timeout and move teardown into a finally block so it always runs:
create a timeout Promise (e.g. new Promise((_, reject) => setTimeout(() =>
reject(new Error('histogram timeout')), TIMEOUT_MS))) and use Promise.race to
await either success or timeout, ensure you clearInterval(trafficTimer) and call
child.shutdown(...) and grpcServer.close() in a finally handler (or in both
success and timeout catch paths) and set shutdownCalled so the metrics handler
no-ops after teardown; update references to trafficTimer, shutdownCalled,
grpcServer, child, and the mustCallAtLeast metrics handler accordingly.
- Around line 120-159: The helper checkHistogramMetrics only inspects
resourceMetrics[0].scopeMetrics[0], so it can miss histograms in later
resource/scope batches; change it to iterate over all resourceMetrics and for
each resource iterate over all scopeMetrics and then over each scope's metrics,
applying the same matching/validation logic (unit, aggregationTemporality,
dataPoints, dp check, and removal from remaining). Make the identical change to
the other histogram helper in this file so both walk every resource and every
scope instead of only the first entries.
In `@test/integrations/express/v4/test-http-route.mjs`:
- Around line 134-183: The test can hang because await metricsPromise may never
resolve and cleanup only runs after it; change the test to bound waiting and
always perform teardown: wrap the request/assert logic and awaiting of
metricsPromise in a try/finally (or use Promise.race with a timeout) so that if
metricsPromise times out or an assertion throws you still run the cleanup code
that destroys connections and closes server and grpcServer; reference the
metricsPromise, grpcServer.on('metrics'), server, grpcServer, and connections
identifiers when making the change.
In `@test/integrations/express/v5/test-http-route.mjs`:
- Around line 19-23: The current prereq guard using existsSync(join(__dirname,
'node_modules')) logs and calls process.exit(0), which masks missing integration
prerequisites; replace that exit-with-success behavior by throwing an Error or
using an assertion (e.g., throw new Error(...) or assert.fail(...)) when the
node_modules check fails so the test fails instead of passing; update the block
that references existsSync, join and process.exit to raise the failure with a
clear message about running the integration prereqs.
- Around line 134-183: The test currently awaits metricsPromise and performs
cleanup only on the happy path; wrap the request/assert/await sequence in
try/finally, bound waiting on metricsPromise with a timeout (e.g., race
metricsPromise against a Promise that rejects after N ms) so failures/timeouts
don't hang, and move teardown into the finally block to always run
connections.forEach(conn => conn.destroy()), server.close(...), and
grpcServer.close(); update references to metricsPromise, connections, server,
and grpcServer to implement the bounded wait and guaranteed cleanup.
In `@test/integrations/fastify/v5/test-http-route.mjs`:
- Around line 19-23: The current check using existsSync(join(__dirname,
'node_modules')) calls process.exit(0) which hides missing integration
prerequisites; replace the exit with a failing test path: throw an Error or use
an assertion so the test fails when node_modules is absent. Update the guard
around existsSync(...) to throw(new Error(...)) or assert.fail(...) with a clear
message instead of calling process.exit(0), referencing the existsSync and
process.exit usage in this test file so CI fails visibly when prerequisites are
missing.
- Around line 115-156: The test currently awaits metricsPromise without a
timeout and performs cleanup after that, which can hang and leave servers
running; change the waiting to a bounded wait (e.g., replace await
metricsPromise with await Promise.race([metricsPromise, new Promise((_, rej) =>
setTimeout(() => rej(new Error('metrics timeout')), 5000))]) or similar) and
move shutdown into a finally block so both fastify.close() and
grpcServer.close() are always called; update the code around metricsPromise, the
await of that promise, and the cleanup to ensure the listener/Promise created
for metricsPromise still resolves/rejects properly and both fastify.close() and
grpcServer.close() are invoked in the finally.
---
Nitpick comments:
In `@src/nsolid/nsolid_api.cc`:
- Around line 2098-2100: The function signature
EnvInst::add_metric_datapoint_(MetricsStream::Type type, double value,
MetricsStream::DatapointAttrs attrs) copies the DatapointAttrs variant on a hot
path; change the parameter to take const MetricsStream::DatapointAttrs& attrs
instead and update all callers to pass their attrs by const reference (or const
lvalue reference) so no unnecessary copies occur; ensure any places that
previously relied on move semantics still behave correctly and that the
header/declaration and definition signatures match.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8764ef7b-e637-496e-ba08-d19afab3a7f0
📒 Files selected for processing (30)
.gitignoreMakefileagents/grpc/src/grpc_agent.ccagents/grpc/src/grpc_agent.hagents/otlp/src/otlp_common.ccagents/otlp/src/otlp_common.hdeps/opentelemetry-cpp/otlp-http-exporter.gyplib/_http_client.jslib/_http_server.jslib/internal/nsolid_diag.jslib/internal/nsolid_http_consts.jssrc/nsolid.hsrc/nsolid/nsolid_api.ccsrc/nsolid/nsolid_api.hsrc/nsolid/nsolid_bindings.hsrc/nsolid/nsolid_metrics_types.hsrc/nsolid/nsolid_trace.htest/addons/nsolid-dispatchqueue/binding.gyptest/addons/nsolid-eventloop-cmd/binding.gyptest/addons/nsolid-log-hooks/binding.gyptest/addons/nsolid-statsdagent/binding.gyptest/agents/test-grpc-http-histograms.mjstest/integrations/express/v4/package.jsontest/integrations/express/v4/test-http-route.mjstest/integrations/express/v5/package.jsontest/integrations/express/v5/test-http-route.mjstest/integrations/fastify/v5/package.jsontest/integrations/fastify/v5/test-http-route.mjstest/integrations/testcfg.pytest/parallel/test-bootstrap-modules.js
🚧 Files skipped from review as they are similar to previous changes (8)
- test/integrations/testcfg.py
- test/addons/nsolid-log-hooks/binding.gyp
- test/integrations/express/v4/package.json
- lib/_http_client.js
- test/addons/nsolid-statsdagent/binding.gyp
- .gitignore
- test/integrations/fastify/v5/package.json
- src/nsolid.h
| const req = res.req; | ||
| // Extract route template from unified nsolid_route_s symbol | ||
| // Both Express (via diagnostic channel) and Fastify (via diagnostic channel) set this | ||
| const route = req?.[nsolid_route_s] || ''; | ||
| binding.pushServerBucket( | ||
| now() - res[nsolid_tracer_s], | ||
| kHttpMethodMap[req?.method] ?? kHttpMethodOther, | ||
| res.statusCode, | ||
| req?.socket?.encrypted ? kHttpSchemeHttps : kHttpSchemeHttp, | ||
| kHttpVersionMap[req?.httpVersion] ?? kHttpVersionOther, | ||
| route); |
There was a problem hiding this comment.
Don't emit a synthetic 200 for aborted responses.
This close-path always pushes res.statusCode, but ServerResponse.prototype.statusCode defaults to 200. If the request aborts before headers are sent, the histogram gets a fake success code even though no response was written. Gate this on res._headerSent and treat the status code as unknown otherwise.
🔧 Minimal fix
const req = res.req;
// Extract route template from unified nsolid_route_s symbol
// Both Express (via diagnostic channel) and Fastify (via diagnostic channel) set this
const route = req?.[nsolid_route_s] || '';
+ const statusCode = res._headerSent ? res.statusCode : 0;
binding.pushServerBucket(
now() - res[nsolid_tracer_s],
kHttpMethodMap[req?.method] ?? kHttpMethodOther,
- res.statusCode,
+ statusCode,
req?.socket?.encrypted ? kHttpSchemeHttps : kHttpSchemeHttp,
kHttpVersionMap[req?.httpVersion] ?? kHttpVersionOther,
route);Then skip http.response.status_code in src/nsolid/nsolid_api.cc when the sentinel 0 is passed.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const req = res.req; | |
| // Extract route template from unified nsolid_route_s symbol | |
| // Both Express (via diagnostic channel) and Fastify (via diagnostic channel) set this | |
| const route = req?.[nsolid_route_s] || ''; | |
| binding.pushServerBucket( | |
| now() - res[nsolid_tracer_s], | |
| kHttpMethodMap[req?.method] ?? kHttpMethodOther, | |
| res.statusCode, | |
| req?.socket?.encrypted ? kHttpSchemeHttps : kHttpSchemeHttp, | |
| kHttpVersionMap[req?.httpVersion] ?? kHttpVersionOther, | |
| route); | |
| const req = res.req; | |
| // Extract route template from unified nsolid_route_s symbol | |
| // Both Express (via diagnostic channel) and Fastify (via diagnostic channel) set this | |
| const route = req?.[nsolid_route_s] || ''; | |
| const statusCode = res._headerSent ? res.statusCode : 0; | |
| binding.pushServerBucket( | |
| now() - res[nsolid_tracer_s], | |
| kHttpMethodMap[req?.method] ?? kHttpMethodOther, | |
| statusCode, | |
| req?.socket?.encrypted ? kHttpSchemeHttps : kHttpSchemeHttp, | |
| kHttpVersionMap[req?.httpVersion] ?? kHttpVersionOther, | |
| route); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/_http_server.js` around lines 322 - 332, The code currently always sends
res.statusCode (default 200) which emits a synthetic 200 for aborted requests;
change the call site that invokes binding.pushServerBucket so it checks
res._headerSent and passes the actual statusCode only if headers were sent,
otherwise pass a sentinel value (0) to indicate unknown; update the
corresponding native handling in nsolid_api.cc to treat 0 as “no status code”
and skip emitting http.response.status_code. Reference the call to
binding.pushServerBucket and the res._headerSent/res.statusCode fields, and the
native nsolid_api.cc logic that writes http.response.status_code.
| .PHONY: test-integrations-prereqs-clean | ||
| test-integrations-prereqs-clean: | ||
| $(RM) -r test/integrations/express/v4/node_modules | ||
| $(RM) -r test/integrations/express/v5/node_modules | ||
| $(RM) -r test/integrations/fastify/v5/node_modules |
There was a problem hiding this comment.
Also clean these deps from distclean.
distclean still says it removes all build and test artifacts, but these new integration node_modules trees only disappear through clean.
🧹 Minimal follow-up
distclean: ## Remove all build and test artifacts.
+ $(MAKE) test-integrations-prereqs-clean
$(RM) -r out🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Makefile` around lines 1695 - 1699, The distclean target does not remove the
new integration node_modules directories added under test/integrations; update
the Makefile so the distclean target also removes
test/integrations/express/v4/node_modules,
test/integrations/express/v5/node_modules, and
test/integrations/fastify/v5/node_modules (same removals as in the
test-integrations-prereqs-clean target) so distclean truly removes all build and
test artifacts.
| // Skip test if dependencies not installed | ||
| if (!existsSync(join(__dirname, 'node_modules'))) { | ||
| console.log('SKIP: node_modules not found. Run "make test-integrations-prereqs" first.'); | ||
| process.exit(0); | ||
| } |
There was a problem hiding this comment.
Fail missing integration prereqs instead of exiting successfully.
process.exit(0) makes a broken integration setup look green, so this new coverage can silently disappear from CI. Throw or assert here instead.
Based on learnings, in the nsolid test suite, native addon bindings are expected to be built beforehand; tests do not add fs.existsSync guards to skip when bindings are missing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/integrations/express/v4/test-http-route.mjs` around lines 19 - 23, The
current guard uses process.exit(0) so missing integration prereqs mark the test
as a success; replace this with a failure signal by throwing an Error or using
an assertion instead of exiting. Locate the existsSync(join(__dirname,
'node_modules')) check in the test and change the process.exit(0) branch to
either throw new Error('SKIP: node_modules not found. Run "make
test-integrations-prereqs" first.') or call assert.fail(...) so the test
fails/flags missing prerequisites instead of exiting cleanly. Ensure the log
message remains descriptive and includes the original guidance.
HTTP Latency Histogram Metrics with Route Attribution
Summary
This PR introduces HTTP latency histogram aggregation with exponential bucket distribution, adds route-level attribution for Express and Fastify frameworks, and improves metrics reliability by cleaning up dead thread data.
Changes
1. lib: simplify metrics interval handling (
21ac403a478)setMetricsIntervalAPI entry point2. agents: delete cached metrics for dead threads (
62129809bd1)thr_metrics_cache_entries when threads are deleted3. src: add HTTP latency histogram aggregation (
68f0246a315)binding.gypfiles for new opentelemetry-cpp dependencies4. lib: capture http.route for HTTP metrics (
daaf80445a9)http.routeattribute for Express and Fastify routes5. test: cover HTTP histogram metrics via gRPC (
36d565f4cf6)6. test: add http.route integration coverage (
9dc155805c8)test-*.mjsfileshttp.routepatternsMetrics Added
http.client.request.durationhttp.method,http.route,http.scheme,net.host.name,net.host.porthttp.server.request.durationhttp.method,http.route,http.scheme,net.host.name,net.host.portFramework Support
req.route.pathreq.routerPathSummary by CodeRabbit
New Features
Tests
Chores