Skip to content

HTTP Latency Histogram Metrics with Route Attribution#431

Open
santigimeno wants to merge 6 commits intonode-v24.x-nsolid-v6.xfrom
santi/exp_histogram_metrics
Open

HTTP Latency Histogram Metrics with Route Attribution#431
santigimeno wants to merge 6 commits intonode-v24.x-nsolid-v6.xfrom
santi/exp_histogram_metrics

Conversation

@santigimeno
Copy link
Member

@santigimeno santigimeno commented Mar 9, 2026

HTTP Latency Histogram Metrics with Route Attribution

Summary

This PR introduces HTTP latency histogram aggregation with exponential bucket distribution, adds route-level attribution for Express and Fastify frameworks, and improves metrics reliability by cleaning up dead thread data.


Changes

1. lib: simplify metrics interval handling (21ac403a478)

  • Refactor NSolid metrics interval handling so configuration and updates are managed in a single place in JS
  • Remove native setMetricsInterval API entry point
  • Reduce duplicated logic between JS and C++ layers

2. agents: delete cached metrics for dead threads (62129809bd1)

  • Clean up thr_metrics_cache_ entries when threads are deleted
  • Prevent exporting stale metrics from dead worker threads

3. src: add HTTP latency histogram aggregation (68f0246a315)

  • Introduce base2 exponential histograms for HTTP client and server latency metrics
  • Maintain per-attribute aggregation using OTLP attributes hash maps
  • Harvest histogram points for OTLP export
  • Update test addon binding.gyp files for new opentelemetry-cpp dependencies

4. lib: capture http.route for HTTP metrics (daaf80445a9)

  • Extract and record http.route attribute for Express and Fastify routes
  • Normalize framework route patterns
  • Attach routes to datapoints for HTTP latency histogram breakdown by logical route

5. test: cover HTTP histogram metrics via gRPC (36d565f4cf6)

  • End-to-end test generating HTTP client/server traffic
  • Assert on exported histogram data over gRPC/OTLP
  • Validate bucket counts and attributes

6. test: add http.route integration coverage (9dc155805c8)

  • Extend integrations test harness to discover nested test-*.mjs files
  • Add Express v4/v5 and Fastify v5 scenarios
  • Wire integrations into Makefile test targets
  • Verify HTTP server latency histograms include expected http.route patterns

Metrics Added

Metric Type Attributes
http.client.request.duration Exponential Histogram http.method, http.route, http.scheme, net.host.name, net.host.port
http.server.request.duration Exponential Histogram http.method, http.route, http.scheme, net.host.name, net.host.port

Framework Support

  • Express v4/v5: Route pattern extraction via req.route.path
  • Fastify v5: Route pattern extraction via req.routerPath

Summary by CodeRabbit

  • New Features

    • Capture richer HTTP metrics (method, status, host/port, scheme, protocol, route) and emit per-attribute exponential histograms; expose HTTP constants and route symbol to JS; add http.route to spans and metrics.
  • Tests

    • Added integration tests for Express v4/v5 and Fastify v5 validating route observability and a gRPC histogram validation test.
  • Chores

    • Added test integration prereqs/cleanup targets, Docker-gated OpenSSL generation, updated test discovery, and updated .gitignore to ignore integration node_modules.

Refactor NSolid metrics interval handling so configuration
and updates are managed in a single place in JS, and the
native API no longer exposes a separate setMetricsInterval
entry point. This reduces duplicated logic between JS and
C++ and makes interval changes easier to reason about.
@santigimeno santigimeno requested a review from RafaelGSS March 9, 2026 09:03
@santigimeno santigimeno self-assigned this Mar 9, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds HTTP datapoint attributes and per-attribute exponential histogram collection: new types and attributes in public API, propagation of HTTP metadata through JS/native/agent paths, OTLP/gRPC histogram emission, integration tests, and Makefile/test infra updates.

Changes

Cohort / File(s) Summary
Core public types
src/nsolid.h, src/nsolid/nsolid_metrics_types.h
Add HTTP enums (method/version/scheme), HttpDatapointAttrs, DatapointAttrs (std::variant), Datapoint.attrs, and shared PointData/Attributes type aliases.
Native API & bindings
src/nsolid/nsolid_api.h, src/nsolid/nsolid_api.cc, src/nsolid/nsolid_bindings.h
Propagate HTTP attrs through PushClient/ServerBucket APIs, add per-attribute hashmaps, harvesting/flush logic, expose HTTP constants/route symbol, remove SetMetricsInterval binding.
gRPC agent
agents/grpc/src/grpc_agent.h, agents/grpc/src/grpc_agent.cc
Introduce ExtMetricsStor/CachedThreadMetrics; change queue/cache types to carry per-thread HTTP histogram points and window timestamps through export path.
OTLP agent
agents/otlp/src/otlp_common.h, agents/otlp/src/otlp_common.cc
Add fill_http_histograms and related imports; emit per-attribute exponential histogram MetricData (delta) for HTTP client/server using shared point attributes.
JS HTTP instrumentation
lib/internal/nsolid_http_consts.js, lib/internal/nsolid_diag.js, lib/_http_client.js, lib/_http_server.js, lib/nsolid.js
New HTTP constants module; capture/forward method,status,host,port,version,scheme,route; HTTP/2 handling and Fastify/Express route extraction; store config.interval in JS without immediate native call.
Tracing
src/nsolid/nsolid_trace.h
Add kSpanHttpRoute span property (include http.route in span attributes).
Build & deps
Makefile, deps/opentelemetry-cpp/otlp-http-exporter.gyp, test/addons/*/binding.gyp
Add test-integrations-prereqs/clean targets and docker-gated gen-openssl; include exponential histogram source files in OTEL gyp and add OTEL include paths to test addon gyps.
Integration tests & test infra
test/integrations/..., test/integrations/testcfg.py, test/agents/test-grpc-http-histograms.mjs, test/parallel/test-bootstrap-modules.js
Add Express v4/v5 and Fastify v5 integration packages and tests for http.route, integration discovery helper, gRPC histogram validation test, and test runtime expectation for new native module.
Repository config
.gitignore
Ignore test/integrations/.../node_modules.

Sequence Diagram

sequenceDiagram
    participant JS as JavaScript Layer
    participant Bind as Native Bindings
    participant Queue as Datapoint Queue
    participant Cache as Per-thread Cache
    participant Agg as Histogram Aggregator
    participant Agent as gRPC/OTLP Agent

    JS->>Bind: Fast/SlowPushClient/ServerBucket(method,status,addr,port,scheme,version,route)
    Bind->>Queue: enqueue ExtMetricsStor (metrics stor + http client/server points)
    Queue->>Cache: accumulate per-thread stor and http point attributes
    Cache->>Agg: on PTiles interval flush attribute hashmaps -> PointDataAttributesVector
    Agg->>Agent: build exponential histogram MetricData (delta) with attributes and emit
    Agent->>Collector: deliver metrics to collector / test listener
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • RafaelGSS
  • EHortua
  • juanarbol

Poem

🐰
Hopping bytes and metrics bright,
Routes and methods take their flight,
Histograms count each tiny beat,
From JS burrow to C++ seat,
Rabbit cheers — observability delights!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'HTTP Latency Histogram Metrics with Route Attribution' accurately and specifically describes the primary changes in the pull request, which introduces exponential histogram metrics for HTTP requests with route-level attribution.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch santi/exp_histogram_metrics

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

@santigimeno santigimeno changed the title Santi/exp histogram metrics HTTP Latency Histogram Metrics with Route Attribution Mar 9, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/nsolid/nsolid_api.cc (1)

2095-2130: ⚠️ Potential issue | 🟠 Major

Convert HTTP samples to seconds before feeding the exponential histograms.

The same raw value still backs the millisecond summaries in agents/otlp/src/otlp_common.cc Lines 354-367, but the new OTel histograms are exported with unit seconds there on Lines 420-425. Aggregating value unchanged here makes the histogram sum/min/max/buckets 1000× too large.

🩹 Proposed fix
-  auto aggregate_into_hashmap = [&value](
-      AttributesHashMap* hashmap,
-      MetricAttributes&& metric_attrs) {
+  auto aggregate_into_hashmap = [](
+      AttributesHashMap* hashmap,
+      MetricAttributes&& metric_attrs,
+      double sample_seconds) {
     auto factory = []() -> std::unique_ptr<Aggregation> {
       return std::make_unique<Base2ExponentialHistogramAggregation>();
     };
     auto* agg = hashmap->GetOrSetDefault(std::move(metric_attrs), factory);
-    agg->Aggregate(value);
+    agg->Aggregate(sample_seconds);
   };
@@
         aggregate_into_hashmap(
             http_client_hashmap_.get(),
-            BuildMetricAttributes(*http, type));
+            BuildMetricAttributes(*http, type),
+            value / 1e3);
@@
         aggregate_into_hashmap(
             http_server_hashmap_.get(),
-            BuildMetricAttributes(*http, type));
+            BuildMetricAttributes(*http, type),
+            value / 1e3);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nsolid/nsolid_api.cc` around lines 2095 - 2130, The HTTP datapoints are
being aggregated into exponential histograms using the raw millisecond `value`,
causing a 1000× scale error; update the places that call aggregate_into_hashmap
for MetricsStream::Type::kHttpClient and kHttpServer so they convert `value`
from milliseconds to seconds (e.g., compute a double seconds_value = value /
1000.0) and pass that converted value into the aggregation instead of the
original `value` when invoking aggregate_into_hashmap (leave other uses of
`value` for ms summaries unchanged); locate usages by the symbols
aggregate_into_hashmap, BuildMetricAttributes, http_client_hashmap_, and
http_server_hashmap_.
🧹 Nitpick comments (5)
test/addons/nsolid-statsdagent/binding.gyp (1)

10-11: Move these include directories to test/addons/common.gypi. The OpenTelemetry include paths (../../../deps/opentelemetry-cpp/api/include and ../../../deps/opentelemetry-cpp/sdk/include) are duplicated across four addon binding files (nsolid-log-hooks, nsolid-statsdagent, nsolid-eventloop-cmd, nsolid-dispatchqueue). Centralizing them in the shared config file will reduce maintenance burden.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/addons/nsolid-statsdagent/binding.gyp` around lines 10 - 11, The two
OpenTelemetry include paths '../../../deps/opentelemetry-cpp/api/include' and
'../../../deps/opentelemetry-cpp/sdk/include' should be removed from this
binding.gyp and added into the shared test/addons/common.gypi include_dirs list;
update common.gypi to include those two paths (so build configs reference them
centrally) and then remove the duplicated entries from the other addon
binding.gyp files (nsolid-log-hooks, nsolid-eventloop-cmd, nsolid-dispatchqueue)
so all addons rely on the shared common.gypi include_dirs instead of repeating
the same paths here.
test/integrations/express/v5/package.json (1)

5-6: The package.json versions are overridden by the Makefile test setup, but consider aligning them for clarity.

The Makefile (test-integrations-prereqs) installs these fixtures using explicit pinned versions (express@4, express@5, fastify@5), not the floating ranges in package.json. Since the package.json versions are bypassed in CI, the stated concern about floating ranges doesn't impact test stability. However, this discrepancy is confusing—if the package.json is not meant to be used, either remove the dependencies or update them to match the Makefile's pinned versions (e.g., "express": "5" instead of "^5.0.0"). Tests already skip gracefully when dependencies are missing (via existsSync guards).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integrations/express/v5/package.json` around lines 5 - 6, The
package.json dependency version for express is inconsistent with the Makefile's
pinned install in test-integrations-prereqs; update the "dependencies" entry in
package.json (the "express" key) to match the pinned version used by the
Makefile (e.g., change "express": "^5.0.0" to "express": "5") or remove the
dependency entirely if package.json is not intended to be authoritative for
these fixtures so the declared versions no longer conflict with the test setup.
test/integrations/express/v4/test-http-route.mjs (1)

122-137: Consider adding a timeout to prevent indefinite test hangs.

The metricsPromise waits for all routes to be validated but has no timeout. If metrics never arrive (e.g., due to a bug or misconfiguration), the test will hang indefinitely.

⏱️ Proposed fix to add timeout
   // Listen for metrics
-  const metricsPromise = new Promise((resolve, reject) => {
+  const metricsPromise = new Promise((resolve) => {
+    const timeout = setTimeout(() => {
+      throw new Error(`Timeout: Only found routes ${[...foundRoutes].join(', ')}`);
+    }, 30000);
+
     grpcServer.on('metrics', (data) => {
       const found = checkHttpRouteAttribute(data, expectedRoutes);
       for (const route of found) {
         if (!foundRoutes.has(route)) {
           foundRoutes.add(route);
           console.log(`Validated route: ${route} (${foundRoutes.size}/${expectedRoutes.size})`);
         }
       }

       if (foundRoutes.size === expectedRoutes.size) {
         console.log('All routes validated!');
+        clearTimeout(timeout);
         resolve();
       }
     });
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integrations/express/v4/test-http-route.mjs` around lines 122 - 137, The
metricsPromise can hang indefinitely if metrics never arrive; update the Promise
created for metricsPromise (the grpcServer.on('metrics', ...) handler) to
include a timeout (e.g., setTimeout) that rejects the promise after a reasonable
interval, ensure you clear that timer and remove the 'metrics' listener when
either resolved or rejected, and make sure the rejection message includes
context (mentioning checkHttpRouteAttribute, foundRoutes and expectedRoutes) so
the test fails fast instead of hanging.
test/integrations/express/v5/test-http-route.mjs (2)

122-137: Same timeout concern as Express v4 test.

The metricsPromise lacks a timeout, which could cause indefinite hangs if metrics don't arrive as expected.

⏱️ Proposed fix to add timeout
   // Listen for metrics
-  const metricsPromise = new Promise((resolve, reject) => {
+  const metricsPromise = new Promise((resolve) => {
+    const timeout = setTimeout(() => {
+      throw new Error(`Timeout: Only found routes ${[...foundRoutes].join(', ')}`);
+    }, 30000);
+
     grpcServer.on('metrics', (data) => {
       const found = checkHttpRouteAttribute(data, expectedRoutes);
       for (const route of found) {
         if (!foundRoutes.has(route)) {
           foundRoutes.add(route);
           console.log(`Validated route: ${route} (${foundRoutes.size}/${expectedRoutes.size})`);
         }
       }

       if (foundRoutes.size === expectedRoutes.size) {
         console.log('All routes validated!');
+        clearTimeout(timeout);
         resolve();
       }
     });
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integrations/express/v5/test-http-route.mjs` around lines 122 - 137, The
metricsPromise created around grpcServer.on('metrics') can hang forever; update
the Promise logic in test-http-route.mjs (the metricsPromise that listens to
'metrics' and uses checkHttpRouteAttribute, foundRoutes, and expectedRoutes) to
implement a configurable timeout: start a timer when the Promise is created that
rejects the Promise with a clear error after e.g. 5-10s, and ensure you clear
the timer and remove the 'metrics' listener on both resolve and reject to avoid
leaks; keep existing route-found logic but add the timeout and proper cleanup.

31-67: Consider extracting shared test utilities.

The getAttr and checkHttpRouteAttribute functions are duplicated across Express v4, v5, and likely Fastify v5 tests. Consider extracting to a shared test helper module.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integrations/express/v5/test-http-route.mjs` around lines 31 - 67,
Extract the duplicated test helpers into a shared test utility module and import
them into the Express v4, Express v5, and Fastify v5 tests: move the getAttr
function and checkHttpRouteAttribute function logic into a new/shared helper
(e.g., export getAttr and checkHttpRouteAttribute from the helper), replace the
local implementations in test-http-route.mjs (and the other duplicated test
files) with imports from that helper, and update tests to call the imported
getAttr and checkHttpRouteAttribute to remove duplication while preserving
existing behavior and signatures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Makefile`:
- Around line 1688-1698: The new integration setup targets
test-integrations-prereqs and test-integrations-prereqs-clean are not wired into
the normal lifecycle; update the Makefile so that jstest, test, and test-ci
depend on test-integrations-prereqs (so integration deps are installed before
running tests) and make clean depend on test-integrations-prereqs-clean (so
integration node_modules get removed on clean); ensure you add these targets as
prerequisites to the existing jstest, test, test-ci, and clean targets rather
than duplicating install/clean logic.

In `@src/nsolid/nsolid_api.cc`:
- Around line 136-152: The server branch in the MetricsStream handling (when
type != MetricsStream::Type::kHttpClient) only sets url.scheme and http.route,
so server histogram points still lack host/port; update the server-side branch
to also insert {"server.address", http.server_address} when http.server_address
is non-empty and {"server.port", static_cast<int64_t>(http.server_port)} when
http.server_port != 0 (matching the client branch) in the same block that sets
url.scheme/http.route, and ensure these attributes propagate through
FastPushServerBucket / PushServerBucketImpl paths so
http.server.request.duration includes host/port.
- Around line 1276-1283: UpdateConfig() must not call gen_ptiles_timer_.start()
directly; instead store the new interval into gen_ptiles_interval (or a pending
variable) and enqueue a restart command to the loop thread using the existing
async queue pattern (e.g., push a command onto eloop_cmds_q_ or
on_config_string_q_). Implement the actual stop/start of gen_ptiles_timer_
(using gen_ptiles_timer_.stop() then gen_ptiles_timer_.start(gen_ptiles_cb_,
gen_ptiles_interval, gen_ptiles_interval)) inside the loop thread’s handler
(env_list_routine_ or the async callback) so the libuv handle is always
manipulated on thread_loop_. Ensure the queued command uses the loop thread
context and checks/uses the stored interval value.

In `@test/integrations/testcfg.py`:
- Around line 8-22: Ls currently returns nested test paths as a single string
(e.g., "fastify/v5/test-http-route.mjs") which prevents
SimpleTestConfiguration.ListTests (which builds candidates as current_path +
[t]) from matching via Contains; modify Ls so it returns path components instead
of a single joined string: compute rel_dir = os.path.relpath(root, path),
normalize it, and if rel_dir == '.' return the filename as a single-element list
[f], otherwise return rel_dir.split(os.sep) + [f] (or equivalent sequence of
path components) so ListTests' concatenation and Contains(path, tst) will work
correctly.

---

Outside diff comments:
In `@src/nsolid/nsolid_api.cc`:
- Around line 2095-2130: The HTTP datapoints are being aggregated into
exponential histograms using the raw millisecond `value`, causing a 1000× scale
error; update the places that call aggregate_into_hashmap for
MetricsStream::Type::kHttpClient and kHttpServer so they convert `value` from
milliseconds to seconds (e.g., compute a double seconds_value = value / 1000.0)
and pass that converted value into the aggregation instead of the original
`value` when invoking aggregate_into_hashmap (leave other uses of `value` for ms
summaries unchanged); locate usages by the symbols aggregate_into_hashmap,
BuildMetricAttributes, http_client_hashmap_, and http_server_hashmap_.

---

Nitpick comments:
In `@test/addons/nsolid-statsdagent/binding.gyp`:
- Around line 10-11: The two OpenTelemetry include paths
'../../../deps/opentelemetry-cpp/api/include' and
'../../../deps/opentelemetry-cpp/sdk/include' should be removed from this
binding.gyp and added into the shared test/addons/common.gypi include_dirs list;
update common.gypi to include those two paths (so build configs reference them
centrally) and then remove the duplicated entries from the other addon
binding.gyp files (nsolid-log-hooks, nsolid-eventloop-cmd, nsolid-dispatchqueue)
so all addons rely on the shared common.gypi include_dirs instead of repeating
the same paths here.

In `@test/integrations/express/v4/test-http-route.mjs`:
- Around line 122-137: The metricsPromise can hang indefinitely if metrics never
arrive; update the Promise created for metricsPromise (the
grpcServer.on('metrics', ...) handler) to include a timeout (e.g., setTimeout)
that rejects the promise after a reasonable interval, ensure you clear that
timer and remove the 'metrics' listener when either resolved or rejected, and
make sure the rejection message includes context (mentioning
checkHttpRouteAttribute, foundRoutes and expectedRoutes) so the test fails fast
instead of hanging.

In `@test/integrations/express/v5/package.json`:
- Around line 5-6: The package.json dependency version for express is
inconsistent with the Makefile's pinned install in test-integrations-prereqs;
update the "dependencies" entry in package.json (the "express" key) to match the
pinned version used by the Makefile (e.g., change "express": "^5.0.0" to
"express": "5") or remove the dependency entirely if package.json is not
intended to be authoritative for these fixtures so the declared versions no
longer conflict with the test setup.

In `@test/integrations/express/v5/test-http-route.mjs`:
- Around line 122-137: The metricsPromise created around
grpcServer.on('metrics') can hang forever; update the Promise logic in
test-http-route.mjs (the metricsPromise that listens to 'metrics' and uses
checkHttpRouteAttribute, foundRoutes, and expectedRoutes) to implement a
configurable timeout: start a timer when the Promise is created that rejects the
Promise with a clear error after e.g. 5-10s, and ensure you clear the timer and
remove the 'metrics' listener on both resolve and reject to avoid leaks; keep
existing route-found logic but add the timeout and proper cleanup.
- Around line 31-67: Extract the duplicated test helpers into a shared test
utility module and import them into the Express v4, Express v5, and Fastify v5
tests: move the getAttr function and checkHttpRouteAttribute function logic into
a new/shared helper (e.g., export getAttr and checkHttpRouteAttribute from the
helper), replace the local implementations in test-http-route.mjs (and the other
duplicated test files) with imports from that helper, and update tests to call
the imported getAttr and checkHttpRouteAttribute to remove duplication while
preserving existing behavior and signatures.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 99eb215f-fea3-440a-a0dd-c8947a5772c4

📥 Commits

Reviewing files that changed from the base of the PR and between 369ce34 and 9dc1558.

📒 Files selected for processing (30)
  • .gitignore
  • Makefile
  • agents/grpc/src/grpc_agent.cc
  • agents/grpc/src/grpc_agent.h
  • agents/otlp/src/otlp_common.cc
  • agents/otlp/src/otlp_common.h
  • deps/opentelemetry-cpp/otlp-http-exporter.gyp
  • lib/_http_client.js
  • lib/_http_server.js
  • lib/internal/nsolid_diag.js
  • lib/internal/nsolid_http_consts.js
  • lib/nsolid.js
  • src/nsolid.h
  • src/nsolid/nsolid_api.cc
  • src/nsolid/nsolid_api.h
  • src/nsolid/nsolid_bindings.h
  • src/nsolid/nsolid_trace.h
  • test/addons/nsolid-dispatchqueue/binding.gyp
  • test/addons/nsolid-eventloop-cmd/binding.gyp
  • test/addons/nsolid-log-hooks/binding.gyp
  • test/addons/nsolid-statsdagent/binding.gyp
  • test/agents/test-grpc-http-histograms.mjs
  • test/integrations/express/v4/package.json
  • test/integrations/express/v4/test-http-route.mjs
  • test/integrations/express/v5/package.json
  • test/integrations/express/v5/test-http-route.mjs
  • test/integrations/fastify/v5/package.json
  • test/integrations/fastify/v5/test-http-route.mjs
  • test/integrations/testcfg.py
  • test/parallel/test-bootstrap-modules.js

@santigimeno santigimeno force-pushed the santi/exp_histogram_metrics branch from 9dc1558 to 4759af6 Compare March 9, 2026 11:18
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@agents/grpc/src/grpc_agent.cc`:
- Around line 955-983: The loop processing dequeued ExtMetricsStor items
currently re-inserts metrics for threads that were removed (env_msg_cb_ erased
thr_metrics_cache_) because queued items can arrive after deletion; before
calling otlp::fill_env_metrics, otlp::fill_http_histograms or updating
thr_metrics_cache_ and thr_metrics_hist_prev_end_ts_ms_, check whether
ext_stor.stor.thread_id exists in env_metrics_map_ (or equivalent live-thread
set) and if not simply skip/purge this dequeued item (continue) so dead threads
are not exported or re-added to thr_metrics_cache_; apply this check around the
body handling ExtMetricsStor (using ExtMetricsStor, thr_metrics_msg_q_.dequeue,
thr_metrics_cache_, thr_metrics_hist_prev_end_ts_ms_, and env_metrics_map_ names
to locate the code).

In `@lib/internal/nsolid_diag.js`:
- Around line 381-385: The current handler in
dc.subscribe('http.server.response.finish', ...) sets request[nsolid_route_s] to
request.route.path which omits mounted router prefixes; change it to compose the
full route by using (request.baseUrl || '') + request.route.path (or equivalent)
and assign that to request[nsolid_route_s] so mounted routes like
app.use('/api', router) are preserved; ensure you handle undefined baseUrl and
avoid accidental duplicate slashes when concatenating, and keep the check that
skips setting when request[nsolid_route_s] already exists.

In `@lib/internal/nsolid_http_consts.js`:
- Around line 21-26: The kHttpVersionMap is missing the '2.0' entry so raw
httpVersion === "2.0" falls through to kHttpVersionOther; add a '2.0':
nsolid_consts.kHttpVersion2 mapping to kHttpVersionMap (next to the existing '2'
key) so both "2" and "2.0" resolve to nsolid_consts.kHttpVersion2; leave the
call sites (where httpVersion strings are passed directly into kHttpVersionMap
in lib/_http_server.js and lib/_http_client.js) unchanged.

In `@Makefile`:
- Line 315: The CI path `test-ci-js` is missing the new
`test-integrations-prereqs` dependency and can skip integration setup; update
the Makefile so `test-ci-js` depends on `test-integrations-prereqs` (or ensure
`$(CI_JS_SUITES)` target pulls that prereq) so it runs the same integration
prerequisites as `jstest` and `test-ci`; modify the `test-ci-js` target to
include `test-integrations-prereqs` alongside existing deps.

In `@src/nsolid/nsolid_api.cc`:
- Around line 1276-1281: The code writes gen_ptiles_interval in UpdateConfig()
without synchronization while gen_ptiles_cb_() (run on thread_loop_) reads it,
causing a data race; fix by making gen_ptiles_interval access thread-safe —
either change the variable to std::atomic<uint64_t> and use atomic loads/stores,
or protect both writers and readers with the same mutex used by thread_loop_ (or
a new mutex) so UpdateConfig() updates and gen_ptiles_cb_() reads the value
under the same lock; update all references (writes in UpdateConfig and reads in
gen_ptiles_cb_) to use the chosen synchronization primitive.
- Around line 124-153: BuildMetricAttributes currently only inserts "url.scheme"
for server metrics, so client metrics lack the URL scheme; update
BuildMetricAttributes (function BuildMetricAttributes) to insert {"url.scheme",
std::string(scheme)} in the client branch as well, where scheme =
HttpUrlSchemeToString(http.url_scheme). Then add a url_scheme parameter to
PushClientBucketImpl (and propagate it through SlowPushClientBucket and
FastPushClientBucket constructors/signatures) so the client-side code can pass
the real scheme instead of the hardcoded HttpUrlScheme::kOther, and update all
C++ callers accordingly; finally update the three JS call sites
(lib/internal/nsolid_diag.js and lib/_http_client.js locations that construct
client bucket instances) to pass the actual URL scheme value instead of kOther
so PushClientBucketImpl receives and forwards the correct url_scheme.

In `@src/nsolid/nsolid_api.h`:
- Around line 414-417: The hard cap of 100 for route-level histograms is too
low: replace the fixed AttributesHashMap(100) usages for http_client_hashmap_
and http_server_hashmap_ with a configurable limit, e.g., use
aggregation_config_->cardinality_limit_ (or fall back to a higher default like
2000) when constructing AttributesHashMap in the code that initializes
http_client_hashmap_ and http_server_hashmap_, mirroring the approach in
sync_metric_storage.cc so the cardinality can be tuned instead of being
hardcoded.

In `@test/integrations/express/v4/test-http-route.mjs`:
- Around line 69-171: The test's runTest function can leak servers on failures —
wrap the main test body (everything after starting grpcServer and Express server
up to awaiting metricsPromise) in a try/finally so cleanup always runs; replace
awaiting metricsPromise directly with a raced promise (e.g.,
Promise.race([metricsPromise, timeoutPromise]) with a short timeout) so the test
fails fast if metrics never arrive; in the finally block always destroy tracked
connections and close the Express server and grpcServer (references: runTest,
grpcServer, server, connections, metricsPromise) to ensure no handles are left
open.

In `@test/integrations/express/v5/test-http-route.mjs`:
- Around line 31-66: The test currently only inspects
resourceMetrics[0].scopeMetrics[0].metrics so it can miss the expected histogram
when multiple resource or scope batches are present; update
checkHttpRouteAttribute to iterate over all resourceMetrics and, within each,
all scopeMetrics and their metrics (use nested for-loops over resourceMetrics
and scopeMetrics instead of indexing [0][0]) and then process each
scopeMetrics[i].metrics array as before (keep the existing metric name/type
checks and dataPoints handling and retain use of getAttr and foundRoutes logic).
- Around line 19-23: The current prereq guard uses existsSync(join(__dirname,
'node_modules')) and does a console.log plus process.exit(0), which masks
missing integration dependencies; replace this with a failing assertion or
non-zero exit so the test fails loudly (e.g., use assert.ok or throw new Error
when existsSync(...) is false) so the test run does not silently skip Express v5
coverage; update the block around existsSync, join and __dirname to assert the
prereq rather than calling process.exit(0).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2386a9f0-037b-485b-aeaa-0305b220b52f

📥 Commits

Reviewing files that changed from the base of the PR and between 9dc1558 and 4759af6.

📒 Files selected for processing (30)
  • .gitignore
  • Makefile
  • agents/grpc/src/grpc_agent.cc
  • agents/grpc/src/grpc_agent.h
  • agents/otlp/src/otlp_common.cc
  • agents/otlp/src/otlp_common.h
  • deps/opentelemetry-cpp/otlp-http-exporter.gyp
  • lib/_http_client.js
  • lib/_http_server.js
  • lib/internal/nsolid_diag.js
  • lib/internal/nsolid_http_consts.js
  • src/nsolid.h
  • src/nsolid/nsolid_api.cc
  • src/nsolid/nsolid_api.h
  • src/nsolid/nsolid_bindings.h
  • src/nsolid/nsolid_metrics_types.h
  • src/nsolid/nsolid_trace.h
  • test/addons/nsolid-dispatchqueue/binding.gyp
  • test/addons/nsolid-eventloop-cmd/binding.gyp
  • test/addons/nsolid-log-hooks/binding.gyp
  • test/addons/nsolid-statsdagent/binding.gyp
  • test/agents/test-grpc-http-histograms.mjs
  • test/integrations/express/v4/package.json
  • test/integrations/express/v4/test-http-route.mjs
  • test/integrations/express/v5/package.json
  • test/integrations/express/v5/test-http-route.mjs
  • test/integrations/fastify/v5/package.json
  • test/integrations/fastify/v5/test-http-route.mjs
  • test/integrations/testcfg.py
  • test/parallel/test-bootstrap-modules.js
🚧 Files skipped from review as they are similar to previous changes (14)
  • .gitignore
  • test/integrations/express/v4/package.json
  • test/parallel/test-bootstrap-modules.js
  • test/addons/nsolid-dispatchqueue/binding.gyp
  • test/integrations/fastify/v5/package.json
  • test/addons/nsolid-eventloop-cmd/binding.gyp
  • test/integrations/testcfg.py
  • test/addons/nsolid-statsdagent/binding.gyp
  • lib/_http_client.js
  • test/agents/test-grpc-http-histograms.mjs
  • test/addons/nsolid-log-hooks/binding.gyp
  • test/integrations/fastify/v5/test-http-route.mjs
  • test/integrations/express/v5/package.json
  • src/nsolid/nsolid_bindings.h

Comment on lines +955 to +983
ExtMetricsStor ext_stor;
while (agent->thr_metrics_msg_q_.dequeue(ext_stor)) {
auto& stor = ext_stor.stor;
otlp::fill_env_metrics(metrics, stor, false);
agent->thr_metrics_cache_.insert_or_assign(stor.thread_id, std::move(stor));
uint64_t thread_id = stor.thread_id;
uint64_t start_ts_ms = 0;
auto prev_it = agent->thr_metrics_hist_prev_end_ts_ms_.find(thread_id);
if (prev_it != agent->thr_metrics_hist_prev_end_ts_ms_.end()) {
start_ts_ms = prev_it->second;
}
// Add exponential histogram metrics for HTTP latency.
otlp::fill_http_histograms(
metrics,
stor,
ext_stor.http_client_points,
ext_stor.http_server_points,
false,
start_ts_ms);
agent->thr_metrics_hist_prev_end_ts_ms_.insert_or_assign(
thread_id,
stor.timestamp);
agent->thr_metrics_cache_.insert_or_assign(
thread_id,
CachedThreadMetrics{
std::move(stor),
start_ts_ms,
ext_stor.http_client_points,
ext_stor.http_server_points
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Drop queued samples for threads that are already gone.

env_msg_cb_() now erases thr_metrics_cache_ on thread removal, but thr_metrics_msg_q_ can already contain an ExtMetricsStor harvested just before that delete callback ran. If the delete async is processed first, this loop exports the sample and inserts the dead thread right back into thr_metrics_cache_. Skip or purge queued entries whose thread_id is no longer present in env_metrics_map_ before exporting/caching them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/grpc/src/grpc_agent.cc` around lines 955 - 983, The loop processing
dequeued ExtMetricsStor items currently re-inserts metrics for threads that were
removed (env_msg_cb_ erased thr_metrics_cache_) because queued items can arrive
after deletion; before calling otlp::fill_env_metrics,
otlp::fill_http_histograms or updating thr_metrics_cache_ and
thr_metrics_hist_prev_end_ts_ms_, check whether ext_stor.stor.thread_id exists
in env_metrics_map_ (or equivalent live-thread set) and if not simply skip/purge
this dequeued item (continue) so dead threads are not exported or re-added to
thr_metrics_cache_; apply this check around the body handling ExtMetricsStor
(using ExtMetricsStor, thr_metrics_msg_q_.dequeue, thr_metrics_cache_,
thr_metrics_hist_prev_end_ts_ms_, and env_metrics_map_ names to locate the
code).

Comment on lines +21 to +26
const kHttpVersionMap = {
'__proto__': null,
'1.0': nsolid_consts.kHttpVersion10,
'1.1': nsolid_consts.kHttpVersion11,
'2': nsolid_consts.kHttpVersion2,
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== kHttpVersionMap call sites =="
rg -n -C2 'kHttpVersionMap\s*\[[^\]]+\]' lib

echo
echo "== Nearby HTTP version normalization logic =="
rg -n -C2 '\bhttpVersion(Major|Minor)?\b' lib/_http_client.js lib/_http_server.js

Repository: nodesource/nsolid

Length of output: 2830


🌐 Web query:

In Node.js HTTP/2 compatibility API, what exact string does httpVersionexpose on request/response objects:2or2.0?

💡 Result:

In Node.js’s HTTP/2 Compatibility API, req.httpVersion / res.httpVersion exposes the string "2.0" (not "2"). [1][2]


Add '2.0' key to kHttpVersionMap for HTTP/2 compatibility.

The Node.js HTTP/2 Compatibility API exposes httpVersion as "2.0" on request/response objects. Without this key, HTTP/2 traffic will fall through to kHttpVersionOther, losing protocol attribution. Both call sites in lib/_http_server.js:331 and lib/_http_client.js:872 pass raw httpVersion strings directly to the map without normalization.

Required change
 const kHttpVersionMap = {
   '__proto__': null,
   '1.0': nsolid_consts.kHttpVersion10,
   '1.1': nsolid_consts.kHttpVersion11,
   '2': nsolid_consts.kHttpVersion2,
+  '2.0': nsolid_consts.kHttpVersion2,
 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/internal/nsolid_http_consts.js` around lines 21 - 26, The kHttpVersionMap
is missing the '2.0' entry so raw httpVersion === "2.0" falls through to
kHttpVersionOther; add a '2.0': nsolid_consts.kHttpVersion2 mapping to
kHttpVersionMap (next to the existing '2' key) so both "2" and "2.0" resolve to
nsolid_consts.kHttpVersion2; leave the call sites (where httpVersion strings are
passed directly into kHttpVersionMap in lib/_http_server.js and
lib/_http_client.js) unchanged.

Comment on lines +1276 to +1281
it = config.find("interval");
if (it != config.end() && !it->is_null()) {
// If interval changes, next timer cb the timer is reset.
uint64_t interval = it->get<uint64_t>();
gen_ptiles_interval = interval;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "# Declaration and all accesses to gen_ptiles_interval"
rg -n -C2 '\bgen_ptiles_interval\b' src/nsolid/nsolid_api.cc

echo
echo "# Timer ownership and callback context"
rg -n -C2 'env_list_routine_|gen_ptiles_cb_|UpdateConfig\(' src/nsolid/nsolid_api.cc

Repository: nodesource/nsolid

Length of output: 3240


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Check for existing atomic usage in src/nsolid/nsolid_api.cc ==="
rg -n 'std::atomic|atomic_utils' src/nsolid/nsolid_api.cc | head -20

echo
echo "=== Check atomic_utils.h availability in src/nsolid/ ==="
fd -t f 'atomic_utils' src/nsolid/

echo
echo "=== Check mutex/lock patterns in UpdateConfig function ==="
sed -n '1226,1315p' src/nsolid/nsolid_api.cc | head -50

Repository: nodesource/nsolid

Length of output: 2102


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Full UpdateConfig function starting at line 1226 ==="
sed -n '1226,1310p' src/nsolid/nsolid_api.cc

Repository: nodesource/nsolid

Length of output: 2641


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Verify gen_ptiles_cb_ is called on thread_loop_ ==="
sed -n '1738,1760p' src/nsolid/nsolid_api.cc
echo
echo "=== Verify timer callback context ==="
sed -n '1833,1845p' src/nsolid/nsolid_api.cc

Repository: nodesource/nsolid

Length of output: 1496


Synchronize gen_ptiles_interval across threads.

UpdateConfig() writes gen_ptiles_interval at line 1280 without holding any lock, while gen_ptiles_cb_() reads it at lines 1885 and 1887 on thread_loop_. This is a data race; on 32-bit builds the value can tear.

🔧 Minimal fix
-uint64_t gen_ptiles_interval = 5000;
+std::atomic<uint64_t> gen_ptiles_interval{5000};
-      uint64_t interval = it->get<uint64_t>();
-      gen_ptiles_interval = interval;
+      uint64_t interval = it->get<uint64_t>();
+      gen_ptiles_interval.store(interval, std::memory_order_relaxed);
-  if (gen_ptiles_interval != interval) {
+  uint64_t next = gen_ptiles_interval.load(std::memory_order_relaxed);
+  if (next != interval) {
     int er = envlist->gen_ptiles_timer_.start(
-      gen_ptiles_cb_, gen_ptiles_interval, gen_ptiles_interval);
+      gen_ptiles_cb_, next, next);
     CHECK_EQ(er, 0);
   }

Also applies to: 1882-1888

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nsolid/nsolid_api.cc` around lines 1276 - 1281, The code writes
gen_ptiles_interval in UpdateConfig() without synchronization while
gen_ptiles_cb_() (run on thread_loop_) reads it, causing a data race; fix by
making gen_ptiles_interval access thread-safe — either change the variable to
std::atomic<uint64_t> and use atomic loads/stores, or protect both writers and
readers with the same mutex used by thread_loop_ (or a new mutex) so
UpdateConfig() updates and gen_ptiles_cb_() reads the value under the same lock;
update all references (writes in UpdateConfig and reads in gen_ptiles_cb_) to
use the chosen synchronization primitive.

Comment on lines +414 to +417
UniqueAttributesHashMap
http_client_hashmap_ = std::make_unique<AttributesHashMap>(100);
UniqueAttributesHashMap
http_server_hashmap_ = std::make_unique<AttributesHashMap>(100);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

hdr="$(fd '^attributes_hashmap\.h$' deps/opentelemetry-cpp | head -n1)"
if [ -z "$hdr" ]; then
  echo "attributes_hashmap.h not found" >&2
  exit 1
fi

echo "== Constructor / size semantics =="
rg -n -C4 'class AttributesHashMap|AttributesHashMap\s*\(' "$hdr"

echo
echo "== Limit / overflow behavior =="
rg -n -C4 'overflow|max_size_|cardinality|GetOrSetDefault' "$hdr" deps/opentelemetry-cpp/sdk/src

echo
echo "== NSolid call sites =="
rg -n -C2 'AttributesHashMap\s*\(\s*100\s*\)' src agents

Repository: nodesource/nsolid

Length of output: 16762


Reconsider the cardinality limit for route-level HTTP histograms.

AttributesHashMap(100) is a hard cardinality cap, not preallocation. With {method, route, scheme, host, port} as the key, 100 distinct combinations per interval is easy to exceed. When exceeded, new series route to an overflow bucket (otel.metrics.overflow=true), which defeats the route-level breakdown. The OpenTelemetry default is 2000; either make this configurable via aggregation_config_->cardinality_limit_ (like sync_metric_storage.cc does) or raise the limit if route-level metrics are a core feature.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nsolid/nsolid_api.h` around lines 414 - 417, The hard cap of 100 for
route-level histograms is too low: replace the fixed AttributesHashMap(100)
usages for http_client_hashmap_ and http_server_hashmap_ with a configurable
limit, e.g., use aggregation_config_->cardinality_limit_ (or fall back to a
higher default like 2000) when constructing AttributesHashMap in the code that
initializes http_client_hashmap_ and http_server_hashmap_, mirroring the
approach in sync_metric_storage.cc so the cardinality can be tuned instead of
being hardcoded.

Comment on lines +19 to +23
// Skip test if dependencies not installed
if (!existsSync(join(__dirname, 'node_modules'))) {
console.log('SKIP: node_modules not found. Run "make test-integrations-prereqs" first.');
process.exit(0);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fail fast when integration deps are missing.

This turns a broken integration setup into a green run, so CI can silently stop exercising the Express v5 coverage added in this PR. Assert the prereq instead of exiting with 0.

Suggested change
-// Skip test if dependencies not installed
-if (!existsSync(join(__dirname, 'node_modules'))) {
-  console.log('SKIP: node_modules not found. Run "make test-integrations-prereqs" first.');
-  process.exit(0);
-}
+assert.ok(
+  existsSync(join(__dirname, 'node_modules')),
+  'Missing integration deps. Run "make test-integrations-prereqs" first.',
+);

Based on learnings, the nsolid test suite expects prerequisites to be prepared ahead of time rather than silently skipped with fs.existsSync guards.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Skip test if dependencies not installed
if (!existsSync(join(__dirname, 'node_modules'))) {
console.log('SKIP: node_modules not found. Run "make test-integrations-prereqs" first.');
process.exit(0);
}
// Test will fail if dependencies not installed
assert.ok(
existsSync(join(__dirname, 'node_modules')),
'Missing integration deps. Run "make test-integrations-prereqs" first.',
);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integrations/express/v5/test-http-route.mjs` around lines 19 - 23, The
current prereq guard uses existsSync(join(__dirname, 'node_modules')) and does a
console.log plus process.exit(0), which masks missing integration dependencies;
replace this with a failing assertion or non-zero exit so the test fails loudly
(e.g., use assert.ok or throw new Error when existsSync(...) is false) so the
test run does not silently skip Express v5 coverage; update the block around
existsSync, join and __dirname to assert the prereq rather than calling
process.exit(0).

RafaelGSS
RafaelGSS previously approved these changes Mar 9, 2026
}
});

dc.subscribe('tracing:fastify.request.handler:start', ({ request, route }) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we map it per-library? express, fastify, ...

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (8)
Makefile (1)

608-608: ⚠️ Potential issue | 🟠 Major

Keep the JS-only CI path in sync with the new integration prereqs.

Line 608 wires test-ci, but test-ci-js still doesn't depend on test-integrations-prereqs. On a clean worker, the new Express/Fastify route tests can hit their node_modules skip path instead of exercising the coverage.

🧩 Suggested fix
-test-ci-js: | clear-stalled ## Build and test JavaScript with building anything else.
+test-ci-js: | clear-stalled test-integrations-prereqs ## Build and test JavaScript with building anything else.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` at line 608, The JS-only CI target test-ci-js is missing the new
integration prereq so update the Makefile to add test-integrations-prereqs to
test-ci-js's dependency list (mirror what you did for test-ci); locate the
test-ci-js target in the Makefile and append test-integrations-prereqs to its
prerequisites so the Express/Fastify route tests run real installs instead of
hitting the node_modules skip path.
lib/internal/nsolid_http_consts.js (1)

21-26: ⚠️ Potential issue | 🟠 Major

Map "2.0" to the HTTP/2 enum as well.

The HTTP/2 compatibility API reports httpVersion as "2.0". Without that key, callers that pass the raw version string fall back to kHttpVersionOther and lose protocol attribution.

🧩 Suggested fix
 const kHttpVersionMap = {
   '__proto__': null,
   '1.0': nsolid_consts.kHttpVersion10,
   '1.1': nsolid_consts.kHttpVersion11,
   '2': nsolid_consts.kHttpVersion2,
+  '2.0': nsolid_consts.kHttpVersion2,
 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/internal/nsolid_http_consts.js` around lines 21 - 26, kHttpVersionMap is
missing the "2.0" key so raw httpVersion values of "2.0" map to
kHttpVersionOther; add a "2.0": nsolid_consts.kHttpVersion2 entry to
kHttpVersionMap (the object defined as kHttpVersionMap) so both "2" and "2.0"
resolve to the HTTP/2 enum value (nsolid_consts.kHttpVersion2).
lib/internal/nsolid_diag.js (1)

381-385: ⚠️ Potential issue | 🟠 Major

Compose the mounted prefix into Express http.route.

request.route.path is only the leaf pattern. With mounted routers, Line 384 drops request.baseUrl, so different handlers can collapse into the same route bucket.

🧩 Suggested fix
-    request[nsolid_route_s] = request.route.path;
+    request[nsolid_route_s] =
+      `${request.baseUrl || ''}${request.route.path}`;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/internal/nsolid_diag.js` around lines 381 - 385, The current dc.subscribe
handler stores only request.route.path into the nsolid_route_s symbol, losing
mounted router prefixes from request.baseUrl; update the handler in
dc.subscribe('http.server.response.finish', ...) so it composes the full route
by prepending request.baseUrl (if present) to request.route.path before
assigning request[nsolid_route_s], e.g., combine (request.baseUrl || '') with
request.route.path and normalize to avoid duplicate slashes so mounted routes
are preserved in the unified nsolid_route_s value.
src/nsolid/nsolid_api.cc (2)

1276-1281: ⚠️ Potential issue | 🔴 Critical

Synchronize gen_ptiles_interval across threads.

Line 1280 writes this global from EnvList::UpdateConfig(), while Lines 1885-1887 read it on thread_loop_ without any lock or atomic. That is a real C++ data race, and on 32-bit builds the value can tear.

Also applies to: 1882-1888

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nsolid/nsolid_api.cc` around lines 1276 - 1281, The global
gen_ptiles_interval is written in EnvList::UpdateConfig() and read in
thread_loop_, causing a data race and potential tearing; make access thread-safe
by either changing gen_ptiles_interval to std::atomic<uint64_t> and using atomic
load/store where EnvList::UpdateConfig() assigns it and where thread_loop_ reads
it, or protect both the writer in EnvList::UpdateConfig() and the readers in
thread_loop_ with the same mutex; update all references to use the chosen
synchronization (atomic methods or lock/unlock) so both update and read sites
consistently use the same mechanism.

136-146: ⚠️ Potential issue | 🟠 Major

Propagate the client URL scheme end-to-end.

Line 144 only emits url.scheme in the server branch, and Line 2452 still hardcodes HttpUrlScheme::kOther for client buckets. HTTP and HTTPS client requests therefore collapse into the same http.client.request.duration series.

Also applies to: 2440-2456

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nsolid/nsolid_api.cc` around lines 136 - 146, The client-side code is not
propagating the HTTP URL scheme: ensure you insert url.scheme for client
requests the same way as server requests by using
HttpUrlSchemeToString(http.url_scheme) and attrs.insert({"url.scheme",
std::string(scheme)}) when handling client metrics
(MetricsStream::Type::kHttpClient branch or the client-bucket creation path),
and when creating client buckets replace the hardcoded HttpUrlScheme::kOther
with the actual http.url_scheme so that http.client.request.duration series
distinguish HTTP vs HTTPS.
test/integrations/express/v5/test-http-route.mjs (2)

19-23: ⚠️ Potential issue | 🟡 Minor

Fail fast when the integration deps are missing.

Exiting 0 here turns a broken setup into a green run, so CI can silently stop exercising the Express v5 route coverage added in this PR.

Minimal fix
-// Skip test if dependencies not installed
-if (!existsSync(join(__dirname, 'node_modules'))) {
-  console.log('SKIP: node_modules not found. Run "make test-integrations-prereqs" first.');
-  process.exit(0);
-}
+assert.ok(
+  existsSync(join(__dirname, 'node_modules')),
+  'Missing integration deps. Run "make test-integrations-prereqs" first.',
+);

Based on learnings, the nsolid test suite expects prerequisites to be prepared ahead of time rather than silently skipped.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integrations/express/v5/test-http-route.mjs` around lines 19 - 23, The
test currently checks for the presence of node_modules using
existsSync(join(__dirname, 'node_modules')) and calls process.exit(0) to skip
when missing; change this to fail fast by exiting with a non-zero status (e.g.,
process.exit(1)) or throw an Error instead so CI marks the run as failing when
integration dependencies are not installed; update the branch that uses
existsSync/join/__dirname to call process.exit(1) (or throw new Error('Missing
integration prerequisites: run "make test-integrations-prereqs"')) and ensure
the log message remains descriptive.

31-66: ⚠️ Potential issue | 🟡 Minor

Walk every OTLP resource/scope batch here.

This helper only inspects resourceMetrics[0].scopeMetrics[0], so it will miss the expected histogram when another resource or scope batch is emitted first.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integrations/express/v5/test-http-route.mjs` around lines 31 - 66, The
helper checkHttpRouteAttribute only inspects resourceMetrics[0].scopeMetrics[0],
which misses metrics in other resource/scope batches; update
checkHttpRouteAttribute to loop over all resourceMetrics and for each resource
loop over its scopeMetrics before iterating metrics, keeping existing logic that
filters metric.name === 'http.server.request.duration' and metric.data ===
'exponentialHistogram', then process metric.exponentialHistogram.dataPoints, use
getAttr to find 'http.route' and add unique route strings into foundRoutes (use
the same expectedRoutes and foundRoutes checks).
agents/grpc/src/grpc_agent.cc (1)

955-983: ⚠️ Potential issue | 🟠 Major

Drop queued samples for threads that are already gone.

env_msg_cb_() now erases the cache on deletion, but this loop will still export and reinsert any ExtMetricsStor that was already queued for the same thread_id. That resurrects dead threads in thr_metrics_cache_ and can leak stale histogram points into later metrics responses.

Minimal fix
   while (agent->thr_metrics_msg_q_.dequeue(ext_stor)) {
     auto& stor = ext_stor.stor;
+    if (agent->env_metrics_map_.find(stor.thread_id) ==
+        agent->env_metrics_map_.end()) {
+      continue;
+    }
     otlp::fill_env_metrics(metrics, stor, false);
     uint64_t thread_id = stor.thread_id;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/grpc/src/grpc_agent.cc` around lines 955 - 983, The loop must drop
queued ExtMetricsStor samples for thread_ids that were deleted (to avoid
resurrecting them); after dequeuing ext_stor and extracting thread_id, check
whether agent->thr_metrics_cache_.contains(thread_id) (or find(thread_id) !=
end()) and if it is not present, continue (skip processing and do not insert
into thr_metrics_hist_prev_end_ts_ms_ or thr_metrics_cache_). Update the block
around agent->thr_metrics_msg_q_.dequeue/auto& stor to bail out early for
missing thread IDs so we never reinsert or update CachedThreadMetrics for dead
threads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/internal/nsolid_diag.js`:
- Around line 184-190: The negotiated protocol (h1 vs 2) captured in the
undici:client:connected handler is only stored on the span but not propagated to
requests, causing nsolidApi.pushClientBucket in undici:request:headers to always
use kHttpVersion11; update the undici:request:create path to copy the negotiated
protocol (e.g. set request.__nsolidNegotiatedHttpVersion or similar) onto the
request object when the client connected handler determines it, then read that
property in the undici:request:headers handler and pass the corresponding
protocol constant (instead of kHttpVersion11) into nsolidApi.pushClientBucket so
HTTP/2 requests are categorized correctly.

In `@test/integrations/fastify/v5/test-http-route.mjs`:
- Around line 110-125: The metrics wait currently never times out and only
closes servers after resolve; wrap the existing metricsPromise (the code that
listens on grpcServer.on('metrics'...) and resolves when foundRoutes.size ===
expectedRoutes.size) in a Promise.race against a timeout promise that rejects
after a sensible timeout, and ensure you await the raced promise inside a
try/catch/finally where the finally always calls fastify.close() and
grpcServer.close() (and handles errors from close). Apply the same pattern to
the second similar block around lines 145-151 so both metric waits will timeout
and always teardown the servers.

---

Duplicate comments:
In `@agents/grpc/src/grpc_agent.cc`:
- Around line 955-983: The loop must drop queued ExtMetricsStor samples for
thread_ids that were deleted (to avoid resurrecting them); after dequeuing
ext_stor and extracting thread_id, check whether
agent->thr_metrics_cache_.contains(thread_id) (or find(thread_id) != end()) and
if it is not present, continue (skip processing and do not insert into
thr_metrics_hist_prev_end_ts_ms_ or thr_metrics_cache_). Update the block around
agent->thr_metrics_msg_q_.dequeue/auto& stor to bail out early for missing
thread IDs so we never reinsert or update CachedThreadMetrics for dead threads.

In `@lib/internal/nsolid_diag.js`:
- Around line 381-385: The current dc.subscribe handler stores only
request.route.path into the nsolid_route_s symbol, losing mounted router
prefixes from request.baseUrl; update the handler in
dc.subscribe('http.server.response.finish', ...) so it composes the full route
by prepending request.baseUrl (if present) to request.route.path before
assigning request[nsolid_route_s], e.g., combine (request.baseUrl || '') with
request.route.path and normalize to avoid duplicate slashes so mounted routes
are preserved in the unified nsolid_route_s value.

In `@lib/internal/nsolid_http_consts.js`:
- Around line 21-26: kHttpVersionMap is missing the "2.0" key so raw httpVersion
values of "2.0" map to kHttpVersionOther; add a "2.0":
nsolid_consts.kHttpVersion2 entry to kHttpVersionMap (the object defined as
kHttpVersionMap) so both "2" and "2.0" resolve to the HTTP/2 enum value
(nsolid_consts.kHttpVersion2).

In `@Makefile`:
- Line 608: The JS-only CI target test-ci-js is missing the new integration
prereq so update the Makefile to add test-integrations-prereqs to test-ci-js's
dependency list (mirror what you did for test-ci); locate the test-ci-js target
in the Makefile and append test-integrations-prereqs to its prerequisites so the
Express/Fastify route tests run real installs instead of hitting the
node_modules skip path.

In `@src/nsolid/nsolid_api.cc`:
- Around line 1276-1281: The global gen_ptiles_interval is written in
EnvList::UpdateConfig() and read in thread_loop_, causing a data race and
potential tearing; make access thread-safe by either changing
gen_ptiles_interval to std::atomic<uint64_t> and using atomic load/store where
EnvList::UpdateConfig() assigns it and where thread_loop_ reads it, or protect
both the writer in EnvList::UpdateConfig() and the readers in thread_loop_ with
the same mutex; update all references to use the chosen synchronization (atomic
methods or lock/unlock) so both update and read sites consistently use the same
mechanism.
- Around line 136-146: The client-side code is not propagating the HTTP URL
scheme: ensure you insert url.scheme for client requests the same way as server
requests by using HttpUrlSchemeToString(http.url_scheme) and
attrs.insert({"url.scheme", std::string(scheme)}) when handling client metrics
(MetricsStream::Type::kHttpClient branch or the client-bucket creation path),
and when creating client buckets replace the hardcoded HttpUrlScheme::kOther
with the actual http.url_scheme so that http.client.request.duration series
distinguish HTTP vs HTTPS.

In `@test/integrations/express/v5/test-http-route.mjs`:
- Around line 19-23: The test currently checks for the presence of node_modules
using existsSync(join(__dirname, 'node_modules')) and calls process.exit(0) to
skip when missing; change this to fail fast by exiting with a non-zero status
(e.g., process.exit(1)) or throw an Error instead so CI marks the run as failing
when integration dependencies are not installed; update the branch that uses
existsSync/join/__dirname to call process.exit(1) (or throw new Error('Missing
integration prerequisites: run "make test-integrations-prereqs"')) and ensure
the log message remains descriptive.
- Around line 31-66: The helper checkHttpRouteAttribute only inspects
resourceMetrics[0].scopeMetrics[0], which misses metrics in other resource/scope
batches; update checkHttpRouteAttribute to loop over all resourceMetrics and for
each resource loop over its scopeMetrics before iterating metrics, keeping
existing logic that filters metric.name === 'http.server.request.duration' and
metric.data === 'exponentialHistogram', then process
metric.exponentialHistogram.dataPoints, use getAttr to find 'http.route' and add
unique route strings into foundRoutes (use the same expectedRoutes and
foundRoutes checks).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f0e376c6-6a33-4adc-ac3f-db1b928c37c0

📥 Commits

Reviewing files that changed from the base of the PR and between 4759af6 and 760c373.

📒 Files selected for processing (30)
  • .gitignore
  • Makefile
  • agents/grpc/src/grpc_agent.cc
  • agents/grpc/src/grpc_agent.h
  • agents/otlp/src/otlp_common.cc
  • agents/otlp/src/otlp_common.h
  • deps/opentelemetry-cpp/otlp-http-exporter.gyp
  • lib/_http_client.js
  • lib/_http_server.js
  • lib/internal/nsolid_diag.js
  • lib/internal/nsolid_http_consts.js
  • src/nsolid.h
  • src/nsolid/nsolid_api.cc
  • src/nsolid/nsolid_api.h
  • src/nsolid/nsolid_bindings.h
  • src/nsolid/nsolid_metrics_types.h
  • src/nsolid/nsolid_trace.h
  • test/addons/nsolid-dispatchqueue/binding.gyp
  • test/addons/nsolid-eventloop-cmd/binding.gyp
  • test/addons/nsolid-log-hooks/binding.gyp
  • test/addons/nsolid-statsdagent/binding.gyp
  • test/agents/test-grpc-http-histograms.mjs
  • test/integrations/express/v4/package.json
  • test/integrations/express/v4/test-http-route.mjs
  • test/integrations/express/v5/package.json
  • test/integrations/express/v5/test-http-route.mjs
  • test/integrations/fastify/v5/package.json
  • test/integrations/fastify/v5/test-http-route.mjs
  • test/integrations/testcfg.py
  • test/parallel/test-bootstrap-modules.js
🚧 Files skipped from review as they are similar to previous changes (13)
  • test/integrations/fastify/v5/package.json
  • test/addons/nsolid-log-hooks/binding.gyp
  • test/addons/nsolid-dispatchqueue/binding.gyp
  • test/integrations/express/v4/test-http-route.mjs
  • deps/opentelemetry-cpp/otlp-http-exporter.gyp
  • lib/_http_server.js
  • test/agents/test-grpc-http-histograms.mjs
  • .gitignore
  • src/nsolid/nsolid_bindings.h
  • agents/otlp/src/otlp_common.h
  • test/integrations/testcfg.py
  • src/nsolid/nsolid_metrics_types.h
  • test/parallel/test-bootstrap-modules.js

Comment on lines +110 to +125
const metricsPromise = new Promise((resolve, reject) => {
grpcServer.on('metrics', (data) => {
const found = checkHttpRouteAttribute(data, expectedRoutes);
for (const route of found) {
if (!foundRoutes.has(route)) {
foundRoutes.add(route);
console.log(`Validated route: ${route} (${foundRoutes.size}/${expectedRoutes.size})`);
}
}

if (foundRoutes.size === expectedRoutes.size) {
console.log('All routes validated!');
resolve();
}
});
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add a timeout and guaranteed teardown.

metricsPromise never rejects or times out, and fastify.close() / grpcServer.close() only run after a successful resolve. If metric export regresses, this test can hang the job indefinitely and leave both servers open. Please race the wait against a timeout and move teardown into a finally block.

Also applies to: 145-151

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integrations/fastify/v5/test-http-route.mjs` around lines 110 - 125, The
metrics wait currently never times out and only closes servers after resolve;
wrap the existing metricsPromise (the code that listens on
grpcServer.on('metrics'...) and resolves when foundRoutes.size ===
expectedRoutes.size) in a Promise.race against a timeout promise that rejects
after a sensible timeout, and ensure you await the raced promise inside a
try/catch/finally where the finally always calls fastify.close() and
grpcServer.close() (and handles errors from close). Apply the same pattern to
the second similar block around lines 145-151 so both metric waits will timeout
and always teardown the servers.

@santigimeno santigimeno force-pushed the santi/exp_histogram_metrics branch from 760c373 to 988f7a6 Compare March 9, 2026 20:46
Introduce base2 exponential histograms for HTTP client and
server latency metrics in the NSolid core and wire them into
the OTLP exporter. Maintain per-attribute aggregation using
OTLP attributes hash maps and expose harvested histogram
points for export. Update test addon binding.gyp files so
they link correctly against the new opentelemetry-cpp
dependencies pulled in by nsolid_api.cc.
Extend internal NSolid diagnostics to extract and record the
http.route attribute for routes handled by Express and Fastify.
Normalize framework route patterns and attach them to the
datapoints sent to the native metrics layer so HTTP latency
histograms can be broken down by logical route.
Add a dedicated test that generates HTTP client and server
traffic, then asserts on the exported histogram data over
gRPC/OTLP. Validate bucket counts and attributes to ensure
the new HTTP latency histogram pipeline is exercised end to
end.
Extend the integrations test harness to discover nested
test-*.mjs files and add Express v4/v5 and Fastify v5
scenarios. Update the Makefile to wire these integrations
into the existing test targets. For each framework, exercise
multiple routes and verify that HTTP server latency
histograms include the expected http.route patterns.
@santigimeno santigimeno force-pushed the santi/exp_histogram_metrics branch from 988f7a6 to 00e8e75 Compare March 9, 2026 20:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

♻️ Duplicate comments (2)
src/nsolid/nsolid_api.h (1)

414-417: ⚠️ Potential issue | 🟠 Major

Raise the route histogram cardinality limit.

AttributesHashMap(100) is a hard series cap, not just preallocation. With {method, route, scheme, host, port} keys, 100 distinct combinations is easy to hit; once that happens, new series fold into the overflow bucket and the route breakdown this PR adds degrades quickly. Please make this configurable or raise it substantially.

Expected: the constructor argument maps to the cardinality/max-size limit, and the search finds both HTTP maps pinned at 100.

🔎 Verification script
#!/bin/bash
set -euo pipefail

mapfile -t files < <(fd 'attributes_hashmap\.(h|cc)$')
if [ "${`#files`[@]}" -eq 0 ]; then
  echo "attributes_hashmap.{h,cc} not found" >&2
  exit 1
fi

echo "== AttributesHashMap definition =="
printf '%s\n' "${files[@]}" | xargs rg -n -C4 'class AttributesHashMap|AttributesHashMap\s*\('

echo
echo "== Overflow / cardinality handling =="
printf '%s\n' "${files[@]}" | xargs rg -n -C4 'overflow|max_size_|cardinality|GetOrSetDefault'

echo
echo "== NSolid hard-coded limits =="
rg -n -C2 'AttributesHashMap\s*\(\s*100\s*\)' src
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nsolid/nsolid_api.h` around lines 414 - 417, The hard cap of 100 used
when constructing the two HTTP attribute maps (http_client_hashmap_ and
http_server_hashmap_) makes AttributesHashMap(100) act as a strict cardinality
limit and will cause new {method,route,scheme,host,port} series to fold into the
overflow bucket; update the code that constructs http_client_hashmap_ and
http_server_hashmap_ to use a much larger default (or read a configurable limit)
instead of the literal 100 so the constructor argument reflects the intended
max-size/cardinality, e.g. replace the fixed 100 with a constant or config value
and wire that into the AttributesHashMap(...) calls so both maps no longer use
the hard-coded 100 limit.
agents/grpc/src/grpc_agent.cc (1)

955-983: ⚠️ Potential issue | 🟠 Major

Drop queued samples for deleted threads.

env_msg_cb_() erases the dead thread from the live maps, but this loop can immediately export and reinsert it if an ExtMetricsStor harvested just before deletion is still queued. Guard the export/cache path with a live-thread check.

🛠️ Minimal fix
   while (agent->thr_metrics_msg_q_.dequeue(ext_stor)) {
     auto& stor = ext_stor.stor;
+    uint64_t thread_id = stor.thread_id;
+    if (agent->env_metrics_map_.find(thread_id) ==
+        agent->env_metrics_map_.end()) {
+      continue;
+    }
+
     otlp::fill_env_metrics(metrics, stor, false);
-    uint64_t thread_id = stor.thread_id;
     uint64_t start_ts_ms = 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/grpc/src/grpc_agent.cc` around lines 955 - 983, The loop exporting
ExtMetricsStor from agent->thr_metrics_msg_q_.dequeue must skip any queued
sample whose thread was deleted; before calling otlp::fill_env_metrics,
otlp::fill_http_histograms, and before inserting into
agent->thr_metrics_hist_prev_end_ts_ms_ or agent->thr_metrics_cache_, check that
the thread_id is still live (e.g., verify agent->IsThreadLive(thread_id) or
agent->thr_live_map_.contains(thread_id)); if the thread is not live, continue
to the next dequeued item and do not reinsert/cached it.
🧹 Nitpick comments (1)
src/nsolid.h (1)

853-889: Pin the enum values explicitly.

The FastPush*Bucket implementations in src/nsolid/nsolid_api.cc cast raw uint32_t codes from JS into these enums. Right now that JS/native contract depends on implicit ordinal ordering, so a future reorder or insertion here will silently corrupt the exported attributes. Assign explicit numeric values that match the JS constants.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nsolid.h` around lines 853 - 889, The enums HttpMethod,
HttpProtocolVersion, and HttpUrlScheme must have explicit numeric values to
prevent JS/native ordinal coupling; update each enum to assign explicit integer
values for every enumerator (e.g., kGet=0, kHead=1, ... and similarly for
HttpProtocolVersion and HttpUrlScheme) so the values match the JS constants used
by the FastPush*Bucket code in src/nsolid/nsolid_api.cc; ensure no gaps or
reorders are left that would change the numeric mapping expected by the JS side
and keep HttpDatapointAttrs and DatapointAttrs unchanged aside from relying on
the pinned enum values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/internal/nsolid_diag.js`:
- Around line 156-171: The socket version is being stored as raw strings ('1.1'
/ '2') which breaks histogram bucket attribution—update the undici handlers to
store the canonical kHttpVersion* constants instead: in the
'undici:client:connected' subscriber (where socketVersionMap is set) map
connectParams.version to kHttpVersion11 or kHttpVersion2 (use the existing
constant names) and store those on socketVersionMap; in the
'undici:client:sendHeaders' subscriber, set request[nsolid_net_prot_s] to the
same kHttpVersion* constant retrieved from socketVersionMap so downstream
pushClientBucket() and other histogram paths receive the canonical constants
rather than raw strings.
- Around line 3-5: The code imports Number from primordials but never uses it
and later constructs a regular WeakMap; change the import to pull SafeWeakMap
from primordials instead of Number (remove Number) and replace the instantiation
new WeakMap() with new SafeWeakMap() (referencing the SafeWeakMap symbol and the
place where WeakMap is constructed) so the module uses the primordials-safe
WeakMap implementation.
- Around line 392-401: The handler registered in
dc.subscribe('http.server.response.finish') assumes request.route.path is a
string and uses startsWith/slice; guard against non-string Express routes by
checking typeof routePath === 'string' (where routePath = request.route.path)
before using string helpers, and if it's not a string skip composing fullPath or
set request[nsolid_route_s] to a safe fallback; ensure you still reference
baseUrl and assign request[nsolid_route_s] only when routePath is a string to
avoid TypeError.

In `@src/nsolid/nsolid_api.cc`:
- Around line 3565-3569: The macro NSOLID_HTTP_VERSION_CONSTS has a line for
V(kHttpVersion20, MetricsStream::HttpProtocolVersion::k2) that exceeds 80
characters; split that macro entry into two physical lines so the long qualified
enum name is wrapped (e.g., break after the comma with a backslash so the entry
becomes V(kHttpVersion20, \n    MetricsStream::HttpProtocolVersion::k2) ) to
keep the line length ≤80 while preserving the same token sequence and
identifiers.

In `@test/agents/test-grpc-http-histograms.mjs`:
- Around line 200-258: runTest currently never cleans up if histograms never
arrive; wrap the metrics-wait logic in a timeout and ensure teardown always runs
in a finally block: start a Promise.race between the existing success path and a
timeout that rejects/flags failure, and move clearInterval(trafficTimer),
child.shutdown(...), and grpcServer.close() into a finally so they run whether
the test succeeds, times out, or errors; modify references in the runTest
function (GRPCServer, TestClient, trafficTimer, mustCallAtLeast handler, and the
code that calls checkHistogramMetrics/collectHistogramWindows) so cleanup always
executes and the Promise resolves or rejects when the timeout fires.
- Around line 124-128: The helpers currently only inspect
resourceMetrics[0].scopeMetrics[0] and metrics[0], which misses histograms in
later resource/scope batches; update both histogram helper functions in this
file to iterate over all resourceMetrics and for each resource iterate all
scopeMetrics, then iterate each scope's metrics array to find and process the
histogram metrics (use the existing variable names resourceMetrics,
scopeMetrics, metrics to locate the code), aggregating/returning the first
matching histogram or the expected collection instead of assuming index 0; apply
the same change to the other helper (the block around lines 167-175) so both
walk every resource/scope/metric batch.

In `@test/integrations/express/v5/test-http-route.mjs`:
- Around line 134-183: The test currently awaits metricsPromise and only runs
teardown after it resolves, which can hang; modify the test so the wait for
metrics (metricsPromise) is raced against a timeout Promise (e.g., Promise.race
with a timer that rejects or resolves after a configurable ms) and ensure
cleanup always runs by moving connection teardown and server/grpcServer close
logic into a finally block that executes regardless of whether metricsPromise
resolved or timed out; reference the existing symbols metricsPromise,
grpcServer, server, connections and the fetch request sequence so the timeout
covers the same wait and the finally performs conn.destroy() for each
connections and calls server.close()/grpcServer.close().

In `@test/integrations/fastify/v5/test-http-route.mjs`:
- Around line 19-23: Replace the silent successful exit when integration deps
are missing: in the block that checks existsSync(join(__dirname,
'node_modules')) (the pre-check in test-http-route.mjs), stop calling
process.exit(0) and instead fail the test by throwing an Error or using an
assertion (e.g., throw new Error or assert.fail) with a clear message like
"SKIP: node_modules not found. Run 'make test-integrations-prereqs' first." so
CI surfaces the missing prerequisites as a failure rather than a silent pass.

---

Duplicate comments:
In `@agents/grpc/src/grpc_agent.cc`:
- Around line 955-983: The loop exporting ExtMetricsStor from
agent->thr_metrics_msg_q_.dequeue must skip any queued sample whose thread was
deleted; before calling otlp::fill_env_metrics, otlp::fill_http_histograms, and
before inserting into agent->thr_metrics_hist_prev_end_ts_ms_ or
agent->thr_metrics_cache_, check that the thread_id is still live (e.g., verify
agent->IsThreadLive(thread_id) or agent->thr_live_map_.contains(thread_id)); if
the thread is not live, continue to the next dequeued item and do not
reinsert/cached it.

In `@src/nsolid/nsolid_api.h`:
- Around line 414-417: The hard cap of 100 used when constructing the two HTTP
attribute maps (http_client_hashmap_ and http_server_hashmap_) makes
AttributesHashMap(100) act as a strict cardinality limit and will cause new
{method,route,scheme,host,port} series to fold into the overflow bucket; update
the code that constructs http_client_hashmap_ and http_server_hashmap_ to use a
much larger default (or read a configurable limit) instead of the literal 100 so
the constructor argument reflects the intended max-size/cardinality, e.g.
replace the fixed 100 with a constant or config value and wire that into the
AttributesHashMap(...) calls so both maps no longer use the hard-coded 100
limit.

---

Nitpick comments:
In `@src/nsolid.h`:
- Around line 853-889: The enums HttpMethod, HttpProtocolVersion, and
HttpUrlScheme must have explicit numeric values to prevent JS/native ordinal
coupling; update each enum to assign explicit integer values for every
enumerator (e.g., kGet=0, kHead=1, ... and similarly for HttpProtocolVersion and
HttpUrlScheme) so the values match the JS constants used by the FastPush*Bucket
code in src/nsolid/nsolid_api.cc; ensure no gaps or reorders are left that would
change the numeric mapping expected by the JS side and keep HttpDatapointAttrs
and DatapointAttrs unchanged aside from relying on the pinned enum values.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9a5b38f9-22a3-48a4-ac37-821ad46773ee

📥 Commits

Reviewing files that changed from the base of the PR and between 760c373 and 988f7a6.

📒 Files selected for processing (30)
  • .gitignore
  • Makefile
  • agents/grpc/src/grpc_agent.cc
  • agents/grpc/src/grpc_agent.h
  • agents/otlp/src/otlp_common.cc
  • agents/otlp/src/otlp_common.h
  • deps/opentelemetry-cpp/otlp-http-exporter.gyp
  • lib/_http_client.js
  • lib/_http_server.js
  • lib/internal/nsolid_diag.js
  • lib/internal/nsolid_http_consts.js
  • src/nsolid.h
  • src/nsolid/nsolid_api.cc
  • src/nsolid/nsolid_api.h
  • src/nsolid/nsolid_bindings.h
  • src/nsolid/nsolid_metrics_types.h
  • src/nsolid/nsolid_trace.h
  • test/addons/nsolid-dispatchqueue/binding.gyp
  • test/addons/nsolid-eventloop-cmd/binding.gyp
  • test/addons/nsolid-log-hooks/binding.gyp
  • test/addons/nsolid-statsdagent/binding.gyp
  • test/agents/test-grpc-http-histograms.mjs
  • test/integrations/express/v4/package.json
  • test/integrations/express/v4/test-http-route.mjs
  • test/integrations/express/v5/package.json
  • test/integrations/express/v5/test-http-route.mjs
  • test/integrations/fastify/v5/package.json
  • test/integrations/fastify/v5/test-http-route.mjs
  • test/integrations/testcfg.py
  • test/parallel/test-bootstrap-modules.js
🚧 Files skipped from review as they are similar to previous changes (12)
  • test/integrations/fastify/v5/package.json
  • test/integrations/testcfg.py
  • lib/_http_client.js
  • test/addons/nsolid-eventloop-cmd/binding.gyp
  • lib/_http_server.js
  • test/addons/nsolid-log-hooks/binding.gyp
  • test/addons/nsolid-statsdagent/binding.gyp
  • test/integrations/express/v4/package.json
  • deps/opentelemetry-cpp/otlp-http-exporter.gyp
  • test/integrations/express/v4/test-http-route.mjs
  • test/addons/nsolid-dispatchqueue/binding.gyp
  • test/integrations/express/v5/package.json

Comment on lines +3 to +5
const {
Number,
} = primordials;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

head -70 lib/internal/nsolid_diag.js

Repository: nodesource/nsolid

Length of output: 1569


🏁 Script executed:

# Check if SafeWeakMap exists in the codebase
rg -r 'SafeWeakMap' --type js -A 2 | head -50

Repository: nodesource/nsolid

Length of output: 457


🏁 Script executed:

# Search for how primordials are typically used in the codebase
rg 'const \{[^}]*\} = primordials' --type js | head -20

Repository: nodesource/nsolid

Length of output: 2294


🏁 Script executed:

# Check for WeakMap usage patterns
rg 'new WeakMap' --type js | head -20

Repository: nodesource/nsolid

Length of output: 1788


🏁 Script executed:

# Check if SafeWeakMap exists and how it's used
rg 'SafeWeakMap' --type js -B 2 -A 2

Repository: nodesource/nsolid

Length of output: 8359


🏁 Script executed:

# Check lines 59-60 specifically in the file
sed -n '55,65p' lib/internal/nsolid_diag.js

Repository: nodesource/nsolid

Length of output: 424


🏁 Script executed:

# Look for what primordials exports, especially Number
fd primordials.js --type js | head -5

Repository: nodesource/nsolid

Length of output: 230


🏁 Script executed:

# Check eslint rules related to primordials
fd -e js -e json . | xargs grep -l "prefer-primordials\|alphabetize-primordials" | grep -E "(eslint|rules)" | head -10

Repository: nodesource/nsolid

Length of output: 781


🏁 Script executed:

# Check if there's a SafeWeakMap export or definition
rg 'SafeWeakMap' --type js

Repository: nodesource/nsolid

Length of output: 1983


🏁 Script executed:

# Check what's actually at lines 59-60
sed -n '59,60p' lib/internal/nsolid_diag.js

Repository: nodesource/nsolid

Length of output: 172


🏁 Script executed:

# Look at primordials export to see if Number is exported
cat lib/internal/per_context/primordials.js | grep -E "(Number|exports)" | head -30

Repository: nodesource/nsolid

Length of output: 72


🏁 Script executed:

# Search for Number in primordials more broadly
rg 'Number' lib/internal/per_context/primordials.js -B 1 -A 1

Repository: nodesource/nsolid

Length of output: 93


🏁 Script executed:

# Check eslint rules configuration
find . -name ".eslintrc*" -o -name "eslint.config.*" | head -5

Repository: nodesource/nsolid

Length of output: 267


🏁 Script executed:

# Check the eslint rules file to see if Number primordial usage is flagged
cat ./agents/.eslintrc.yaml | head -100

Repository: nodesource/nsolid

Length of output: 6418


🏁 Script executed:

# Check if there's a specific rule that flags Number specifically
rg 'Number' tools/eslint-rules/prefer-primordials.js -B 3 -A 3

Repository: nodesource/nsolid

Length of output: 43


🏁 Script executed:

# Check the eslint rules directory
find . -path ./deps -prune -o -path ./test -prune -o -name "*prefer-primordials*" -type f -print

Repository: nodesource/nsolid

Length of output: 103


🏁 Script executed:

# Examine the prefer-primordials rule to understand what it flags
cat tools/eslint-rules/prefer-primordials.js | head -150

Repository: nodesource/nsolid

Length of output: 4109


🏁 Script executed:

# Search for specific handling of Number in the rule
rg 'Number' tools/eslint-rules/prefer-primordials.js -B 2 -A 2

Repository: nodesource/nsolid

Length of output: 43


🏁 Script executed:

# Check if nsolid_diag.js file has any lint exemptions
head -20 lib/internal/nsolid_diag.js | grep -E "(eslint|lint)"

Repository: nodesource/nsolid

Length of output: 43


🏁 Script executed:

# Search for how other files use Number from primordials
rg 'const \{[^}]*Number[^}]*\} = primordials' --type js -A 1 -B 1

Repository: nodesource/nsolid

Length of output: 1136


🏁 Script executed:

# Check if Number specifically is flagged in any test cases
rg 'Number.*primordials\|primordials.*Number' test/parallel/test-eslint-prefer-primordials.js -B 2 -A 2

Repository: nodesource/nsolid

Length of output: 43


🏁 Script executed:

# Look for what the prefer-primordials rule config is for this project
rg 'prefer-primordials' .eslintrc.js -B 5 -A 5

Repository: nodesource/nsolid

Length of output: 113


🏁 Script executed:

# Check the root directory for eslint config
ls -la | grep eslint

Repository: nodesource/nsolid

Length of output: 130


🏁 Script executed:

# Find the actual eslint configuration
find . -maxdepth 2 -name "*eslint*" -o -name ".eslintignore"

Repository: nodesource/nsolid

Length of output: 275


🏁 Script executed:

# Check the eslint.config.mjs for prefer-primordials rules
cat eslint.config.mjs | head -200

Repository: nodesource/nsolid

Length of output: 7187


🏁 Script executed:

# Look more carefully at the test case to understand the Number issue
cat test/parallel/test-eslint-prefer-primordials.js | grep -A 15 "const { Number } = primordials"

Repository: nodesource/nsolid

Length of output: 560


🏁 Script executed:

# Check if nsolid_diag.js is even subject to eslint
cat lib/eslint.config_partial.mjs | head -50

Repository: nodesource/nsolid

Length of output: 2112


🏁 Script executed:

# Search for how Number is actually used in nsolid_diag.js
rg 'Number\.' lib/internal/nsolid_diag.js -B 2 -A 2

Repository: nodesource/nsolid

Length of output: 43


🏁 Script executed:

# Get the full context around lines 3-10 to see the actual usage
sed -n '1,100p' lib/internal/nsolid_diag.js | grep -A 50 "const {"

Repository: nodesource/nsolid

Length of output: 2762


Replace new WeakMap() with new SafeWeakMap() at line 56.

The WeakMap constructor should use the safe version from primordials, consistent with the codebase pattern. Separately, the Number import is unused and can be removed.

🧰 Tools
🪛 Biome (2.4.6)

[error] 4-4: Do not shadow the global "Number" property.

(lint/suspicious/noShadowRestrictedNames)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/internal/nsolid_diag.js` around lines 3 - 5, The code imports Number from
primordials but never uses it and later constructs a regular WeakMap; change the
import to pull SafeWeakMap from primordials instead of Number (remove Number)
and replace the instantiation new WeakMap() with new SafeWeakMap() (referencing
the SafeWeakMap symbol and the place where WeakMap is constructed) so the module
uses the primordials-safe WeakMap implementation.

Comment on lines +156 to +171
dc.subscribe('undici:client:connected', ({ connectParams, socket }) => {
const api = getApi();
const span = api.trace.getSpan(api.context.active());
const version = connectParams.version === 'h1' ? '1.1' : '2';
// Store version on socket for later retrieval in sendHeaders
socketVersionMap.set(socket, version);
if (span && span.type === kSpanHttpClient) {
const version = connectParams.version === 'h1' ? '1.1' : '2';
span._pushSpanDataString(kSpanHttpProtocolVersion, version);
return api.context.active();
}
});

dc.subscribe('undici:client:sendHeaders', ({ request, socket }) => {
const version = socketVersionMap.get(socket);
request[nsolid_net_prot_s] = version;
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't cache span version strings in the histogram path.

This stores '1.1' / '2' on the socket and then passes that value into pushClientBucket(), while the other bucket paths use kHttpVersion* constants. That makes Undici protocol attribution depend on coercion and is also why kHttpVersion11 is currently dead.

Suggested fix
 dc.subscribe('undici:client:connected', ({ connectParams, socket }) => {
   const api = getApi();
   const span = api.trace.getSpan(api.context.active());
   const version = connectParams.version === 'h1' ? '1.1' : '2';
+  const versionEnum = connectParams.version === 'h1' ?
+    kHttpVersion11 :
+    kHttpVersion2;
   // Store version on socket for later retrieval in sendHeaders
-  socketVersionMap.set(socket, version);
+  socketVersionMap.set(socket, versionEnum);
   if (span && span.type === kSpanHttpClient) {
     span._pushSpanDataString(kSpanHttpProtocolVersion, version);
     return api.context.active();
   }
 });

Also applies to: 195-201

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/internal/nsolid_diag.js` around lines 156 - 171, The socket version is
being stored as raw strings ('1.1' / '2') which breaks histogram bucket
attribution—update the undici handlers to store the canonical kHttpVersion*
constants instead: in the 'undici:client:connected' subscriber (where
socketVersionMap is set) map connectParams.version to kHttpVersion11 or
kHttpVersion2 (use the existing constant names) and store those on
socketVersionMap; in the 'undici:client:sendHeaders' subscriber, set
request[nsolid_net_prot_s] to the same kHttpVersion* constant retrieved from
socketVersionMap so downstream pushClientBucket() and other histogram paths
receive the canonical constants rather than raw strings.

Comment on lines +392 to +401
dc.subscribe('http.server.response.finish', ({ request, response }) => {
if (request?.route?.path && !request[nsolid_route_s]) {
// Compose full route path including mounted router prefixes
const baseUrl = request.baseUrl || '';
const routePath = request.route.path;
// Avoid double slashes when concatenating (e.g., baseUrl ends with / and routePath starts with /)
const fullPath = baseUrl.endsWith('/') && routePath.startsWith('/')
? baseUrl + routePath.slice(1)
: baseUrl + routePath;
request[nsolid_route_s] = fullPath;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual code in the file
cat -n lib/internal/nsolid_diag.js | sed -n '385,410p'

Repository: nodesource/nsolid

Length of output: 1033


🌐 Web query:

Express.js req.route.path type regex array

💡 Result:

  • In Express, req.route is the currently matched route object, and its path is the original route “pattern” (e.g. '/user/:id?') — i.e. req.route.path is the route path you registered, not the URL path (req.path). (express.js.cn)
  • If you need the compiled regular expression Express matched with, use req.route.regexp (it’s shown alongside path in the req.route object). (express.js.cn)
  • Express route “paths” (what you pass as the first argument to app.get, router.get, etc.) can be strings or regular expressions. (expressjs.com)
  • Express also allows that “path” argument to be an array of strings/regexes (string | RegExp | Array<string | RegExp> in the TypeScript defs). (jsdocs.io)
    • In practice, once a request is matched, req.route.path refers to the single matched route’s path pattern (not an array). (express.js.cn)

Citations:


🌐 Web query:

Express req.route.path RegExp type when route registered regex

💡 Result:

In Express, req.route.path is the original “path” you registered the route with. Since Express route paths can be regular expressions, if you registered the route with a regex, then req.route.path will be a RegExp object (so req.route.path instanceof RegExp === true). [1], [3]

Type-wise (TypeScript), Express route “path” params are defined as string | RegExp | Array<string | RegExp>, which matches that behavior. [2]

Sources
[1] (expressjs.com)
[2] (app.unpkg.com)
[3] (gist.github.com)

Citations:


🏁 Script executed:

# Let's also check if there are any type definitions or tests in the nsolid repo
# that might clarify this expectation
rg -t js "route\.path" lib/internal/nsolid_diag.js -B 3 -A 3

Repository: nodesource/nsolid

Length of output: 706


Guard against non-string Express routes before using string helpers.

req.route.path can be a RegExp object if the route was registered with a regex pattern. Calling .startsWith() and .slice() on a RegExp will throw a TypeError. Check that routePath is a string before calling these methods, or skip non-string routes entirely.

🧰 Tools
🪛 GitHub Check: lint-js-and-md

[failure] 400-400:
':' should be placed at the end of the line


[failure] 399-399:
'?' should be placed at the end of the line

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/internal/nsolid_diag.js` around lines 392 - 401, The handler registered
in dc.subscribe('http.server.response.finish') assumes request.route.path is a
string and uses startsWith/slice; guard against non-string Express routes by
checking typeof routePath === 'string' (where routePath = request.route.path)
before using string helpers, and if it's not a string skip composing fullPath or
set request[nsolid_route_s] to a safe fallback; ensure you still reference
baseUrl and assign request[nsolid_route_s] only when routePath is a string to
avoid TypeError.

Comment on lines +3565 to +3569
#define NSOLID_HTTP_VERSION_CONSTS(V) \
V(kHttpVersion10, MetricsStream::HttpProtocolVersion::k10) \
V(kHttpVersion11, MetricsStream::HttpProtocolVersion::k11) \
V(kHttpVersion20, MetricsStream::HttpProtocolVersion::k2) \
V(kHttpVersionOther, MetricsStream::HttpProtocolVersion::kOther)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix line length to pass CI.

Line 3568 exceeds 80 characters, causing the make lint-cpp pipeline failure.

🔧 Proposed fix
 `#define` NSOLID_HTTP_VERSION_CONSTS(V)                                          \
   V(kHttpVersion10, MetricsStream::HttpProtocolVersion::k10)                   \
   V(kHttpVersion11, MetricsStream::HttpProtocolVersion::k11)                   \
-  V(kHttpVersion20, MetricsStream::HttpProtocolVersion::k2)                     \
+  V(kHttpVersion20, MetricsStream::HttpProtocolVersion::k2)                    \
   V(kHttpVersionOther, MetricsStream::HttpProtocolVersion::kOther)
🧰 Tools
🪛 GitHub Actions: Linters

[error] 3568-3568: make lint-cpp failed: cpplint reported line_length error in src/nsolid/nsolid_api.cc at line 3568 (Lines should be <= 80 characters long [whitespace/line_length]).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nsolid/nsolid_api.cc` around lines 3565 - 3569, The macro
NSOLID_HTTP_VERSION_CONSTS has a line for V(kHttpVersion20,
MetricsStream::HttpProtocolVersion::k2) that exceeds 80 characters; split that
macro entry into two physical lines so the long qualified enum name is wrapped
(e.g., break after the comma with a backslash so the entry becomes
V(kHttpVersion20, \n    MetricsStream::HttpProtocolVersion::k2) ) to keep the
line length ≤80 while preserving the same token sequence and identifiers.

Comment on lines +124 to +128
const scopeMetrics = resourceMetrics[0].scopeMetrics;
if (!scopeMetrics || scopeMetrics.length === 0) return null;

const metrics = scopeMetrics[0].metrics;
if (!metrics) return null;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Walk every resource/scope batch in both histogram helpers.

Both helpers stop at resourceMetrics[0].scopeMetrics[0]. A valid export can place the HTTP histograms in a later resource or scope group, which makes this test miss real data and either false-fail or hang waiting for continuity.

Also applies to: 167-175

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/agents/test-grpc-http-histograms.mjs` around lines 124 - 128, The
helpers currently only inspect resourceMetrics[0].scopeMetrics[0] and
metrics[0], which misses histograms in later resource/scope batches; update both
histogram helper functions in this file to iterate over all resourceMetrics and
for each resource iterate all scopeMetrics, then iterate each scope's metrics
array to find and process the histogram metrics (use the existing variable names
resourceMetrics, scopeMetrics, metrics to locate the code),
aggregating/returning the first matching histogram or the expected collection
instead of assuming index 0; apply the same change to the other helper (the
block around lines 167-175) so both walk every resource/scope/metric batch.

Comment on lines +19 to +23
// Skip test if dependencies not installed
if (!existsSync(join(__dirname, 'node_modules'))) {
console.log('SKIP: node_modules not found. Run "make test-integrations-prereqs" first.');
process.exit(0);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fail fast when integration deps are missing.

Use an assertion or throw here instead of process.exit(0). Exiting successfully turns a broken Fastify integration setup into a green run, so CI can silently stop covering this path. Based on learnings, the nsolid test suite expects prerequisites to be prepared ahead of time rather than silently skipped with fs.existsSync guards.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integrations/fastify/v5/test-http-route.mjs` around lines 19 - 23,
Replace the silent successful exit when integration deps are missing: in the
block that checks existsSync(join(__dirname, 'node_modules')) (the pre-check in
test-http-route.mjs), stop calling process.exit(0) and instead fail the test by
throwing an Error or using an assertion (e.g., throw new Error or assert.fail)
with a clear message like "SKIP: node_modules not found. Run 'make
test-integrations-prereqs' first." so CI surfaces the missing prerequisites as a
failure rather than a silent pass.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (12)
test/integrations/fastify/v5/test-http-route.mjs (2)

19-23: ⚠️ Potential issue | 🟡 Minor

Fail missing integration prereqs instead of exiting successfully.

process.exit(0) turns a broken Fastify setup into a green run, so CI can silently stop exercising this coverage. Throw or assert here instead of skipping.

Based on learnings, in the nsolid test suite, native addon bindings are expected to be built beforehand; tests do not add fs.existsSync guards to skip when bindings are missing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integrations/fastify/v5/test-http-route.mjs` around lines 19 - 23, The
current check using existsSync(join(__dirname, 'node_modules')) calls
process.exit(0) which hides missing integration prerequisites; replace the exit
with a failing test path: throw an Error or use an assertion so the test fails
when node_modules is absent. Update the guard around existsSync(...) to
throw(new Error(...)) or assert.fail(...) with a clear message instead of
calling process.exit(0), referencing the existsSync and process.exit usage in
this test file so CI fails visibly when prerequisites are missing.

115-156: ⚠️ Potential issue | 🟠 Major

Bound the route wait and always tear down both servers.

If route export regresses, or one of the request assertions fails first, await metricsPromise never reaches the cleanup block and both servers stay open. Put the wait behind a timeout and move fastify.close() / grpcServer.close() into finally.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integrations/fastify/v5/test-http-route.mjs` around lines 115 - 156, The
test currently awaits metricsPromise without a timeout and performs cleanup
after that, which can hang and leave servers running; change the waiting to a
bounded wait (e.g., replace await metricsPromise with await
Promise.race([metricsPromise, new Promise((_, rej) => setTimeout(() => rej(new
Error('metrics timeout')), 5000))]) or similar) and move shutdown into a finally
block so both fastify.close() and grpcServer.close() are always called; update
the code around metricsPromise, the await of that promise, and the cleanup to
ensure the listener/Promise created for metricsPromise still resolves/rejects
properly and both fastify.close() and grpcServer.close() are invoked in the
finally.
test/integrations/express/v5/test-http-route.mjs (2)

19-23: ⚠️ Potential issue | 🟡 Minor

Fail missing integration prereqs instead of exiting successfully.

This still turns a broken Express v5 setup into a green run, so CI can silently stop covering the new route test. Throw or assert here instead of exiting with 0.

Based on learnings, in the nsolid test suite, native addon bindings are expected to be built beforehand; tests do not add fs.existsSync guards to skip when bindings are missing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integrations/express/v5/test-http-route.mjs` around lines 19 - 23, The
current prereq guard using existsSync(join(__dirname, 'node_modules')) logs and
calls process.exit(0), which masks missing integration prerequisites; replace
that exit-with-success behavior by throwing an Error or using an assertion
(e.g., throw new Error(...) or assert.fail(...)) when the node_modules check
fails so the test fails instead of passing; update the block that references
existsSync, join and process.exit to raise the failure with a clear message
about running the integration prereqs.

134-183: ⚠️ Potential issue | 🟠 Major

Bound the route wait and move teardown into finally.

Only the happy path reaches conn.destroy(), server.close(), and grpcServer.close(). If metric export regresses or a fetch/assert fails first, this test can hang the job with open handles.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integrations/express/v5/test-http-route.mjs` around lines 134 - 183, The
test currently awaits metricsPromise and performs cleanup only on the happy
path; wrap the request/assert/await sequence in try/finally, bound waiting on
metricsPromise with a timeout (e.g., race metricsPromise against a Promise that
rejects after N ms) so failures/timeouts don't hang, and move teardown into the
finally block to always run connections.forEach(conn => conn.destroy()),
server.close(...), and grpcServer.close(); update references to metricsPromise,
connections, server, and grpcServer to implement the bounded wait and guaranteed
cleanup.
test/agents/test-grpc-http-histograms.mjs (2)

200-258: ⚠️ Potential issue | 🟠 Major

Add a timeout and guaranteed teardown around the histogram wait.

If the histograms never show up, trafficTimer keeps firing and neither child.shutdown() nor grpcServer.close() runs. Race the wait against a timeout and move the teardown path into finally.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/agents/test-grpc-http-histograms.mjs` around lines 200 - 258, The test's
runTest function can hang because trafficTimer keeps running and
child/grpcServer teardown only happens on successful histogram validation; wrap
the histogram wait in a Promise that races the existing success path against a
timeout and move teardown into a finally block so it always runs: create a
timeout Promise (e.g. new Promise((_, reject) => setTimeout(() => reject(new
Error('histogram timeout')), TIMEOUT_MS))) and use Promise.race to await either
success or timeout, ensure you clearInterval(trafficTimer) and call
child.shutdown(...) and grpcServer.close() in a finally handler (or in both
success and timeout catch paths) and set shutdownCalled so the metrics handler
no-ops after teardown; update references to trafficTimer, shutdownCalled,
grpcServer, child, and the mustCallAtLeast metrics handler accordingly.

120-159: ⚠️ Potential issue | 🟠 Major

Walk every OTLP resource/scope batch in both helpers.

Both helpers still inspect only resourceMetrics[0].scopeMetrics[0]. A valid export can place the HTTP histograms in later resource or scope groups, so this can miss real data and false-fail the test.

Also applies to: 161-198

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/agents/test-grpc-http-histograms.mjs` around lines 120 - 159, The helper
checkHistogramMetrics only inspects resourceMetrics[0].scopeMetrics[0], so it
can miss histograms in later resource/scope batches; change it to iterate over
all resourceMetrics and for each resource iterate over all scopeMetrics and then
over each scope's metrics, applying the same matching/validation logic (unit,
aggregationTemporality, dataPoints, dp check, and removal from remaining). Make
the identical change to the other histogram helper in this file so both walk
every resource and every scope instead of only the first entries.
test/integrations/express/v4/test-http-route.mjs (1)

134-183: ⚠️ Potential issue | 🟠 Major

Bound the route wait and always tear down the servers.

Right now cleanup only runs after await metricsPromise. If the metrics never expose all routes, or one of the request assertions throws first, this test can hang the job with open HTTP and gRPC handles.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integrations/express/v4/test-http-route.mjs` around lines 134 - 183, The
test can hang because await metricsPromise may never resolve and cleanup only
runs after it; change the test to bound waiting and always perform teardown:
wrap the request/assert logic and awaiting of metricsPromise in a try/finally
(or use Promise.race with a timeout) so that if metricsPromise times out or an
assertion throws you still run the cleanup code that destroys connections and
closes server and grpcServer; reference the metricsPromise,
grpcServer.on('metrics'), server, grpcServer, and connections identifiers when
making the change.
lib/internal/nsolid_diag.js (3)

392-401: ⚠️ Potential issue | 🟠 Major

Guard routePath before using string helpers.

request.route.path can be non-string for some Express route definitions. In that case startsWith() / slice() throws on the response-finish path.

🛡️ Safe guard
   if (request?.route?.path && !request[nsolid_route_s]) {
     const baseUrl = request.baseUrl || '';
     const routePath = request.route.path;
+    if (typeof routePath !== 'string')
+      return;
     const fullPath = baseUrl.endsWith('/') && routePath.startsWith('/')
       ? baseUrl + routePath.slice(1)
       : baseUrl + routePath;
Express req.route.path RegExp array string
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/internal/nsolid_diag.js` around lines 392 - 401, The code in the
dc.subscribe('http.server.response.finish', ...) handler assumes
request.route.path is a string and calls startsWith()/slice(), which throws if
route.path is a RegExp or array; guard routePath by checking typeof routePath
=== 'string' (or coerce to string safely) before using string helpers and only
compute fullPath when routePath is a string; update the logic around routePath,
baseUrl and request[nsolid_route_s] (inside the dc.subscribe callback) to skip
or fallback (e.g., use request.route.path.toString() or simply set
request[nsolid_route_s] = baseUrl) for non-string route.path values.

156-171: ⚠️ Potential issue | 🟠 Major

Store the protocol enum, not the display string.

nsolid_net_prot_s now feeds a binding that takes uint32_t protocol_version. Writing '1.1' / '2' here makes Undici histogram attribution depend on coercion and can misbucket HTTP/1.1 traffic.

🔧 Minimal fix
 dc.subscribe('undici:client:connected', ({ connectParams, socket }) => {
   const api = getApi();
   const span = api.trace.getSpan(api.context.active());
   const version = connectParams.version === 'h1' ? '1.1' : '2';
+  const versionEnum =
+    connectParams.version === 'h1' ? kHttpVersion11 : kHttpVersion2;
   // Store version on socket for later retrieval in sendHeaders
-  socketVersionMap.set(socket, version);
+  socketVersionMap.set(socket, versionEnum);
   if (span && span.type === kSpanHttpClient) {
     span._pushSpanDataString(kSpanHttpProtocolVersion, version);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/internal/nsolid_diag.js` around lines 156 - 171, The code stores the
protocol as the display string ('1.1'/'2') on the socket and writes that into
request[nsolid_net_prot_s], but the binding expects a uint32_t protocol enum;
change the logic in the dc.subscribe handlers (the 'undici:client:connected'
callback that sets socketVersionMap and the 'undici:client:sendHeaders' callback
that writes request[nsolid_net_prot_s]) to store and write an integer enum
constant (e.g., 1 for HTTP/1.1, 2 for HTTP/2) instead of the strings, and
continue pushing the human-readable value into the span via
span._pushSpanDataString(kSpanHttpProtocolVersion) if needed so telemetry and
the binding receive the correct numeric protocol_version.

3-5: ⚠️ Potential issue | 🟠 Major

Fix the new primordials/lint blockers.

CI is already red here: the Number binding trips noShadowRestrictedNames, and new WeakMap() trips the primordials-safe SafeWeakMap requirement for lib code.

Also applies to: 59-60

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/internal/nsolid_diag.js` around lines 3 - 5, The code currently
destructures Number from primordials (creating a shadowed restricted name) and
uses new WeakMap() in lib code; update the module to stop binding restricted
global names and to use the primordials-safe wrapper: remove the const { Number
} = primordials pattern and reference the global Number directly (or use a
uniquely named local alias if needed), and replace any instantiation of new
WeakMap() with the primordials-safe SafeWeakMap (e.g., use SafeWeakMap when
creating maps in nsolid_diag.js, including the occurrences around the later
block at the same file mentioned in the comment). Ensure identifiers referenced
in the file (e.g., any existing WeakMap usages and the Number binding) are
updated accordingly so lint rule noShadowRestrictedNames and primordials safety
are satisfied.
src/nsolid/nsolid_api.h (1)

414-417: ⚠️ Potential issue | 🟠 Major

Reconsider the hard 100 series cap here.

This looks like a hard cardinality cap rather than a reserve size. With {method, route, scheme, host, port} keys, 100 distinct combinations per interval is easy to exhaust, which would push new traffic into the overflow bucket and blunt the new route-level breakdown.

Run this read-only check to confirm the ctor and overflow semantics in the vendored OTEL code. Expect matches showing overflow handling plus both new HTTP maps pinned to 100.

#!/bin/bash
set -euo pipefail

hdr="$(fd '^attributes_hashmap\.h$' deps/opentelemetry-cpp | head -n1)"
if [ -z "$hdr" ]; then
  echo "attributes_hashmap.h not found" >&2
  exit 1
fi

echo "== AttributesHashMap constructor / overflow semantics =="
rg -n -C3 'AttributesHashMap\s*\(|GetOrSetDefault|overflow|max_size_|cardinality' \
  "$hdr" deps/opentelemetry-cpp/sdk/src

echo
echo "== Current NSolid call sites =="
rg -n -C1 'AttributesHashMap\s*\(\s*100\s*\)' src/nsolid
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nsolid/nsolid_api.h` around lines 414 - 417, The two HTTP attribute maps
(http_client_hashmap_ and http_server_hashmap_) are constructed with a
hard-coded cardinality cap of 100 which can be easily exhausted; inspect the
AttributesHashMap constructor/overflow semantics in the vendored OpenTelemetry
code (AttributesHashMap) and replace the literal 100 in the http_client_hashmap_
and http_server_hashmap_ initializations with a safer approach: either use the
AttributesHashMap default ctor (if it represents an unbounded/reservable map),
or introduce a named constant (e.g., kHttpAttributesReserveSize) read from
configuration/feature-flag and set a much larger default (or expose it to
runtime config) so the maps won’t overflow quickly; ensure the change is applied
to both http_client_hashmap_ and http_server_hashmap_ and that you verify the
AttributesHashMap overflow behavior to decide whether reserve vs cap semantics
require increasing or removing the limit.
agents/grpc/src/grpc_agent.cc (1)

955-984: ⚠️ Potential issue | 🟠 Major

Skip processing queued metrics for threads that no longer exist.

When a thread is deleted, env_msg_cb_ erases its entries from all caches. However, thr_metrics_msg_q_ may already contain an ExtMetricsStor harvested just before the deletion callback ran. If the deletion async is processed first, this loop will re-insert the dead thread into thr_metrics_hist_prev_end_ts_ms_ and thr_metrics_cache_.

Check whether the thread still exists in env_metrics_map_ before exporting and caching:

🛠️ Proposed fix
   ExtMetricsStor ext_stor;
   while (agent->thr_metrics_msg_q_.dequeue(ext_stor)) {
     auto& stor = ext_stor.stor;
+    uint64_t thread_id = stor.thread_id;
+    // Skip metrics for threads that have been deleted
+    if (agent->env_metrics_map_.find(thread_id) == agent->env_metrics_map_.end()) {
+      continue;
+    }
     otlp::fill_env_metrics(metrics, stor, false);
-    uint64_t thread_id = stor.thread_id;
     uint64_t start_ts_ms = 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/grpc/src/grpc_agent.cc` around lines 955 - 984, The loop that
processes ExtMetricsStor from thr_metrics_msg_q_ should skip entries for threads
that have been deleted: before calling otlp::fill_env_metrics,
otlp::fill_http_histograms or inserting into thr_metrics_hist_prev_end_ts_ms_
and thr_metrics_cache_, check whether agent->env_metrics_map_ contains
stor.thread_id (the thread exists); if not, continue and drop this
ExtMetricsStor. This prevents resurrecting deleted threads that env_msg_cb_
already removed; update the loop around thr_metrics_msg_q_.dequeue handling to
guard on agent->env_metrics_map_.find(thread_id) != end() and only then perform
the export and cache insert_or_assign operations.
🧹 Nitpick comments (1)
src/nsolid/nsolid_api.cc (1)

2098-2100: Pass DatapointAttrs by const reference on this hot path.

add_metric_datapoint_() only inspects attrs, but the current signature copies the full variant for every datapoint on the NSolid thread. That is extra churn once http.route strings start flowing through here.

♻️ Small refactor
 void EnvInst::add_metric_datapoint_(MetricsStream::Type type,
                                     double value,
-                                    MetricsStream::DatapointAttrs attrs) {
+                                    const MetricsStream::DatapointAttrs& attrs) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nsolid/nsolid_api.cc` around lines 2098 - 2100, The function signature
EnvInst::add_metric_datapoint_(MetricsStream::Type type, double value,
MetricsStream::DatapointAttrs attrs) copies the DatapointAttrs variant on a hot
path; change the parameter to take const MetricsStream::DatapointAttrs& attrs
instead and update all callers to pass their attrs by const reference (or const
lvalue reference) so no unnecessary copies occur; ensure any places that
previously relied on move semantics still behave correctly and that the
header/declaration and definition signatures match.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/_http_server.js`:
- Around line 322-332: The code currently always sends res.statusCode (default
200) which emits a synthetic 200 for aborted requests; change the call site that
invokes binding.pushServerBucket so it checks res._headerSent and passes the
actual statusCode only if headers were sent, otherwise pass a sentinel value (0)
to indicate unknown; update the corresponding native handling in nsolid_api.cc
to treat 0 as “no status code” and skip emitting http.response.status_code.
Reference the call to binding.pushServerBucket and the
res._headerSent/res.statusCode fields, and the native nsolid_api.cc logic that
writes http.response.status_code.

In `@Makefile`:
- Around line 1695-1699: The distclean target does not remove the new
integration node_modules directories added under test/integrations; update the
Makefile so the distclean target also removes
test/integrations/express/v4/node_modules,
test/integrations/express/v5/node_modules, and
test/integrations/fastify/v5/node_modules (same removals as in the
test-integrations-prereqs-clean target) so distclean truly removes all build and
test artifacts.

In `@test/integrations/express/v4/test-http-route.mjs`:
- Around line 19-23: The current guard uses process.exit(0) so missing
integration prereqs mark the test as a success; replace this with a failure
signal by throwing an Error or using an assertion instead of exiting. Locate the
existsSync(join(__dirname, 'node_modules')) check in the test and change the
process.exit(0) branch to either throw new Error('SKIP: node_modules not found.
Run "make test-integrations-prereqs" first.') or call assert.fail(...) so the
test fails/flags missing prerequisites instead of exiting cleanly. Ensure the
log message remains descriptive and includes the original guidance.

---

Duplicate comments:
In `@agents/grpc/src/grpc_agent.cc`:
- Around line 955-984: The loop that processes ExtMetricsStor from
thr_metrics_msg_q_ should skip entries for threads that have been deleted:
before calling otlp::fill_env_metrics, otlp::fill_http_histograms or inserting
into thr_metrics_hist_prev_end_ts_ms_ and thr_metrics_cache_, check whether
agent->env_metrics_map_ contains stor.thread_id (the thread exists); if not,
continue and drop this ExtMetricsStor. This prevents resurrecting deleted
threads that env_msg_cb_ already removed; update the loop around
thr_metrics_msg_q_.dequeue handling to guard on
agent->env_metrics_map_.find(thread_id) != end() and only then perform the
export and cache insert_or_assign operations.

In `@lib/internal/nsolid_diag.js`:
- Around line 392-401: The code in the
dc.subscribe('http.server.response.finish', ...) handler assumes
request.route.path is a string and calls startsWith()/slice(), which throws if
route.path is a RegExp or array; guard routePath by checking typeof routePath
=== 'string' (or coerce to string safely) before using string helpers and only
compute fullPath when routePath is a string; update the logic around routePath,
baseUrl and request[nsolid_route_s] (inside the dc.subscribe callback) to skip
or fallback (e.g., use request.route.path.toString() or simply set
request[nsolid_route_s] = baseUrl) for non-string route.path values.
- Around line 156-171: The code stores the protocol as the display string
('1.1'/'2') on the socket and writes that into request[nsolid_net_prot_s], but
the binding expects a uint32_t protocol enum; change the logic in the
dc.subscribe handlers (the 'undici:client:connected' callback that sets
socketVersionMap and the 'undici:client:sendHeaders' callback that writes
request[nsolid_net_prot_s]) to store and write an integer enum constant (e.g., 1
for HTTP/1.1, 2 for HTTP/2) instead of the strings, and continue pushing the
human-readable value into the span via
span._pushSpanDataString(kSpanHttpProtocolVersion) if needed so telemetry and
the binding receive the correct numeric protocol_version.
- Around line 3-5: The code currently destructures Number from primordials
(creating a shadowed restricted name) and uses new WeakMap() in lib code; update
the module to stop binding restricted global names and to use the
primordials-safe wrapper: remove the const { Number } = primordials pattern and
reference the global Number directly (or use a uniquely named local alias if
needed), and replace any instantiation of new WeakMap() with the
primordials-safe SafeWeakMap (e.g., use SafeWeakMap when creating maps in
nsolid_diag.js, including the occurrences around the later block at the same
file mentioned in the comment). Ensure identifiers referenced in the file (e.g.,
any existing WeakMap usages and the Number binding) are updated accordingly so
lint rule noShadowRestrictedNames and primordials safety are satisfied.

In `@src/nsolid/nsolid_api.h`:
- Around line 414-417: The two HTTP attribute maps (http_client_hashmap_ and
http_server_hashmap_) are constructed with a hard-coded cardinality cap of 100
which can be easily exhausted; inspect the AttributesHashMap
constructor/overflow semantics in the vendored OpenTelemetry code
(AttributesHashMap) and replace the literal 100 in the http_client_hashmap_ and
http_server_hashmap_ initializations with a safer approach: either use the
AttributesHashMap default ctor (if it represents an unbounded/reservable map),
or introduce a named constant (e.g., kHttpAttributesReserveSize) read from
configuration/feature-flag and set a much larger default (or expose it to
runtime config) so the maps won’t overflow quickly; ensure the change is applied
to both http_client_hashmap_ and http_server_hashmap_ and that you verify the
AttributesHashMap overflow behavior to decide whether reserve vs cap semantics
require increasing or removing the limit.

In `@test/agents/test-grpc-http-histograms.mjs`:
- Around line 200-258: The test's runTest function can hang because trafficTimer
keeps running and child/grpcServer teardown only happens on successful histogram
validation; wrap the histogram wait in a Promise that races the existing success
path against a timeout and move teardown into a finally block so it always runs:
create a timeout Promise (e.g. new Promise((_, reject) => setTimeout(() =>
reject(new Error('histogram timeout')), TIMEOUT_MS))) and use Promise.race to
await either success or timeout, ensure you clearInterval(trafficTimer) and call
child.shutdown(...) and grpcServer.close() in a finally handler (or in both
success and timeout catch paths) and set shutdownCalled so the metrics handler
no-ops after teardown; update references to trafficTimer, shutdownCalled,
grpcServer, child, and the mustCallAtLeast metrics handler accordingly.
- Around line 120-159: The helper checkHistogramMetrics only inspects
resourceMetrics[0].scopeMetrics[0], so it can miss histograms in later
resource/scope batches; change it to iterate over all resourceMetrics and for
each resource iterate over all scopeMetrics and then over each scope's metrics,
applying the same matching/validation logic (unit, aggregationTemporality,
dataPoints, dp check, and removal from remaining). Make the identical change to
the other histogram helper in this file so both walk every resource and every
scope instead of only the first entries.

In `@test/integrations/express/v4/test-http-route.mjs`:
- Around line 134-183: The test can hang because await metricsPromise may never
resolve and cleanup only runs after it; change the test to bound waiting and
always perform teardown: wrap the request/assert logic and awaiting of
metricsPromise in a try/finally (or use Promise.race with a timeout) so that if
metricsPromise times out or an assertion throws you still run the cleanup code
that destroys connections and closes server and grpcServer; reference the
metricsPromise, grpcServer.on('metrics'), server, grpcServer, and connections
identifiers when making the change.

In `@test/integrations/express/v5/test-http-route.mjs`:
- Around line 19-23: The current prereq guard using existsSync(join(__dirname,
'node_modules')) logs and calls process.exit(0), which masks missing integration
prerequisites; replace that exit-with-success behavior by throwing an Error or
using an assertion (e.g., throw new Error(...) or assert.fail(...)) when the
node_modules check fails so the test fails instead of passing; update the block
that references existsSync, join and process.exit to raise the failure with a
clear message about running the integration prereqs.
- Around line 134-183: The test currently awaits metricsPromise and performs
cleanup only on the happy path; wrap the request/assert/await sequence in
try/finally, bound waiting on metricsPromise with a timeout (e.g., race
metricsPromise against a Promise that rejects after N ms) so failures/timeouts
don't hang, and move teardown into the finally block to always run
connections.forEach(conn => conn.destroy()), server.close(...), and
grpcServer.close(); update references to metricsPromise, connections, server,
and grpcServer to implement the bounded wait and guaranteed cleanup.

In `@test/integrations/fastify/v5/test-http-route.mjs`:
- Around line 19-23: The current check using existsSync(join(__dirname,
'node_modules')) calls process.exit(0) which hides missing integration
prerequisites; replace the exit with a failing test path: throw an Error or use
an assertion so the test fails when node_modules is absent. Update the guard
around existsSync(...) to throw(new Error(...)) or assert.fail(...) with a clear
message instead of calling process.exit(0), referencing the existsSync and
process.exit usage in this test file so CI fails visibly when prerequisites are
missing.
- Around line 115-156: The test currently awaits metricsPromise without a
timeout and performs cleanup after that, which can hang and leave servers
running; change the waiting to a bounded wait (e.g., replace await
metricsPromise with await Promise.race([metricsPromise, new Promise((_, rej) =>
setTimeout(() => rej(new Error('metrics timeout')), 5000))]) or similar) and
move shutdown into a finally block so both fastify.close() and
grpcServer.close() are always called; update the code around metricsPromise, the
await of that promise, and the cleanup to ensure the listener/Promise created
for metricsPromise still resolves/rejects properly and both fastify.close() and
grpcServer.close() are invoked in the finally.

---

Nitpick comments:
In `@src/nsolid/nsolid_api.cc`:
- Around line 2098-2100: The function signature
EnvInst::add_metric_datapoint_(MetricsStream::Type type, double value,
MetricsStream::DatapointAttrs attrs) copies the DatapointAttrs variant on a hot
path; change the parameter to take const MetricsStream::DatapointAttrs& attrs
instead and update all callers to pass their attrs by const reference (or const
lvalue reference) so no unnecessary copies occur; ensure any places that
previously relied on move semantics still behave correctly and that the
header/declaration and definition signatures match.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8764ef7b-e637-496e-ba08-d19afab3a7f0

📥 Commits

Reviewing files that changed from the base of the PR and between 988f7a6 and 00e8e75.

📒 Files selected for processing (30)
  • .gitignore
  • Makefile
  • agents/grpc/src/grpc_agent.cc
  • agents/grpc/src/grpc_agent.h
  • agents/otlp/src/otlp_common.cc
  • agents/otlp/src/otlp_common.h
  • deps/opentelemetry-cpp/otlp-http-exporter.gyp
  • lib/_http_client.js
  • lib/_http_server.js
  • lib/internal/nsolid_diag.js
  • lib/internal/nsolid_http_consts.js
  • src/nsolid.h
  • src/nsolid/nsolid_api.cc
  • src/nsolid/nsolid_api.h
  • src/nsolid/nsolid_bindings.h
  • src/nsolid/nsolid_metrics_types.h
  • src/nsolid/nsolid_trace.h
  • test/addons/nsolid-dispatchqueue/binding.gyp
  • test/addons/nsolid-eventloop-cmd/binding.gyp
  • test/addons/nsolid-log-hooks/binding.gyp
  • test/addons/nsolid-statsdagent/binding.gyp
  • test/agents/test-grpc-http-histograms.mjs
  • test/integrations/express/v4/package.json
  • test/integrations/express/v4/test-http-route.mjs
  • test/integrations/express/v5/package.json
  • test/integrations/express/v5/test-http-route.mjs
  • test/integrations/fastify/v5/package.json
  • test/integrations/fastify/v5/test-http-route.mjs
  • test/integrations/testcfg.py
  • test/parallel/test-bootstrap-modules.js
🚧 Files skipped from review as they are similar to previous changes (8)
  • test/integrations/testcfg.py
  • test/addons/nsolid-log-hooks/binding.gyp
  • test/integrations/express/v4/package.json
  • lib/_http_client.js
  • test/addons/nsolid-statsdagent/binding.gyp
  • .gitignore
  • test/integrations/fastify/v5/package.json
  • src/nsolid.h

Comment on lines +322 to +332
const req = res.req;
// Extract route template from unified nsolid_route_s symbol
// Both Express (via diagnostic channel) and Fastify (via diagnostic channel) set this
const route = req?.[nsolid_route_s] || '';
binding.pushServerBucket(
now() - res[nsolid_tracer_s],
kHttpMethodMap[req?.method] ?? kHttpMethodOther,
res.statusCode,
req?.socket?.encrypted ? kHttpSchemeHttps : kHttpSchemeHttp,
kHttpVersionMap[req?.httpVersion] ?? kHttpVersionOther,
route);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't emit a synthetic 200 for aborted responses.

This close-path always pushes res.statusCode, but ServerResponse.prototype.statusCode defaults to 200. If the request aborts before headers are sent, the histogram gets a fake success code even though no response was written. Gate this on res._headerSent and treat the status code as unknown otherwise.

🔧 Minimal fix
   const req = res.req;
   // Extract route template from unified nsolid_route_s symbol
   // Both Express (via diagnostic channel) and Fastify (via diagnostic channel) set this
   const route = req?.[nsolid_route_s] || '';
+  const statusCode = res._headerSent ? res.statusCode : 0;
   binding.pushServerBucket(
     now() - res[nsolid_tracer_s],
     kHttpMethodMap[req?.method] ?? kHttpMethodOther,
-    res.statusCode,
+    statusCode,
     req?.socket?.encrypted ? kHttpSchemeHttps : kHttpSchemeHttp,
     kHttpVersionMap[req?.httpVersion] ?? kHttpVersionOther,
     route);

Then skip http.response.status_code in src/nsolid/nsolid_api.cc when the sentinel 0 is passed.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const req = res.req;
// Extract route template from unified nsolid_route_s symbol
// Both Express (via diagnostic channel) and Fastify (via diagnostic channel) set this
const route = req?.[nsolid_route_s] || '';
binding.pushServerBucket(
now() - res[nsolid_tracer_s],
kHttpMethodMap[req?.method] ?? kHttpMethodOther,
res.statusCode,
req?.socket?.encrypted ? kHttpSchemeHttps : kHttpSchemeHttp,
kHttpVersionMap[req?.httpVersion] ?? kHttpVersionOther,
route);
const req = res.req;
// Extract route template from unified nsolid_route_s symbol
// Both Express (via diagnostic channel) and Fastify (via diagnostic channel) set this
const route = req?.[nsolid_route_s] || '';
const statusCode = res._headerSent ? res.statusCode : 0;
binding.pushServerBucket(
now() - res[nsolid_tracer_s],
kHttpMethodMap[req?.method] ?? kHttpMethodOther,
statusCode,
req?.socket?.encrypted ? kHttpSchemeHttps : kHttpSchemeHttp,
kHttpVersionMap[req?.httpVersion] ?? kHttpVersionOther,
route);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/_http_server.js` around lines 322 - 332, The code currently always sends
res.statusCode (default 200) which emits a synthetic 200 for aborted requests;
change the call site that invokes binding.pushServerBucket so it checks
res._headerSent and passes the actual statusCode only if headers were sent,
otherwise pass a sentinel value (0) to indicate unknown; update the
corresponding native handling in nsolid_api.cc to treat 0 as “no status code”
and skip emitting http.response.status_code. Reference the call to
binding.pushServerBucket and the res._headerSent/res.statusCode fields, and the
native nsolid_api.cc logic that writes http.response.status_code.

Comment on lines +1695 to +1699
.PHONY: test-integrations-prereqs-clean
test-integrations-prereqs-clean:
$(RM) -r test/integrations/express/v4/node_modules
$(RM) -r test/integrations/express/v5/node_modules
$(RM) -r test/integrations/fastify/v5/node_modules
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Also clean these deps from distclean.

distclean still says it removes all build and test artifacts, but these new integration node_modules trees only disappear through clean.

🧹 Minimal follow-up
 distclean: ## Remove all build and test artifacts.
+	$(MAKE) test-integrations-prereqs-clean
 	$(RM) -r out
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 1695 - 1699, The distclean target does not remove the
new integration node_modules directories added under test/integrations; update
the Makefile so the distclean target also removes
test/integrations/express/v4/node_modules,
test/integrations/express/v5/node_modules, and
test/integrations/fastify/v5/node_modules (same removals as in the
test-integrations-prereqs-clean target) so distclean truly removes all build and
test artifacts.

Comment on lines +19 to +23
// Skip test if dependencies not installed
if (!existsSync(join(__dirname, 'node_modules'))) {
console.log('SKIP: node_modules not found. Run "make test-integrations-prereqs" first.');
process.exit(0);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fail missing integration prereqs instead of exiting successfully.

process.exit(0) makes a broken integration setup look green, so this new coverage can silently disappear from CI. Throw or assert here instead.

Based on learnings, in the nsolid test suite, native addon bindings are expected to be built beforehand; tests do not add fs.existsSync guards to skip when bindings are missing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integrations/express/v4/test-http-route.mjs` around lines 19 - 23, The
current guard uses process.exit(0) so missing integration prereqs mark the test
as a success; replace this with a failure signal by throwing an Error or using
an assertion instead of exiting. Locate the existsSync(join(__dirname,
'node_modules')) check in the test and change the process.exit(0) branch to
either throw new Error('SKIP: node_modules not found. Run "make
test-integrations-prereqs" first.') or call assert.fail(...) so the test
fails/flags missing prerequisites instead of exiting cleanly. Ensure the log
message remains descriptive and includes the original guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants