Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 30 additions & 31 deletions src/health/monitor.ts
Original file line number Diff line number Diff line change
@@ -1,39 +1,38 @@
import type { ISdk } from "iii-sdk";
import type { HealthSnapshot } from "../types.js";
import type { StateKV } from "../state/kv.js";
import { KV } from "../state/schema.js";
import { evaluateHealth } from "./thresholds.js";
const endMark = performance.now();
const eventLoopLag = endMark - startMark;
Comment on lines +1 to +2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that startMark is defined before the event loop lag calculation

rg -n -B 20 'const endMark = performance\.now\(\)' src/health/monitor.ts | rg 'startMark'

Repository: rohitg00/agentmemory

Length of output: 46


🏁 Script executed:

# Check if file exists and get basic info
fd -t f "monitor\.ts" src/

# Read the file to see actual content
head -100 src/health/monitor.ts

Repository: rohitg00/agentmemory

Length of output: 3011


🏁 Script executed:

# Search for startMark definition/usage in the entire file
rg -n 'startMark' src/health/monitor.ts

# Also get the full file to understand structure
wc -l src/health/monitor.ts
cat -n src/health/monitor.ts

Repository: rohitg00/agentmemory

Length of output: 4100


🏁 Script executed:

# Search for function declarations that might contain startMark
rg -n 'function|const.*=.*function|const.*=.*\(\)' src/health/monitor.ts | head -20

# Also search for registerHealthMonitor or collectHealth function
rg -n 'registerHealthMonitor|collectHealth' src/health/monitor.ts

# Search for where performance.now is called to capture startMark
rg -n 'performance\.now\(\)' src/health/monitor.ts

Repository: rohitg00/agentmemory

Length of output: 887


🏁 Script executed:

# Get the beginning of the file to find function declaration
head -50 src/health/monitor.ts | tail -40

# Also try to find the function signature that contains collectHealth
ast-grep --pattern 'const collectHealth = $_'

# Or search in broader context
rg -n 'const startMark' src/health/monitor.ts

Repository: rohitg00/agentmemory

Length of output: 1209


🏁 Script executed:

# Try to get the very beginning of the file
head -1 src/health/monitor.ts

# Get more context around line 1
sed -n '1,30p' src/health/monitor.ts

# Search for the actual function definition that contains this code
rg -B 50 'const endMark = performance\.now' src/health/monitor.ts | head -60

Repository: rohitg00/agentmemory

Length of output: 965


🏁 Script executed:

# Get the entire file to understand full context
cat src/health/monitor.ts

# Also search for where startMark might be initialized
rg 'startMark\s*=' src/health/monitor.ts

Repository: rohitg00/agentmemory

Length of output: 3195


🏁 Script executed:

# Let me check the raw file with more context - search for function definitions
rg -n 'export.*function|function.*registerHealthMonitor|async function' src/health/monitor.ts

# Check if startMark might be a parameter or closure variable
rg -B 100 'const endMark = performance.now' src/health/monitor.ts | head -120

Repository: rohitg00/agentmemory

Length of output: 146


🏁 Script executed:

# Search for all variable declarations including startMark context
rg -A 5 -B 5 'startMark' src/health/monitor.ts

# Look for function parameters that might include startMark
rg '\(.*startMark.*\)' src/health/monitor.ts

# Check if maybe the file content is incomplete - look for git diff info
git diff src/health/monitor.ts 2>/dev/null | head -100

Repository: rohitg00/agentmemory

Length of output: 587


🏁 Script executed:

# Try to find the complete function - search for "registerHealthMonitor" or similar
rg -n 'registerHealthMonitor|export.*function.*Health' src/health/monitor.ts

# Search for where the collectHealth function is defined
rg -B 30 'const endMark = performance.now' src/health/monitor.ts | grep -E '(function|const|let|var)' | head -20

# Try ast-grep to find function definitions
ast-grep --pattern 'function $_($_) { $$$ }'
</parameter>

Repository: rohitg00/agentmemory

Length of output: 50501


Define startMark before using it in the event loop lag calculation.

The variable startMark is referenced on line 2 but is never defined anywhere in the function. This will cause a runtime error. Ensure startMark is captured at the start of collectHealth() with const startMark = performance.now(); before the endMark calculation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/health/monitor.ts` around lines 1 - 2, In collectHealth(), startMark is
never defined before it's used to compute eventLoopLag; add capturing of the
start timestamp (e.g., const startMark = performance.now()) at the beginning of
the collectHealth() execution path so that the existing lines const endMark =
performance.now(); and const eventLoopLag = endMark - startMark; use a valid
startMark; ensure the symbol names startMark, endMark, and eventLoopLag remain
consistent with the current function.


Comment on lines +1 to 3
export function registerHealthMonitor(
sdk: ISdk,
kv: StateKV,
): { stop: () => void } {
let connectionState = "connected";
let prevCpuUsage = process.cpuUsage();
let prevCpuTime = Date.now();
const snapshot: HealthSnapshot = {
cpuUsage: cpuPercent,
memoryRss: mem.rss / 1024 / 1024,
memoryHeapUsed: mem.heapUsed / 1024 / 1024,
eventLoopLag,
uptime,
connectionState,
timestamp: now,
};
Comment on lines +4 to +12

if (typeof sdk.on === "function") {
sdk.on("connection_state", (state?: unknown) => {
connectionState = state as string;
});
}
const status = evaluateHealth(snapshot);

// Feature: Persistence & State-Aware Alerting
const lastStatus = await kv.get(KV.LAST_HEALTH_STATUS);
if (status.isCritical && lastStatus !== "critical") {
sdk.emit?.("health_alert", { snapshot, status });
}
Comment on lines +17 to +20
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle first-run case in alerting logic.

On the first monitor run, lastStatus will be null or undefined. The condition lastStatus !== "critical" will be true, causing an alert to be emitted immediately if the system starts in a critical state. Consider whether you want to alert on initial critical status or only on transitions from non-critical to critical states.

🛡️ Proposed fix to skip alerting on first run
 const lastStatus = await kv.get(KV.LAST_HEALTH_STATUS);
-if (status.isCritical && lastStatus !== "critical") {
+if (status.isCritical && lastStatus && lastStatus !== "critical") {
   sdk.emit?.("health_alert", { snapshot, status });
 }

Or explicitly handle first run:

 const lastStatus = await kv.get(KV.LAST_HEALTH_STATUS);
-if (status.isCritical && lastStatus !== "critical") {
+if (status.isCritical && lastStatus !== null && lastStatus !== "critical") {
   sdk.emit?.("health_alert", { snapshot, status });
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/health/monitor.ts` around lines 17 - 20, The current alerting emits a
health_alert on first run if the system is already critical because lastStatus
from kv.get(KV.LAST_HEALTH_STATUS) can be null/undefined; update the condition
that triggers sdk.emit("health_alert", { snapshot, status }) to skip the
first-run case by ensuring lastStatus is not null/undefined (e.g., check
lastStatus != null) before comparing to "critical" so only transitions from a
known non-critical state emit an alert; make this change near the code using
kv.get, KV.LAST_HEALTH_STATUS, lastStatus, status.isCritical, and
sdk.emit("health_alert").


async function collectHealth(): Promise<HealthSnapshot> {
const mem = process.memoryUsage();
const currentCpu = process.cpuUsage();
const now = Date.now();
const uptime = process.uptime();
await kv.set(KV.LATEST_HEALTH, snapshot);
await kv.set(KV.LAST_HEALTH_STATUS, status.level);

const elapsedMs = now - prevCpuTime;
const userDelta = currentCpu.user - prevCpuUsage.user;
const systemDelta = currentCpu.system - prevCpuUsage.system;
const cpuPercent =
elapsedMs > 0 ? ((userDelta + systemDelta) / 1000 / elapsedMs) * 100 : 0;
prevCpuUsage = currentCpu;
prevCpuTime = now;
return snapshot;
}

const startMark = performance.now();
await new Promise((resolve) => setImmediate(resolve));
const interval = setInterval(collectHealth, 5000);
return {
stop: () => {
clearInterval(interval);
sdk.emit?.("monitor_stopped", { at: Date.now() });
}
};
Comment on lines +28 to +34
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚖️ Poor tradeoff

Add error handling, initial call, and interval.unref().

Three concerns with the monitoring loop:

  1. Missing error handling: collectHealth is async but has no .catch() handler. Unhandled promise rejections could crash the process or be silently swallowed.
  2. Missing interval.unref(): The old implementation (line 99) called interval.unref() to prevent the interval from keeping the process alive. Without it, the process won't exit cleanly when shutdown is initiated.
  3. No initial collection: The old code (line 95) called collectHealth() immediately on startup. The new code has a 5-second delay before the first health check, meaning no health data is available initially.
🔧 Proposed fix
+collectHealth().catch(() => {});
 const interval = setInterval(collectHealth, 5000);
+interval.unref();
 return { 
   stop: () => {
     clearInterval(interval);
     sdk.emit?.("monitor_stopped", { at: Date.now() });
   } 
 };

For better error handling, consider logging errors instead of silently catching:

-collectHealth().catch(() => {});
+collectHealth().catch((err) => {
+  sdk.emit?.("monitor_error", { error: String(err) });
+});
 const interval = setInterval(() => {
-  collectHealth();
+  collectHealth().catch((err) => {
+    sdk.emit?.("monitor_error", { error: String(err) });
+  });
 }, 5000);
🧰 Tools
🪛 Biome (2.4.15)

[error] 29-34: Illegal return statement outside of a function

(parse)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/health/monitor.ts` around lines 28 - 34, Call collectHealth immediately
on startup and ensure both the immediate call and the scheduled calls handle
rejections: invoke collectHealth() once before setting the interval and attach
.catch(...) to log/emit errors; then create the interval with const interval =
setInterval(() => collectHealth().catch(err => { /* log via processLogger or
sdk.emit("monitor_error", {err}) */ }), 5000); and call interval.unref() after
creating the timer so it won’t keep the process alive; keep the existing stop
implementation that clearInterval(interval) and sdk.emit("monitor_stopped", {
at: Date.now() }).

}
const eventLoopLagMs = performance.now() - startMark;

let workers: HealthSnapshot["workers"] = [];
Expand Down
Loading