Skip to content

feat(tool-chain): ref-aware schema validation and invalid tool call observability#2795

Open
mike-inkeep wants to merge 1 commit intofeature/dev-tools-serverfrom
feature/tool-chain-ref-aware-validation
Open

feat(tool-chain): ref-aware schema validation and invalid tool call observability#2795
mike-inkeep wants to merge 1 commit intofeature/dev-tools-serverfrom
feature/tool-chain-ref-aware-validation

Conversation

@mike-inkeep
Copy link
Contributor

@mike-inkeep mike-inkeep commented Mar 21, 2026

Summary

Tool chaining — where one tool call's arguments reference the output of a prior tool via sentinel objects like { $tool: "call_id", $path: "result.field" } — had two silent failure modes:

  1. Schema rejection at call time. The AI SDK validates tool arguments before calling execute(). Sentinel refs are not valid values for most schema types, so chained tool calls were silently rejected before reaching the executor — with no OTEL span, no session event, nothing in conversation history.
  2. No post-resolution validation. After sentinel refs were resolved to real values, there was no check that the resolved values actually satisfied the original schema.

This PR fixes both, and adds full observability for SDK-level schema failures.

Key decisions

Schema widening strategy: ref-aware-schema.ts recursively transforms each tool's JSON Schema so every value-position node (properties, items, additionalProperties) becomes anyOf: [<original>, <sentinel-ref-schema>]. Structural combinator nodes (anyOf/oneOf/allOf) are traversed but not widened — this prevents double-wrapping. The widened schema is what the AI SDK sees.

Two-schema model: The original (pre-widened) schema is preserved as baseInputSchema on the tool definition alongside the widened inputSchema. After ArtifactParser resolves sentinel refs, tool-wrapper validates resolved args against baseInputSchema — the strict constraint that refs bypassed at call time.

Observability via onStepFinish: SDK-level schema validation failures never reach execute(), so tool-wrapper can't catch them. The only hook is onStepFinish in generate.ts, where tool-error content parts surface these failures. Each one now gets an OTEL span, session events (for the SSE feed), and a conversation history message — same visibility as a normal execution-time failure.

Image encoding field removed: The encoding: 'base64' field on ImageInput was always the literal string 'base64' — it carried no information. Removed from the type, schema, and server docs.

Manual QA

Manually tested with the zendesk MCP + image agent.

…bservability

- Add ref-aware JSON schema transformer (ref-aware-schema.ts) that wraps
  every value-position node with anyOf[original, sentinel-ref] so the AI SDK
  accepts $tool/$artifact/$path sentinel refs without rejecting tool calls at
  parse time
- Preserve the original schema as baseInputSchema on each tool definition;
  validate resolved args against it post-resolution in tool-wrapper to catch
  invalid resolved values
- Add onStepFinish callback in generate.ts that catches tool-error content
  parts (AI SDK schema-validation failures) and emits OTEL spans, session
  events, and conversation history messages for full observability
- Remove redundant encoding: 'base64' field from ImageInput schema

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@changeset-bot
Copy link

changeset-bot bot commented Mar 21, 2026

⚠️ No Changeset found

Latest commit: cf35c70

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link

vercel bot commented Mar 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Mar 21, 2026 3:04am
agents-docs Ready Ready Preview, Comment Mar 21, 2026 3:04am
agents-manage-ui Ready Ready Preview, Comment Mar 21, 2026 3:04am

Request Review

@mike-inkeep mike-inkeep marked this pull request as ready for review March 21, 2026 03:06
@pullfrog
Copy link
Contributor

pullfrog bot commented Mar 21, 2026

TL;DR — Tool-chaining via sentinel refs ({ $tool, $path }) was silently rejected by the AI SDK's schema validation before reaching execution, and resolved args were validated against the wrong (widened) schema. This PR introduces ref-aware schema widening so chained tool calls pass validation at call time, preserves the original schema for post-resolution validation, and adds full observability (OTEL spans, session events, conversation history) for tool calls that the SDK rejects before execute() is ever called.

Key changes

  • Ref-aware JSON Schema transformer — Recursively widens every value-position node in a tool's input schema with anyOf: [<original>, <sentinel ref>] so the AI SDK accepts $tool/$path refs at parse time.
  • baseInputSchema preservation and post-resolution validation — Retains the original strict schema alongside the widened one and validates resolved args against it, catching invalid values that slip through after ref substitution.
  • Invalid tool call observability — Hooks into onStepFinish to detect tool-error content parts the SDK emits for schema-validation failures and surfaces them via OTEL spans, SSE session events, and persisted conversation messages.
  • Image schema simplification — Removes the redundant encoding: 'base64' field from the MCP media image type and schema.

Summary | 18 files | 1 commit | base: feature/dev-tools-serverfeature/tool-chain-ref-aware-validation


Ref-aware JSON Schema transformer for tool-chain sentinel refs

Before: Tool input schemas only accepted concrete values; sentinel refs like { $tool: "call_123", $path: "result.city" } failed AI SDK schema validation and the tool call was silently dropped.
After: A recursive transformer wraps every value-position node with anyOf: [<original>, <sentinel ref schema>], allowing the SDK to accept both real values and chaining refs.

The new makeRefAwareJsonSchema function in ref-aware-schema.ts walks the JSON Schema tree and applies withToolCallArgRef at value positions (properties, items, additionalProperties, patternProperties) but not at structural combinators (anyOf/oneOf/allOf) to avoid double-wrapping. The sentinel ref schema accepts objects containing $tool (required) plus optional $artifact and $path.

Both MCP tools and function tools now use this transformer:

Tool type Where schema is built Entry point
MCP (overridden) AgentMcpManager.buildOverriddenTool makeRefAwareJsonSchema on the raw/override JSON
MCP (standard) getMcpTools buildRefAwareInputSchema (Zod → JSON → transform → Zod)
Function tools getFunctionTools makeRefAwareJsonSchema on functionData.inputSchema

ref-aware-schema.ts · mcp-tools.ts · function-tools.ts · AgentMcpManager.ts


baseInputSchema preservation and post-resolution validation

Before: After ArtifactParser.resolveArgs() swapped sentinel refs for real values, resolved args were validated against the ref-aware (widened) schema — which would accept almost anything.
After: The original strict schema is preserved as baseInputSchema on each tool definition, and wrapToolWithStreaming validates resolved args against it.

A new optional baseInputSchema field is added to AiSdkToolDefinition. The tool wrapper's resolution path now prefers baseInputSchema over parameters for post-resolution validation, so only structurally valid concrete values reach execute().

How does the two-schema approach work? Each tool carries two schemas: `parameters` (the ref-aware widened schema the AI SDK uses at call time) and `baseInputSchema` (the original strict schema). At execution time in `wrapToolWithStreaming`, if the args changed during resolution (i.e. contained refs), the resolved args are validated against `baseInputSchema`. If validation fails, execution is blocked with a descriptive error.

tool-wrapper.ts · agent-types.ts · toolWrapperRefAwareValidation.test.ts


Invalid tool call observability via onStepFinish

Before: When the AI SDK rejected a tool call due to schema validation failure, the error was invisible — no OTEL span, no session event, no conversation history entry.
After: A new onStepFinish callback in buildBaseGenerationConfig inspects each step's content for tool-error parts and emits full observability signals.

The handleInvalidToolResultsFromStep function iterates over step.content, filters for tool-error parts with string errors (AI SDK converts schema validation errors to strings via getErrorMessage()), and for each one:

  1. Emits an OTEL span with SPAN_NAMES.AI_TOOL_CALL including tool name, args, and error status
  2. Records tool_call and tool_result session events for the SSE streaming feed
  3. Persists an internal tool-result message to conversation history via createMessage

generate.ts · tool-result-for-conversation-history.ts


Image schema simplification: remove redundant encoding field

Before: ImageInput type and imageInputSchema required an encoding: 'base64' literal field, adding schema surface with zero information since it was always 'base64'.
After: The encoding field is removed; encoding is implicit.

This simplifies the MCP media tools' input contract and removes encoding from NormalizedContentItem, unwrapToolResult, and related test assertions.

image.ts · artifact-utils.ts · server.ts

Pullfrog  | View workflow run | Triggered by Pullfrogpullfrog.com𝕏

Copy link
Contributor

@pullfrog pullfrog bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-structured PR that solves two real silent-failure modes. The two-schema approach (widened for SDK acceptance, base for post-resolution validation) is sound, the recursive transformer is clean, and the onStepFinish observability hook fills a genuine gap. Three items worth addressing below — one medium-severity correctness issue, one minor consistency point, and one edge-case question.

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow runpullfrog.com𝕏


function withToolCallArgRef(schemaNode: unknown): Record<string, unknown> {
return {
anyOf: [schemaNode, TOOL_CALL_ARG_REF_SCHEMA],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOOL_CALL_ARG_REF_SCHEMA is itself an anyOf with two variants. withToolCallArgRef wraps it in another anyOf: { anyOf: [schemaNode, { anyOf: [...] }] }. This nested anyOf is semantically correct but may confuse some JSON Schema validators or produce harder-to-read error messages. Consider flattening:

function withToolCallArgRef(schemaNode: unknown): Record<string, unknown> {
  return {
    anyOf: [schemaNode, ...TOOL_CALL_ARG_REF_VARIANTS],
  };
}

where TOOL_CALL_ARG_REF_VARIANTS is the inner array. Not a blocker — just a readability/diagnostics improvement.

import { parseEmbeddedJson, unwrapError } from '@inkeep/agents-core';
import { SpanStatusCode, trace } from '@opentelemetry/api';
import { type ToolSet, tool } from 'ai';
import { z } from 'zod';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file imports z from 'zod' while the other tool files (function-tools.ts, ref-aware-schema.ts, default-tools.ts) import from '@hono/zod-openapi'. They resolve to the same z at runtime, but z.fromJSONSchema and z.toJSONSchema are Zod 4 methods that must come from the same Zod instance to guarantee type compatibility. This works today because hono's zod-openapi re-exports the same singleton, but it's fragile if the dependency graph ever diverges. Consider aligning to @hono/zod-openapi for consistency.

if (functionData.inputSchema) {
try {
baseInputSchema = makeBaseInputSchema(functionData.inputSchema);
} catch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty catch {} silently swallows the error for makeBaseInputSchema. The AgentMcpManager equivalent (line ~391) logs a warning with tool name and error details. This should do the same for debuggability — a silent catch here means you'll never know if baseInputSchema is consistently failing for a specific tool's schema.

Suggested change
} catch {
} catch (schemaError) {
logger.warn(
{
functionToolName: functionToolDef.name,
schemaError: schemaError instanceof Error ? schemaError.message : String(schemaError),
},
'Failed to build baseInputSchema; skipping resolved-args validation for this tool'
);
}

// AI SDK v6 converts schema-validation errors to strings via getErrorMessage()
// before storing them in step.content. execute() errors remain Error objects and
// are already handled (OTEL span + session events) by executeToolCall / tool-wrapper.
if (typeof toolError.error !== 'string') continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typeof toolError.error !== 'string' guard is a smart way to distinguish SDK-level schema failures (stringified) from execute() errors (Error objects). But this is implicitly coupling to AI SDK v6 internal behavior (getErrorMessage() stringification). If a future SDK version changes this — e.g., wraps schema errors in a typed object — this filter would silently start skipping them. Consider adding a brief comment noting this SDK version dependency, or alternatively checking for the absence of an Error prototype rather than presence of string type.

visibility: 'internal',
messageType: 'tool-result',
fromSubAgentId: ctx.config.id,
metadata: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON.stringify(parsedArgsForResolution) comparison at tool-wrapper level (line 161-162 of tool-wrapper.ts) is duplicated from the old code, but worth noting: JSON.stringify is not a reliable deep-equality check when property ordering differs between the input and resolved output. If resolveArgs returns the same values but in a different key order, this comparison would report resolvedChanged = true and trigger unnecessary validation. For correctness this is fine (extra validation is harmless), but it could produce confusing OTEL noise if validation fails on what's effectively a no-op resolution. This is pre-existing, not introduced by this PR — just flagging it.

baseInputSchema: ReturnType<typeof z.fromJSONSchema> | undefined;
} {
try {
const rawJson = z.toJSONSchema(inputSchema as z.ZodType) as Record<string, unknown>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

z.toJSONSchema(inputSchema as z.ZodType) will throw if inputSchema is already a raw JSON Schema object (not a Zod type). The catch block handles this gracefully by falling back to the original schema. However, in that fallback path baseInputSchema is undefined, meaning post-resolution validation is silently skipped for any MCP tool whose inputSchema isn't a Zod type. Is that the intended behavior? If so, it's fine — just want to confirm the fallback is deliberate rather than a gap.

@pullfrog
Copy link
Contributor

pullfrog bot commented Mar 21, 2026

TL;DR — Tool-chaining sentinel refs ({ $tool, $path }) were being rejected by the AI SDK's schema validation before execution, silently dropping chained tool calls. This PR introduces a recursive JSON Schema transformer that widens tool input schemas to accept sentinel refs at call time, preserves the original strict schema for post-resolution validation, and adds full observability (OTEL, session events, DB messages) for tool calls the SDK rejects.

Key changes

  • Add ref-aware-schema.ts — recursive JSON Schema transformer — Wraps every value-position node with anyOf: [<original>, <sentinel ref>] so the AI SDK accepts $tool/$path refs without rejecting at parse time.
  • Preserve baseInputSchema on tool definitions — Retains the original strict schema alongside the widened one; after ArtifactParser resolves refs, resolved args are validated against baseInputSchema in tool-wrapper.ts.
  • Add handleInvalidToolResultsFromStep for invalid tool call observability — An onStepFinish callback in generate.ts catches tool-error content parts and emits OTEL spans, session events, and conversation history messages for SDK-level rejections.
  • Remove redundant encoding: 'base64' from image schemas — The field was always a literal 'base64' and added schema surface with zero information.

Summary | 18 files | 1 commit | base: feature/dev-tools-serverfeature/tool-chain-ref-aware-validation


Ref-aware schema widening for tool-chain sentinel refs

Before: Sentinel ref objects like { $tool: "call_id", $path: "result.field" } failed the AI SDK's input schema validation, silently preventing chained tool calls from executing.
After: A recursive transformer wraps each value-position JSON Schema node with anyOf: [<original>, <ref schema>], so the SDK always accepts valid sentinel refs.

makeRefAwareJsonSchema walks the JSON Schema tree and applies withToolCallArgRef at leaf positions — properties, array items, additionalProperties, and patternProperties — while leaving structural combinators (anyOf, oneOf, allOf) untouched to prevent double-wrapping. Both the function-tool path (raw JSON Schema → z.fromJSONSchema) and the MCP-tool path (Zod → z.toJSONSchema → transform → z.fromJSONSchema) converge on the same transformer.

How does the MCP-tool path differ from function tools? MCP tools start with a Zod schema from the MCP client. buildRefAwareInputSchema in mcp-tools.ts round-trips through z.toJSONSchema, applies the ref-aware transform, then converts back with z.fromJSONSchema. Function tools start with raw JSON Schema from the tool config and call makeRefAwareJsonSchema directly.

ref-aware-schema.ts · mcp-tools.ts · function-tools.ts


baseInputSchema preservation for post-resolution validation

Before: Resolved args were validated against the widened ref-accepting schema, so invalid resolved values could slip through to execution.
After: The original strict schema is retained as baseInputSchema on every tool definition. After ArtifactParser.resolveArgs() swaps refs for real values, tool-wrapper.ts validates against baseInputSchema and throws if the resolved values don't match.

The AiSdkToolDefinition type gains an optional baseInputSchema field. Each tool construction site — AgentMcpManager.buildOverriddenTool, getMcpTools, and getFunctionTools — attaches it via object spread on the tool() result.

tool-wrapper.ts · agent-types.ts · AgentMcpManager.ts


Invalid tool call observability via onStepFinish

Before: When the AI SDK rejected a tool call due to schema validation, no trace appeared in OTEL, session events, or conversation history — the failure was invisible.
After: handleInvalidToolResultsFromStep inspects each step's content array for tool-error parts with string errors (SDK validation failures) and emits an OTEL span, two session events (tool_call + tool_result), and a persisted conversation message.

The handler specifically filters for typeof error === 'string' to distinguish SDK validation errors from execute() errors (which are Error objects and already have their own observability path). The conversation message uses a new formatInvalidToolCallForHistory helper.

generate.ts · tool-result-for-conversation-history.ts


Image schema simplification — drop encoding: 'base64'

Before: ImageInput carried an encoding: z.literal('base64') field, adding a required schema property that was always the same value.
After: The field is removed from the type, Zod schema, MCP server docs, and all test fixtures.

image.ts · artifact-utils.ts

Pullfrog  | View workflow run | Triggered by Pullfrogpullfrog.com𝕏

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(4) Total Issues | Risk: Medium

🟠⚠️ Major (2) 🟠⚠️

🟠 1) mcp-tools.ts:52-78 Non-session MCP tools bypass ref-aware schema widening

Issue: The !sessionId early-return path does not call buildRefAwareInputSchema, meaning tools returned via this code path use the original inputSchema. Sentinel refs like { $tool: 'call_id' } will fail schema validation at AI SDK call time — exactly the silent failure mode this PR aims to fix.

Why: This creates inconsistent behavior: the same tool accepts refs when sessionId is present but rejects them when absent. This violates the two-schema model invariant established by this PR and reintroduces the original bug for a subset of tool invocations. The SPEC.md mentions A2A proxies as explicitly excluded, but the non-session MCP path is not documented as an intended exclusion.

Fix: Apply buildRefAwareInputSchema in the non-session branch, mirroring the pattern at lines 100-102:

if (!sessionId) {
  const wrappedTools: ToolSet = {};
  for (const toolSet of toolSets) {
    for (const [toolName, toolDef] of Object.entries(toolSet.tools)) {
      const needsApproval = toolSet.toolPolicies?.[toolName]?.needsApproval || false;
      
      // Add ref-aware schema transformation
      const { refAwareInputSchema, baseInputSchema } = buildRefAwareInputSchema(
        (toolDef as any).inputSchema
      );
      
      const enhancedTool = {
        ...(toolDef || {}),
        inputSchema: refAwareInputSchema,
        baseInputSchema,
        needsApproval,
      };
      // ... rest unchanged
    }
  }
}

Refs:

🟠 2) generate.ts handleInvalidToolResultsFromStep has no direct unit tests

Issue: The handleInvalidToolResultsFromStep function (lines 84-188) is the core observability feature of this PR — it emits OTEL spans, session events, and persists to conversation history for SDK-level schema failures. However, there are no unit tests covering this function directly.

Why: This is critical new functionality that should have test coverage for: (1) processing tool-error parts with string errors, (2) skipping Error objects (handled elsewhere), (3) handling missing streamRequestId or conversationId gracefully, (4) catching and logging DB persistence failures.

Fix: Add unit tests for handleInvalidToolResultsFromStep:

describe('handleInvalidToolResultsFromStep', () => {
  it('emits OTEL span, session events, and DB message for string tool-error', async () => { ... });
  it('skips Error object errors (already handled by tool-wrapper)', async () => { ... });
  it('handles missing streamRequestId gracefully', async () => { ... });
  it('handles missing conversationId gracefully', async () => { ... });
  it('logs warning on DB persistence failure without throwing', async () => { ... });
});

Refs:

Inline Comments:

  • 🟠 Major: mcp-tools.ts:62 Non-session MCP tools bypass ref-aware schema widening
  • 🟠 Major: mcp-tools.ts:28 Silent catch swallows schema conversion errors with no visibility

🟡 Minor (2) 🟡

🟡 1) refAwareSchema.test.ts Missing test coverage for nested arrays and JSON Schema combinators

Issue: The makeRefAwareJsonSchema function handles array items (lines 63-68) and explicitly traverses anyOf/oneOf/allOf combinators (lines 75-80) without double-wrapping. However, the test file only covers simple object properties.

Why: Tool schemas in practice often have arrays of objects or union types. Without tests for these patterns, regressions in array item handling or combinator traversal could go undetected.

Fix: Add test cases for array items and combinators:

it('handles array items with ref support', () => {
  const rawJson = {
    type: 'object',
    properties: {
      images: { type: 'array', items: { type: 'object', properties: { data: { type: 'string' } } } }
    }
  };
  const schema = z.fromJSONSchema(makeRefAwareJsonSchema(rawJson));
  expect(schema.safeParse({ images: [{ $tool: 'toolu_123' }] }).success).toBe(true);
});

it('preserves anyOf/oneOf combinator structure without double-wrapping', () => {
  const rawJson = {
    type: 'object',
    properties: { value: { anyOf: [{ type: 'string' }, { type: 'number' }] } }
  };
  const schema = z.fromJSONSchema(makeRefAwareJsonSchema(rawJson));
  expect(schema.safeParse({ value: { $tool: 'toolu_123' } }).success).toBe(true);
});

Refs:

🟡 2) toolWrapperRefAwareValidation.test.ts Missing test for validation skipped when args unchanged

Issue: The tool-wrapper only validates when resolvedChanged is true (lines 161-162). Current tests always mock resolveArgs to return different values. No test verifies validation is correctly skipped when args are unchanged.

Why: If validation were incorrectly applied even when args are unchanged, it could cause false positives. The inverse case (validation incorrectly skipped) is also worth testing.

Fix: Add test verifying validation is skipped for unchanged args:

it('skips validation when resolved args match original args', async () => {
  const originalArgs = { city: 'San Francisco' };
  vi.mocked(agentSessionManager.getArtifactParser).mockReturnValue({
    resolveArgs: vi.fn().mockResolvedValue(originalArgs),
  } as any);
  const baseInputSchema = { safeParse: vi.fn() };
  const executeSpy = vi.fn().mockResolvedValue({ ok: true });
  // ... wrap and execute ...
  expect(baseInputSchema.safeParse).not.toHaveBeenCalled();
});

Refs:

Inline Comments:

  • 🟡 Minor: function-tools.ts:84-86 Silent catch provides no visibility
  • 🟡 Minor: tool-wrapper.ts:161-162 JSON.stringify comparison may have false negatives

💭 Consider (1) 💭

💭 1) mcp-tools.ts:19 buildRefAwareInputSchema exported from mcp-tools.ts rather than ref-aware-schema.ts

Issue: The PR introduces ref-aware-schema.ts as the canonical module for ref-aware utilities (makeRefAwareJsonSchema, makeBaseInputSchema), but buildRefAwareInputSchema is exported from mcp-tools.ts. Tests import from both files.

Why: This creates a split export surface. The peer pattern for schema utilities (packages/agents-core/src/utils/schema-conversion.ts) consolidates related functions in a single module.

Fix: Consider moving buildRefAwareInputSchema to ref-aware-schema.ts alongside the other ref-aware helpers for discoverability.

Inline Comments:

  • 💭 Consider: ref-aware-schema.ts:103-107 makeBaseInputSchema is a trivial wrapper

💡 APPROVE WITH SUGGESTIONS

Summary: This PR is a well-designed solution to a real problem (silent schema rejection of sentinel refs), with solid observability additions via onStepFinish. The main issue to address is that the non-session MCP tools path (lines 52-78) bypasses ref-aware schema widening, which should be intentional and documented or fixed to maintain consistency. The new handleInvalidToolResultsFromStep function should also have unit test coverage given it's the core observability feature.

Discarded (7)
Location Issue Reason Discarded
agent-types.ts:218 baseInputSchema type uses inline safeParse signature Type is correct for usage; using Zod's actual types would add import complexity
ref-aware-schema.ts:34 isObjectRecord not exported Intentionally module-private helper; no evidence of duplication
ref-aware-schema.ts:38 Schema transformation does not handle $ref JSON Schema $ref resolution is complex; MCP/function tool schemas don't typically use $ref
generate.ts:107 Tool args logged to OTEL may contain sensitive data Pre-existing pattern in tool-wrapper; redaction is an exporter-level concern
generate.ts:145 DB write lacks transaction boundary Partial persistence is acceptable; warn logging provides visibility
generate.ts:71 ToolErrorPart type extraction assumes SDK structure Acceptable coupling with SDK; runtime type guard provides defense
AgentMcpManager.ts:388 baseInputSchema failure log missing schema details Log is already informative; adding full schema could bloat logs
Reviewers (7)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-architecture 4 1 0 0 1 0 2
pr-review-tests 6 2 0 0 0 0 4
pr-review-errors 4 1 0 0 2 0 1
pr-review-types 5 0 0 0 0 0 5
pr-review-consistency 5 0 1 0 1 0 3
pr-review-llm 4 0 0 0 1 0 3
pr-review-standards 0 0 0 0 0 0 0
Total 28 4 1 0 5 0 18

Note: pr-review-standards returned no findings, indicating core code quality is solid.

const baseInputSchema = makeBaseInputSchema(rawJson);
const refAwareInputSchema = z.fromJSONSchema(makeRefAwareJsonSchema(rawJson));
return { refAwareInputSchema, baseInputSchema };
} catch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Silent catch swallows schema conversion errors with no visibility

Issue: When z.toJSONSchema() or subsequent transformation fails, the catch block silently returns the original inputSchema and sets baseInputSchema to undefined. This disables post-resolution validation with zero logging.

Why: Operators have no way to know that resolved-args validation is disabled for specific tools, making debugging difficult when invalid resolved values slip through to execution.

Fix: Add logging to surface the failure:

Suggested change
} catch {
} catch (err) {
logger.warn(
{ inputSchema: typeof inputSchema, error: err instanceof Error ? err.message : String(err) },
'Failed to build ref-aware schema; falling back to original schema without post-resolution validation'
);
return {
refAwareInputSchema: inputSchema as ReturnType<typeof z.fromJSONSchema>,
baseInputSchema: undefined,
};
}

Refs:

Comment on lines +84 to +86
} catch {
// baseInputSchema stays undefined
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Silent catch provides no visibility into baseInputSchema failures

Issue: When makeBaseInputSchema throws, the error is silently caught with only a comment. Resolved-args validation is disabled for this tool with no logging.

Why: Debugging production issues becomes difficult when operators don't know why certain function tools skip post-resolution validation.

Fix:

Suggested change
} catch {
// baseInputSchema stays undefined
}
} catch (schemaError) {
logger.debug(
{ functionToolName: functionToolDef.name, schemaError: schemaError instanceof Error ? schemaError.message : String(schemaError) },
'Failed to build baseInputSchema; resolved-args validation disabled for this tool'
);
}

Refs:

Comment on lines +161 to +162
const resolvedChanged =
JSON.stringify(parsedArgsForResolution) !== JSON.stringify(resolvedArgs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: JSON.stringify comparison may have false negatives for key ordering

Issue: JSON.stringify doesn't guarantee consistent key ordering across different object constructions. If resolution produces an equivalent but differently-ordered object, validation may be incorrectly skipped.

Why: While rare, this could allow invalid resolved values to bypass validation when the only difference is object key order.

Fix: Consider using deep equality or always validating when artifactParser exists:

// Option 1: Always validate when artifactParser exists (simplest, safest)
if (artifactParser && validationSchema?.safeParse) {
  const validation = validationSchema.safeParse(resolvedArgs);
  // ...
}

// Option 2: Use lodash isEqual for accurate comparison
import isEqual from 'lodash/isEqual';
const resolvedChanged = !isEqual(parsedArgsForResolution, resolvedArgs);

The validation cost is minimal compared to the false-negative risk.

Comment on lines +103 to +107
export function makeBaseInputSchema(
schema: Record<string, unknown>
): ReturnType<typeof z.fromJSONSchema> {
return z.fromJSONSchema(schema);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Consider: makeBaseInputSchema is a trivial wrapper

Issue: This function is a single-line wrapper around z.fromJSONSchema(schema) with no additional behavior.

Why: The indirection adds cognitive overhead without providing abstraction value. The function exists alongside makeRefAwareJsonSchema which does meaningful transformation, creating asymmetry.

Fix: Either inline z.fromJSONSchema calls at call sites, or add a comment explaining the wrapper's purpose (e.g., future error handling, consistency with makeRefAwareJsonSchema API).

@github-actions github-actions bot deleted a comment from claude bot Mar 21, 2026
@itoqa
Copy link

itoqa bot commented Mar 21, 2026

Ito Test Report ❌

14 test cases ran. 2 failed, 12 passed.

Overall, 12 of 14 test cases passed, confirming stable behavior across SSE streaming chat completions, sentinel-ref handling (including adversarial isolation), tool-error persistence and conversation read APIs, MCP media image contract chaining, MCP auth boundary enforcement, and UI robustness for rapid submits and mobile tool-error flows. The two medium-severity, pre-existing defects are that non-streaming /run/api/chat can return HTTP 200 without a required conversationId and with empty assistant content, and playground refresh/back-forward during streaming can orphan conversation state and clear visible history because the client-generated playground conversationId is reset on unmount and not persisted.

❌ Failed (2)
Category Summary Screenshot
Edge 🟠 Refresh plus back/forward navigation causes playground conversation continuity loss, and the chat history appears empty after reopening Try it. EDGE-5
Happy-path 🟠 Non-streaming /run/api/chat response omits conversationId and may return empty assistant content in successful responses. ROUTE-1
🟠 Refresh and back-forward during streaming preserves conversation continuity
  • What failed: Previously visible conversation content disappears after the navigation sequence and the history panel shows no prior conversation, but expected behavior is to preserve the same conversation context.
  • Impact: Users can lose active conversation continuity when they refresh or use browser navigation during streaming. This makes debugging and iterative prompting unreliable in normal UI navigation flows.
  • Introduced by this PR: No – pre-existing bug (code not changed in this PR)
  • Steps to reproduce:
    1. Open an agent's playground and click Try it.
    2. Send a prompt that produces a streaming response.
    3. Refresh the page during streaming, then navigate back and forward in the browser.
    4. Reopen Try it and open Chat history; observe the prior conversation is missing.
  • Code analysis: I inspected the playground conversation lifecycle in agents-manage-ui and found the conversation ID is generated client-side and reset on playground unmount, then only that ephemeral ID is passed into the embedded chat. The persisted Zustand slice also excludes playgroundConversationId, so refresh/navigation can orphan the prior conversation from the UI. The run's recorded temporary code edits do not touch these playground files, so this is not explained by test-only patching.
  • Why this is likely a bug: The UI intentionally rotates a non-persisted conversation ID on unmount/refresh, which directly breaks the continuity guarantee expected for refresh/back-forward during an in-progress conversation.

Relevant code:

agents-manage-ui/src/components/agent/playground/playground.tsx (lines 42-51)

const { resetPlaygroundConversationId } = useAgentActions();
const conversationId = useAgentStore(({ playgroundConversationId }) => playgroundConversationId);

useEffect(() => {
  // when the playground is closed the chat widget is unmounted so we need to reset the conversation id
  return () => resetPlaygroundConversationId();
}, []);

agents-manage-ui/src/features/agent/state/use-agent-store.ts (lines 121-152)

playgroundConversationId: generateId(),

reset() {
  const { isSidebarSessionOpen: _, ...state } = initialAgentState;
  set({ ...state, playgroundConversationId: generateId() });
},

agents-manage-ui/src/features/agent/state/use-agent-store.ts (lines 328-335)

persist(agentState, {
  name: 'inkeep:agent',
  partialize(state) {
    return {
      jsonSchemaMode: state.jsonSchemaMode,
      isSidebarPinnedOpen: state.isSidebarPinnedOpen,
      hasTextWrap: state.hasTextWrap,
    };
  },
})
🟠 Vercel stream chat baseline remains functional
  • What failed: The response returns HTTP 200 but does not include conversationId, and choices[0].message.content is populated directly from buffered text with no fallback when that text is empty.
  • Impact: Clients that depend on conversationId cannot reliably continue or fetch the created conversation after a successful non-streaming call. Users can also receive an apparently successful response with empty assistant content, which degrades chat reliability and breaks downstream expectations.
  • Introduced by this PR: No – pre-existing bug (code not changed in this PR)
  • Steps to reproduce:
    1. Send a POST request to /run/api/chat with stream=false and a simple user message.
    2. Inspect the JSON response payload for a top-level conversationId field.
    3. Check choices[0].message.content and observe it can be empty even when the status is 200.
  • Code analysis: I inspected the non-streaming response construction in agents-api/src/domains/run/routes/chatDataStream.ts and verified that conversationId is created and used server-side but never serialized in the JSON response; I also verified assistant content is assigned directly from captured.text without an empty-content guard.
  • Why this is likely a bug: The handler creates a conversation ID but omits it from the non-streaming success payload, and it treats empty buffered text as valid assistant output, which directly explains the observed contract and content failures without relying on test-only behavior.

Relevant code:

agents-api/src/domains/run/routes/chatDataStream.ts (lines 245-246)

const conversationId = body.conversationId ?? getConversationId();
const activeSpan = trace.getActiveSpan();

agents-api/src/domains/run/routes/chatDataStream.ts (lines 410-430)

return c.json({
  id: `chat-${Date.now()}`,
  object: 'chat.completion',
  created: Math.floor(Date.now() / 1000),
  model: agentName,
  choices: [
    {
      index: 0,
      message: {
        role: 'assistant',
        content: captured.hasError ? captured.errorMessage : captured.text,
      },
      finish_reason: result.success && !captured.hasError ? 'stop' : 'error',
    },
  ],
  usage: {
    prompt_tokens: 0,
    completion_tokens: 0,
    total_tokens: 0,
  },
});
✅ Passed (12)
Category Summary Screenshot
Adversarial Sentinel payload with prototype-pollution style keys remained inert during ref resolution. ADV-1
Adversarial Cross-conversation sentinel reference did not resolve unrelated tool outputs. ADV-2
Adversarial Missing/malformed Authorization and scope mismatch requests returned deterministic 401 auth-error responses with no tool-result payload. ADV-3
Adversarial Tenant/project header mismatches against a valid token were blocked with 401 auth-error responses only. ADV-4
Edge Rapid Enter burst produced one in-flight submission with one persisted user/error pair and no duplicate or dropped tool-error artifacts. EDGE-4
Edge On 390x844 viewport, the composer and controls stayed usable through a tool-error flow and accepted a successful follow-up prompt. EDGE-6
Happy-path SSE chat completions stream returned HTTP 200 with data chunks and [DONE] terminator. ROUTE-2
Happy-path Sentinel refs are accepted during function tool input-schema validation before strict post-resolution checks. ROUTE-3
Happy-path Verified invalid SDK tool-error content is persisted with TOOL_CALL_ID, input, and error text in conversation history. ROUTE-5
Happy-path Verified conversation list/detail endpoints remain readable after internal tool-result persistence. ROUTE-6
Happy-path image_info accepted {data,mimeType} input with valid MCP auth and returned image metadata without requiring encoding. ROUTE-7
Happy-path image_crop output was chained into image_resize in one MCP session and succeeded while preserving the {data,mimeType} contract. ROUTE-8

Commit: cf35c70

View Full Run


Tell us how we did: Give Ito Feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant