feat(tool-chain): ref-aware schema validation and invalid tool call observability#2795
feat(tool-chain): ref-aware schema validation and invalid tool call observability#2795mike-inkeep wants to merge 1 commit intofeature/dev-tools-serverfrom
Conversation
…bservability - Add ref-aware JSON schema transformer (ref-aware-schema.ts) that wraps every value-position node with anyOf[original, sentinel-ref] so the AI SDK accepts $tool/$artifact/$path sentinel refs without rejecting tool calls at parse time - Preserve the original schema as baseInputSchema on each tool definition; validate resolved args against it post-resolution in tool-wrapper to catch invalid resolved values - Add onStepFinish callback in generate.ts that catches tool-error content parts (AI SDK schema-validation failures) and emits OTEL spans, session events, and conversation history messages for full observability - Remove redundant encoding: 'base64' field from ImageInput schema Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
TL;DR — Tool-chaining via sentinel refs ( Key changes
Summary | 18 files | 1 commit | base: Ref-aware JSON Schema transformer for tool-chain sentinel refs
The new Both MCP tools and function tools now use this transformer:
|
There was a problem hiding this comment.
Well-structured PR that solves two real silent-failure modes. The two-schema approach (widened for SDK acceptance, base for post-resolution validation) is sound, the recursive transformer is clean, and the onStepFinish observability hook fills a genuine gap. Three items worth addressing below — one medium-severity correctness issue, one minor consistency point, and one edge-case question.
|
|
||
| function withToolCallArgRef(schemaNode: unknown): Record<string, unknown> { | ||
| return { | ||
| anyOf: [schemaNode, TOOL_CALL_ARG_REF_SCHEMA], |
There was a problem hiding this comment.
TOOL_CALL_ARG_REF_SCHEMA is itself an anyOf with two variants. withToolCallArgRef wraps it in another anyOf: { anyOf: [schemaNode, { anyOf: [...] }] }. This nested anyOf is semantically correct but may confuse some JSON Schema validators or produce harder-to-read error messages. Consider flattening:
function withToolCallArgRef(schemaNode: unknown): Record<string, unknown> {
return {
anyOf: [schemaNode, ...TOOL_CALL_ARG_REF_VARIANTS],
};
}where TOOL_CALL_ARG_REF_VARIANTS is the inner array. Not a blocker — just a readability/diagnostics improvement.
| import { parseEmbeddedJson, unwrapError } from '@inkeep/agents-core'; | ||
| import { SpanStatusCode, trace } from '@opentelemetry/api'; | ||
| import { type ToolSet, tool } from 'ai'; | ||
| import { z } from 'zod'; |
There was a problem hiding this comment.
This file imports z from 'zod' while the other tool files (function-tools.ts, ref-aware-schema.ts, default-tools.ts) import from '@hono/zod-openapi'. They resolve to the same z at runtime, but z.fromJSONSchema and z.toJSONSchema are Zod 4 methods that must come from the same Zod instance to guarantee type compatibility. This works today because hono's zod-openapi re-exports the same singleton, but it's fragile if the dependency graph ever diverges. Consider aligning to @hono/zod-openapi for consistency.
| if (functionData.inputSchema) { | ||
| try { | ||
| baseInputSchema = makeBaseInputSchema(functionData.inputSchema); | ||
| } catch { |
There was a problem hiding this comment.
The empty catch {} silently swallows the error for makeBaseInputSchema. The AgentMcpManager equivalent (line ~391) logs a warning with tool name and error details. This should do the same for debuggability — a silent catch here means you'll never know if baseInputSchema is consistently failing for a specific tool's schema.
| } catch { | |
| } catch (schemaError) { | |
| logger.warn( | |
| { | |
| functionToolName: functionToolDef.name, | |
| schemaError: schemaError instanceof Error ? schemaError.message : String(schemaError), | |
| }, | |
| 'Failed to build baseInputSchema; skipping resolved-args validation for this tool' | |
| ); | |
| } |
| // AI SDK v6 converts schema-validation errors to strings via getErrorMessage() | ||
| // before storing them in step.content. execute() errors remain Error objects and | ||
| // are already handled (OTEL span + session events) by executeToolCall / tool-wrapper. | ||
| if (typeof toolError.error !== 'string') continue; |
There was a problem hiding this comment.
The typeof toolError.error !== 'string' guard is a smart way to distinguish SDK-level schema failures (stringified) from execute() errors (Error objects). But this is implicitly coupling to AI SDK v6 internal behavior (getErrorMessage() stringification). If a future SDK version changes this — e.g., wraps schema errors in a typed object — this filter would silently start skipping them. Consider adding a brief comment noting this SDK version dependency, or alternatively checking for the absence of an Error prototype rather than presence of string type.
| visibility: 'internal', | ||
| messageType: 'tool-result', | ||
| fromSubAgentId: ctx.config.id, | ||
| metadata: { |
There was a problem hiding this comment.
The JSON.stringify(parsedArgsForResolution) comparison at tool-wrapper level (line 161-162 of tool-wrapper.ts) is duplicated from the old code, but worth noting: JSON.stringify is not a reliable deep-equality check when property ordering differs between the input and resolved output. If resolveArgs returns the same values but in a different key order, this comparison would report resolvedChanged = true and trigger unnecessary validation. For correctness this is fine (extra validation is harmless), but it could produce confusing OTEL noise if validation fails on what's effectively a no-op resolution. This is pre-existing, not introduced by this PR — just flagging it.
| baseInputSchema: ReturnType<typeof z.fromJSONSchema> | undefined; | ||
| } { | ||
| try { | ||
| const rawJson = z.toJSONSchema(inputSchema as z.ZodType) as Record<string, unknown>; |
There was a problem hiding this comment.
z.toJSONSchema(inputSchema as z.ZodType) will throw if inputSchema is already a raw JSON Schema object (not a Zod type). The catch block handles this gracefully by falling back to the original schema. However, in that fallback path baseInputSchema is undefined, meaning post-resolution validation is silently skipped for any MCP tool whose inputSchema isn't a Zod type. Is that the intended behavior? If so, it's fine — just want to confirm the fallback is deliberate rather than a gap.
|
TL;DR — Tool-chaining sentinel refs ( Key changes
Summary | 18 files | 1 commit | base: Ref-aware schema widening for tool-chain sentinel refs
|
There was a problem hiding this comment.
PR Review Summary
(4) Total Issues | Risk: Medium
🟠⚠️ Major (2) 🟠⚠️
🟠 1) mcp-tools.ts:52-78 Non-session MCP tools bypass ref-aware schema widening
Issue: The !sessionId early-return path does not call buildRefAwareInputSchema, meaning tools returned via this code path use the original inputSchema. Sentinel refs like { $tool: 'call_id' } will fail schema validation at AI SDK call time — exactly the silent failure mode this PR aims to fix.
Why: This creates inconsistent behavior: the same tool accepts refs when sessionId is present but rejects them when absent. This violates the two-schema model invariant established by this PR and reintroduces the original bug for a subset of tool invocations. The SPEC.md mentions A2A proxies as explicitly excluded, but the non-session MCP path is not documented as an intended exclusion.
Fix: Apply buildRefAwareInputSchema in the non-session branch, mirroring the pattern at lines 100-102:
if (!sessionId) {
const wrappedTools: ToolSet = {};
for (const toolSet of toolSets) {
for (const [toolName, toolDef] of Object.entries(toolSet.tools)) {
const needsApproval = toolSet.toolPolicies?.[toolName]?.needsApproval || false;
// Add ref-aware schema transformation
const { refAwareInputSchema, baseInputSchema } = buildRefAwareInputSchema(
(toolDef as any).inputSchema
);
const enhancedTool = {
...(toolDef || {}),
inputSchema: refAwareInputSchema,
baseInputSchema,
needsApproval,
};
// ... rest unchanged
}
}
}Refs:
🟠 2) generate.ts handleInvalidToolResultsFromStep has no direct unit tests
Issue: The handleInvalidToolResultsFromStep function (lines 84-188) is the core observability feature of this PR — it emits OTEL spans, session events, and persists to conversation history for SDK-level schema failures. However, there are no unit tests covering this function directly.
Why: This is critical new functionality that should have test coverage for: (1) processing tool-error parts with string errors, (2) skipping Error objects (handled elsewhere), (3) handling missing streamRequestId or conversationId gracefully, (4) catching and logging DB persistence failures.
Fix: Add unit tests for handleInvalidToolResultsFromStep:
describe('handleInvalidToolResultsFromStep', () => {
it('emits OTEL span, session events, and DB message for string tool-error', async () => { ... });
it('skips Error object errors (already handled by tool-wrapper)', async () => { ... });
it('handles missing streamRequestId gracefully', async () => { ... });
it('handles missing conversationId gracefully', async () => { ... });
it('logs warning on DB persistence failure without throwing', async () => { ... });
});Refs:
Inline Comments:
- 🟠 Major:
mcp-tools.ts:62Non-session MCP tools bypass ref-aware schema widening - 🟠 Major:
mcp-tools.ts:28Silent catch swallows schema conversion errors with no visibility
🟡 Minor (2) 🟡
🟡 1) refAwareSchema.test.ts Missing test coverage for nested arrays and JSON Schema combinators
Issue: The makeRefAwareJsonSchema function handles array items (lines 63-68) and explicitly traverses anyOf/oneOf/allOf combinators (lines 75-80) without double-wrapping. However, the test file only covers simple object properties.
Why: Tool schemas in practice often have arrays of objects or union types. Without tests for these patterns, regressions in array item handling or combinator traversal could go undetected.
Fix: Add test cases for array items and combinators:
it('handles array items with ref support', () => {
const rawJson = {
type: 'object',
properties: {
images: { type: 'array', items: { type: 'object', properties: { data: { type: 'string' } } } }
}
};
const schema = z.fromJSONSchema(makeRefAwareJsonSchema(rawJson));
expect(schema.safeParse({ images: [{ $tool: 'toolu_123' }] }).success).toBe(true);
});
it('preserves anyOf/oneOf combinator structure without double-wrapping', () => {
const rawJson = {
type: 'object',
properties: { value: { anyOf: [{ type: 'string' }, { type: 'number' }] } }
};
const schema = z.fromJSONSchema(makeRefAwareJsonSchema(rawJson));
expect(schema.safeParse({ value: { $tool: 'toolu_123' } }).success).toBe(true);
});Refs:
🟡 2) toolWrapperRefAwareValidation.test.ts Missing test for validation skipped when args unchanged
Issue: The tool-wrapper only validates when resolvedChanged is true (lines 161-162). Current tests always mock resolveArgs to return different values. No test verifies validation is correctly skipped when args are unchanged.
Why: If validation were incorrectly applied even when args are unchanged, it could cause false positives. The inverse case (validation incorrectly skipped) is also worth testing.
Fix: Add test verifying validation is skipped for unchanged args:
it('skips validation when resolved args match original args', async () => {
const originalArgs = { city: 'San Francisco' };
vi.mocked(agentSessionManager.getArtifactParser).mockReturnValue({
resolveArgs: vi.fn().mockResolvedValue(originalArgs),
} as any);
const baseInputSchema = { safeParse: vi.fn() };
const executeSpy = vi.fn().mockResolvedValue({ ok: true });
// ... wrap and execute ...
expect(baseInputSchema.safeParse).not.toHaveBeenCalled();
});Refs:
Inline Comments:
- 🟡 Minor:
function-tools.ts:84-86Silent catch provides no visibility - 🟡 Minor:
tool-wrapper.ts:161-162JSON.stringify comparison may have false negatives
💭 Consider (1) 💭
💭 1) mcp-tools.ts:19 buildRefAwareInputSchema exported from mcp-tools.ts rather than ref-aware-schema.ts
Issue: The PR introduces ref-aware-schema.ts as the canonical module for ref-aware utilities (makeRefAwareJsonSchema, makeBaseInputSchema), but buildRefAwareInputSchema is exported from mcp-tools.ts. Tests import from both files.
Why: This creates a split export surface. The peer pattern for schema utilities (packages/agents-core/src/utils/schema-conversion.ts) consolidates related functions in a single module.
Fix: Consider moving buildRefAwareInputSchema to ref-aware-schema.ts alongside the other ref-aware helpers for discoverability.
Inline Comments:
- 💭 Consider:
ref-aware-schema.ts:103-107makeBaseInputSchemais a trivial wrapper
💡 APPROVE WITH SUGGESTIONS
Summary: This PR is a well-designed solution to a real problem (silent schema rejection of sentinel refs), with solid observability additions via onStepFinish. The main issue to address is that the non-session MCP tools path (lines 52-78) bypasses ref-aware schema widening, which should be intentional and documented or fixed to maintain consistency. The new handleInvalidToolResultsFromStep function should also have unit test coverage given it's the core observability feature.
Discarded (7)
| Location | Issue | Reason Discarded |
|---|---|---|
agent-types.ts:218 |
baseInputSchema type uses inline safeParse signature | Type is correct for usage; using Zod's actual types would add import complexity |
ref-aware-schema.ts:34 |
isObjectRecord not exported | Intentionally module-private helper; no evidence of duplication |
ref-aware-schema.ts:38 |
Schema transformation does not handle $ref | JSON Schema $ref resolution is complex; MCP/function tool schemas don't typically use $ref |
generate.ts:107 |
Tool args logged to OTEL may contain sensitive data | Pre-existing pattern in tool-wrapper; redaction is an exporter-level concern |
generate.ts:145 |
DB write lacks transaction boundary | Partial persistence is acceptable; warn logging provides visibility |
generate.ts:71 |
ToolErrorPart type extraction assumes SDK structure | Acceptable coupling with SDK; runtime type guard provides defense |
AgentMcpManager.ts:388 |
baseInputSchema failure log missing schema details | Log is already informative; adding full schema could bloat logs |
Reviewers (7)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-architecture |
4 | 1 | 0 | 0 | 1 | 0 | 2 |
pr-review-tests |
6 | 2 | 0 | 0 | 0 | 0 | 4 |
pr-review-errors |
4 | 1 | 0 | 0 | 2 | 0 | 1 |
pr-review-types |
5 | 0 | 0 | 0 | 0 | 0 | 5 |
pr-review-consistency |
5 | 0 | 1 | 0 | 1 | 0 | 3 |
pr-review-llm |
4 | 0 | 0 | 0 | 1 | 0 | 3 |
pr-review-standards |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Total | 28 | 4 | 1 | 0 | 5 | 0 | 18 |
Note: pr-review-standards returned no findings, indicating core code quality is solid.
| const baseInputSchema = makeBaseInputSchema(rawJson); | ||
| const refAwareInputSchema = z.fromJSONSchema(makeRefAwareJsonSchema(rawJson)); | ||
| return { refAwareInputSchema, baseInputSchema }; | ||
| } catch { |
There was a problem hiding this comment.
🟠 MAJOR: Silent catch swallows schema conversion errors with no visibility
Issue: When z.toJSONSchema() or subsequent transformation fails, the catch block silently returns the original inputSchema and sets baseInputSchema to undefined. This disables post-resolution validation with zero logging.
Why: Operators have no way to know that resolved-args validation is disabled for specific tools, making debugging difficult when invalid resolved values slip through to execution.
Fix: Add logging to surface the failure:
| } catch { | |
| } catch (err) { | |
| logger.warn( | |
| { inputSchema: typeof inputSchema, error: err instanceof Error ? err.message : String(err) }, | |
| 'Failed to build ref-aware schema; falling back to original schema without post-resolution validation' | |
| ); | |
| return { | |
| refAwareInputSchema: inputSchema as ReturnType<typeof z.fromJSONSchema>, | |
| baseInputSchema: undefined, | |
| }; | |
| } |
Refs:
| } catch { | ||
| // baseInputSchema stays undefined | ||
| } |
There was a problem hiding this comment.
🟡 Minor: Silent catch provides no visibility into baseInputSchema failures
Issue: When makeBaseInputSchema throws, the error is silently caught with only a comment. Resolved-args validation is disabled for this tool with no logging.
Why: Debugging production issues becomes difficult when operators don't know why certain function tools skip post-resolution validation.
Fix:
| } catch { | |
| // baseInputSchema stays undefined | |
| } | |
| } catch (schemaError) { | |
| logger.debug( | |
| { functionToolName: functionToolDef.name, schemaError: schemaError instanceof Error ? schemaError.message : String(schemaError) }, | |
| 'Failed to build baseInputSchema; resolved-args validation disabled for this tool' | |
| ); | |
| } |
Refs:
| const resolvedChanged = | ||
| JSON.stringify(parsedArgsForResolution) !== JSON.stringify(resolvedArgs); |
There was a problem hiding this comment.
🟡 Minor: JSON.stringify comparison may have false negatives for key ordering
Issue: JSON.stringify doesn't guarantee consistent key ordering across different object constructions. If resolution produces an equivalent but differently-ordered object, validation may be incorrectly skipped.
Why: While rare, this could allow invalid resolved values to bypass validation when the only difference is object key order.
Fix: Consider using deep equality or always validating when artifactParser exists:
// Option 1: Always validate when artifactParser exists (simplest, safest)
if (artifactParser && validationSchema?.safeParse) {
const validation = validationSchema.safeParse(resolvedArgs);
// ...
}
// Option 2: Use lodash isEqual for accurate comparison
import isEqual from 'lodash/isEqual';
const resolvedChanged = !isEqual(parsedArgsForResolution, resolvedArgs);The validation cost is minimal compared to the false-negative risk.
| export function makeBaseInputSchema( | ||
| schema: Record<string, unknown> | ||
| ): ReturnType<typeof z.fromJSONSchema> { | ||
| return z.fromJSONSchema(schema); | ||
| } |
There was a problem hiding this comment.
💭 Consider: makeBaseInputSchema is a trivial wrapper
Issue: This function is a single-line wrapper around z.fromJSONSchema(schema) with no additional behavior.
Why: The indirection adds cognitive overhead without providing abstraction value. The function exists alongside makeRefAwareJsonSchema which does meaningful transformation, creating asymmetry.
Fix: Either inline z.fromJSONSchema calls at call sites, or add a comment explaining the wrapper's purpose (e.g., future error handling, consistency with makeRefAwareJsonSchema API).
Ito Test Report ❌14 test cases ran. 2 failed, 12 passed. Overall, 12 of 14 test cases passed, confirming stable behavior across SSE streaming chat completions, sentinel-ref handling (including adversarial isolation), tool-error persistence and conversation read APIs, MCP media image contract chaining, MCP auth boundary enforcement, and UI robustness for rapid submits and mobile tool-error flows. The two medium-severity, pre-existing defects are that non-streaming /run/api/chat can return HTTP 200 without a required conversationId and with empty assistant content, and playground refresh/back-forward during streaming can orphan conversation state and clear visible history because the client-generated playground conversationId is reset on unmount and not persisted. ❌ Failed (2)🟠 Refresh and back-forward during streaming preserves conversation continuity
Relevant code:
const { resetPlaygroundConversationId } = useAgentActions();
const conversationId = useAgentStore(({ playgroundConversationId }) => playgroundConversationId);
useEffect(() => {
// when the playground is closed the chat widget is unmounted so we need to reset the conversation id
return () => resetPlaygroundConversationId();
}, []);
playgroundConversationId: generateId(),
reset() {
const { isSidebarSessionOpen: _, ...state } = initialAgentState;
set({ ...state, playgroundConversationId: generateId() });
},
persist(agentState, {
name: 'inkeep:agent',
partialize(state) {
return {
jsonSchemaMode: state.jsonSchemaMode,
isSidebarPinnedOpen: state.isSidebarPinnedOpen,
hasTextWrap: state.hasTextWrap,
};
},
})🟠 Vercel stream chat baseline remains functional
Relevant code:
const conversationId = body.conversationId ?? getConversationId();
const activeSpan = trace.getActiveSpan();
return c.json({
id: `chat-${Date.now()}`,
object: 'chat.completion',
created: Math.floor(Date.now() / 1000),
model: agentName,
choices: [
{
index: 0,
message: {
role: 'assistant',
content: captured.hasError ? captured.errorMessage : captured.text,
},
finish_reason: result.success && !captured.hasError ? 'stop' : 'error',
},
],
usage: {
prompt_tokens: 0,
completion_tokens: 0,
total_tokens: 0,
},
});✅ Passed (12)Commit: Tell us how we did: Give Ito Feedback |















Summary
Tool chaining — where one tool call's arguments reference the output of a prior tool via sentinel objects like
{ $tool: "call_id", $path: "result.field" }— had two silent failure modes:execute(). Sentinel refs are not valid values for most schema types, so chained tool calls were silently rejected before reaching the executor — with no OTEL span, no session event, nothing in conversation history.This PR fixes both, and adds full observability for SDK-level schema failures.
Key decisions
Schema widening strategy:
ref-aware-schema.tsrecursively transforms each tool's JSON Schema so every value-position node (properties, items, additionalProperties) becomesanyOf: [<original>, <sentinel-ref-schema>]. Structural combinator nodes (anyOf/oneOf/allOf) are traversed but not widened — this prevents double-wrapping. The widened schema is what the AI SDK sees.Two-schema model: The original (pre-widened) schema is preserved as
baseInputSchemaon the tool definition alongside the widenedinputSchema. AfterArtifactParserresolves sentinel refs,tool-wrappervalidates resolved args againstbaseInputSchema— the strict constraint that refs bypassed at call time.Observability via
onStepFinish: SDK-level schema validation failures never reachexecute(), sotool-wrappercan't catch them. The only hook isonStepFinishingenerate.ts, wheretool-errorcontent parts surface these failures. Each one now gets an OTEL span, session events (for the SSE feed), and a conversation history message — same visibility as a normal execution-time failure.Image
encodingfield removed: Theencoding: 'base64'field onImageInputwas always the literal string'base64'— it carried no information. Removed from the type, schema, and server docs.Manual QA
Manually tested with the zendesk MCP + image agent.