feat(ai): AI Testing Framework — consolidation staging branch [0/7 → master]#411
Draft
ianwhitedeveloper wants to merge 3 commits intomasterfrom
Draft
feat(ai): AI Testing Framework — consolidation staging branch [0/7 → master]#411ianwhitedeveloper wants to merge 3 commits intomasterfrom
ianwhitedeveloper wants to merge 3 commits intomasterfrom
Conversation
* feat(ai): add error types, constants, and Zod schemas (PR 1/7) Foundation layer for the AI testing framework. Introduces structured error handling via error-causes and runtime-validated configuration constants via Zod schemas. Updates eslint ecmaVersion to 2022 to support numeric separators and optional chaining used throughout the framework source. Files: - source/ai-errors.js — named error types (ParseError, ValidationError, etc.) - source/ai-errors.test.js — full coverage for error descriptors and createError - source/constants.js — defaults, constraints, and Zod schemas - source/constants.test.js — 26 tests covering all schemas and boundaries - eslint.config.js — bump ecmaVersion 2017 → 2022 (prerequisite) - package.json — add error-causes and zod to production dependencies Co-authored-by: Cursor <cursoragent@cursor.com> * chore(config): bring working configs from feature branch Adds vitest.config.js e2e exclusion (source/e2e.test.js uses Riteway/Tape, not Vitest) alongside the eslint ecmaVersion 2022 bump already in place. Both changes are sourced from the working feature branch. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 1 review findings - constants.js: lazy process.cwd() default (z.string().default(() => process.cwd())) prevents stale value when cwd changes after module load - constants.js: add concurrencyMax (50) to constraints + enforce in concurrencySchema - constants.js: remove JSDoc from internal constants (not public API) - constants.test.js: add full aiTestOptionsSchema coverage (valid input, missing filePath, empty filePath, invalid agent, lazy cwd default, optional agentConfigPath) - constants.test.js: add concurrencySchema upper-bound tests - ai-errors.test.js: replace for..of loops with test.each (one named test per case) - ai-errors.test.js: expand createError integration to cover two error types - ai-errors.test.js: replace typeof handleAIErrors check with behavioral routing tests - ai-errors.js: remove forward-reference comment (extraction-parser.js not yet in scope) - eslint.config.js: Object.assign -> spread operator Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
… 2/7] (#408) * feat(ai): Utilities — Debug Logger, Concurrency Limiter, TAP YAML [PR 2/7] - Add createDebugLogger: console + file logging with buffer/flush - Add limitConcurrency: sliding-window async concurrency limiter - Add parseTAPYAML: parse judge agent TAP YAML diagnostic blocks - Add limit-concurrency.test.js (missing from PR #394) - Apply js.mdc cleanup: flush loop → single write, for-of → reduce pipeline - Replace @paralleldrive/cuid2 (not in deps) with mkdtempSync in debug-logger.test.js Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(ai): apply PR 2 review suggestions - Collapse formatMessage to concise arrow expression - Add comment to limit-concurrency for-of loop (justified async pattern) - Add flush no-op test when logFile is not configured - Use vi.useFakeTimers() in concurrency-cap test for determinism Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
Add agent-parser, extraction-parser, aggregation, and execute-agent modules with full unit test coverage. - agent-parser: parseStringResult, parseOpenCodeNDJSON, unwrapEnvelope (new shared export), unwrapAgentResult. Shared unwrapEnvelope breaks duplication between agent-parser and execute-agent (WIP fix #9). - extraction-parser: parseExtractionResult with multi-strategy JSON parsing (direct, markdown fence, pre-parsed object), and resolveImportPaths for prompt file resolution. - aggregation: normalizeJudgment, calculateRequiredPasses, aggregatePerAssertionResults with Zod validation. - execute-agent: extracted from ai-runner.js to break the circular dependency (ai-runner ↔ test-extractor). Logger injected at executeAgent call site rather than created inside spawnProcess (WIP fix #8). Uses shared unwrapEnvelope from agent-parser. - Test files use test.each for all table-driven cases per convention. 164 tests pass, 0 lint errors, TypeScript checks pass. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 3 code review findings - aggregation.js: validate once in aggregatePerAssertionResults — capture the Zod-validated result and compute Math.ceil inline, eliminating the redundant second schema parse inside calculateRequiredPasses - aggregation.js: remove misleading optional chaining (raw?.passed etc.) after the null-guard throw; use plain property access - agent-parser.js: replace acc.push() with [...acc, text] in reduce accumulator to prefer immutability per JS style guide - agent-parser.test.js: drop redundant "parsed object:" prefix from unwrapEnvelope test.each given fields; remove duplicate standalone "no result key" test that overlapped with test.each row - aggregation.test.js: remove redundant export-existence assertion for normalizeJudgment; add empty perAssertionResults edge case (vacuous truth — every() on [] returns true) - execute-agent.test.js: strengthen parseOutput test to verify stdout and logger are threaded through as expected (documents WIP fix #8) Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 3 author review findings - aggregation.js: rename `raw` param to `judgeResponse` and fold into single options object for normalizeJudgment; removes the two-argument signature (breaking change, callers updated) - aggregation.js: remove calculateRequiredPasses — math is inlined in aggregatePerAssertionResults, eliminating double schema parse - aggregation.test.js: remove calculateRequiredPasses describe block; fix Try() usage (direct fn ref, not arrow wrapper); update all normalizeJudgment call sites to new single-options signature - execute-agent.js: extract magic number 500 to maxOutputPreviewLength constant (camelCase per javascript.mdc); applied to all 3 truncation sites - execute-agent.test.js: replace try/catch antipatterns with await Try(); add Try import from riteway.js - extraction-parser.test.js: strengthen weak typeof assertions to check specific fields; strengthen cause !== undefined to cause.name === SyntaxError 151 tests pass, 0 lint errors, TypeScript clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 3 follow-up review findings - constants.js: rename calculateRequiredPassesSchema to aggregationParamsSchema — name now reflects what the schema validates (aggregation input params) rather than the deleted calculateRequiredPasses function; update all import sites - aggregation.test.js: add 6 missing Zod validation edge cases for aggregatePerAssertionResults (zero runs, negative runs, non-integer runs, NaN runs, negative threshold, NaN threshold) — coverage gap introduced when calculateRequiredPasses and its tests were removed; all cases now exercised via aggregatePerAssertionResults test.each 157 tests pass, 0 lint errors, TypeScript clean. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(test): complete PR review remediation 🐛 - Remove weak instanceof Error assertions 🔄 - Add threshold calculation verification tests Tests now verify threshold-based pass/fail logic directly 164 tests passing, 0 lint errors, TypeScript clean Co-authored-by: Ian White <ian.white.developer@gmail.com> * fix(ai): remove implementation detail from test - execute-agent.test.js: remove logger type assertion from parseOutput test — typeof checks violate tdd.mdc:64 and logger threading is an implementation detail; the three remaining assertions (call count, stdout arg, parsed result) collectively verify correct integration 164 tests pass, 0 lint errors, TypeScript clean. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
Collaborator
|
I'm okay with the strategies here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
This is the staging branch for a structured consolidation of draft PR #394 — the Riteway AI Testing Framework. Per Eric's consolidation request, PR #394 (80+ commits, 104 files, ~21K lines, ~60% docs/planning) is being decomposed into 7 small, focused PRs — one module per PR, in dependency order — each with functional requirements and unit tests, ruthlessly reviewed before merging here.
This branch is NOT ready to merge to master. It will be when all 7 PRs are merged into it and a final review passes.
Epic
Enable
riteway ai <promptfile>— a CLI command that reads SudoLang test files, delegates execution to AI agents, and outputs results in TAP format. Treats prompts as first-class testable units, supporting configurable runs, pass thresholds, parallel execution, and rich TAP markdown output.Full requirements:
tasks/2026-01-22-riteway-ai-testing-framework.mdWhy Not Cherry-Pick or Rebase PR #394?
ai-runner.js↔test-extractor.js) needed to be resolved firstApproach: Fresh branches from this consolidation base, copy files from the feature branch, fix WIP issues during consolidation, review each PR independently before merging here.
Dependency Graph (module architecture)
No cycles. Every module has a colocated test file.
7-PR Progress
ai-errors.js,constants.js+ testsagent-parser,extraction-parser,aggregation,execute-agent+ testsagent-config,validation+ tests + fixturestest-extractor,ai-runner+ teststest-output,ai-command,bin/riteway+ testse2e.test.js, fixtures, vitest/eslint configCurrent test count: 163 tests merged (PRs 1–3) + 23 in PR 4 review = 186 passing.
WIP Issues From Original PR (13 total)
for (constloops in testsformatMediadead codeTry(() => fn(args))syntaxunwrapRawEnvelopeduplicationunwrapEnvelope)--trustflagOpen Architectural Questions (surfaced in PR 4, flagged for Eric)
Two design issues in
agent-config.jsare not regressions from the feature branch, but consolidation is the right moment to decide before PR 6 (CLI integration) wires up--agent-config.1. Built-in agent configs hardcode third-party CLI flags
getAgentConfig()returns hardcoded flag arrays for claude, opencode, and cursor. If any of those CLIs rename a flag, every riteway user breaks until we ship an update.Proposed:
riteway ai initwrites a starter config file to the project. Built-in defaults stay for first-run convenience; teams who want stability own their config file. Library stops being the source of truth for third-party CLI interfaces.2.
parseOutputfunction can't live in a JSON config fileCustom agents loaded via
--agent-config my-agent.jsonare silently locked to default JSON stdout parsing. A custom OpenCode-compatible agent can't declare NDJSON output format in its config — becauseparseOutputis a JS function and JSON can't serialize functions.Proposed: Replace the
parseOutputfunction field with a declarative"outputFormat": "json" | "ndjson" | "text"string. Riteway maps strategy names to parsers. Schema becomes fully serializable; custom agents become fully capable.These two changes should land as a PR 4 follow-up before PR 6 merges.
Merge Plan