feat(ai): AI Testing Framework — consolidation staging branch [0/7 → master] by ianwhitedeveloper · Pull Request #411 · paralleldrive/riteway

ianwhitedeveloper · 2026-02-19T15:45:20Z

Context

This is the staging branch for a structured consolidation of draft PR #394 — the Riteway AI Testing Framework. Per Eric's consolidation request, PR #394 (80+ commits, 104 files, ~21K lines, ~60% docs/planning) is being decomposed into 7 small, focused PRs — one module per PR, in dependency order — each with functional requirements and unit tests, ruthlessly reviewed before merging here.

This branch is NOT ready to merge to master. It will be when all 7 PRs are merged into it and a final review passes.

Epic

Enable riteway ai <promptfile> — a CLI command that reads SudoLang test files, delegates execution to AI agents, and outputs results in TAP format. Treats prompts as first-class testable units, supporting configurable runs, pass thresholds, parallel execution, and rich TAP markdown output.

Full requirements: tasks/2026-01-22-riteway-ai-testing-framework.md

Why Not Cherry-Pick or Rebase PR #394?

80+ commits interleave multiple modules — no clean per-module slices
Duplicate commits from prior rebases make cherry-pick impractical
~60% of changed files are docs/planning that must stay out of production PRs
Circular dependency (ai-runner.js ↔ test-extractor.js) needed to be resolved first

Approach: Fresh branches from this consolidation base, copy files from the feature branch, fix WIP issues during consolidation, review each PR independently before merging here.

Dependency Graph (module architecture)

ai-errors.js  (leaf)       constants.js  (leaf)
    ↓                           ↓
debug-logger.js            tap-yaml.js
    ↓
agent-parser.js  ←  ai-errors
extraction-parser.js  ←  ai-errors
execute-agent.js  ←  ai-errors, debug-logger, agent-parser
aggregation.js  ←  ai-errors, constants
    ↓
agent-config.js  ←  ai-errors, agent-parser      [PR 4]
validation.js  ←  ai-errors, debug-logger         [PR 4]
    ↓
test-extractor.js  ←  execute-agent               [PR 5]
ai-runner.js  ←  all prior                        [PR 5]
    ↓
test-output.js                                    [PR 6]
ai-command.js  ←  all prior                       [PR 6]
bin/riteway.js  (modifications)                   [PR 6]
    ↓
e2e.test.js  +  fixtures  +  config               [PR 7]

No cycles. Every module has a colocated test file.

7-PR Progress

#	PR	Files	Status
1	Foundation — Error Types + Constants	`ai-errors.js`, `constants.js` + tests	✅ Merged (#407)
2	Utilities — Debug Logger, Concurrency, TAP YAML	5 files	✅ Merged (#408)
3	Parsers + Execute Agent	`agent-parser`, `extraction-parser`, `aggregation`, `execute-agent` + tests	✅ Merged (#409)
4	Config + Validation	`agent-config`, `validation` + tests + fixtures	🔍 In review (#410)
5	Test Extractor + Core Runner	`test-extractor`, `ai-runner` + tests	⏳ Pending
6	Test Output + CLI Integration	`test-output`, `ai-command`, `bin/riteway` + tests	⏳ Pending
7	E2E Tests + Fixtures + Config	`e2e.test.js`, fixtures, vitest/eslint config	⏳ Pending

Current test count: 163 tests merged (PRs 1–3) + 23 in PR 4 review = 186 passing.

WIP Issues From Original PR (13 total)

#	Issue	Status
1	`for (const` loops in tests	✅ Zero instances — resolved
2	agent-config schema comment verbose	✅ Resolved in PR 4
3	Fixtures README outdated	⏳ PR 7
4	`formatMedia` dead code	✅ Decision documented — remove in PR 6
5	test-output.js dead call	✅ Removed with #4
6	Redundant test comments	✅ None found in PRs 1–4
7	`Try(() => fn(args))` syntax	✅ Valid — no change
8	ai-runner logger coupling	✅ Resolved in PR 3 (injected logger)
9	`unwrapRawEnvelope` duplication	✅ Resolved in PR 3 (shared `unwrapEnvelope`)
10	Cursor agent `--trust` flag	✅ Resolved in PR 4
11	Hardcoded defaults in tests	✅ Explicit per TDD rules
12	Error handling/Zod placement	⏳ PR 5
13	Re-exports in test-extractor	⏳ PR 5

Open Architectural Questions (surfaced in PR 4, flagged for Eric)

Two design issues in agent-config.js are not regressions from the feature branch, but consolidation is the right moment to decide before PR 6 (CLI integration) wires up --agent-config.

1. Built-in agent configs hardcode third-party CLI flags

getAgentConfig() returns hardcoded flag arrays for claude, opencode, and cursor. If any of those CLIs rename a flag, every riteway user breaks until we ship an update.

Proposed: riteway ai init writes a starter config file to the project. Built-in defaults stay for first-run convenience; teams who want stability own their config file. Library stops being the source of truth for third-party CLI interfaces.

2. parseOutput function can't live in a JSON config file

Custom agents loaded via --agent-config my-agent.json are silently locked to default JSON stdout parsing. A custom OpenCode-compatible agent can't declare NDJSON output format in its config — because parseOutput is a JS function and JSON can't serialize functions.

Proposed: Replace the parseOutput function field with a declarative "outputFormat": "json" | "ndjson" | "text" string. Riteway maps strategy names to parsers. Schema becomes fully serializable; custom agents become fully capable.

These two changes should land as a PR 4 follow-up before PR 6 merges.

Merge Plan

Each topic PR targets this branch (not master)
Agent + human review before each merge
When all 7 are merged here and tests are green: final review, then PR this → master

* feat(ai): add error types, constants, and Zod schemas (PR 1/7) Foundation layer for the AI testing framework. Introduces structured error handling via error-causes and runtime-validated configuration constants via Zod schemas. Updates eslint ecmaVersion to 2022 to support numeric separators and optional chaining used throughout the framework source. Files: - source/ai-errors.js — named error types (ParseError, ValidationError, etc.) - source/ai-errors.test.js — full coverage for error descriptors and createError - source/constants.js — defaults, constraints, and Zod schemas - source/constants.test.js — 26 tests covering all schemas and boundaries - eslint.config.js — bump ecmaVersion 2017 → 2022 (prerequisite) - package.json — add error-causes and zod to production dependencies Co-authored-by: Cursor <cursoragent@cursor.com> * chore(config): bring working configs from feature branch Adds vitest.config.js e2e exclusion (source/e2e.test.js uses Riteway/Tape, not Vitest) alongside the eslint ecmaVersion 2022 bump already in place. Both changes are sourced from the working feature branch. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 1 review findings - constants.js: lazy process.cwd() default (z.string().default(() => process.cwd())) prevents stale value when cwd changes after module load - constants.js: add concurrencyMax (50) to constraints + enforce in concurrencySchema - constants.js: remove JSDoc from internal constants (not public API) - constants.test.js: add full aiTestOptionsSchema coverage (valid input, missing filePath, empty filePath, invalid agent, lazy cwd default, optional agentConfigPath) - constants.test.js: add concurrencySchema upper-bound tests - ai-errors.test.js: replace for..of loops with test.each (one named test per case) - ai-errors.test.js: expand createError integration to cover two error types - ai-errors.test.js: replace typeof handleAIErrors check with behavioral routing tests - ai-errors.js: remove forward-reference comment (extraction-parser.js not yet in scope) - eslint.config.js: Object.assign -> spread operator Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

… 2/7] (#408) * feat(ai): Utilities — Debug Logger, Concurrency Limiter, TAP YAML [PR 2/7] - Add createDebugLogger: console + file logging with buffer/flush - Add limitConcurrency: sliding-window async concurrency limiter - Add parseTAPYAML: parse judge agent TAP YAML diagnostic blocks - Add limit-concurrency.test.js (missing from PR #394) - Apply js.mdc cleanup: flush loop → single write, for-of → reduce pipeline - Replace @paralleldrive/cuid2 (not in deps) with mkdtempSync in debug-logger.test.js Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(ai): apply PR 2 review suggestions - Collapse formatMessage to concise arrow expression - Add comment to limit-concurrency for-of loop (justified async pattern) - Add flush no-op test when logFile is not configured - Use vi.useFakeTimers() in concurrency-cap test for determinism Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

Add agent-parser, extraction-parser, aggregation, and execute-agent modules with full unit test coverage. - agent-parser: parseStringResult, parseOpenCodeNDJSON, unwrapEnvelope (new shared export), unwrapAgentResult. Shared unwrapEnvelope breaks duplication between agent-parser and execute-agent (WIP fix #9). - extraction-parser: parseExtractionResult with multi-strategy JSON parsing (direct, markdown fence, pre-parsed object), and resolveImportPaths for prompt file resolution. - aggregation: normalizeJudgment, calculateRequiredPasses, aggregatePerAssertionResults with Zod validation. - execute-agent: extracted from ai-runner.js to break the circular dependency (ai-runner ↔ test-extractor). Logger injected at executeAgent call site rather than created inside spawnProcess (WIP fix #8). Uses shared unwrapEnvelope from agent-parser. - Test files use test.each for all table-driven cases per convention. 164 tests pass, 0 lint errors, TypeScript checks pass. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 3 code review findings - aggregation.js: validate once in aggregatePerAssertionResults — capture the Zod-validated result and compute Math.ceil inline, eliminating the redundant second schema parse inside calculateRequiredPasses - aggregation.js: remove misleading optional chaining (raw?.passed etc.) after the null-guard throw; use plain property access - agent-parser.js: replace acc.push() with [...acc, text] in reduce accumulator to prefer immutability per JS style guide - agent-parser.test.js: drop redundant "parsed object:" prefix from unwrapEnvelope test.each given fields; remove duplicate standalone "no result key" test that overlapped with test.each row - aggregation.test.js: remove redundant export-existence assertion for normalizeJudgment; add empty perAssertionResults edge case (vacuous truth — every() on [] returns true) - execute-agent.test.js: strengthen parseOutput test to verify stdout and logger are threaded through as expected (documents WIP fix #8) Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 3 author review findings - aggregation.js: rename `raw` param to `judgeResponse` and fold into single options object for normalizeJudgment; removes the two-argument signature (breaking change, callers updated) - aggregation.js: remove calculateRequiredPasses — math is inlined in aggregatePerAssertionResults, eliminating double schema parse - aggregation.test.js: remove calculateRequiredPasses describe block; fix Try() usage (direct fn ref, not arrow wrapper); update all normalizeJudgment call sites to new single-options signature - execute-agent.js: extract magic number 500 to maxOutputPreviewLength constant (camelCase per javascript.mdc); applied to all 3 truncation sites - execute-agent.test.js: replace try/catch antipatterns with await Try(); add Try import from riteway.js - extraction-parser.test.js: strengthen weak typeof assertions to check specific fields; strengthen cause !== undefined to cause.name === SyntaxError 151 tests pass, 0 lint errors, TypeScript clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 3 follow-up review findings - constants.js: rename calculateRequiredPassesSchema to aggregationParamsSchema — name now reflects what the schema validates (aggregation input params) rather than the deleted calculateRequiredPasses function; update all import sites - aggregation.test.js: add 6 missing Zod validation edge cases for aggregatePerAssertionResults (zero runs, negative runs, non-integer runs, NaN runs, negative threshold, NaN threshold) — coverage gap introduced when calculateRequiredPasses and its tests were removed; all cases now exercised via aggregatePerAssertionResults test.each 157 tests pass, 0 lint errors, TypeScript clean. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(test): complete PR review remediation 🐛 - Remove weak instanceof Error assertions 🔄 - Add threshold calculation verification tests Tests now verify threshold-based pass/fail logic directly 164 tests passing, 0 lint errors, TypeScript clean Co-authored-by: Ian White <ian.white.developer@gmail.com> * fix(ai): remove implementation detail from test - execute-agent.test.js: remove logger type assertion from parseOutput test — typeof checks violate tdd.mdc:64 and logger threading is an implementation detail; the three remaining assertions (call count, stdout arg, parsed result) collectively verify correct integration 164 tests pass, 0 lint errors, TypeScript clean. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

ericelliott · 2026-02-20T05:10:10Z

I'm okay with the strategies here.

ianwhitedeveloper and others added 3 commits February 18, 2026 08:07

ianwhitedeveloper mentioned this pull request Feb 21, 2026

fix(ai): Retroactive review remediation — PR 1-3 findings #412

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat(ai): AI Testing Framework — consolidation staging branch [0/7 → master]#411

feat(ai): AI Testing Framework — consolidation staging branch [0/7 → master]#411
ianwhitedeveloper wants to merge 3 commits intomasterfrom
ai-testing-framework-implementation-consolidation

ianwhitedeveloper commented Feb 19, 2026

Uh oh!

ericelliott commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

ianwhitedeveloper commented Feb 19, 2026

Context

Epic

Why Not Cherry-Pick or Rebase PR #394?

Dependency Graph (module architecture)

7-PR Progress

WIP Issues From Original PR (13 total)

Open Architectural Questions (surfaced in PR 4, flagged for Eric)

Merge Plan

Uh oh!

ericelliott commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants