Skip to content

Comments

feat(ai): Utilities — Debug Logger, Concurrency Limiter, TAP YAML [PR 2/7]#408

Merged
ianwhitedeveloper merged 2 commits intoai-testing-framework-implementation-consolidationfrom
pr/ai-utilities
Feb 18, 2026
Merged

feat(ai): Utilities — Debug Logger, Concurrency Limiter, TAP YAML [PR 2/7]#408
ianwhitedeveloper merged 2 commits intoai-testing-framework-implementation-consolidationfrom
pr/ai-utilities

Conversation

@ianwhitedeveloper
Copy link
Collaborator

Summary

PR 2 of 7 in the staged consolidation of draft PR #394 — the AI Testing Framework implementation.

This PR adds three foundational utility modules that all higher-level modules (parsers, runner, CLI) depend on. It is self-contained with no dependencies beyond PR 1 (already merged into the consolidation branch).

Consolidation strategy: PR #394 has been marked draft and its work is being decomposed into 7 small, focused PRs in dependency order, each targeting this consolidation branch. See PR #394 for full context and history.


Files Added

File Purpose
source/debug-logger.js Factory that creates a debug logger writing to console.error and/or a log file with buffered flush
source/debug-logger.test.js 8 tests covering console output, file writing, timestamps, object serialization, structured methods, no-op flush
source/limit-concurrency.js Sliding-window async concurrency limiter — runs up to limit tasks concurrently, preserves result order
source/limit-concurrency.test.js 5 tests (written fresh — file was missing from PR #394); uses vi.useFakeTimers() for deterministic concurrency cap verification
source/tap-yaml.js Parser for judge agent TAP YAML diagnostic blocks (--- … ---); throws ParseError (from PR 1 error-causes) on invalid input
source/tap-yaml.test.js 6 tests covering passed boolean, score number coercion, quoted/unquoted strings, missing fields, error path

Total: 582 lines added across 6 files (53 source + 472 tests, excluding blank lines).


Epic Requirements Covered

From the AI Testing Framework epic:

  • ✅ Debug logging infrastructure for agent subprocess tracing
  • ✅ Parallel execution with concurrency limits (supports --runs N with bounded parallelism)
  • ✅ TAP YAML parsing for judge agent response evaluation

WIP Issues Fixed (from consolidation plan)

# Issue Resolution
6 Redundant test comments Cleaned throughout
@paralleldrive/cuid2 imported in test but not in package.json Replaced with mkdtempSync from Node fs
limit-concurrency.test.js missing entirely from PR #394 Written from scratch

Cleanup Applied (js.mdc)

  • debug-logger.js: flush() loop → single appendFileSync(logFile, buffer.join('')) (one I/O call instead of N)
  • debug-logger.js: formatMessage collapsed to concise arrow expression
  • tap-yaml.js: for...of YAML parse loop → functional .map().filter().reduce() pipeline
  • limit-concurrency.js: for...of with await retained (sequential task launch is required to throttle pool entry — functional reduce would serialize, not parallelize); added clarifying comment

Test Results

✓ source/debug-logger.test.js      (8 tests)
✓ source/limit-concurrency.test.js (5 tests)
✓ source/tap-yaml.test.js          (6 tests)
✓ source/ai-errors.test.js         (31 tests)  ← PR 1, unaffected
✓ source/constants.test.js         (34 tests)  ← PR 1, unaffected
✓ source/vitest.test.jsx           (6 tests)

Tests: 90 passed | Lint: clean | TypeScript: clean

Dependencies

  • Depends on: PR 1 (ai-errors.js, constants.js) — already merged into consolidation branch
  • Required by: PR 3 (parsers + execute-agent), PR 4 (config), PR 5 (runner), PR 6 (CLI)

PR Sequence

PR Module Status
1 Foundation — Error Types + Constants ✅ Merged
2 Utilities — Debug Logger, Concurrency Limiter, TAP YAML 👆 This PR
3 Parsers + Execute Agent ⏳ Pending
4 Config + Validation ⏳ Pending
5 Test Extractor + Core Runner ⏳ Pending
6 Test Output + CLI ⏳ Pending
7 E2E Tests + Fixtures + Config ⏳ Pending

Made with Cursor

ianwhitedeveloper and others added 2 commits February 18, 2026 08:15
… 2/7]

- Add createDebugLogger: console + file logging with buffer/flush
- Add limitConcurrency: sliding-window async concurrency limiter
- Add parseTAPYAML: parse judge agent TAP YAML diagnostic blocks
- Add limit-concurrency.test.js (missing from PR #394)
- Apply js.mdc cleanup: flush loop → single write, for-of → reduce pipeline
- Replace @paralleldrive/cuid2 (not in deps) with mkdtempSync in debug-logger.test.js

Co-authored-by: Cursor <cursoragent@cursor.com>
- Collapse formatMessage to concise arrow expression
- Add comment to limit-concurrency for-of loop (justified async pattern)
- Add flush no-op test when logFile is not configured
- Use vi.useFakeTimers() in concurrency-cap test for determinism

Co-authored-by: Cursor <cursoragent@cursor.com>
@ianwhitedeveloper
Copy link
Collaborator Author

@cursor please /review

@cursor
Copy link

cursor bot commented Feb 18, 2026

🔬 Code Review Complete

I've conducted a thorough review of PR 2/7 (Utilities — Debug Logger, Concurrency Limiter, TAP YAML). Here's my assessment:

✅ Overall Status: APPROVED — High Quality, Production-Ready

Test Results:

  • ✓ 90 tests passing (8 + 5 + 6 new utility tests + 65 existing)
  • ✓ Lint clean
  • ✓ TypeScript checks clean

Code Quality Assessment

debug-logger.js — Excellent

  • Functional programming: Good use of map(), arrow functions, destructuring
  • Clear API: Well-documented JSDoc with proper parameter defaults
  • Performance: Single appendFileSync call in flush() is optimal
  • Naming: Clear verb-based method names (log, command, flush)
  • Immutability: Proper use of const; buffer mutation is an acceptable optimization for internal state

limit-concurrency.js — Excellent

  • Async patterns: Clean use of Promise.race() and Promise.all()
  • Performance-critical: The for...of loop with await is correct—sequential task launch is required to throttle pool entry (functional reduce would serialize, not parallelize)
  • Documentation: Clear comment explaining why imperative loop is retained
  • Edge cases: Properly handles empty arrays and task failures

tap-yaml.js — Excellent

  • Functional pipeline: Beautiful .split().map().filter().reduce() chain
  • Error handling: Proper use of createError with structured diagnostic data
  • Type coercion: Clean handling of passed (boolean) and score (number)
  • Regex: Strict ^---\s*\n...\n---\s*$/m pattern prevents LLM hallucinations

Test Quality Assessment

All tests follow Riteway TDD principles and answer the 5 questions:

debug-logger.test.js (8 tests):

  • ✅ Proper isolation with beforeEach/afterEach spies
  • ✅ Cleanup with rmSync in finally blocks
  • ✅ Tests both console and file output paths
  • ✅ Validates timestamp format and JSON serialization

limit-concurrency.test.js (5 tests):

  • Excellent use of vi.useFakeTimers() for deterministic concurrency verification
  • ✅ Tests concurrency cap enforcement (max 2 concurrent)
  • ✅ Tests result ordering preservation
  • ✅ Tests error propagation with Try

tap-yaml.test.js (6 tests):

  • ✅ Tests boolean parsing (passed: true/false)
  • ✅ Tests type coercion (score → number)
  • ✅ Tests quoted/unquoted strings
  • ✅ Tests error path with ParseError validation

Adherence to Project Standards

javascript.mdc compliance:

  • Functional programming patterns throughout
  • Minimal mutation (only for performance-critical internal state)
  • Clear, verb-based naming
  • Proper destructuring and defaults in function signatures

tdd.mdc compliance:

  • Tests are Readable, Isolated, Thorough, Explicit (RITE)
  • All tests use given/should assertions
  • No shared mutable state between tests

Cleanup from PR description:

  • flush() → single appendFileSync (not N calls) ✓
  • tap-yaml.js → functional pipeline ✓
  • limit-concurrency → retained for...of with explanatory comment ✓
  • @paralleldrive/cuid2 → replaced with mkdtempSync

Minor Observations (Not Blockers)

  1. Buffer mutation in debug-logger.js (line 49): buffer.length = 0 is a micro-optimization. A more functional approach would be reassigning the buffer, but this is acceptable for internal state management.

  2. Executing array mutation in limit-concurrency.js (line 15): executing.splice() mutates the array. Given the performance-critical nature of concurrency management, this is acceptable.

These are intentional performance optimizations and do not violate the spirit of the functional programming guidelines.

Recommendation

✅ LGTM — This PR is ready to merge. The code is clean, well-tested, follows all project standards, and represents the quality expected from a top-tier software engineering team.

The PR successfully delivers the three foundational utilities needed for the AI Testing Framework while maintaining excellent code quality and comprehensive test coverage.

Open in Cursor Open in Web

@ianwhitedeveloper ianwhitedeveloper merged commit 5bcb456 into ai-testing-framework-implementation-consolidation Feb 18, 2026
@ianwhitedeveloper ianwhitedeveloper deleted the pr/ai-utilities branch February 18, 2026 14:41
Comment on lines +12 to +18
beforeEach(() => {
consoleErrorSpy = vi.spyOn(console, 'error').mockImplementation(() => {});
});

afterEach(() => {
consoleErrorSpy.mockRestore();
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: should we use onTestFinished instead to keep tests self contained and explicit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remediated in PR #412: replaced the shared beforeEach/afterEach spy pattern with per-test onTestFinished(() => consoleErrorSpy.mockRestore()) in every test that creates a spy. The file-based tests' try/finally { rmSync(...) } cleanup has also been migrated to onTestFinished for full consistency. See: #412

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good things: Clean factory function (no class), destructured options with defaults, buffered writes, composable design. Solid overall.

Issues to address:

  1. Mutation via buffer.length = 0 — This is a subtle mutation pattern. Consider replacing the buffer entirely or using a more explicit approach:
const flush = () => {
  if (!logFile || buffer.length === 0) return;
  appendFileSync(logFile, buffer.join(''));
  buffer.splice(0, buffer.length);
};

Though honestly, since buffer is internal mutable state by design (accumulating log entries), this is a pragmatic trade-off. If you wanted to go fully immutable you'd need a different architecture (e.g., returning new logger instances). The length = 0 trick is just less idiomatic than splice(0) — both mutate.

  1. formatMessage silently swallows errors from JSON.stringify — Circular references will throw. Consider:
const formatMessage = (parts) =>
  parts.map(part => {
    if (typeof part !== 'object') return String(part);
    try { return JSON.stringify(part); }
    catch { return '[Circular]'; }
  }).join(' ');
  1. process naming collision — Exporting as process shadows Node's global process. The internal rename to logProcess shows awareness, but consumers doing const { process } = createDebugLogger() will shadow the global. Consider naming it logProcess in the public API too, or something like output.

  2. writeToFile is misleading — It doesn't write to a file; it writes to a buffer. Call it bufferMessage or enqueue.

  3. No flush on exit — If the process crashes, buffered logs are lost. Consider registering a process.on('exit', flush) or documenting that the consumer is responsible for calling flush.

  4. Missing logFile validation — If someone passes logFile: 123, appendFileSync will fail at flush time, far from the source of the bug. A guard in the factory would surface this earlier.

Minor nits:

  • The docblock says @returns {Object} — could be more specific with @returns {{ log, command, process, result, flush }} or a @typedef.
  • args = [] default in command is fine but the function could just use rest params: const command = (cmd, ...args) => log(\Command: ${cmd} ${args.join(' ')}`)`.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remediated in PR #412 — all 6 issues addressed:

  1. buffer.length = 0 → kept (pragmatic internal state; noted as known deviation from immutability preference)
  2. Circular ref guard added to formatMessage (try/catch returns '[Circular]')
  3. process export renamed to logProcess — no more global shadowing
  4. writeToFile renamed to bufferEntry — accurately describes deferred buffering
  5. Flush-on-exit responsibility documented in JSDoc; process.on('exit') avoided (side effect in factory)
  6. logFile type guard added — throws TypeError at factory call time, not at flush time
    Minor nits: @returns updated with specific method list; command(cmd, args=[]) converted to rest params command(cmd, ...args).
    See: fix(ai): Retroactive review remediation — PR 1-3 findings #412

}
}

return Promise.all(results);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one task throws, Promise.all rejects and all other results are lost. Should we consider Promise.allSettled or document the fail-fast behavior explicitly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remediated in PR #412: limitConcurrency JSDoc now explicitly documents the fail-fast semantics — "if any task rejects, the returned promise rejects immediately and remaining results are discarded." Behavior kept as-is (intentional fail-fast for the test runner use case). See: #412

results.push(promise);
executing.push(promise);

if (executing.length >= limit) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative or 0 limit would cause sequential instead of concurrent, wouldn't it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remediated in PR #412: added a RangeError guard at the top of limitConcurrencyif (!Number.isInteger(limit) || limit < 1) throw new RangeError(...). Covers zero, negative, non-integer, and NaN inputs. Two separate tests added: 'throws RangeError for zero limit' and 'throws RangeError for negative limit'. See: #412

ianwhitedeveloper added a commit that referenced this pull request Feb 21, 2026
- test(ai-errors): remove error-causes API tests; keep only
  handleAIErrors behavioral routing (ericelliott/janhesters #407)
- test(constants): remove defaults/constraints value-only blocks;
  replace tautological expected: defaults.X with literals (ericelliott #407)
- fix(debug-logger): rename writeToFile→bufferEntry, process→logProcess
  export; add logFile type guard; circular ref safety in formatMessage;
  command() rest params; improved JSDoc (janhesters #408)
- test(debug-logger): onTestFinished for all teardown; add circular
  ref and logFile TypeError tests; flush no-op debug:false (janhesters #408)
- fix(limit-concurrency): guard non-positive limit with RangeError;
  onTestFinished for fake timer teardown; document fail-fast (janhesters #408)
- test(agent-parser): replace partial assertions with full expected
  values including ndjsonLength (janhesters #409)
- test(extraction-parser): replace 4x multi-assert blocks with single
  full-object assertions (janhesters #409)

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants