Skip to content

sprint/WINTEST: functional Pester test suite for Windows engine#369

Closed
muunkky wants to merge 95 commits intoPeonPing:mainfrom
muunkky:sprint/WINTEST
Closed

sprint/WINTEST: functional Pester test suite for Windows engine#369
muunkky wants to merge 95 commits intoPeonPing:mainfrom
muunkky:sprint/WINTEST

Conversation

@muunkky
Copy link
Contributor

@muunkky muunkky commented Mar 16, 2026

Stacks on #365 (SMARTPACK) and #366 (SMARTPACKDEBT). Review scope is the test suite only — 8 files, ~2,800 lines.

Motivation

The Windows hook engine (peon.ps1 embedded in install.ps1) had zero functional tests. Structural syntax checks existed in adapters-windows.Tests.ps1, but nothing verified that events actually route to the correct CESP categories, that config toggles suppress sounds, that state management (debounce, no-repeat, spam detection, TTL expiry) works, or that security boundaries hold. The SMARTPACKDEBT fixes were written and reviewed without any way to run them on Windows. This sprint builds the test infrastructure that proves the engine works.

What changed

Shared test harness (tests/windows-setup.ps1)

New-PeonTestEnvironment extracts the embedded peon.ps1 from install.ps1 using AST parsing, creates an isolated temp directory with mock win-play.ps1 (logs calls instead of playing audio), mock pack manifests with real CESP category structure, default config, and empty state. Supports ConfigOverrides and StateOverrides for per-test customization. Invoke-PeonHook pipes CESP JSON to the extracted script and captures exit code + audio log. New-CespJson generates well-formed event payloads.

Engine tests (tests/peon-engine.Tests.ps1 — 17 scenarios)

Event routing (SessionStart, Stop, PermissionRequest, PostToolUseFailure, SubagentStart), notification suppression (permission_prompt, idle_prompt), Cursor camelCase remapping, config behavior (enabled toggle, category toggles, volume passthrough, missing config resilience), and state management (Stop debounce, no-repeat logic, spam detection threshold, session TTL expiry, corrupted state recovery, empty stdin).

Adapter tests (tests/peon-adapters.Tests.ps1 — 12 scenarios)

Functional tests for all 12 Windows PowerShell adapters: each adapter is invoked with a mock event and verified to produce correct CESP JSON output with proper hook_event_name mapping. Also validates daemon mode flags, FileSystemWatcher usage patterns, and absence of ExecutionPolicy Bypass.

Security tests (tests/peon-security.Tests.ps1 — 15 scenarios)

hook-handle-use.ps1: pack name injection, path traversal, nonexistent pack handling, CLI vs hook mode behavior. win-play.ps1: path traversal in file argument, volume clamping (negative, >1.0, non-numeric), missing file handling, backend-specific argument validation (ffplay, mpv, vlc, pwsh MediaPlayer).

Pack selection tests (tests/peon-packs.Tests.ps1 — 7 scenarios)

Pack selection hierarchy: default_pack config, default_pack with active_pack legacy fallback, session_override via state, round-robin rotation, random rotation, session_override priority over rotation. Scenarios 4-7 (path_rules) deferred pending engine port.

CI integration

.github/workflows/test.yml: switched from explicit file list to Get-ChildItem -Filter *.Tests.ps1 auto-discovery so new test files are picked up automatically.

Fixes found during testing

  • install.ps1: removed stray backticks from merge artifacts causing parse errors
  • install.ps1: hardened config serialization extraction regex in the test harness
  • install.ps1: fixed stale active_pack test assertions from SMARTPACK renames
  • install.ps1: fixed session_override rotation mode handling

Verification

Real Windows 10 run (PowerShell 5.1):

Tests Passed: 46, Failed: 0, Skipped: 1 (74.14s)

The 1 skipped test is Scenario 14 (spam detection after 3 rapid UserPromptSubmit events) — blocked by a known ConvertTo-Hashtable bug where PS 5.1 pipeline semantics unwrap single-element arrays. Tracked as card 8ny6qr.

Risks and limitations

  • The test harness extracts peon.ps1 by parsing the install.ps1 here-string. If the embedding format changes, Extract-PeonHookScript will need updating.
  • Scenario 14 (spam detection) is skipped, not passing. The underlying engine bug exists in production.
  • Path_rules test scenarios 4-7 are deferred — the matching engine hasn't been ported to peon.ps1 yet (card rd6fu4).

Deferred work

Card Description
8ny6qr Fix ConvertTo-Hashtable array corruption (unblocks Scenario 14)
rd6fu4 Port path_rules to peon.ps1 + test scenarios 4-7
d3c6b0 Remove duplicate deepagents structural tests from peon-adapters
n5uqeo Tighten security test assertion precision (VLC gain regex, exit code check)

muunkky added 30 commits March 13, 2026 18:45
…erride

Completes the config key migration across all components:
- Adapters (kilo, opencode): config templates use default_pack
- TypeScript plugins: PeonConfig interface uses default_pack with
  active_pack as optional legacy fallback
- install.ps1: all CLI commands and hook runtime read default_pack
  first with active_pack fallback, regex replacements write default_pack
- install.sh: test sound lookup uses default_pack with fallback
- hook-handle-use scripts: write session_override instead of agentskill
- Skills docs: updated terminology throughout
- Tests: updated assertions to match new key names, added legacy
  fallback test for TypeScript resolveActivePack
Reviewer approved commit 3f5a1f0. Routed executor close-out instructions
and 1 planner card (DRY install.ps1 pack resolution) for sprint SMARTPACK.
All acceptance criteria verified as pre-existing: fnmatch-based
path_rules matching in peon.sh, config.json template, override
hierarchy, CLI commands (bind/unbind/bindings), and 9 BATS tests.
peon.ps1 marked N/A (file does not exist in repo).
Verification-only card. All acceptance criteria confirmed as pre-existing
in peon.sh, config.json, and BATS tests. No blockers found.
When a path_rule matches the current working directory, `peon status`
now displays the matching rule (e.g., `path rule: */work/* -> glados`)
in addition to the total count of configured rules.
Remove synchronous MediaPlayer/PresentationCore from peon.ps1 hook and
win-play.ps1 to eliminate P0 deadlock caused by WPF dispatcher in
headless PowerShell processes.

- peon.ps1: replace inline audio block with Start-Process delegation
  to win-play.ps1 in a detached hidden window
- peon.ps1: add 8-second System.Timers.Timer self-timeout before any
  I/O as safety net against unforeseen blocking
- win-play.ps1: keep SoundPlayer for WAV, replace MediaPlayer with
  CLI player priority chain (ffplay -> mpv -> vlc) for non-WAV
- install.ps1: print ffmpeg recommendation post-install if ffplay
  not found on PATH
- Update Pester tests: assert zero MediaPlayer/PresentationCore refs,
  verify Start-Process delegation, Timer, and CLI player chain
Review 1 approved at commit 57964e9. Routed executor close-out
instructions and 1 backlog card (audio diagnostic logging) to planner.
Add write_state()/read_state() Python helpers (tempfile + os.replace)
and Write-StateAtomic/Read-StateWithRetry PowerShell functions
(PID-based temp + [System.IO.File]::Move). Replace all raw
json.dump/Set-Content state writes and json.load/Get-Content state
reads across peon.sh (main block + trainer blocks) and install.ps1
(embedded peon.ps1). Retry-on-read uses 50/100/200ms backoff with
graceful fallback to empty defaults on corruption. BATS tests added
for corrupted state recovery and concurrent Stop event safety.
Add Pack Selection Hierarchy table documenting the 5-layer override
system (session_override > path_rules > pack_rotation > default_pack >
hardcoded). Add Per-Project Pack Assignment section with bind/unbind CLI
examples and manual config. Add bind/unbind/bindings CLI commands to
Chinese README. Update llms.txt with hierarchy and bind/unbind context.
…tall.ps1

Consolidate the repeated default_pack -> active_pack -> "peon" fallback
chain into a single Get-ActivePack helper function. Replaces ~10 inline
expressions across both the installer script and the embedded peon.ps1
hook with calls to the helper. Also migrates the installer's initial
config creation from active_pack to default_pack, aligning with the
rename completed in peon.sh.
# Conflicts:
#	install.ps1
#	tests/adapters-windows.Tests.ps1
…ack refactor

Worktree merge for z0c9fd used --theirs which reverted HOOKBUG sprint
changes (atomic state, audio delegation, safety timer). Restored
install.ps1 from pre-merge state and manually applied Get-ActivePack
helper extraction. 204/204 Pester tests pass.
APPROVAL routed to executor for close-out. Two BACKLOG items
(Write-StateAtomic atomicity, ffplay install guidance) routed
to planner for card creation.
Approved at commit 0a67a57. Executor gets close-out instructions.
Planner gets 1 BACKLOG card (2 items: atomic state I/O hardening).
muunkky added 26 commits March 15, 2026 13:49
- opencode/kilo adapter tests: active_pack -> default_pack
- hook-handle-use test: agentskill -> session_override
Add tests/windows-setup.ps1 with reusable helper functions:
- Extract-PeonHookScript: extracts peon.ps1 from install.ps1 here-string
- New-PeonTestEnvironment: creates isolated temp dir with config, state,
  mock packs, and mock win-play.ps1 audio logger
- Invoke-PeonHook: pipes CESP JSON to peon.ps1 via Process API
- New-CespJson, Get-PeonState, Get-PeonConfig, Get-AudioLog helpers

Add tests/peon-engine.Tests.ps1 with 25 smoke tests validating the
harness infrastructure and core peon.ps1 functional behavior:
- Extraction produces valid PS syntax
- Test env creates all required files and accepts overrides
- SessionStart/Stop events play correct sounds
- Disabled config skips audio
- Mock win-play.ps1 logs calls without playing real audio

Update CI workflow to run both Pester test files.
Add clarifying comment in tests/windows-setup.ps1 explaining that
CLAUDE_PEON_DIR and PEON_TEST env vars exist for structural parity
with the BATS harness and are not consumed by peon.ps1.
Add tests/windows-setup.ps1 with reusable helper functions:
- Extract-PeonHookScript: extracts peon.ps1 from install.ps1 here-string
- New-PeonTestEnvironment: creates isolated temp dir with config, state,
  mock packs, and mock win-play.ps1 audio logger
- Invoke-PeonHook: pipes CESP JSON to peon.ps1 via Process API
- New-CespJson, Get-PeonState, Get-PeonConfig, Get-AudioLog helpers

Add tests/peon-engine.Tests.ps1 with 25 smoke tests validating the
harness infrastructure and core peon.ps1 functional behavior:
- Extraction produces valid PS syntax
- Test env creates all required files and accepts overrides
- SessionStart/Stop events play correct sounds
- Disabled config skips audio
- Mock win-play.ps1 logs calls without playing real audio

Update CI workflow to run both Pester test files.
16 integration tests covering:
- Pack name input validation (path traversal, shell injection, charset)
- Session ID sanitization (malicious IDs fallback to "default")
- Config/state mutation correctness (agentskill mode, pack_rotation)
- Hook mode vs CLI mode behavior (stdin JSON vs arg)
- win-play.ps1 WAV/MP3 branching (SoundPlayer vs CLI players)
- Volume clamping at boundaries (0.0, 1.0)
- Player priority chain (ffplay -> mpv -> vlc -> silent exit)
15 Pester tests covering the full pack selection override hierarchy:
- Default pack fallback (active_pack, empty fallback to "peon")
- Session override mode (per-session pack from state, agentskill alias)
- Session override fallback (unmatched session, missing pack cleanup)
- Default key for Cursor users without conversation_id
- Pack rotation (random selection from array, single-pack array)
- Edge cases (empty rotation, missing mode key, legacy string format)

path_rules tests are deferred as the feature is not yet implemented
in peon.ps1 (Windows) -- only exists in peon.sh (Unix).
New test file tests/peon-adapters.Tests.ps1 with 48 tests that actually
execute adapter scripts with controlled input and verify JSON output shape.

Category A (simple translators): codex, gemini, copilot, windsurf, kiro,
openclaw, deepagents -- event mapping verified via mock peon.ps1 stdin capture.

Category B (filesystem watchers): amp, antigravity, kimi -- pure functions
(Emit-Event, Process-WireLine) extracted and tested in isolation.

Category C (structural): deepagents.ps1 added to syntax validation and
ExecutionPolicy Bypass checks in adapters-windows.Tests.ps1.

Also adds edge case tests (missing peon.ps1, unknown events, no stdin)
and CESP JSON shape validation across all Category A adapters.
Implements all test scenarios from card 1dnbzv covering:
- Event routing: SessionStart, Stop, PermissionRequest, PostToolUseFailure,
  SubagentStart, Notification suppression, Cursor camelCase remap
- Config behavior: enabled:false, category toggles, volume passthrough,
  missing config resilience
- State management: Stop debounce, no-repeat sound selection, session TTL
  expiry, corrupted state recovery, empty stdin handling

Scenario 14 (spam detection) is skipped due to a production bug in
ConvertTo-Hashtable that corrupts prompt_timestamps arrays when reading
state back from JSON -- single-element arrays become hashtables, preventing
accumulation across invocations.
# Conflicts:
#	tests/peon-engine.Tests.ps1
#	tests/windows-setup.ps1
Adds the new functional adapter test file to the Pester Run.Path array
in test.yml so all 48 tests execute in CI on windows-latest.
Scenarios 1 and 7 asserted "agentskill" but hook-handle-use.ps1 sets
"session_override". Updated assertions and scenario 7 description to
match the actual source behavior. All 16 tests pass.
# Conflicts:
#	.github/workflows/test.yml
Item A: Replace brittle regex (?<=\d),(?=\d) with InvariantCulture
enforcement before ConvertTo-Json. The regex corrupted integer arrays
like [1,2,3] -> [1.2.3] on non-English locales. Now we save/restore
CurrentCulture around the serialization call.

Item B: Anchor here-string extraction on the unique marker comment
"# peon-ping hook for Claude Code" inside install.ps1. Previously the
regex hookScript = @'(.+?)'@ assumed exactly one here-string, which
would silently misextract if a second were added.

Card: WINTEST-xk4ymm
# Conflicts:
#	tests/windows-setup.ps1
Change $config.Run.Path from an explicit array of 5 test files to
"tests/" so Pester auto-discovers all *.Tests.ps1 files. This ensures
new test files are picked up without CI workflow edits.

windows-setup.ps1 and hookbug-integration.ps1 are not *.Tests.ps1
files, so Pester correctly ignores them.
- Step 2A-2D: 4 parallel test cards (event routing, adapters, security, packs)
- Step 2.5: harness hardening (locale serialization, extraction regex)
- Step 3: CI auto-discovery for all Pester test files
- Archived superseded card gtb6dm
- Umbrella card j30alo checkboxes updated
@vercel
Copy link

vercel bot commented Mar 16, 2026

@muunkky is attempting to deploy a commit to the Gary Sheng's projects Team on Vercel.

A member of the Team first needs to authorize it.

@muunkky muunkky marked this pull request as draft March 16, 2026 05:01
@muunkky
Copy link
Contributor Author

muunkky commented Mar 16, 2026

Superseded by a clean PR that excludes .gitban/ project management content.

@muunkky muunkky closed this Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant