docs: audit conditional-llm bypass justifications for pr #85 by petterlindstrom79 · Pull Request #86 · strale-io/strale

petterlindstrom79 · 2026-05-11T08:19:58Z

Summary

Follow-up audit to PR #85. Traces each of the 11 capabilities in `CONDITIONAL_LLM_CAPABILITIES` through its `known_answer.input` fixture to verify whether the bypass justification holds (i.e. the scheduled-test path does NOT invoke the Anthropic SDK).

Result: 9 CLEAN, 2 LEAKY, 0 AMBIGUOUS.

LEAKY: `us-company-data` (fixture is `"AAPL"` — ticker, not CIK; `findCik` regex fails → LLM fires), `website-to-company` (LLM is the primary extraction path, not a fallback; also chains into `norwegian-company-data` which triggers a second LLM call).
CLEAN: 8 country-data caps (numeric registry-code fixtures match regex, direct API path) + `container-track` (fixture `"test_value"` triggers invalid-format early-return).

Documentation hygiene found in 2 of the CLEAN caps (bypass comment doesn't match the actual code mechanism) — `brazilian-company-data` has unreachable SDK code; `container-track`'s comment claims a well-known carrier prefix but the fixture is a placeholder string.

Impact for T+48h Anthropic Console check

LEAKY caps' contribution: ~35K Haiku tokens/day, ~1.4% of pre-PR-#85 daily volume. Negligible relative to the main fix. Expected total residual ~10–32% (the wide range reflects uncertainty about `/v1/suggest` traffic — that's the next leverage point, not the LEAKY conditional caps).

Recommended next steps (separate PR, not this one)

Path A (recommended): Promote `us-company-data` and `website-to-company` to `ALWAYS_LLM_CAPABILITY_COSTS`, add a small startup-migration block (0065) mirroring PR #85's block 0064 pattern. Trivial.

Path B (fixture hygiene): Fix the `us-company-data` manifest to use a numeric CIK (`"0000320193"` for Apple) and refactor `website-to-company` to put LLM behind a structured-data-first fallback. More work, doesn't generalize.

Test plan

This is a read-only audit. No code touched. Review:

Confirm the 11-cap exclusion list count matches `CONDITIONAL_LLM_CAPABILITIES` in llm-capability-costs.ts.
Confirm the `us-company-data` fixture `"AAPL"` does NOT match `/^\d{1,10}$/` — proving the LLM path fires.
Confirm the `website-to-company` LLM call at line 103 is reached for any non-empty meta-extract result.
Decide Path A vs Path B. Open a follow-up PR.
Do not merge this PR. Audit findings need chat/Petter review before the LEAKY caps get promoted.

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Follow-up to audit PR #86. Closes the two LEAKY caps and ships the three adjacent doc/hygiene fixes that PR #86 surfaced. - website-to-company promoted to ALWAYS_LLM_CAPABILITY_COSTS (1¢). PR #86's trace showed llmExtractCompanyName fires on every real site URL — the bypass premise was wrong. New migration block 0065 bumps cost; CI gate updated. - us-company-data fixture swapped "AAPL" → "320193" (numeric CIK, Apple). Manifest updated; block 0065 also UPDATEs the prod test_suites row whose input was populated by onboard.ts at capability-creation time. Cap stays in CONDITIONAL_LLM_CAPABILITIES with corrected bypass comment. - brazilian-company-data: extractCompanyName + @anthropic-ai/sdk import removed. Function had zero callers anywhere in apps/api/src/ (verified by grep). Cap drops out of CONDITIONAL_LLM_CAPABILITIES naturally (no SDK import → CI gate doesn't apply). - container-track: bypass comment rewritten to match the actual invalid-format early-return mechanism (the fixture is "test_value", not a well-known carrier prefix). - norwegian-company-data manifest: health_check_input.org_number fixed from Swedish format (556703-7485) to Norwegian format (923609016, Equinor ASA). Block 0065 is idempotent (filter clauses + dual post-condition checks). 28 startup-migration + cost-map tests pass; both negative tests on the CI gate still fire. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…idge (pr a) (#88) Decouple billing data (`external_cost_cents`) from the scheduling signal that drives the hourly test scheduler. PR A introduces the new column and a derivation bridge; PR B (separately tracked) will force explicit declarations at INSERT sites and remove the bridge. The May 2026 Haiku token spike (PRs #84/#85/#86/#87) was structurally possible because `external_cost_cents = 0` did double duty as billing data and scheduling signal. A compound-PR pattern (cadence flip + deferred cost-bump) silently turned billing-data lag into a scheduling regression for 73 LLM caps for 7 days. Decoupling these concerns makes that class of failure impossible at the data model. Changes - Schema: add `test_suites.scheduled_testing_eligible BOOLEAN NOT NULL DEFAULT FALSE` to schema.ts. - Migration 0063: backfill `eligible = TRUE` where `cost = 0`. Preserves current dispatch behavior exactly. Post-condition DO block asserts parity between the two filters. - Scheduler: `findOverdueCapabilities`, `countOverdueCapabilities`, and `countPaidSkipped` swap to read the new column. - Diagnostic script `investigate-singapore.ts` swaps the scheduling-pool-proxy reader. - Startup block 0066 `runMigration0066_reconcileEligibilityFromCost`: re-derives eligibility from cost at every boot as the interim derivation bridge. Catches any INSERT site that lands a row without setting eligibility explicitly. PR B removes this block when INSERT sites are forced explicit. PR A intentionally does NOT touch the 12 INSERT call sites. The default FALSE + block 0066's `cost = 0 ⇒ eligible = TRUE` derivation means new free caps land at eligible = TRUE on next boot. Refines DEC-20260503-B without superseding. See DEC-20260511-B. Closing-steps rule walk: - Rule 4 (source-health integrity): the dispatch swap is the named carve-out from the structural follow-up (PR #85 explicit reference). - Rule 7 (bug-fix framework): not invoked. Structural prevention, not a bug fix. The May 2026 leak is contained. - Rules 8 + 9: hand-written Drizzle migration + schema.ts sync in the same commit. Per DEC-20260420-A. - Rule 11 (DEC supersession): does NOT supersede DEC-20260503-B. Refines. No sweep needed. - Rule 12 (audit follow-up tests): new code paths covered — dispatch swap by parity (backfill makes pre/post sets identical), block 0066 by post-condition DO block, observability swap by same parity. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two reference-content handoffs from earlier 2026-05-11 sessions. Per Rule G (handoff note hygiene, DEC-20260510-A), notes with substantive "Landed" / "Outcome" / runbook content are promoted; pure session narration is deleted. Promoted: - 2026-05-11-decouple-scheduled-testing-eligible-pr-a.md (PR #88 PR A shipped state — landed-list, decisions locked, follow-ups). - 2026-05-11-haiku-cost-leak-audit-contain-cleanup.md (4-PR incident arc: #84/#85/#86/#87 with outcomes, audits, decisions). Deleted (not in this commit; rm'd locally) the third 2026-05-11 note: - 2026-05-11-pr88-deploy-recovery-and-phase3-halt.md Pure session-progress narration. Phase 3 halt was resolved by PR #89 (merged 2026-05-11); content superseded by DEC-20260511-C + PR #89's handoff note (2026-05-11-in-ts-migrations-convention-pr88-phase3.md). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs: audit conditional-llm bypass justifications for pr #85

4a5c3f1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

petterlindstrom79 mentioned this pull request May 11, 2026

fix: close leaky-cap residual + bypass-set hygiene cleanup #87

Merged

6 tasks

petterlindstrom79 mentioned this pull request May 11, 2026

refactor: scheduled_testing_eligible column + derivation bridge (PR A) #88

Merged

7 tasks

petterlindstrom79 mentioned this pull request May 11, 2026

chore: promote 2 handoff notes from 2026-05-11 #90

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: audit conditional-llm bypass justifications for pr #85#86

docs: audit conditional-llm bypass justifications for pr #85#86
petterlindstrom79 wants to merge 1 commit into
mainfrom
audit/conditional-llm-bypass-may-2026

petterlindstrom79 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

petterlindstrom79 commented May 11, 2026

Summary

Impact for T+48h Anthropic Console check

Recommended next steps (separate PR, not this one)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant