docs: audit conditional-llm bypass justifications for pr #85#86
Open
petterlindstrom79 wants to merge 1 commit into
Open
docs: audit conditional-llm bypass justifications for pr #85#86petterlindstrom79 wants to merge 1 commit into
petterlindstrom79 wants to merge 1 commit into
Conversation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
petterlindstrom79
added a commit
that referenced
this pull request
May 11, 2026
Follow-up to audit PR #86. Closes the two LEAKY caps and ships the three adjacent doc/hygiene fixes that PR #86 surfaced. - website-to-company promoted to ALWAYS_LLM_CAPABILITY_COSTS (1¢). PR #86's trace showed llmExtractCompanyName fires on every real site URL — the bypass premise was wrong. New migration block 0065 bumps cost; CI gate updated. - us-company-data fixture swapped "AAPL" → "320193" (numeric CIK, Apple). Manifest updated; block 0065 also UPDATEs the prod test_suites row whose input was populated by onboard.ts at capability-creation time. Cap stays in CONDITIONAL_LLM_CAPABILITIES with corrected bypass comment. - brazilian-company-data: extractCompanyName + @anthropic-ai/sdk import removed. Function had zero callers anywhere in apps/api/src/ (verified by grep). Cap drops out of CONDITIONAL_LLM_CAPABILITIES naturally (no SDK import → CI gate doesn't apply). - container-track: bypass comment rewritten to match the actual invalid-format early-return mechanism (the fixture is "test_value", not a well-known carrier prefix). - norwegian-company-data manifest: health_check_input.org_number fixed from Swedish format (556703-7485) to Norwegian format (923609016, Equinor ASA). Block 0065 is idempotent (filter clauses + dual post-condition checks). 28 startup-migration + cost-map tests pass; both negative tests on the CI gate still fire. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7 tasks
petterlindstrom79
added a commit
that referenced
this pull request
May 11, 2026
…idge (pr a) (#88) Decouple billing data (`external_cost_cents`) from the scheduling signal that drives the hourly test scheduler. PR A introduces the new column and a derivation bridge; PR B (separately tracked) will force explicit declarations at INSERT sites and remove the bridge. The May 2026 Haiku token spike (PRs #84/#85/#86/#87) was structurally possible because `external_cost_cents = 0` did double duty as billing data and scheduling signal. A compound-PR pattern (cadence flip + deferred cost-bump) silently turned billing-data lag into a scheduling regression for 73 LLM caps for 7 days. Decoupling these concerns makes that class of failure impossible at the data model. Changes - Schema: add `test_suites.scheduled_testing_eligible BOOLEAN NOT NULL DEFAULT FALSE` to schema.ts. - Migration 0063: backfill `eligible = TRUE` where `cost = 0`. Preserves current dispatch behavior exactly. Post-condition DO block asserts parity between the two filters. - Scheduler: `findOverdueCapabilities`, `countOverdueCapabilities`, and `countPaidSkipped` swap to read the new column. - Diagnostic script `investigate-singapore.ts` swaps the scheduling-pool-proxy reader. - Startup block 0066 `runMigration0066_reconcileEligibilityFromCost`: re-derives eligibility from cost at every boot as the interim derivation bridge. Catches any INSERT site that lands a row without setting eligibility explicitly. PR B removes this block when INSERT sites are forced explicit. PR A intentionally does NOT touch the 12 INSERT call sites. The default FALSE + block 0066's `cost = 0 ⇒ eligible = TRUE` derivation means new free caps land at eligible = TRUE on next boot. Refines DEC-20260503-B without superseding. See DEC-20260511-B. Closing-steps rule walk: - Rule 4 (source-health integrity): the dispatch swap is the named carve-out from the structural follow-up (PR #85 explicit reference). - Rule 7 (bug-fix framework): not invoked. Structural prevention, not a bug fix. The May 2026 leak is contained. - Rules 8 + 9: hand-written Drizzle migration + schema.ts sync in the same commit. Per DEC-20260420-A. - Rule 11 (DEC supersession): does NOT supersede DEC-20260503-B. Refines. No sweep needed. - Rule 12 (audit follow-up tests): new code paths covered — dispatch swap by parity (backfill makes pre/post sets identical), block 0066 by post-condition DO block, observability swap by same parity. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
petterlindstrom79
added a commit
that referenced
this pull request
May 11, 2026
Two reference-content handoffs from earlier 2026-05-11 sessions. Per Rule G (handoff note hygiene, DEC-20260510-A), notes with substantive "Landed" / "Outcome" / runbook content are promoted; pure session narration is deleted. Promoted: - 2026-05-11-decouple-scheduled-testing-eligible-pr-a.md (PR #88 PR A shipped state — landed-list, decisions locked, follow-ups). - 2026-05-11-haiku-cost-leak-audit-contain-cleanup.md (4-PR incident arc: #84/#85/#86/#87 with outcomes, audits, decisions). Deleted (not in this commit; rm'd locally) the third 2026-05-11 note: - 2026-05-11-pr88-deploy-recovery-and-phase3-halt.md Pure session-progress narration. Phase 3 halt was resolved by PR #89 (merged 2026-05-11); content superseded by DEC-20260511-C + PR #89's handoff note (2026-05-11-in-ts-migrations-convention-pr88-phase3.md). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up audit to PR #85. Traces each of the 11 capabilities in `CONDITIONAL_LLM_CAPABILITIES` through its `known_answer.input` fixture to verify whether the bypass justification holds (i.e. the scheduled-test path does NOT invoke the Anthropic SDK).
Result: 9 CLEAN, 2 LEAKY, 0 AMBIGUOUS.
Documentation hygiene found in 2 of the CLEAN caps (bypass comment doesn't match the actual code mechanism) — `brazilian-company-data` has unreachable SDK code; `container-track`'s comment claims a well-known carrier prefix but the fixture is a placeholder string.
Impact for T+48h Anthropic Console check
LEAKY caps' contribution: ~35K Haiku tokens/day, ~1.4% of pre-PR-#85 daily volume. Negligible relative to the main fix. Expected total residual ~10–32% (the wide range reflects uncertainty about `/v1/suggest` traffic — that's the next leverage point, not the LEAKY conditional caps).
Recommended next steps (separate PR, not this one)
Path A (recommended): Promote `us-company-data` and `website-to-company` to `ALWAYS_LLM_CAPABILITY_COSTS`, add a small startup-migration block (0065) mirroring PR #85's block 0064 pattern. Trivial.
Path B (fixture hygiene): Fix the `us-company-data` manifest to use a numeric CIK (`"0000320193"` for Apple) and refactor `website-to-company` to put LLM behind a structured-data-first fallback. More work, doesn't generalize.
Test plan
This is a read-only audit. No code touched. Review:
🤖 Generated with Claude Code