Skip to content

docs: audit conditional-llm bypass justifications for pr #85#86

Open
petterlindstrom79 wants to merge 1 commit into
mainfrom
audit/conditional-llm-bypass-may-2026
Open

docs: audit conditional-llm bypass justifications for pr #85#86
petterlindstrom79 wants to merge 1 commit into
mainfrom
audit/conditional-llm-bypass-may-2026

Conversation

@petterlindstrom79
Copy link
Copy Markdown
Member

Summary

Follow-up audit to PR #85. Traces each of the 11 capabilities in `CONDITIONAL_LLM_CAPABILITIES` through its `known_answer.input` fixture to verify whether the bypass justification holds (i.e. the scheduled-test path does NOT invoke the Anthropic SDK).

Result: 9 CLEAN, 2 LEAKY, 0 AMBIGUOUS.

  • LEAKY: `us-company-data` (fixture is `"AAPL"` — ticker, not CIK; `findCik` regex fails → LLM fires), `website-to-company` (LLM is the primary extraction path, not a fallback; also chains into `norwegian-company-data` which triggers a second LLM call).
  • CLEAN: 8 country-data caps (numeric registry-code fixtures match regex, direct API path) + `container-track` (fixture `"test_value"` triggers invalid-format early-return).

Documentation hygiene found in 2 of the CLEAN caps (bypass comment doesn't match the actual code mechanism) — `brazilian-company-data` has unreachable SDK code; `container-track`'s comment claims a well-known carrier prefix but the fixture is a placeholder string.

Impact for T+48h Anthropic Console check

LEAKY caps' contribution: ~35K Haiku tokens/day, ~1.4% of pre-PR-#85 daily volume. Negligible relative to the main fix. Expected total residual ~10–32% (the wide range reflects uncertainty about `/v1/suggest` traffic — that's the next leverage point, not the LEAKY conditional caps).

Recommended next steps (separate PR, not this one)

Path A (recommended): Promote `us-company-data` and `website-to-company` to `ALWAYS_LLM_CAPABILITY_COSTS`, add a small startup-migration block (0065) mirroring PR #85's block 0064 pattern. Trivial.

Path B (fixture hygiene): Fix the `us-company-data` manifest to use a numeric CIK (`"0000320193"` for Apple) and refactor `website-to-company` to put LLM behind a structured-data-first fallback. More work, doesn't generalize.

Test plan

This is a read-only audit. No code touched. Review:

  • Confirm the 11-cap exclusion list count matches `CONDITIONAL_LLM_CAPABILITIES` in llm-capability-costs.ts.
  • Confirm the `us-company-data` fixture `"AAPL"` does NOT match `/^\d{1,10}$/` — proving the LLM path fires.
  • Confirm the `website-to-company` LLM call at line 103 is reached for any non-empty meta-extract result.
  • Decide Path A vs Path B. Open a follow-up PR.
  • Do not merge this PR. Audit findings need chat/Petter review before the LEAKY caps get promoted.

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
petterlindstrom79 added a commit that referenced this pull request May 11, 2026
Follow-up to audit PR #86. Closes the two LEAKY caps and ships the
three adjacent doc/hygiene fixes that PR #86 surfaced.

- website-to-company promoted to ALWAYS_LLM_CAPABILITY_COSTS (1¢).
  PR #86's trace showed llmExtractCompanyName fires on every real
  site URL — the bypass premise was wrong. New migration block 0065
  bumps cost; CI gate updated.
- us-company-data fixture swapped "AAPL" → "320193" (numeric CIK,
  Apple). Manifest updated; block 0065 also UPDATEs the prod
  test_suites row whose input was populated by onboard.ts at
  capability-creation time. Cap stays in CONDITIONAL_LLM_CAPABILITIES
  with corrected bypass comment.
- brazilian-company-data: extractCompanyName + @anthropic-ai/sdk
  import removed. Function had zero callers anywhere in apps/api/src/
  (verified by grep). Cap drops out of CONDITIONAL_LLM_CAPABILITIES
  naturally (no SDK import → CI gate doesn't apply).
- container-track: bypass comment rewritten to match the actual
  invalid-format early-return mechanism (the fixture is "test_value",
  not a well-known carrier prefix).
- norwegian-company-data manifest: health_check_input.org_number
  fixed from Swedish format (556703-7485) to Norwegian format
  (923609016, Equinor ASA).

Block 0065 is idempotent (filter clauses + dual post-condition
checks). 28 startup-migration + cost-map tests pass; both negative
tests on the CI gate still fire.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
petterlindstrom79 added a commit that referenced this pull request May 11, 2026
…idge (pr a) (#88)

Decouple billing data (`external_cost_cents`) from the scheduling signal
that drives the hourly test scheduler. PR A introduces the new column
and a derivation bridge; PR B (separately tracked) will force explicit
declarations at INSERT sites and remove the bridge.

The May 2026 Haiku token spike (PRs #84/#85/#86/#87) was structurally
possible because `external_cost_cents = 0` did double duty as billing
data and scheduling signal. A compound-PR pattern (cadence flip + deferred
cost-bump) silently turned billing-data lag into a scheduling regression
for 73 LLM caps for 7 days. Decoupling these concerns makes that class
of failure impossible at the data model.

Changes
- Schema: add `test_suites.scheduled_testing_eligible BOOLEAN NOT NULL
  DEFAULT FALSE` to schema.ts.
- Migration 0063: backfill `eligible = TRUE` where `cost = 0`. Preserves
  current dispatch behavior exactly. Post-condition DO block asserts
  parity between the two filters.
- Scheduler: `findOverdueCapabilities`, `countOverdueCapabilities`, and
  `countPaidSkipped` swap to read the new column.
- Diagnostic script `investigate-singapore.ts` swaps the
  scheduling-pool-proxy reader.
- Startup block 0066 `runMigration0066_reconcileEligibilityFromCost`:
  re-derives eligibility from cost at every boot as the interim
  derivation bridge. Catches any INSERT site that lands a row without
  setting eligibility explicitly. PR B removes this block when INSERT
  sites are forced explicit.

PR A intentionally does NOT touch the 12 INSERT call sites. The default
FALSE + block 0066's `cost = 0 ⇒ eligible = TRUE` derivation means new
free caps land at eligible = TRUE on next boot.

Refines DEC-20260503-B without superseding. See DEC-20260511-B.

Closing-steps rule walk:
- Rule 4 (source-health integrity): the dispatch swap is the named
  carve-out from the structural follow-up (PR #85 explicit reference).
- Rule 7 (bug-fix framework): not invoked. Structural prevention, not
  a bug fix. The May 2026 leak is contained.
- Rules 8 + 9: hand-written Drizzle migration + schema.ts sync in
  the same commit. Per DEC-20260420-A.
- Rule 11 (DEC supersession): does NOT supersede DEC-20260503-B.
  Refines. No sweep needed.
- Rule 12 (audit follow-up tests): new code paths covered — dispatch
  swap by parity (backfill makes pre/post sets identical), block 0066
  by post-condition DO block, observability swap by same parity.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
petterlindstrom79 added a commit that referenced this pull request May 11, 2026
Two reference-content handoffs from earlier 2026-05-11 sessions. Per
Rule G (handoff note hygiene, DEC-20260510-A), notes with substantive
"Landed" / "Outcome" / runbook content are promoted; pure session
narration is deleted.

Promoted:
- 2026-05-11-decouple-scheduled-testing-eligible-pr-a.md (PR #88 PR A
  shipped state — landed-list, decisions locked, follow-ups).
- 2026-05-11-haiku-cost-leak-audit-contain-cleanup.md (4-PR incident
  arc: #84/#85/#86/#87 with outcomes, audits, decisions).

Deleted (not in this commit; rm'd locally) the third 2026-05-11 note:
- 2026-05-11-pr88-deploy-recovery-and-phase3-halt.md
  Pure session-progress narration. Phase 3 halt was resolved by PR #89
  (merged 2026-05-11); content superseded by DEC-20260511-C + PR #89's
  handoff note (2026-05-11-in-ts-migrations-convention-pr88-phase3.md).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant