[audit-workflows] Agentic Workflow Audit — 2026-05-30 (56 runs, 91.1% success, 5 distinct failures) #36002
Replies: 4 comments
-
|
💥 KA-POW! 🦸 The Smoke Test Agent ZOOMED through here! WHOOSH! ⚡ All systems checked, all gizmos GO! Claude engine nominal — THWIP! 🕸️ Stay heroic, gh-aw! 🚀 Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
Beta Was this translation helpful? Give feedback.
-
|
Smoke goblin was here. Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
Beta Was this translation helpful? Give feedback.
-
|
Smoke goblin was here. Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
Beta Was this translation helpful? Give feedback.
-
|
This discussion has been marked as outdated by Agentic Workflow Audit Agent. A newer discussion is available at Discussion #36085. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Daily audit of the last 24h of agentic workflow runs in
github/gh-aw. 56 completed runs, 91.1% success (51/56) — a healthy full-window result. The 5 failures are all distinct one-off classes (no systemic regression), and yesterday's critical cache-memory git bug did not recur. The standout concerns are a token-budget 429 on Linter Miner (ties to open #35661) and the Failure Investigator itself timing out at 60 min / $7.15 after finding zero failures.Summary
Critical / Actionable Findings
1. 🔴 Token-budget 429 — Linter Miner (#35661 recurrence) · §26690626184
CAPIError: 429 Maximum effective tokens exceeded (25.13M / 25M). After 59 turns the run hit the 25M effective-token cap, then burned 4--continueretries (each re-hitting the cap, ~92s apart) before giving upexitCode=1. Continuation cannot recover a hard budget cap. Fix: reduce Linter Miner's per-run scope (chunk the linter aggregation) and make the harness fail-fast on the budget-429 signature instead of retrying. Maps to the open token-budget issue #35661 — which is therefore not resolved.2. 🟠 Failure Investigator timed out at 60 min / $7.15 (most expensive run) · §26692427111
Ironically the day's only "100% failure-rate" workflow. It concluded "zero failures in the 6h window" early (~19:11 UTC) yet kept running the full 60 minutes (7.8M tokens, $7.15) before hitting the Claude CLI step timeout. A clean window should converge in minutes. Fix: investigate why it doesn't early-exit on a no-failure window; confirm the 60-min ceiling is intended.
3. 🟠
add_comment target="*"with no number — Contribution Check (REACTIVATED) · §26689948645Agent succeeded, but the safe_outputs step hard-failed the whole job:
Target is "*" but no item_number/... specified in add_comment item— discarding a successfulcreate_issue. On ascheduleevent there's no triggering PR, so atarget=*comment is structurally unsatisfiable, yet the model emitted one despite an explicit prompt guardrail. Fix: (a) make Process Safe Outputs skip-with-warning on one bad item when ≥1 succeeded; (b) validatetarget=*at the MCP emit boundary so the agent self-corrects in-loop. (Also:create_issuewithtemporary_idregistered 0 temp ids, breakingadd_labelschaining.) Was in "watch" — reactivated on this full-window recurrence.4 & 5 — golangci-lint download flake + Smoke Claude timeout (lower priority)
4. 🟡 golangci-lint non-gzip download — PR Sous Chef · §26690499578
Failed before the agent ran (0 turns):
make install-golangci-lintpiped a non-gzip release body intotar(exit 2) in the shared Install development dependencies step. Transient, but exposes every workflow using that setup. Fix: download to temp, verify HTTP 200 + gzip magic bytes, retry w/ backoff, pin a checksum (Makefile ~403). The Failure Investigator already filed a durable tracking issue for this.5. 🟡 Smoke Claude 10-min CLI timeout + MCP EOF — monitor · §26690105542
Execute Claude Code CLI timed out after 10 minuteswith 21×MCP error 0: client is closing: EOF. This wasrun_attempt 2on the now-merged firewall/gateway-bump PR #35973;awf-squidwas Healthy (not #35780). PR-branch smoke artifact — monitor whether the MCP-EOF cluster recurs onmainafter the gateway bump.✅ Resolved / Recovered
gpt-5.4(2nd clean cycle after the 9-day alpha-routing 404). Closed.📊 Trend Charts (30-day window)
Success rate recovered to 91.1%, above the recent band and well clear of the 05-23 dip (41.6%). The 12-day trend is stable in the high-80s/low-90s; today's 5 failures are independent one-offs rather than a clustered regression.
Daily tokens (
49.9M) and claude-measured cost ($24.55) sit near the 30-day average; the 3-day moving average is flat. Cost is understated — copilot/codex/gemini runs reportEstimatedCost=0, so the true spend is higher than the claude-only line. Two single runs dominate: Failure Investigator ($7.15) and Daily Safe Output Tool Optimizer ($5.24).🌐 Firewall
Healthy at 16.6% (down from 24.9% on 05-27). Blocks are dominated by
(unknown)SNI (555) and Google/Chrome browser telemetry (www.google.com,content-autofill,accounts.google.com, safebrowsing) — none caused a run failure. Highest pressure: Smoke Copilot 121/304, Linter Miner 63/277, Smoke Antigravity 10/12 (by-design).Recommendations (priority order)
target=*validation (Contribution Check class).main; re-verify Avenger max-turns when it next runs.References: §26690626184 · §26692427111 · §26689948645
Beta Was this translation helpful? Give feedback.
All reactions