Rewrite: agentic failure-triage, self-healing & RCA engine (0.4) by manykarim · Pull Request #13 · manykarim/robotframework-heal

manykarim · 2026-06-15T15:34:02Z

Ground-up rewrite of robotframework-heal from a locator-only healer into a failure-triage, self-healing and root-cause-analysis engine built on pydantic-ai. Every change in this PR was developed spec-first (OpenSpec) and validated with real experiments against live model backends.

Highlights

Failure taxonomy — 7 classes (locator-drift, timing, viewport, overlay, form-state, assertion-drift, unknown). Deterministic detectors run first; an LLM triage agent only when needed.
Tiered locator healing — deterministic candidates → LLM index-pick → full-DOM generation fallback. Experiment-measured ~65–70% fewer tokens at equal accuracy, +27 accuracy points on 8B-class models.
Verified, never guessed — every proposed fix is checked against the live page before the keyword reruns; verification lives in output validators so it works on every model output mode.
Capability-tiered models — any OpenAI-compatible endpoint (vLLM, Ollama, LiteLLM, MiniMax, OpenRouter) or a pydantic-ai provider string; capability is probed (heal doctor), not assumed. Small models work via a prompted-JSON floor.
Frames & shadow DOM — open shadow roots and same-origin iframes heal via frame >>> inner selectors; an interaction-target blocklist fixes a demonstrated iframe false-heal.
Cross-run heal memory — known fixes warm-start from history.sqlite; repeat heals cost zero tokens.
Fix engine — read-only healed copies + word-highlighted HTML diffs by default (sources never touched); AST-based origin resolution (literal / variable / prefix${VAR}suffix / user-keyword argument call-site tracing) with blast-radius safety; opt-in patch/in-place tiers.
Reporting — crash-safe JSONL run store, self-contained HTML dashboard, summary.json for CI gates, cross-run history.
Surfaces — RF listener, heal CLI (triage/report/apply/doctor/mcp/corpus/history), MCP server + agent skill, SeleniumLibrary support ([selenium] extra), Appium.
Comprehensive docs site — Material for MkDocs, Diátaxis structure, config/CLI reference auto-generated from code with a drift guard, live at the GitHub Pages URL.

Breaking changes

Configuration moves to HEAL_* env vars (pydantic-settings); LLM_* no longer read.
Packaging migrates to PEP 621 + uv (hatchling); new heal package and heal console script.
New canonical listener import Library Heal. Back-compat preserved: Library SelfHealing (deprecated shim) and Library heal.rf.HealListener both still work.

See docs/reference/migration.md for the full 0.3 → 0.4 migration.

Testing

172 unit tests (no LLM, no browser) green via invoke heal-utests.
Acceptance tests heal on a real browser through Library Heal (invoke heal-atests; locator suites via --live-llm).
Replay/eval harness over a 60-fixture corpus grades healing quality per model tier.
Every architectural assumption is backed by a recorded experiment (experiments/*/FINDINGS.md): the threading model (4/4 spike), small-model output modes, selection-vs-generation, frame piercing, Selenium primitives.

Notes

All four OpenSpec changes are archived; openspec/specs/ holds the deployed 19-capability baseline.
Merging triggers the new docs workflow to keep the published site current.
Legacy demo atests intentionally remain on the SelfHealing shim to exercise the deprecation path.

🤖 Generated with Claude Code

…sign Proposal, design, 12 capability specs and phased tasks for rewriting robotframework-heal as a pydantic-ai based failure-triage and RCA agent. Includes MiniMax/OpenRouter probe suite (experiments/minimax-probe): - root cause of small-model tool failures: forced tool_choice (fix: openai_supports_tool_choice_required=False, 311s -> 14s) - ModelRetry verification works in prompted mode on all reachable models - per-model capability matrix recorded in FINDINGS.md Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…g mapping Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ved output modes Encodes probe findings: MiniMax tool_choice fix, vLLM strict-stripping, prompted floor for unknown backends, tool gating by reliability tier. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…tion Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ted on MiniMax) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Ports simplify_dom, unique-selector generation, candidate predicates and fuzzy filtering from SelfHealing.utils; fixes nth-of-type identity bug and fuzz-median score/item misalignment. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Deterministic detectors (timing/locator-drift/overlay/viewport) resolve without LLM; triage agent fallback on silence; budget suppression and per-failure timeout; timing plugin heals by wait+rerun. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

TransactionRuntime: persistent healer loop, main-thread call marshalling, abandonment with grace. Spike (4/4 PASS) proves rerun, assignment, parallel async work and timeout unblock inside listener v3. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… rerun fallback Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… via slow server Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…iniMax preset fixed to prompted End-to-end via SelfHealing shim: locator drift healed (MiniMax 18.6s, gpt-4.1-nano 6.9s; greedy reuse 0.2s, no LLM), timing healed deterministically. NativeOutput proved unreliable under ModelRetry loops on MiniMax -> preset now resolves prompted; findings recorded. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…istory Verified live: timing atest produces events.jsonl, heal_report.html, summary.json and history.sqlite on close. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…rough), overlay dismissal Mobile specifics: page source covers only the current screen, so absence may mean off-screen; swipe search heals it and falls through to locator healing (single engine hop) when nothing is found. Locator proposals on viewport-limited drivers verify via swipe search before rejection. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…orm/assertion healing Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… RCA agent enrichment Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… tiers resolve_fix handles literal / variable / variable+suffix origins incl. imported resources; cross-file usage scan drives blast radius; unified patches verified git-appliable; in-place refused on dirty trees; listener enriches event proposals and writes heal.patch + healed copies at close. Validated live: gpt-4.1-nano heal produced a correct patch for the atest suite. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

src/SelfHealing reduced to the deprecation shim; tests/utest superseded by tests/unit; removed pyautogui/tinydb/cssify/parsimonious/litellm/opencv/lxml. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…, pipeline diagram, changelog Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… configured HealSettings loads the nearest .env (cwd upwards) via python-dotenv with override=True, exporting non-HEAL_ keys too. Unconfigured-model healing now suppresses with an actionable message instead of 'Engine error'. Verified end-to-end: legacy atest heals with config purely from a local .env. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

- assertion parsing handles RF's "Text 'a' (str) should be 'b' (str)" shape - ambiguous locators (count>1, strict-mode violations) detected as drift; driver-level disambiguation (:visible >> nth=0 legacy parity) - overlay heal without dismiss control falls through to locator healing - argument-aware select verification: proposal must contain wanted options - richer validator feedback for multi-match rejections Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

GLOBAL listener scope shares fixed_locators across suites; the proactive swap in start_keyword now skips keywords inside expected-failure wrappers (Run Keyword And Return Status etc.), matching the end_keyword rule. Found via run_keywork_tests atest after a heal leaked from ait_llm. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ith findings Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Frame-safe healing (false-heal fix + pierced proposals), tiered locator selection (experiment-backed: -68% tokens, +27pts on 8B models), cross-run heal memory, SeleniumLibrary driver, self-growing eval corpus. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…reen Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…erence Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…cs/_refgen Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…tion Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…, MCP) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…stores green build) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…rkflow PRs validate with strict build; main deploys rolling 'latest'; release tags add a pinned version. mike verified locally (one version, default set). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…deploy flow The repository Pages toggle and live-URL verification are one-time manual steps (no repo-admin access from here) documented in docs/CONTRIBUTING.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…onsistency The docs taught 'Library heal.rf.HealListener' (verbose, never CI-tested) while all atests used the deprecated 'Library SelfHealing' shim. Add a short, idiomatic top-level 'Heal' module (Heal.py — a file, not a 'Heal/' dir, to avoid a case-insensitive-FS collision with the 'heal/' package) re-exporting the listener. Convert the 6 heal atests and all docs examples to 'Library Heal' so the public API is dogfooded and CI-covered; keep SelfHealing and heal.rf.HealListener working. Verified in a real RF run + new unit test. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…reference specs Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ec deltas Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…, harden .gitignore Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…test step - ci.yml ran 'invoke tests' (the legacy task) which executes the entire tests/atest/ dir — all live-LLM and external-site demo suites — against no configured model, so every seeded-failure test failed. Replace with the deterministic heal-utests + heal-atests (no LLM, no external sites). - heal-utests now writes results/pytest.xml so the Test Report step has input (it was failing on a missing file). - Remove the dead LLM_* env (superseded by HEAL_*); install --all-extras. - Fix a duplicate 'Force Tags' line in heal_dom_edge_cases.robot (invalid RF). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The deterministic PR CI alone doesn't exercise the core value (LLM-driven healing). Add a dedicated e2e workflow that runs the live-llm acceptance suites — locator drift, keyword-arg call-site fixing, Selenium, shadow DOM / iframes — on push to main, weekly, and manual dispatch (not on PRs: secrets are unavailable to fork PRs and cost tokens). - Reuses the existing 0.3 secrets (LLM_API_KEY/LLM_API_BASE/LLM_LOCATOR_MODEL, stripping the litellm 'openai/' prefix) with new HEAL_* overrides; skips cleanly when no key is configured. - Restructure heal-atests to tag-driven sets: deterministic 'heal-atest' (both timing suites) for PR CI, 'live-llm' for e2e. - Validated end-to-end: 8/8 live suites heal via Library Heal with a real model (shadow DOM, iframe pierce, keyword-arg trace, Selenium all pass). - Flagged in docs: the 0.3 grok model id 404s on OpenRouter — set a current HEAL_MODEL variable. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Pre-checked all three models against the live MiniMax API: each resolves via heal doctor and heals the canonical fixture correctly (M2.5 827tok, M2.7 671tok, M3 727tok), and MiniMax-M2.5 heals through the full real-browser listener path. Add a matrix job to e2e.yml running the live-llm suites per model on schedule/manual dispatch (token-cost controlled); skips without the secret. Documented the required MINIMAX_API_KEY. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… before merge) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…only MiniMax sweep validated green on real runners (M2.5/M2.7/M3 all passed). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Per request, the MiniMax e2e matrix (M2.5/M2.7/M3 x all live suites) now runs on pull requests as well as push-main/schedule/dispatch — it skips cleanly on fork PRs without the secret. The generic live-heal job becomes opt-in (runs only when a HEAL_MODEL repo variable is set) so it never fails PRs on an unconfigured default. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…acts Add grouped configuration tables (model + per-role overrides, behaviour options, budgets, reporting) verified against the settings schema; document the canonical 'Library Heal' import and its alternatives; list the CLI commands and report artifacts; add the --live-llm dev command. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Markdown/docs/mkdocs changes no longer trigger the token-heavy 3-model live sweep on pull requests. Code changes still run it; scheduled and manual dispatch always run fully. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

manykarim and others added 30 commits June 10, 2026 17:21

Task 1.1: Migrate to PEP 621 pyproject with uv and hatchling

c9b361f

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 1.2: Create heal package skeleton with import-boundary test

199ed5d

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 1.3: HealSettings with per-role model resolution and legacy kwar…

a091a11

…g mapping Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 1.4: Core schemas with JSONL round-trip and austerity guard tests

8c3ff51

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 2.5: RunLedger with per-transaction budgets and run-wide degrada…

a8963ac

…tion Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 2.6: doctor probe library; fix preset profile merge (live-valida…

6437e4a

…ted on MiniMax) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 3.2: lazy cost-tagged evidence collectors and git context

058a6bd

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 4.3: locator-drift healing with validator-verified proposals and…

6b7f020

… rerun fallback Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Tasks 4.1+4.2: HealListener and SelfHealing deprecation shim

1d4cc5e

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 4.5 (assets): heal atest suites — locator drift via shim, timing…

ffe70ab

… via slow server Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Mark tasks 4.5+4.6 complete

7dcd6d8

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Phase 5: run store, HTML dashboard, summary/GHA annotations, SQLite h…

80d4868

…istory Verified live: timing atest produces events.jsonl, heal_report.html, summary.json and history.sqlite on close. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 6.4: vision probe — all reference backends pass; gate open for f…

a02688f

…orm/assertion healing Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Tasks 6.5-6.7: form diagnosis, assertion healing with semantic guard,…

06e958a

… RCA agent enrichment Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Phase 8: heal CLI, MCP server, agent skill, replay/eval harness

58ea19a

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 9.1: delete superseded legacy modules and tests; prune dependencies

5944060

src/SelfHealing reduced to the deprecation shim; tests/utest superseded by tests/unit; removed pyautogui/tinydb/cssify/parsimonious/litellm/opencv/lxml. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Tasks 9.2+9.3: README/docs rewrite, config reference, migration table…

b02231a

…, pipeline diagram, changelog Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Explore: shadow DOM/iframe edge-case and selection-mode experiments w…

20767a8

…ith findings Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

manykarim and others added 29 commits June 15, 2026 15:29

Task 1.1: docs toolchain — Material theme config, plugins, mike provider

3f90b70

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 1.2: Diataxis nav skeleton and placeholder pages; strict build g…

6522395

…reen Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 2.1: describe all per-role settings fields for the generated ref…

14e91f1

…erence Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Tasks 2.2+2.3: generate config+CLI reference from code with drift guard

842cedb

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 2.4: unit-test the reference generator; extract pure logic to do…

55b3a1b

…cs/_refgen Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Phase 3: reference content — failure classes, drivers, reports, migra…

4dbcab2

…tion Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 4.1: getting-started tutorial

1b85b0b

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 4.2: model provider how-to (tabbed per backend)

1eb66bc

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 4.3: Selenium and Appium how-to guides

eb32f77

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 4.4: workflow how-to guides (CI gating, fixing files, warm start…

4c48ad6

…, MCP) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Tasks 5.1+5.2: failure-taxonomy and tiered-locator explanation pages

9234e28

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Task 5.3: model-tiers, threading, and RCA/fixes explanation pages (re…

c5def84

…stores green build) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Phase 6: diagrams, real screenshots, What-you-get page, landing page

024322b

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Tasks 7.1+7.2: mike versioning initialized and GitHub Pages deploy wo…

228ac47

…rkflow PRs validate with strict build; main deploys rolling 'latest'; release tags add a pinned version. mike verified locally (one version, default set). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

docs: Pages already active on gh-pages — drop the one-time-setup note

9adccdd

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Archive comprehensive-docs-site; sync documentation-site + generated-…

164c15b

…reference specs Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Archive healed-files-diff-report; sync fix-engine + healing-report sp…

16c05eb

…ec deltas Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Clean PR: untrack .claude local tooling, drop generated run artifacts…

2204032

…, harden .gitignore Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

TEMP: run MiniMax e2e sweep on this PR to validate the secret (revert…

98ecf6b

… before merge) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

TEMP: enable minimax job on PR event (validate sweep)

b27d005

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Revert temporary PR trigger: e2e back to schedule/dispatch/push-main …

4aaaf0e

…only MiniMax sweep validated green on real runners (M2.5/M2.7/M3 all passed). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

manykarim merged commit 56752b2 into main Jun 15, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite: agentic failure-triage, self-healing & RCA engine (0.4)#13

Rewrite: agentic failure-triage, self-healing & RCA engine (0.4)#13
manykarim merged 81 commits into
mainfrom
rewrite/agentic-heal

manykarim commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

manykarim commented Jun 15, 2026

Highlights

Breaking changes

Testing

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant