Rewrite: agentic failure-triage, self-healing & RCA engine (0.4)#13
Merged
Conversation
…sign Proposal, design, 12 capability specs and phased tasks for rewriting robotframework-heal as a pydantic-ai based failure-triage and RCA agent. Includes MiniMax/OpenRouter probe suite (experiments/minimax-probe): - root cause of small-model tool failures: forced tool_choice (fix: openai_supports_tool_choice_required=False, 311s -> 14s) - ModelRetry verification works in prompted mode on all reachable models - per-model capability matrix recorded in FINDINGS.md Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…g mapping Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ved output modes Encodes probe findings: MiniMax tool_choice fix, vLLM strict-stripping, prompted floor for unknown backends, tool gating by reliability tier. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…tion Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ted on MiniMax) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Ports simplify_dom, unique-selector generation, candidate predicates and fuzzy filtering from SelfHealing.utils; fixes nth-of-type identity bug and fuzz-median score/item misalignment. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Deterministic detectors (timing/locator-drift/overlay/viewport) resolve without LLM; triage agent fallback on silence; budget suppression and per-failure timeout; timing plugin heals by wait+rerun. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
TransactionRuntime: persistent healer loop, main-thread call marshalling, abandonment with grace. Spike (4/4 PASS) proves rerun, assignment, parallel async work and timeout unblock inside listener v3. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… rerun fallback Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… via slow server Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…iniMax preset fixed to prompted End-to-end via SelfHealing shim: locator drift healed (MiniMax 18.6s, gpt-4.1-nano 6.9s; greedy reuse 0.2s, no LLM), timing healed deterministically. NativeOutput proved unreliable under ModelRetry loops on MiniMax -> preset now resolves prompted; findings recorded. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…istory Verified live: timing atest produces events.jsonl, heal_report.html, summary.json and history.sqlite on close. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…rough), overlay dismissal Mobile specifics: page source covers only the current screen, so absence may mean off-screen; swipe search heals it and falls through to locator healing (single engine hop) when nothing is found. Locator proposals on viewport-limited drivers verify via swipe search before rejection. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…orm/assertion healing Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… RCA agent enrichment Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… tiers resolve_fix handles literal / variable / variable+suffix origins incl. imported resources; cross-file usage scan drives blast radius; unified patches verified git-appliable; in-place refused on dirty trees; listener enriches event proposals and writes heal.patch + healed copies at close. Validated live: gpt-4.1-nano heal produced a correct patch for the atest suite. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
src/SelfHealing reduced to the deprecation shim; tests/utest superseded by tests/unit; removed pyautogui/tinydb/cssify/parsimonious/litellm/opencv/lxml. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…, pipeline diagram, changelog Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… configured HealSettings loads the nearest .env (cwd upwards) via python-dotenv with override=True, exporting non-HEAL_ keys too. Unconfigured-model healing now suppresses with an actionable message instead of 'Engine error'. Verified end-to-end: legacy atest heals with config purely from a local .env. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- assertion parsing handles RF's "Text 'a' (str) should be 'b' (str)" shape - ambiguous locators (count>1, strict-mode violations) detected as drift; driver-level disambiguation (:visible >> nth=0 legacy parity) - overlay heal without dismiss control falls through to locator healing - argument-aware select verification: proposal must contain wanted options - richer validator feedback for multi-match rejections Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
GLOBAL listener scope shares fixed_locators across suites; the proactive swap in start_keyword now skips keywords inside expected-failure wrappers (Run Keyword And Return Status etc.), matching the end_keyword rule. Found via run_keywork_tests atest after a heal leaked from ait_llm. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ith findings Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Frame-safe healing (false-heal fix + pierced proposals), tiered locator selection (experiment-backed: -68% tokens, +27pts on 8B models), cross-run heal memory, SeleniumLibrary driver, self-growing eval corpus. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…reen Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…erence Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…cs/_refgen Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…tion Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…, MCP) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…stores green build) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…rkflow PRs validate with strict build; main deploys rolling 'latest'; release tags add a pinned version. mike verified locally (one version, default set). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…deploy flow The repository Pages toggle and live-URL verification are one-time manual steps (no repo-admin access from here) documented in docs/CONTRIBUTING.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…onsistency The docs taught 'Library heal.rf.HealListener' (verbose, never CI-tested) while all atests used the deprecated 'Library SelfHealing' shim. Add a short, idiomatic top-level 'Heal' module (Heal.py — a file, not a 'Heal/' dir, to avoid a case-insensitive-FS collision with the 'heal/' package) re-exporting the listener. Convert the 6 heal atests and all docs examples to 'Library Heal' so the public API is dogfooded and CI-covered; keep SelfHealing and heal.rf.HealListener working. Verified in a real RF run + new unit test. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…reference specs Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ec deltas Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…, harden .gitignore Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…test step - ci.yml ran 'invoke tests' (the legacy task) which executes the entire tests/atest/ dir — all live-LLM and external-site demo suites — against no configured model, so every seeded-failure test failed. Replace with the deterministic heal-utests + heal-atests (no LLM, no external sites). - heal-utests now writes results/pytest.xml so the Test Report step has input (it was failing on a missing file). - Remove the dead LLM_* env (superseded by HEAL_*); install --all-extras. - Fix a duplicate 'Force Tags' line in heal_dom_edge_cases.robot (invalid RF). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The deterministic PR CI alone doesn't exercise the core value (LLM-driven healing). Add a dedicated e2e workflow that runs the live-llm acceptance suites — locator drift, keyword-arg call-site fixing, Selenium, shadow DOM / iframes — on push to main, weekly, and manual dispatch (not on PRs: secrets are unavailable to fork PRs and cost tokens). - Reuses the existing 0.3 secrets (LLM_API_KEY/LLM_API_BASE/LLM_LOCATOR_MODEL, stripping the litellm 'openai/' prefix) with new HEAL_* overrides; skips cleanly when no key is configured. - Restructure heal-atests to tag-driven sets: deterministic 'heal-atest' (both timing suites) for PR CI, 'live-llm' for e2e. - Validated end-to-end: 8/8 live suites heal via Library Heal with a real model (shadow DOM, iframe pierce, keyword-arg trace, Selenium all pass). - Flagged in docs: the 0.3 grok model id 404s on OpenRouter — set a current HEAL_MODEL variable. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pre-checked all three models against the live MiniMax API: each resolves via heal doctor and heals the canonical fixture correctly (M2.5 827tok, M2.7 671tok, M3 727tok), and MiniMax-M2.5 heals through the full real-browser listener path. Add a matrix job to e2e.yml running the live-llm suites per model on schedule/manual dispatch (token-cost controlled); skips without the secret. Documented the required MINIMAX_API_KEY. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… before merge) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…only MiniMax sweep validated green on real runners (M2.5/M2.7/M3 all passed). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Per request, the MiniMax e2e matrix (M2.5/M2.7/M3 x all live suites) now runs on pull requests as well as push-main/schedule/dispatch — it skips cleanly on fork PRs without the secret. The generic live-heal job becomes opt-in (runs only when a HEAL_MODEL repo variable is set) so it never fails PRs on an unconfigured default. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…acts Add grouped configuration tables (model + per-role overrides, behaviour options, budgets, reporting) verified against the settings schema; document the canonical 'Library Heal' import and its alternatives; list the CLI commands and report artifacts; add the --live-llm dev command. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Markdown/docs/mkdocs changes no longer trigger the token-heavy 3-model live sweep on pull requests. Code changes still run it; scheduled and manual dispatch always run fully. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ground-up rewrite of robotframework-heal from a locator-only healer into a failure-triage, self-healing and root-cause-analysis engine built on pydantic-ai. Every change in this PR was developed spec-first (OpenSpec) and validated with real experiments against live model backends.
Highlights
locator-drift,timing,viewport,overlay,form-state,assertion-drift,unknown). Deterministic detectors run first; an LLM triage agent only when needed.heal doctor), not assumed. Small models work via a prompted-JSON floor.frame >>> innerselectors; an interaction-target blocklist fixes a demonstrated iframe false-heal.history.sqlite; repeat heals cost zero tokens.prefix${VAR}suffix/ user-keyword argument call-site tracing) with blast-radius safety; opt-in patch/in-place tiers.summary.jsonfor CI gates, cross-run history.healCLI (triage/report/apply/doctor/mcp/corpus/history), MCP server + agent skill, SeleniumLibrary support ([selenium]extra), Appium.Breaking changes
HEAL_*env vars (pydantic-settings);LLM_*no longer read.healpackage andhealconsole script.Library Heal. Back-compat preserved:Library SelfHealing(deprecated shim) andLibrary heal.rf.HealListenerboth still work.See
docs/reference/migration.mdfor the full 0.3 → 0.4 migration.Testing
invoke heal-utests.Library Heal(invoke heal-atests; locator suites via--live-llm).experiments/*/FINDINGS.md): the threading model (4/4 spike), small-model output modes, selection-vs-generation, frame piercing, Selenium primitives.Notes
openspec/specs/holds the deployed 19-capability baseline.docsworkflow to keep the published site current.SelfHealingshim to exercise the deprecation path.🤖 Generated with Claude Code