From 00a4b362c13466d71e93072b57e1add332bb4edd Mon Sep 17 00:00:00 2001
From: Pratyush Sharma <56130065+pratyush618@users.noreply.github.com>
Date: Tue, 10 Mar 2026 17:59:04 +0530
Subject: [PATCH 1/2] Add requesting-human-help skill for structured
 human-in-the-loop collaboration

Implements skill from issue #594: structured, evidence-driven requests
for capability limits and high-risk actions, with validation chain and
audit trail from request through human action to agent decision.
---
 skills/requesting-human-help/SKILL.md | 119 ++++++++++++++++++++++++++
 1 file changed, 119 insertions(+)
 create mode 100644 skills/requesting-human-help/SKILL.md

diff --git a/skills/requesting-human-help/SKILL.md b/skills/requesting-human-help/SKILL.md
new file mode 100644
index 000000000..c52040590
--- /dev/null
+++ b/skills/requesting-human-help/SKILL.md
@@ -0,0 +1,119 @@
+---
+name: requesting-human-help
+description: Use when blocked by capability limits (UI testing, local execution, VPN-only systems, MFA/captcha), or before irreversible/high-risk actions (deleting data, deploying to production, sending external messages, handling credentials) that require human judgment or approval
+---
+
+# Requesting Human Help
+
+## Overview
+
+Ad hoc help requests fail: they're inconsistent, lack context, and return unverifiable responses.
+
+**Core principle:** Turn human collaboration into a structured, evidence-driven, auditable request with explicit acceptance criteria.
+
+## When to Use
+
+**Capability/access boundaries:**
+- Testing UI on a real device or browser you cannot control
+- Running commands on a local machine or VPN-only system
+- Completing flows requiring MFA, CAPTCHA, or physical hardware
+- Subjective visual checks ("does this look right?")
+
+**High-risk / high-uncertainty steps:**
+- Deleting data, dropping tables, wiping storage
+- Deploying to production or staging environments
+- Sending external emails, Slack messages, or notifications
+- Handling or rotating sensitive credentials
+- Any irreversible action where being wrong is costly
+
+**Do NOT use for:**
+- Questions you can answer by reading files, docs, or web search
+- Low-risk, reversible local actions you can attempt yourself
+- Anything recoverable you should just try first
+
+## The Request Format
+
+Present every help request as a structured block. Include ALL fields — missing fields are the top cause of execution errors.
+
+```
+## Human Help Needed
+
+**Goal:** [One sentence: what outcome is needed]
+
+**Why I can't do this:** [Specific blocker — capability limit or risk reason]
+
+**Context:**
+- [Relevant state: what has already been done, what the system looks like]
+- [File paths, URLs, service names, environment]
+
+**Prerequisites before starting:**
+- [ ] [What must be true / set up before the human begins]
+
+**Steps:**
+1. [Explicit, numbered, unambiguous instruction]
+2. [Each step should be doable without guessing]
+3. ...
+
+**Expected output / evidence needed:**
+- [What to capture: screenshot, log output, command result, confirmation text]
+- [Format: paste text output, attach screenshot, confirm yes/no]
+
+**Acceptance criteria:**
+- [ ] [Specific, verifiable condition that means "this worked"]
+- [ ] [What distinguishes success from partial success]
+
+**If something goes wrong:** [Who to contact or how to escalate]
+```
+
+## Validating the Human Response
+
+When the human responds, verify before proceeding:
+
+```
+FOR EACH acceptance criterion:
+  - Is it addressed in the response?
+  - Is evidence provided (log, screenshot, output)?
+  - Does the evidence confirm the criterion?
+
+IF any criterion unmet:
+  → Request ONLY the missing piece (minimal follow-up)
+  → Do NOT re-ask everything
+
+IF all criteria met:
+  → State: "Confirmed: [criterion 1], [criterion 2]. Proceeding."
+  → Continue workflow
+```
+
+**Never accept "looks good" or "done" without artifacts.** A screenshot or pasted output is the minimum bar for irreversible actions.
+
+## The Audit Chain
+
+Every request creates a record:
+
+```
+REQUEST → [structured block above]
+HUMAN ACTION → [what they did]
+EVIDENCE → [artifact they returned]
+AGENT DECISION → [what you decided based on evidence]
+```
+
+Log this chain in your response so future debugging has a clear trail.
+
+## Red Flags — STOP
+
+- Attempting irreversible action without explicit human approval
+- Proceeding because human said "go ahead" with no evidence
+- Asking for help without prerequisites listed (human will get stuck)
+- Accepting partial confirmation and assuming the rest is fine
+- Re-asking the entire request when only one piece is missing
+
+## Common Mistakes
+
+| Mistake | Fix |
+|---------|-----|
+| Vague goal ("deploy the thing") | One-sentence outcome with system + environment |
+| Missing prerequisites | List what must be true before step 1 |
+| Ambiguous steps ("configure it") | Exact commands, menu paths, field values |
+| No evidence requested | Always specify what to capture and how |
+| Accepting "done" without artifact | Ask for the specific log or screenshot |
+| Over-escalating routine actions | Only escalate capability limits and irreversible risks |

From 57d42ccff9f93a76c68255d96bf1ada08fc1cf6f Mon Sep 17 00:00:00 2001
From: Pratyush Sharma <56130065+pratyush618@users.noreply.github.com>
Date: Tue, 10 Mar 2026 18:34:06 +0530
Subject: [PATCH 2/2] Address review: add involvement level field and fix
 fenced block language tags

- Add explicit involvement level (clarification/execution/approval) to
  the request template so the audit trail records what authority the
  human response grants
- Add 'text' language tag to all fenced blocks to satisfy MD040
---
 skills/requesting-human-help/SKILL.md | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/skills/requesting-human-help/SKILL.md b/skills/requesting-human-help/SKILL.md
index c52040590..ceaba26d4 100644
--- a/skills/requesting-human-help/SKILL.md
+++ b/skills/requesting-human-help/SKILL.md
@@ -35,11 +35,16 @@ Ad hoc help requests fail: they're inconsistent, lack context, and return unveri
 
 Present every help request as a structured block. Include ALL fields — missing fields are the top cause of execution errors.
 
-```
+```text
 ## Human Help Needed
 
 **Goal:** [One sentence: what outcome is needed]
 
+**Involvement level:** [clarification | execution | approval/takeover]
+- clarification: human answers a question so the agent can continue
+- execution: human performs steps the agent cannot
+- approval/takeover: human must approve or own the action before agent proceeds
+
 **Why I can't do this:** [Specific blocker — capability limit or risk reason]
 
 **Context:**
@@ -69,7 +74,7 @@ Present every help request as a structured block. Include ALL fields — missing
 
 When the human responds, verify before proceeding:
 
-```
+```text
 FOR EACH acceptance criterion:
   - Is it addressed in the response?
   - Is evidence provided (log, screenshot, output)?
@@ -90,7 +95,7 @@ IF all criteria met:
 
 Every request creates a record:
 
-```
+```text
 REQUEST → [structured block above]
 HUMAN ACTION → [what they did]
 EVIDENCE → [artifact they returned]