AI pull request reviews with any OpenAI- or Anthropic-compatible model — cloud or self-hosted.
Point it at your llama.cpp box or your Anthropic key. Either way, every PR gets a real review.
Quick start · How it works · Inputs · Recipes · Troubleshooting
The action gathers PR metadata, diff context, linked issue context from PR-closing references, linked sources, optional evidence provider output, optional tool harness output, image digest provenance, basic repository impact/history, and an optional standards file such as CLAUDE.md. It returns a structured verdict and markdown review body, and it can publish the result as a sticky comment or a native GitHub review.
- 🏠 Local-model-first — works with ollama, llama.cpp, vLLM, LiteLLM, or any OpenAI/Anthropic-compatible endpoint, with a cloud fallback if you want one
- 🧭 Deterministic PR classification — rule-based risk flags and required checklists keep small models focused and honest
- ⚡ Fast/smart model routing — boring PRs go to a cheap model, scary ones escalate to a smarter one automatically
- 🔍 Structured findings — severity-tagged findings, optional line-anchored inline comments, and a severity-gated verdict policy
- 💸 Token-saving by design — unchanged-diff skip, incremental re-reviews, and carry-forward of unresolved findings
- 🛡️ Safe by default — approvals off, fork enrichment off, read-only tool allowlists, secret redaction, link sanitization
- 🧰 Extensible — repo-defined evidence providers, a bounded read-only tool harness, repo-local rules (
AGENTS.md/CLAUDE.md) and prompt overrides
name: AI PR Review
on:
pull_request:
types: [opened, reopened, synchronize, ready_for_review]
permissions:
contents: read
pull-requests: write
jobs:
review:
if: ${{ !github.event.pull_request.draft }}
runs-on: self-hosted
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
ref: ${{ github.event.pull_request.head.sha }}
- uses: misospace/pr-reviewer-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ai_base_url: http://llama-server.internal:8080/v1
ai_model: qwen3-32b
publish_review_comment: "true"Requirements: the repository under review is already checked out; the runner has gh, jq, curl, git, and python3; the workflow runs on pull_request events (or passes explicit repo and pr_number inputs).
- How it works
- Platform support
- Review pipeline features
- Inputs
- Outputs
- Usage recipes
- Publishing & verdicts
- Routing & escalation
- Token-saving with incremental reviews
- Local model troubleshooting
- Notes
- Validation
- Version pinning and releases
- Security
flowchart LR
A[Precheck<br/>diff fingerprint] -->|unchanged| Z[Skip review 💤]
A -->|changed| B[Wait for CI<br/><i>optional</i>]
B --> C[Collect context<br/>diff · issues · sources · evidence · tools]
C --> D[Classify PR<br/>rule-based risk flags]
D --> E[Route & call model<br/>primary / smart / fallback]
E --> F[Validate & enforce<br/>required checks · findings · carry-forward]
F --> G[Publish<br/>comment / native review]
What it supports:
| ✅ Self-hosted OpenAI-compatible endpoints | ✅ Native Anthropic-compatible /messages endpoints |
| ✅ Cloud OpenAI- or Anthropic-compatible subscriptions | ✅ Optional fallback model/endpoint |
| ✅ Evidence providers for repo-specific checks | ✅ Read-only tool harness (single-round or iterative planning) |
| ✅ Managed PR comment publishing | ✅ Automatic skip when the effective PR diff is unchanged |
✅ Linked issue ingestion (Fixes #123, Closes owner/repo#456) |
✅ Repo-provided rules via CLAUDE.md, AGENTS.md, or a custom file |
| ✅ Upstream link sanitizer for published reviews | ✅ Incremental re-reviews with carried-forward findings |
The action works on GitHub and Forgejo (1.4.x). Set platform: auto (default) to detect automatically from GITHUB_SERVER_URL and FORGEJO_API_URL, or set explicitly to forgejo / github.
| Feature | GitHub | Forgejo |
|---|---|---|
| PR diff, files, metadata | ✅ Full | ✅ Full (REST backend) |
| Managed sticky comment | ✅ Full | ✅ Full |
Native review comments (review_comment) |
✅ Full | |
Native review verdicts (review_verdict) |
✅ Full | |
| Cleanup: dismiss stale reviews | ✅ Full | ✅ Full (REST) |
| Cleanup: minimizeComment (hide outdated) | ✅ Full | ❌ Skipped (no GraphQL) |
Thread resolution (resolveReviewThread) |
✅ Full | ❌ Skipped (no GraphQL) |
Thread follow-up replies (in_reply_to) |
✅ Full | ❌ Skipped — suppression-file dedup only |
| CI status check polling | ✅ Full | ✅ Commit-status polling (Forgejo REST) |
| Evidence providers | ✅ Full | ✅ Full |
| Tool harness | ✅ Full | ✅ Full |
| Incremental reviews + carry-forward | ✅ Full | ✅ Full |
| Fast/smart model routing | ✅ Full | ✅ Full |
Note: On Forgejo, features requiring GitHub's GraphQL API (thread resolution, review minimization, thread follow-up replies) are skipped with a clear log line. The core review pipeline and all REST-based features work fully.
Before invoking the AI model, the action runs a deterministic classification step that analyzes changed file paths, diff content, and linked issue context to produce structured metadata about the PR. This helps smaller/weaker reviewer models stay focused and reduces the chance of being misled by irrelevant context.
The classification is purely rule-based — no model calls are involved. It uses pattern matching on file paths, diff content, and linked issue metadata to determine the PR type and associated risk flags.
Classification output (injected into the review corpus):
| Field | Description |
|---|---|
pr_kind |
One of: renovate_digest_only, dependency_upgrade, app_code, k8s_manifest, auth_changes, public_route_changes, file_serving_changes, path_handling_changes, secret_handling_changes, db_or_migration_changes |
risk_flags |
Detected risk indicators such as linked_security_issue, linked_audit_issue, linked_priority_p0, linked_priority_p1, file_serving_changes, path_handling_changes, auth_changes, secret_handling_changes |
changed_files_summary |
List of changed file paths (truncated to 50) |
linked_issue_labels |
Labels from linked issues when available |
must_check |
Explicit checklist items derived from the classification (e.g., "review auth flow for regression" for auth_changes) |
Default required checks per risk class (must_check is the union of the checks for the pr_kind and every detected risk flag — a PR classified as app_code that still trips the auth_changes flag gets the auth checklist):
| Risk class | Required checks |
|---|---|
renovate_digest_only |
verify no functional changes beyond lockfile hashes |
dependency_upgrade |
breaking API changes in updated dependencies; run full test suite after upgrade |
k8s_manifest |
validate manifest against target cluster version; resource quota / limit changes |
auth_changes |
review auth flow for regression; session token handling |
public_route_changes |
route access controls; unintended public endpoints |
file_serving_changes |
file path sanitization; directory traversal |
path_handling_changes |
path traversal; edge-case paths (null bytes, symlinks) |
secret_handling_changes |
secrets not logged/exposed; secret rotation impact |
db_or_migration_changes |
migration data-loss risk; test on a copy of production schema |
linked_security_issue / linked_audit_issue / linked_priority_p0/p1 |
explicitly address the linked issue / verify thoroughly |
These checklists exist to keep weaker local models honest on high-risk PRs: the items are injected into the model's instructions ("address EACH of these"), and the review is then validated against them.
After the model returns, the action deterministically checks whether review_markdown actually discussed each must_check item (shallow keyword matching — it catches reviews that never mentioned a required check, not incorrect discussion). Controlled by validate_required_checks (auto = validate when must_check is non-empty) and required_check_validation_mode:
warn(default): an Unaddressed required checks section listing the missing items is appended to the published review, so a human sees exactly what the model skipped. The verdict is not changed.fail: additionally forces arequest_changesverdict.metadata_only: records the result without touching the published review — for downstream automation.
The result is exposed as the required_checks output (complete / incomplete / none), written to the run's step summary, and recorded in the managed metadata marker for future runs. Low-risk PRs (empty must_check) produce no validation noise.
Before publishing, the action runs scripts/sanitize_review_markdown.py on the review markdown to neutralize upstream GitHub references (PR URLs, issue URLs, commit URLs, compare URLs, cross-repo owner/repo#123 references, and bare #123 references). This prevents GitHub from auto-linking them into the reviewed repository, which would create notification noise and misleading linkbacks to unrelated projects. Sanitization is documented as P0 hygiene in issue #132.
Only three inputs are required: github_token, ai_base_url, and ai_model. Everything else has a sensible default. Inputs are grouped by topic below — expand the sections you need.
Core — token, repo, PR targeting
| Input | Description | Required | Default |
|---|---|---|---|
github_token |
GitHub token for PR and API access | Yes | - |
repo |
Repository in owner/name format |
No | current repository |
pr_number |
Pull request number | No | current pull_request number |
Primary model — endpoint, format, sampling, structured output
| Input | Description | Required | Default |
|---|---|---|---|
ai_base_url |
Base URL of the primary AI API | Yes | - |
ai_api_format |
Primary API request/response format: openai or anthropic |
No | openai |
ai_model |
Model name for the primary analysis pass | Yes | - |
ai_api_key |
Optional API key for the primary AI endpoint. OpenAI format sends Authorization: Bearer; Anthropic format sends x-api-key |
No | "" |
ai_max_tokens |
Maximum completion tokens for primary and fallback final review calls. Required by Anthropic-compatible APIs. Reasoning models (those that emit a thinking channel, e.g. Gemma) need this headroom — too low a cap is spent on reasoning, leaving empty content (finish_reason=length) so the verdict JSON fails to parse and the review needlessly escalates; raise to 16000+ for verbose reasoners |
No | 8192 |
ai_temperature |
Sampling temperature for the review model. Empty string omits the field (some newer cloud models reject non-default temperature) | No | 0.1 |
ai_response_format |
Structured-output mode for OpenAI-compatible endpoints (incl. LiteLLM): off, json_object, or json_schema (enforces the verdict/review_markdown schema). Ignored for anthropic. Improves reliability with smaller local models |
No | off |
ai_tokens_param |
Token-limit field name for OpenAI-compatible requests: max_tokens or max_completion_tokens (newer OpenAI reasoning models). Ignored for anthropic |
No | max_tokens |
anthropic_version |
anthropic-version header used for Anthropic-compatible requests |
No | 2023-06-01 |
Fallback & failure handling — fallback endpoint, retries, failure behavior
| Input | Description | Required | Default |
|---|---|---|---|
ai_fallback_base_url |
Optional fallback AI API base URL | No | "" |
ai_fallback_api_format |
Fallback API request/response format; defaults to ai_api_format when blank |
No | "" |
ai_fallback_model |
Optional fallback model name | No | "" |
ai_fallback_api_key |
Optional API key for the fallback AI endpoint | No | "" |
ai_primary_retries |
Number of retries for the primary model | No | 8 |
ai_primary_retry_delay_sec |
Delay between retries in seconds | No | 15 |
on_model_failure |
Behavior when primary and fallback models fail: fail (fail the step) or notice (post a visible request_changes notice explaining the review could not run — never auto-approves) |
No | fail |
Verdicts & findings — verdict policy, inline comments, required-check validation
| Input | Description | Required | Default |
|---|---|---|---|
verdict_policy |
How the final verdict is decided: model (the model's own verdict) or findings_severity_gated (derived from structured findings: request_changes iff any blocker finding; falls back to the model verdict when no findings). Enforcement settings still apply afterwards |
No | model |
inline_findings |
Attach diff-anchorable structured findings as native line-anchored review comments in review_comment/review_verdict modes. Ignored for comment mode |
No | false |
inline_findings_max |
Maximum inline review comments per review when inline_findings=true |
No | 20 |
validate_required_checks |
Validate the final review against the classifier's must_check items: auto (when must_check is non-empty), true, or false |
No | auto |
required_check_validation_mode |
Action on unaddressed required checks: warn (append a section to the review), fail (also force request_changes), or metadata_only |
No | warn |
Routing & escalation — primary/smart model split and escalation triggers
| Input | Description | Required | Default |
|---|---|---|---|
review_routing_mode |
Route reviews between the primary and smart models from the classification: off (existing primary/fallback behavior) or auto |
No | off |
ai_primary_model |
Model for the primary route in auto mode; defaults to ai_model |
No | "" |
ai_primary_base_url |
Base URL for the primary route model; defaults to ai_base_url |
No | "" |
ai_primary_api_format |
API format for the primary route model; defaults to ai_api_format |
No | "" |
ai_primary_api_key |
API key for the primary route model; defaults to ai_api_key |
No | "" |
ai_fast_model |
Deprecated alias for ai_primary_model (route renamed fast → primary) |
No | "" |
ai_fast_base_url |
Deprecated alias for ai_primary_base_url |
No | "" |
ai_fast_api_format |
Deprecated alias for ai_primary_api_format |
No | "" |
ai_fast_api_key |
Deprecated alias for ai_primary_api_key |
No | "" |
ai_smart_model |
Smart model for high-risk reviews in auto mode; defaults to ai_fallback_model |
No | "" |
ai_smart_base_url |
Base URL for the smart model; defaults to ai_fallback_base_url |
No | "" |
ai_smart_api_format |
API format for the smart model; defaults to ai_fallback_api_format, then ai_api_format |
No | "" |
ai_smart_api_key |
API key for the smart model; defaults to ai_fallback_api_key |
No | "" |
escalate_on_risk_flags |
Comma-separated pr_kind/risk_flag names that route to the smart model in auto mode |
No | security/priority/auth/route/file-serving/path/secret/db list |
escalate_on_incomplete_required_checks |
Escalate primary-route reviews with unaddressed required checks to the smart model (auto mode) |
No | true |
escalate_on_fast_request_changes |
Escalate primary-route reviews whose verdict is request_changes (auto mode) |
No | true |
escalate_on_fast_low_confidence |
Escalate low-confidence primary-route reviews (short relative to the diff, or populated Unknowns section) (auto mode) |
No | true |
escalate_on_tool_or_evidence_blockers |
Escalate when evidence blockers exist or every executed tool request failed (auto mode) |
No | true |
escalate_on_tool_planning_failure |
Escalate when the tool-harness planning call failed (auto mode). Off by default: a planning failure degrades the review to no-tools, it does not signal risk |
No | false |
escalate_on_dirty_baseline |
Escalate incremental reviews whose baseline review found issues (auto mode) |
No | true |
Publishing — comment vs. native review, approval guardrails, cleanup
| Input | Description | Required | Default |
|---|---|---|---|
publish_review_comment |
Publish or update a managed PR comment | No | false |
publish_mode |
Publish mode for the review verdict: comment (sticky PR comment, default), review_comment (non-blocking native PR review comment), review_verdict (native approve/request_changes). Requires pull-requests: write for review_comment and review_verdict |
No | comment |
allow_approve |
If true and publish_mode=review_verdict, the model's approve verdict can be submitted as a native approval. Defaults to false — approval is blocked unless explicitly enabled. WARNING: native approvals can affect branch protection rules and automerge pipelines. | No | false |
approve_forks |
If true and publish_mode=review_verdict with allow_approve=true, native approvals are also allowed for cross-repository (fork) PRs. Defaults to false — fork PRs are blocked from approval even when allow_approve is set. | No | false |
cleanup_previous_native_reviews |
Mark previous managed native PR reviews as outdated/superseded before publishing a new native review. Accepted values: auto (default, enables cleanup for review_comment and review_verdict modes), true, or false. Cleanup only targets reviews created by this action carrying the managed marker. Dismissal of old approval/request-changes reviews is attempted when permissions allow but is secondary to visual cleanup. |
No | auto |
comment_marker |
HTML marker for the managed PR comment | No | <!-- ai-pr-reviewer --> |
Prompt & standards — system prompt overrides and repo-local rules
| Input | Description | Required | Default |
|---|---|---|---|
system_prompt |
Optional system prompt override | No | bundled prompt |
system_prompt_file |
File in the reviewed repo to use as the full system prompt | No | "" |
standards_file |
Explicit standards file path; takes priority over candidates | No | "" |
standards_file_candidates |
Candidate files checked in order; first found is used | No | AGENTS.md,agents.md,CLAUDE.md,claude.md,.github/ai-review-rules.md,.github/ai-review-rules.txt |
Context & enrichment budgets — context window sizing, enrichment time limits
| Input | Description | Required | Default |
|---|---|---|---|
context_limit_mode |
Context budget mode: normal (140k/70k/220k), low (80k/40k/120k), minimal (40k/20k/60k) |
No | normal |
model_context_tokens |
The model's real context window in tokens (e.g. 8192, 32768). When set, corpus/diff/file byte budgets are derived from it (reserving ai_max_tokens for output) instead of context_limit_mode. Recommended for local models. Empty uses context_limit_mode |
No | "" |
enrichment_budget_sec |
Maximum seconds to spend on enrichment (linked source fetching, release metadata, ghcr.io lookups). Exceeding the budget stops further enrichment. | No | 60 |
image_digest_budget_sec |
Maximum seconds to spend on image digest provenance lookups (registry tokens, manifests, revision compares). 0 disables the budget. | No | 60 |
allowed_source_hosts |
Comma-separated allowlist for linked URL fetching | No | github.com,api.github.com,gitlab.com,registry.terraform.io,artifacthub.io |
Evidence providers — repo-defined check commands
| Input | Description | Required | Default |
|---|---|---|---|
evidence_providers_file |
Optional JSON file in the reviewed repo defining evidence provider commands | No | "" |
evidence_provider_timeout_sec |
Default timeout in seconds for each evidence provider command | No | 30 |
evidence_provider_max_output_bytes |
Max stdout or stderr bytes captured per provider command | No | 20000 |
evidence_provider_parallelism |
Max evidence provider commands run concurrently (set 1 to force serial execution) |
No | 4 |
evidence_blocker_enforcement |
Force request_changes when any provider reports blocker severity |
No | false |
evidence_enable_for_forks |
Allow evidence providers on cross-repository PRs | No | false |
Tool harness — read-only model-planned evidence gathering
| Input | Description | Required | Default |
|---|---|---|---|
tool_mode |
Tool harness mode: off, plan_execute_once, plan_execute_loop, or native_loop |
No | off |
tool_max_requests |
Maximum tool requests executed in one harness run (total across rounds in loop mode) | No | 4 |
tool_max_rounds |
Maximum planning rounds for tool_mode=plan_execute_loop; for native_loop, up to twice this (capped at 8) since a round is one model turn |
No | 3 |
tool_loop_wall_clock_sec |
Wall-clock ceiling in seconds for the whole tool_mode=native_loop exchange. Ignored for other modes |
No | 120 |
tool_loop_summarize |
When true, native_loop folds the oldest tool results into a model-generated evidence digest once the conversation outgrows its context budget, instead of blunt-truncating them (costs one extra model call per compaction). Off = truncation. Ignored for other modes |
No | false |
tool_loop_summarize_max_tokens |
Maximum completion tokens for each result-summarization call when tool_loop_summarize is enabled |
No | 512 |
tool_evidence_memory |
Carry the evidence a native_loop review gathers across incremental reviews of the same PR: a compact digest of what it read/fetched is stored in the metadata marker and reused by the next incremental review (re-verifying only what the delta touched) instead of re-gathering. On by default; false to disable. No effect on full reviews or non-native modes |
No | true |
tool_planning_timeout_sec |
Timeout in seconds for tool harness planning model call | No | 60 |
tool_planning_max_context_bytes |
Maximum corpus bytes passed to planning | No | 50000 |
tool_planning_max_tokens |
Maximum completion tokens for tool harness planning call | No | 400 |
tool_max_response_bytes |
Maximum bytes captured from each tool response | No | 12000 |
tool_allowed_gh_api_repos |
Comma-separated owner/repo allowlist for gh_api; use * to allow any repo endpoint still permitted by the tool path allowlist (empty = current repo only) |
No | "" |
tool_request_timeout_sec |
Timeout in seconds for each tool execution request | No | 20 |
search_url |
Search-engine endpoint (e.g. a SearXNG /search URL) that enables the read-only web_search tool in the native tool loop. When set, the model can search for a page and then web_fetch the best result; empty leaves web_search un-advertised. The model supplies only the query — the host is fixed by this setting. Subject to the same fork gating as the rest of the tool harness |
No | "" |
tool_max_search_results |
Maximum results returned per web_search call |
No | 5 |
tool_failure_enforcement |
Force request_changes when tool harness planning fails |
No | false |
tool_min_successful_requests |
Minimum successful tool requests required when tool_failure_enforcement=true |
No | 0 |
tool_enable_for_forks |
Allow tool harness on cross-repository PRs | No | false |
tool_mcp_servers |
Allowlist of read-only MCP servers for tool_mode: native_loop, as a newline/comma list of name=url. Read-verb tools are advertised as mcp__<name>__<tool>; write-verb tools are refused. Empty = off. Fork-gated like the rest of the harness |
No | "" |
tool_mcp_token |
Optional bearer token sent to every configured MCP server | No | "" |
Timeouts & streaming — request/connect timeouts, streaming toggles
| Input | Description | Required | Default |
|---|---|---|---|
ai_request_timeout_sec |
Timeout in seconds for the primary model API request (curl --max-time) |
No | 300 |
ai_connect_timeout_sec |
Timeout in seconds for the primary model API connection (curl --connect-timeout) |
No | 30 |
ai_fallback_request_timeout_sec |
Timeout in seconds for the fallback model API request (curl --max-time). Defaults to ai_request_timeout_sec when blank. |
No | "" |
ai_fallback_connect_timeout_sec |
Timeout in seconds for the fallback model API connection (curl --connect-timeout). Defaults to ai_connect_timeout_sec when blank. |
No | "" |
ai_stream |
If true, use streaming responses to avoid timeouts behind proxies with short read timeouts (e.g. Cloudflare 100s edge timer) | No | "true" |
ai_fallback_stream |
If set, overrides ai_stream for the fallback model; defaults to ai_stream value when blank | No | "" |
Review scope & CI gating — incremental reviews, diff skip, waiting for CI
| Input | Description | Required | Default |
|---|---|---|---|
review_scope |
Controls whether the action reviews the full PR or only changes since the last managed review. Accepted values: auto (default, full on first run, incremental on later safe updates), full (always full review), incremental (delta review, falls back to full if prior metadata unavailable) |
No | auto |
platform |
Target hosting platform for API capability gating and backend selection. auto (default) detects from GITHUB_SERVER_URL and FORGEJO_API_URL: non-github.com hosts or FORGEJO_API_URL set resolves to forgejo; otherwise github. Set forgejo or github explicitly to override auto-detection. On Forgejo, features requiring GitHub GraphQL (thread resolution, review minimization) degrade gracefully with a log line; the REST backend handles core PR operations. Linked-source enrichment always targets github.com. |
No | auto |
forgejo_api_url |
Base URL for the Forgejo REST backend. Optional on Forgejo Actions runners when github.server_url is the Forgejo instance; set it when running from another host or when GITHUB_SERVER_URL is unavailable. |
No | "" |
forgejo_token |
Optional Forgejo API token. Defaults to github_token when blank; set it when the token used for GitHub-compatible operations is not valid for the Forgejo REST API. |
No | "" |
skip_if_diff_unchanged |
Skip the LLM review when the current PR patch matches the last managed review fingerprint | No | true |
force_review |
Bypass the diff-unchanged guard and review even when the fingerprint matches. Set automatically by the rereview_label; also drivable from workflow_dispatch/repository_dispatch. When the last managed review was not clean, a forced re-review runs at full scope to re-establish a clean baseline (recovers a PR wedged in Request changes) |
No | false |
rereview_label |
Label that, when added to a PR, forces a fresh review (add labeled to the workflow's pull_request types to enable). Self-authorizing — only write/triage can label. The label is removed after, so re-adding re-triggers |
No | ai-review |
ci_status_check |
Wait for all CI checks to reach a terminal state before starting the AI review. Default false — immediate review. | No | false |
ci_timeout_sec |
Maximum seconds to wait for CI checks to complete when ci_status_check=true. | No | 300 |
ci_interval_sec |
Seconds between CI status polls when ci_status_check=true. | No | 15 |
ci_skip_on_timeout |
If true, proceed with review after timeout instead of failing. | No | true |
| Output | Description |
|---|---|
verdict |
approve or request_changes |
verdict_source |
model, findings (per verdict_policy), or carry_forward (a carried-forward blocker survived an incremental review) |
required_checks |
Required-check validation status: complete, incomplete, or none (validation did not run) |
review_route |
Model route used: legacy (routing off), primary, smart, or escalated |
escalation_reason |
Comma-separated escalation trigger names when review_route is escalated (empty otherwise) |
findings |
Normalized structured findings as a JSON array ([] when the model produced none) |
review_markdown |
Full markdown review body |
analysis_engine |
Model and endpoint that produced the final result, annotated with how it was chosen: — fast route, — routed smart (risk match: …), — escalated (…), or — fallback (primary failed). Unannotated when routing is off |
should_review |
true when a new LLM review was run |
skip_reason |
Skip reason such as diff-unchanged |
diff_fingerprint |
Stable fingerprint of the current PR patch |
ci_status_skipped |
true if CI status check was skipped, false if it completed |
ci_status_final |
Final CI state (success/failure) when ci_status_check completed |
effective_review_scope |
Effective scope used: full or incremental |
previous_head_sha |
Previous head SHA when scope is incremental |
baseline_clean |
Whether the full-review baseline was clean (for verdict safety) |
name: AI PR Review
on:
pull_request:
types: [opened, reopened, synchronize, ready_for_review]
permissions:
contents: read
pull-requests: write
jobs:
review:
if: ${{ !github.event.pull_request.draft }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
ref: ${{ github.event.pull_request.head.sha }}
- uses: misospace/pr-reviewer-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ai_base_url: https://api.openai.com/v1
ai_model: gpt-4.1
ai_api_key: ${{ secrets.OPENAI_API_KEY }}
standards_file: CLAUDE.md
publish_review_comment: "true"- uses: misospace/pr-reviewer-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ai_base_url: https://api.anthropic.com/v1
ai_api_format: anthropic
ai_model: claude-sonnet-4-5
ai_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
ai_max_tokens: "8192"
publish_review_comment: "true"When ai_api_format: anthropic is set, the action posts to /messages, sends the x-api-key and anthropic-version headers, and parses only Anthropic text content blocks. Non-text blocks such as thinking are ignored so private reasoning is not copied into PR comments.
- uses: misospace/pr-reviewer-action@v1
id: review
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ai_base_url: http://llama-server.internal:8080/v1
ai_model: qwen3-32b
ai_fallback_base_url: https://api.openai.com/v1
ai_fallback_api_format: openai
ai_fallback_model: gpt-4.1-mini
ai_fallback_api_key: ${{ secrets.OPENAI_API_KEY }}Set ci_status_check: true to wait for all CI checks to reach a terminal state before starting the AI review. This ensures the review considers the final CI results rather than running against in-progress checks.
The per-check outcomes (name, status, conclusion) are also folded into the review corpus as a CI Check Results section, so the model cites real test/lint results instead of reporting them as "not verifiable". The reviewer never runs your test suite itself — that would mean executing untrusted PR code with the bot's token — it consumes the results your CI already produced in its own sandbox.
- uses: misospace/pr-reviewer-action@v1
id: review
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ai_base_url: http://llama-server.internal:8080/v1
ai_model: qwen3-32b
ci_status_check: "true"
ci_timeout_sec: "300"
ci_interval_sec: "15"
ci_skip_on_timeout: "true"When ci_skip_on_timeout: true (the default), the action proceeds with the review after ci_timeout_sec even if checks are still running. Set it to false to fail the action on timeout instead. The ci_status_skipped and ci_status_final outputs indicate whether the CI wait completed and what the final state was.
By default an unchanged PR is not re-reviewed — the precheck skips when the diff + config fingerprint matches the last managed review (skip_if_diff_unchanged). You'll want to re-review anyway when something the fingerprint can't see has changed: a model repointed behind a stable alias, updated standards, or a flaky first pass.
Add the ai-review label to a PR and it re-reviews. That's it. Enable it with a one-word addition to your existing review workflow:
on:
pull_request:
types: [opened, reopened, synchronize, ready_for_review, labeled] # ← add `labeled`If your workflow uses
concurrencywithcancel-in-progress: true(common), givelabeledevents their own group — otherwise an auto-applied label (e.g. Renovate adding labels at PR creation) spawns alabeledrun that cancels the in-flightopenedreview, and the PR goes unreviewed:concurrency: group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}${{ github.event.action == 'labeled' && format('-label-{0}', github.run_id) || '' }} cancel-in-progress: true
Nothing else changes. The action detects the label event itself: if the added label is the rereview_label (default ai-review) it forces a fresh review and then removes the label so adding it again re-triggers; any other label is ignored. There's no second workflow, no command parsing, and no checkout/authorization dance — labels are inherently maintainer-only (only users with write/triage permission can apply them), and the trigger rides pull_request, so there's no privileged-checkout exposure.
Rename the trigger label with the rereview_label input if ai-review collides with an existing label. This repository's own ai-pr-review.yaml uses exactly this wiring.
For non-interactive callers, force_review: "true" bypasses the guard directly — useful from a workflow_dispatch input or a repository_dispatch payload (both already require a write-scoped token to fire).
Recovering a "stuck" PR. With publish_mode: review_verdict, an incremental review can only approve on top of a trusted clean full baseline (verdict safety). So once any full review records issues, later pushes — which are incremental — cannot approve until a clean full review re-establishes the baseline. A forced re-review (the ai-review label, workflow_dispatch, or repository_dispatch) now handles this automatically: when the last managed review was not clean, the forced re-review runs at full scope so it re-examines the whole PR and can clear the baseline. (A forced re-review on an already-clean baseline stays incremental — there's nothing to reset.) So a PR wedged in Request changes after its findings are fixed recovers with a single re-label.
- uses: misospace/pr-reviewer-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ai_base_url: http://llama-server.internal:8080/v1
ai_model: qwen3-32b
evidence_providers_file: .github/pr-review-providers.json
evidence_provider_timeout_sec: "30"
evidence_provider_max_output_bytes: "20000"
evidence_blocker_enforcement: "true"Example provider config (.github/pr-review-providers.json):
{
"providers": [
{
"id": "version-compat",
"command": ["python3", "scripts/check_version_compat.py"],
"timeout_sec": 45,
"max_output_bytes": 15000
}
]
}Provider commands can print plain text, or JSON with fields such as severity and findings. If evidence_blocker_enforcement is true, any provider output with blocker severity forces a request_changes verdict.
Evidence providers execute in the context of the checked-out pull request code. The command runs from the repository root with full access to the PR's working tree, environment variables, and installed tools. This means:
- Provider scripts reference files relative to the PR branch being reviewed, not the base branch.
- Commands have access to all repository files staged or committed in the PR.
- Environment variables set by the GitHub Actions runner (such as
GITHUB_TOKEN,HOME, etc.) are available to provider commands.
Argv arrays are strongly recommended over shell strings. When command is an array like ["python3", "scripts/check.py"], the action invokes the process directly via subprocess.run with no shell interpretation. When command is a string, it runs through bash -lc, which introduces shell injection risks if any part of the command or environment is influenced by untrusted PR content.
Evidence providers are disabled by default on cross-repository pull requests (evidence_enable_for_forks=false). This prevents forked PRs from executing arbitrary scripts defined in the destination repository's config. Set evidence_enable_for_forks: "true" only when you trust fork contributors or run reviews in an isolated environment.
- uses: misospace/pr-reviewer-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ai_base_url: http://llama-server.internal:8080/v1
ai_model: qwen3-32b
tool_mode: plan_execute_once
tool_max_requests: "4"
tool_planning_timeout_sec: "30"
tool_planning_max_context_bytes: "50000"
tool_planning_max_tokens: "400"
tool_max_response_bytes: "12000"
tool_allowed_gh_api_repos: "siderolabs/kubelet,siderolabs/talos"
tool_request_timeout_sec: "20"
tool_failure_enforcement: "true"
tool_min_successful_requests: "1"In plan_execute_once mode, the model first plans up to tool_max_requests read-only evidence calls, then the action executes those calls and appends the results to the final review corpus.
In plan_execute_loop mode the planning iterates: after each round's tools run, the planner sees the results (clearly fenced as untrusted data) and may request follow-ups — "the diff touches auth/session.go → read it → it calls validateToken → grep for other callers". The loop stops when the planner replies {"requests": []} (or DONE), the tool_max_requests total budget is spent, tool_max_rounds is reached, or a later-round response fails to parse (the review proceeds with the evidence gathered so far — a planning hiccup never fails the review). Requests identical to ones already executed are deduplicated so weak models cannot burn the budget re-fetching the same evidence. Each round is an extra planning model call, so latency grows with depth; the executor, allowlists, and size caps are identical to single-round mode.
In native_loop mode the reviewing model uses its provider's native tool-calling API (OpenAI tool_calls / Anthropic tool_use) instead of a JSON-in-prose planner. The tool schemas are sent with the request and the model holds the conversation: it issues a call, sees the result appended as a real tool-result turn, and decides the next call from what came back — so a chain like "read the machineconfig → extract the platform version → fetch that version's published compatibility matrix" is expressed natively, with each hop conditioned on the previous one's content rather than guessed up front. The loop stops when the model replies with no further tool calls, the tool_max_requests total budget is spent, the round cap is hit (native_loop allows up to 2 × tool_max_rounds, capped at 8, since one model turn is one round), or tool_loop_wall_clock_sec elapses. Malformed arguments and duplicate calls are answered with a corrective tool-result the model can react to (duplicates don't cost budget); a transport error mid-loop keeps the evidence already gathered. When the conversation outgrows its context budget the oldest tool results are compacted before the next turn — blunt-truncated by default, or (with tool_loop_summarize) folded into a model-generated evidence digest that keeps the salient facts in fewer tokens while the newest results stay verbatim. A model that never emits a tool call degrades automatically to plan_execute_loop, so native_loop is safe to enable on a model whose tool-calling support is uncertain. Non-streaming requests only. The executor, allowlists, size caps, and tool catalog are identical across all three modes. Supported tools are:
gh_apiwith a repo-local path likerepos/owner/repo/pulls/123/filesread_filefor files inside the checked-out repositoryweb_fetchfor allowlisted hosts fromallowed_source_hostsgit_grepfor local repository content searchrun_commandfor a fixed catalog of named read-only commands
run_command never executes model-supplied shell text. The planner may only pick a command name from the built-in catalog, and the action runs the corresponding fixed argv (no shell involved):
| Command name | Executes |
|---|---|
git_status_short |
git status --short |
git_diff_stat |
git diff --stat HEAD |
git_diff_name_only |
git diff --name-only HEAD |
Any other command name is rejected with an error listing the catalog. Output is secret-masked and truncated to tool_max_response_bytes like every other tool result.
By default, tool harness execution is skipped on cross-repository PRs unless tool_enable_for_forks is set to true.
If the destination repo has a CLAUDE.md, claude.md, AGENTS.md, or .github/ai-review-rules.md, the action can use that as review policy context.
- uses: misospace/pr-reviewer-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ai_base_url: https://api.openai.com/v1
ai_model: gpt-4.1
ai_api_key: ${{ secrets.OPENAI_API_KEY }}
standards_file: ""You can also pin a specific rules file:
- uses: misospace/pr-reviewer-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ai_base_url: https://api.openai.com/v1
ai_model: gpt-4.1
ai_api_key: ${{ secrets.OPENAI_API_KEY }}
standards_file: .github/review-rules.mdIf PRs are driven by detailed GitHub issues, include closing references such as Fixes #40 or Closes owner/repo#12 in the PR body. The action will fetch those issue bodies and include them in the review corpus so the model can compare the implementation against issue guidance and acceptance criteria.
If a repo wants more than policy context and needs to fully control the reviewer behavior, it can provide a prompt file:
- uses: misospace/pr-reviewer-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ai_base_url: https://api.openai.com/v1
ai_model: gpt-4.1
ai_api_key: ${{ secrets.OPENAI_API_KEY }}
system_prompt_file: .github/pr-review-prompt.mdThe action supports three publish modes via the publish_mode input:
| Mode | Behavior | Branch protection impact |
|---|---|---|
💬 comment |
Posts a sticky PR comment with <!-- ai-pr-reviewer --> markers. The default mode. |
🟢 None — comments are advisory only |
📝 review_comment |
Submits a non-blocking native PR review comment via gh pr review --comment. |
🟢 None — review comments don't affect status checks |
⚖️ review_verdict |
Submits a native PR review verdict (approve or request_changes) via gh pr review. Affects branch protection and status checks. |
🔴 Yes — counts as a real review |
Each publish mode requires different GitHub token permissions in your workflow:
| Publish mode | Required permissions | Notes |
|---|---|---|
comment |
contents: read, pull-requests: write |
The action posts a managed comment using the existing sticky-comment behavior. pull-requests: write is needed for the token to create/edit PR comments. |
review_comment |
contents: read, pull-requests: write |
Submits non-blocking native review comments via gh pr review --comment. The token must have pull-requests: write. |
review_verdict |
contents: read, pull-requests: write |
Submits native approve or request-changes verdicts. Requires pull-requests: write and may additionally require the Allow GitHub Actions to create and approve pull requests setting (see below). |
All modes require contents: read for the action to access repository files during review.
When publish_mode=review_verdict is set, the action submits a native GitHub PR review (approve or request_changes) instead of posting a comment. This integrates with branch protection rules and status checks.
Approval guardrails:
allow_approvedefaults tofalse. The model's approve verdict will be blocked unless this input is explicitly set totrue.approve_forksdefaults tofalse. Even whenallow_approve=true, native approvals are blocked for cross-repository (fork) PRs unless this is also set totrue.- If evidence provider enforcement or tool harness failure enforcement modified the verdict to
request_changes, approval is automatically blocked. - The review body must be non-empty for an approval to be submitted.
Warning
Native approvals can affect branch protection rules and automerge pipelines. Enable allow_approve only when you understand the implications for your repository's merge policy.
- uses: misospace/pr-reviewer-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ai_base_url: https://api.openai.com/v1
ai_model: gpt-4.1
ai_api_key: ${{ secrets.OPENAI_API_KEY }}
publish_mode: review_verdict
allow_approve: "true"name: AI PR Review (native verdicts)
on:
pull_request:
types: [opened, reopened, synchronize, ready_for_review]
permissions:
contents: read
pull-requests: write
jobs:
review:
if: ${{ !github.event.pull_request.draft }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
ref: ${{ github.event.pull_request.head.sha }}
- uses: misospace/pr-reviewer-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ai_base_url: https://api.openai.com/v1
ai_model: gpt-4.1
ai_api_key: ${{ secrets.OPENAI_API_KEY }}
publish_review_comment: "true"
publish_mode: review_verdict
allow_approve: "true"Note
This workflow requires the Allow GitHub Actions to create and approve pull requests setting to be enabled for your repository or organization. Without it, native approvals will fail with a 403 error even though pull-requests: write is granted.
This configuration allows the AI to submit native approvals when its verdict is approve. Fork PRs are still blocked from approval unless approve_forks is also set to "true".
Even when your workflow grants pull-requests: write, native PR review verdicts (approve/request-changes) may fail silently or error out because of GitHub Actions repository settings:
-
Allow GitHub Actions to create and approve pull requests — This organization or repository setting must be enabled for Actions to submit native approvals. Without it, the
gh pr review --approvecommand will fail with a 403 error from the GitHub API. You can find this setting under:- Repository: Settings → Actions → General → "Allow GitHub Actions to create and approve pull requests"
- Organization: Settings → Actions → Organization permissions → "Allow GitHub Actions to create and approve pull requests"
-
Branch protection rules — If branch protection requires a review from a specific user or team, the AI's approval may not satisfy that requirement. The PR will still show
request_changesuntil the required reviewer approves. -
Fork PRs without
approve_forks: true— Approvals from fork PRs are blocked by default unlessapprove_forksis explicitly set to"true".
When approval is blocked, the action always submits a request_changes verdict with an explanation in the review body rather than failing silently.
When publish_mode=review_comment is set, the action submits a non-blocking native PR review comment via gh pr review --comment. This gives you a GitHub-native review entry in the PR's conversation thread without affecting branch protection or status checks.
- uses: misospace/pr-reviewer-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ai_base_url: https://api.openai.com/v1
ai_model: gpt-4.1
ai_api_key: ${{ secrets.OPENAI_API_KEY }}
publish_mode: review_commentWhen publish_mode is set to review_comment or review_verdict, the action creates a new submitted native PR review on every run. Without cleanup, old AI reviews pile up in the PR timeline and make the conversation noisy.
By default, the action automatically cleans up previous managed native reviews for review_comment and review_verdict modes. The cleanup_previous_native_reviews input controls this behavior:
auto(default): enables cleanup forreview_commentandreview_verdictmodes, disabled forcommentmode (which already edits one sticky comment in place).true: always enable cleanup regardless of publish mode.false: disable cleanup entirely.
The cleanup process:
- Identifies previous managed AI reviews from the current authenticated actor that carry the
<!-- ai-pr-reviewer -->marker. - Dismisses old approval or request-changes verdict reviews when permissions allow, so stale verdicts stop counting toward branch protection.
- Updates the body of old managed reviews to a compact "Outdated: superseded by a newer automated review." stub.
Old reviews may still exist in the PR timeline, but they are visually minimized and explicitly marked as outdated. Human reviews and unmarked bot reviews are never modified.
Cleanup and dismissal failures produce warnings but do not prevent posting the new review. If you need to grant additional permissions for dismissal:
permissions:
contents: read
pull-requests: writeThe pull-requests: write permission is required for both posting reviews and dismissing them. On protected branches or stricter repositories, GitHub may require repository admin permissions or explicit review-dismissal settings to be enabled for the app/token.
The model may return an optional findings array alongside the verdict — concrete, located issues:
{
"verdict": "request_changes",
"review_markdown": "...",
"findings": [
{"severity": "blocker", "category": "security", "file": "app/serve.py", "line": 42,
"message": "Resolved path is not checked against the data root before opening."}
]
}Findings are normalized (severities mapped to blocker/major/minor/info, malformed entries dropped) and exposed as the findings output. Absence is fine — weaker local models that only produce verdict/review_markdown keep exactly the previous behavior.
With verdict_policy: findings_severity_gated, the verdict is derived deterministically from the findings instead of trusting the model's headline call: request_changes iff any blocker-severity finding exists, otherwise approve. When no findings were produced, the model's verdict is used (the verdict_source output tells you which path applied). Enforcement settings (evidence_blocker_enforcement, tool-failure enforcement) still run afterwards and can force request_changes.
- uses: misospace/pr-reviewer-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ai_base_url: http://llama-server.internal:8080/v1
ai_model: qwen3-32b
ai_response_format: json_schema # schema includes the findings array
verdict_policy: findings_severity_gatedWith inline_findings: "true" and a native publish mode, findings that carry a file + line anchoring to the PR diff are attached as line-anchored review comments:
publish_mode: review_verdict— the approve/request_changes review itself carries the inline comments (comments[]on the reviews API). If GitHub rejects the payload (e.g. an anchor raced a new push), the action falls back to the plain review, so publishing never fails because of inline findings.publish_mode: review_comment— the sticky summary comment is published as usual, plus a separate nativeCOMMENTreview carrying the inline comments. That review includes the managed marker, so the next run's cleanup marks it superseded.publish_mode: comment— ignored.
Anchors are validated against the diff before submission (GitHub only accepts comments on lines present in the diff); findings without a valid anchor stay in the review body. Comment bodies are secret-masked and @-mention-neutralized like all published output, and capped by inline_findings_max (default 20).
Thread lifecycle on re-review. Each inline comment carries a hidden content fingerprint of its finding. On a later incremental review, the action matches existing review threads by that fingerprint and keeps them alive instead of stacking duplicates:
- A carried finding the model answered with
resolution: resolved(the same fail-closed rule that drives the verdict) gets its thread resolved via the GraphQLresolveReviewThreadmutation. - A carried finding that survives (
still_open,not_verifiable_from_delta, or unanswered) gets a short reply on its existing thread ("Still open after this push…") instead of a fresh duplicate anchored comment. Replies are stamped with the head SHA, so a re-run on the same push never posts the same follow-up twice, and are capped byinline_findings_max. - A still-open carried finding whose thread no longer exists falls back to a fresh anchored comment as before.
Best-effort throughout: API failures (e.g. read-only tokens on fork PRs) warn and never fail the publish.
- uses: misospace/pr-reviewer-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ai_base_url: http://llama-server.internal:8080/v1
ai_model: qwen3-32b
publish_mode: review_verdict
verdict_policy: findings_severity_gated
inline_findings: "true"With review_routing_mode: auto, the deterministic classification decides which model reviews the PR — boring PRs go to a fast/local model, scary ones go straight to a smarter model:
- uses: misospace/pr-reviewer-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ai_base_url: http://llama-server.internal:8080/v1 # fast (default = primary)
ai_model: qwen3-32b
ai_smart_base_url: https://api.anthropic.com/v1
ai_smart_api_format: anthropic
ai_smart_model: claude-sonnet-4-6
ai_smart_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
review_routing_mode: autoRouting rules:
- A PR whose
pr_kindor anyrisk_flagsentry matchesescalate_on_risk_flagsroutes to the smart model; everything else routes to the fast model. - The fast config defaults to the primary
ai_*inputs; the smart config defaults to theai_fallback_*inputs. If a risky PR is detected but no smart/fallback model is configured, the review stays on the fast model (logged, never fails). off(the default) preserves the existing primary/fallback behavior exactly (review_routeoutput reportslegacy).- The retry and failure-fallback machinery is unchanged — routing only picks which model it talks to.
- The chosen route appears in the
review_routeoutput, the step summary, and the managed metadata marker; routing config is part of the precheck fingerprint, so changing it forces a fresh review.
In auto mode, a fast review can also be escalated after the fact: the action evaluates the raw fast output and re-runs the review on the smart model when any enabled trigger fires:
escalate_on_fast_request_changes— the fast model wants changes; let the smart model confirm or overturn before a human is summoned.escalate_on_incomplete_required_checks— the fast review never discussed one of the classifier's required checks.escalate_on_fast_low_confidence— the review is very short or carries a populated "Unknowns or Needs Verification" section. The length floor is diff-aware: a diff of ≤10 changed lines only needs an 80-char review (a correct review of a one-line renovate bump is short — escalating it wastes a smart-model run on exactly the PRs least worth one), while larger diffs keep the 200-char floor.escalate_on_tool_or_evidence_blockers— evidence providers reported a blocker, or tool requests executed and every one failed.escalate_on_tool_planning_failure(default false) — the harness planning call failed before any tools ran. Off by default because a planning failure means the review proceeded with less evidence (the same situation astool_mode: off), not that the PR is risky; the failure is still recorded in the step summary.escalate_on_dirty_baseline— this is an incremental review and the previous review found issues; judging whether the delta resolves them is run on the smart model.
Only the final review is published. The primary result is kept on the runner as ai-output.primary.json for debugging; if the smart model fails, the primary review is published instead (never a failed run because of escalation). review_route reports escalated and escalation_reason lists the trigger names; both also land in the step summary and the managed metadata marker, and the published review's _Analysis engine:_ line carries the same story in human-readable form (— routed smart (risk match: …) vs — escalated (…) vs — fallback (primary failed)), so you can tell a deliberate smart review from an escalation or an availability fallback at a glance. Worst case is two model calls per review — the unchanged-diff skip and incremental scope keep that bounded.
When review_scope: auto (the default), the action performs a full PR review on the first run. On subsequent pushes to the same PR, it attempts an incremental review that only analyzes the delta since the last managed review. This can significantly reduce token usage for large PRs with multiple commits.
Key behaviors:
- First run: Full PR review (same as before).
- Later pushes: Incremental review of only new changes.
- Fallback: Automatically falls back to full review when incremental comparison is unsafe (force-push, rebase, base branch change, missing metadata, etc.).
- Verdict safety: With
publish_mode: review_verdict, approvals based on incremental reviews require a trusted clean full-review baseline. If the baseline is dirty, a forced re-review (theai-reviewlabel) escalates to full scope to re-establish it — see Forcing a re-review. - Carried-forward findings (cumulative verdict): when a review requests changes, its findings are persisted in the managed metadata marker (
open_findings). The next incremental review receives them as a high-priority corpus section and must answer each with aresolution:resolved,still_open, ornot_verifiable_from_delta. Findings the model does not convincingly resolve survive into the new review'sfindingsoutput, and a surviving blocker forcesrequest_changes(verdict_source: carry_forward) — fixing one of three blockers cannot rubber-stamp the other two. The published review lists what this push resolved and what is still open, so the latest review always reflects total PR state (useful since superseded reviews are dismissed and hidden). - Cross-run evidence memory (
tool_mode: native_loop): a native_loop review gathers evidence with read-only tools (reading configs, fetching support matrices). A compact digest of that evidence is persisted in the same metadata marker (evidence_digest, tagged with the head SHA it was gathered at). The next incremental review receives it as a corpus section and reuses it — re-verifying only what the delta touched — instead of re-running the same reads and fetches. On by default (tool_evidence_memory); the framing is fail-safe (prior evidence is context, not ground truth, and may be stale). - Header: incremental reviews are titled
# AI Automated Review (incremental).
You can force specific behavior:
# Always do full reviews (original behavior)
- uses: misospace/pr-reviewer-action@vX.Y.Z
with:
review_scope: full
# Always attempt incremental (falls back safely)
- uses: misospace/pr-reviewer-action@vX.Y.Z
with:
review_scope: incrementalThe action is designed local-model-first (ollama, llama.cpp, vLLM, or anything behind an OpenAI/Anthropic-compatible proxy like LiteLLM). The settings below cover the failure modes that come up most often with self-hosted endpoints.
ai_base_url must point at the OpenAI-compatible base (the action appends /chat/completions, or /messages for ai_api_format: anthropic):
# ollama on the same runner/host (note the /v1 — ollama's native API is not OpenAI-compatible)
ai_base_url: http://localhost:11434/v1
# ollama on another host on your network
ai_base_url: http://192.168.1.50:11434/v1
# llama.cpp llama-server
ai_base_url: http://llama-server.internal:8080/v1
# vLLM
ai_base_url: http://vllm.internal:8000/v1
# LiteLLM proxy (set ai_api_format to match the route's format; openai is typical)
ai_base_url: http://litellm.internal:4000/v1Self-hosted runners must be able to reach the endpoint — GitHub-hosted runners cannot reach localhost or LAN addresses on your network. Leave ai_api_key unset if the endpoint is unauthenticated; nothing is sent in that case.
The named context_limit_mode budgets assume large cloud-model windows (normal is roughly 55–70k tokens of corpus). Local models commonly run 8k–32k windows, and an overflowing prompt fails in confusing ways: the server returns context length exceeded (visible in the action log thanks to error-body preservation), or worse, silently truncates the prompt and the model returns malformed or irrelevant JSON.
Set model_context_tokens to the window you actually serve the model with (e.g. ollama's num_ctx, llama.cpp's --ctx-size, vLLM's --max-model-len):
model_context_tokens: "16384" # derive corpus/diff/file budgets from the real window
ai_max_tokens: "2048" # reserved for the model's reply within that windowThe action reserves ai_max_tokens plus prompt headroom and converts the rest to byte budgets conservatively (~3 bytes/token). Check the run's step summary: it shows the active budget and whether the diff/corpus were truncated.
Small models often wrap their JSON in prose or markdown fences. The parser tolerates a lot, but structured output is more reliable when the server supports it:
ai_response_format: json_object # broad support: ollama, vLLM, llama.cpp server, LiteLLM
# or, where supported (enforces the exact verdict/review_markdown schema):
ai_response_format: json_schema # vLLM guided decoding, llama.cpp grammars, newer serversIf the endpoint rejects the request after enabling this (HTTP 400 mentioning response_format), the server does not support that mode — drop back to json_object or off. Ignored entirely for ai_api_format: anthropic.
- Slow prompt eval (big corpus, CPU offload): raise
ai_request_timeout_sec(default 300). The tool-planning call is non-streaming and has its owntool_planning_timeout_sec— raise it too if planning times out. - Proxies with idle-read timeouts (e.g. Cloudflare's ~100s edge timer): keep
ai_stream: "true"(the default) so bytes flow before the timer fires. - Models that reject sampling params: set
ai_temperature: ""to omit the field entirely; setai_tokens_param: max_completion_tokensfor newer OpenAI reasoning models. - Endpoint not always up (homelab): configure
ai_fallback_base_url/ai_fallback_model(e.g. a small cloud model) or seton_model_failure: noticeso the PR gets a visible explanation instead of a bare red check. - Don't burn 10 minutes on a dead endpoint: the defaults (
ai_primary_retries: "8", 15s delay with backoff, 300s request timeout) are tuned for flaky-but-alive endpoints and can spend ~10 minutes before giving up. If your endpoint is either up or down (typical homelab), use a low-retry profile:
ai_primary_retries: "2"
ai_primary_retry_delay_sec: "5"
ai_connect_timeout_sec: "10"
on_model_failure: notice # visible explanation instead of a long red check| Symptom | Likely cause | Fix |
|---|---|---|
curl transport error (exit 7) in logs |
endpoint unreachable from the runner | check ai_base_url, runner network, server is listening |
| HTTP 404 from the endpoint | base URL missing /v1 (ollama) or wrong ai_api_format |
use the OpenAI-compatible base path |
context length exceeded in the logged error body |
corpus exceeds the served window | set model_context_tokens (and/or lower ai_max_tokens) |
| Verdict parse failures, retries, then fallback | model wraps JSON in prose | set ai_response_format: json_object |
| Reviews time out behind a proxy | idle-read timer on non-streamed response | keep ai_stream: "true" |
HTTP 400 mentioning temperature |
model rejects non-default sampling | ai_temperature: "" |
-
Reserved comment markers: The managed PR comment uses HTML comment markers for internal metadata. These are reserved and must not appear in model-generated review markdown:
<!-- ai-pr-review-fingerprint:<value> -->— stable patch + config fingerprint used by the precheck to skip unchanged diffs.<!-- ai-pr-review-sha:<sha> -->— PR head SHA used to detect out-of-date reviews.
The action strips any matching markers from model output before publishing (see
scripts/strip_metadata_markers.py). The precheck parser reads only the first occurrence of each marker for defense in depth. -
ai_api_format=openaiposts to/chat/completionsand parseschoices[0].message.content. -
ai_api_format=anthropicposts to/messagesand parses onlycontent[]blocks wheretype == "text". -
The tool harness planner uses the primary
ai_api_format; fallback settings apply only to the final review call. -
system_prompttakes precedence oversystem_prompt_file. -
system_prompt_filetakes precedence over the bundled generic prompt. -
standards_fileis optional; if blank, the action checksstandards_file_candidatesin order and uses the first file found.AGENTS.mdis checked first by default, thenCLAUDE.md, making the action compatible with both Claude Code and non-Claude Code setups. -
By default, the action computes a stable patch fingerprint with
git patch-id --stableand skips the LLM call when that fingerprint matches the most recent managed review comment. This avoids token spend on rebases and other history-only changes. -
publish_review_commentusesgh pr comment --edit-last --create-if-none, so the comment is managed by the token identity used in the workflow. -
context_limit_modereduces the amount of PR data sent to the LLM. Useminimalfor models with very small context windows. This skips nothing but truncates more aggressively. -
evidence_providers_fileaccepts JSON only. It can be either an object withproviders: []or a top-level provider array. -
Provider
commandaccepts either a shell string (executed viabash -lc) or an argument array (invoked directly). Argv arrays are strongly recommended to avoid shell injection risks. Each provider can overridetimeout_secandmax_output_bytes. -
Provider output is appended to the review corpus under an
Evidence Providerssection. -
tool_mode=plan_execute_onceadds a single planning-and-execution tool round before final review synthesis;plan_execute_loopiterates planning (bounded bytool_max_roundsand the totaltool_max_requestsbudget) with results fed back as untrusted data. -
Tool harness output is appended to the review corpus under
Tool Harness Findings. -
Tool harness planning treats corpus content as untrusted data and uses strict tool/path/host allowlists with output redaction. The
run_commandtool does not execute arbitrary shell text; it accepts only named read-only command definitions (git_status_short,git_diff_stat,git_diff_name_only) and runs them argv-only withoutbash -lc. -
Evidence providers and tool harness are both disabled by default on cross-repository PRs (
*_enable_for_forks=false). -
gh_apidefaults to current-repo scope only. Usetool_allowed_gh_api_reposto allow specific upstream repos, or*to allow any repository while keeping the path denylist and endpoint allowlist active. -
For local models, reduce
tool_planning_max_context_bytesandtool_planning_max_tokens, and increasetool_planning_timeout_secas needed. -
Set
tool_failure_enforcement=trueto fail closed when tool harness planning fails or when every tool request fails. -
Use
tool_min_successful_requests(for example1) to enforce a minimum successful tool-evidence threshold when the planner attempted tool requests. -
Model requests use
curl -qso user-level.curlrctimeouts do not unexpectedly cancel long-running local model calls.
This repo includes a local smoke test that exercises the action logic against a real GitHub pull request while using a mock OpenAI/Anthropic-compatible API server.
Run it with a specific PR:
PR_NUMBER=6757 tests/smoke_test.shOr let it pick the most recent open PR in misospace/pr-reviewer-action:
tests/smoke_test.shThe smoke test validates:
- GitHub PR data collection through
gh - review corpus assembly
- OpenAI-compatible
chat/completionsand Anthropic-compatiblemessagesresponse parsing - output parsing and action output generation
Copyable workflows are included in examples/:
The action is versioned via Git tags (e.g., v1.2.4). The examples in this README use @v1 as a shorthand; in production workflows, pin to a specific version tag or commit SHA for reproducible runs:
# Pin to a specific release tag (recommended)
- uses: misospace/pr-reviewer-action@v1.2.4
# Pin to a specific commit (most stable during development)
- uses: misospace/pr-reviewer-action@d1a7753252a7d9d1e999ae53824e5e43587c8130 # v1.2.4This repository's own self-review workflow (.github/workflows/ai-pr-review.yaml) pins the action to a specific commit SHA rather than @v1 or @main. This ensures the self-review process uses a known-good, tested version while new changes are developed on main. After a release is cut and tagged, the self-review workflow is updated to pin the new tag.
Releases are cut when features or fixes are ready. The v1.x.y scheme follows semver:
- Patch (
y): bug fixes and minor improvements - Minor (
x): new features, backward-compatible changes - Major (
v1→v2): breaking changes to inputs/outputs or behavior
To stay current, subscribe to GitHub Releases or enable Renovate to track the misospace/pr-reviewer-action dependency.
See SECURITY.md for the threat model, controls, and operational guidance.
Built for homelabs and production alike — if this action reviews your PRs well, consider starring the repo ⭐