Proposal: Load Agent Threat Rules (ATR) detection patterns as Colang rails

## Proposal — Load Agent Threat Rules (ATR) detection patterns as Colang rails

Hi NeMo-Guardrails team,

I maintain Agent Threat Rules (ATR), an open detection standard for AI agent attacks (Apache 2.0, https://github.com/Agent-Threat-Rule/agent-threat-rules). Filing this as a proposal because the integration is a clean fit with how NeMo-Guardrails composes rails today and I want to know if you'd accept the PR before I open it.

### What ATR is
- 338 YAML detection rules across 10 attack categories: prompt-injection, tool-poisoning, context-exfiltration, excessive-autonomy, privilege-escalation, agent-manipulation, data-poisoning, model-abuse, skill-compromise, model-security
- 97.1% recall on NVIDIA garak (independent benchmark)
- 100% recall / 97% precision / 0.20% FP on 498 real-world SKILL.md samples
- Already in production: Cisco AI Defense (skill-scanner #79, merged), Microsoft (agent-governance-toolkit #908, merged), OWASP Agentic Top 10 (precize repo #14, merged), MISP taxonomy + galaxy (#323 + #1207, submitted 2026-05-10)
- 96,096 skills wild-scanned, 751 confirmed malware skills found in production ecosystems

### Why NeMo-Guardrails specifically
You already model rails as composable Colang flows. ATR provides a curated, severity-tagged, MITRE ATLAS / OWASP Agentic-Top-10 / SAFE-MCP cross-walked catalog of detection patterns. Loading ATR rules as a Colang library would:

1. Cut the time from new threat disclosure to deployable rail. Example: Microsoft Semantic Kernel CVE-2026-26030 (lambda+eval RCE) had ATR rules merged within 4 days of MSRC disclosure (5/7 → 5/11), shipped as `@agent-threat-rules` v2.1.2 on npm. NeMo users on `nemoguardrails[atr]` would inherit those rails on next install.
2. Cover MCP-specific surfaces (tool poisoning, skill compromise, excessive autonomy) that don't have first-party rails today.
3. Map cleanly to OWASP Agentic-Top-10 categories — useful when users ask "which rails cover LLM06 sensitive info disclosure?"

### Proposed integration shape

Option A — optional extra: `pip install nemoguardrails[atr]` pulls our Python loader that compiles each ATR YAML into a Colang `define flow` block. Configurable per category/severity.

Option B — example library: ship as `examples/atr_rails/` reference with a tutorial. Lower lift, also lower discoverability.

I lean toward A, but happy to start with B if that matches your roadmap better.

### What I'd contribute
- Loader (Python, MIT) that maps `condition` / `agent_source` / `response` from the ATR schema to Colang flows
- 10 example flows (one per category) shipped in the repo
- CI test against the existing benchmark in `nvidia/aegis-ai-content-safety-test` so a NeMo PR can prove FP rate stays under their thresholds
- Maintenance: ATR ships patch releases when wild-scans find new patterns (last 30 days: 26 → 338 rules). Cisco's pinned the rules in their own ATR mirror and I'd do the same for NeMo so version pin is the user's choice.

### What I need from you
- Yes / no on the integration angle
- Pointer to the right Colang flow primitive if option A's loader output should look different from what I'd guess from the docs

Not asking for prioritization or maintainer time beyond review. If this isn't a fit, "not now" is a fine answer — I'll close.

Refs:
- ATR repo: https://github.com/Agent-Threat-Rule/agent-threat-rules
- npm: https://www.npmjs.com/package/agent-threat-rules
- Recent Microsoft CVE disclosure loop closed by ATR: AGT#1981 (Copilot SWE Agent triage 2026-05-11)
- Wild scan results: https://github.com/Agent-Threat-Rule/agent-threat-rules/tree/main/data/skill-benchmark

Thanks for the time. Will hold off on opening a PR until I hear back.

— Adam Lin (linkedin/eeee2345)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Load Agent Threat Rules (ATR) detection patterns as Colang rails #1872

Proposal — Load Agent Threat Rules (ATR) detection patterns as Colang rails

What ATR is

Why NeMo-Guardrails specifically

Proposed integration shape

What I'd contribute

What I need from you

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Load Agent Threat Rules (ATR) detection patterns as Colang rails #1872

Description

Proposal — Load Agent Threat Rules (ATR) detection patterns as Colang rails

What ATR is

Why NeMo-Guardrails specifically

Proposed integration shape

What I'd contribute

What I need from you

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions