You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I maintain Agent Threat Rules (ATR), an open detection standard for AI agent attacks (Apache 2.0, https://github.com/Agent-Threat-Rule/agent-threat-rules). Filing this as a proposal because the integration is a clean fit with how NeMo-Guardrails composes rails today and I want to know if you'd accept the PR before I open it.
96,096 skills wild-scanned, 751 confirmed malware skills found in production ecosystems
Why NeMo-Guardrails specifically
You already model rails as composable Colang flows. ATR provides a curated, severity-tagged, MITRE ATLAS / OWASP Agentic-Top-10 / SAFE-MCP cross-walked catalog of detection patterns. Loading ATR rules as a Colang library would:
Cut the time from new threat disclosure to deployable rail. Example: Microsoft Semantic Kernel CVE-2026-26030 (lambda+eval RCE) had ATR rules merged within 4 days of MSRC disclosure (5/7 → 5/11), shipped as @agent-threat-rules v2.1.2 on npm. NeMo users on nemoguardrails[atr] would inherit those rails on next install.
Cover MCP-specific surfaces (tool poisoning, skill compromise, excessive autonomy) that don't have first-party rails today.
Map cleanly to OWASP Agentic-Top-10 categories — useful when users ask "which rails cover LLM06 sensitive info disclosure?"
Proposed integration shape
Option A — optional extra: pip install nemoguardrails[atr] pulls our Python loader that compiles each ATR YAML into a Colang define flow block. Configurable per category/severity.
Option B — example library: ship as examples/atr_rails/ reference with a tutorial. Lower lift, also lower discoverability.
I lean toward A, but happy to start with B if that matches your roadmap better.
What I'd contribute
Loader (Python, MIT) that maps condition / agent_source / response from the ATR schema to Colang flows
10 example flows (one per category) shipped in the repo
CI test against the existing benchmark in nvidia/aegis-ai-content-safety-test so a NeMo PR can prove FP rate stays under their thresholds
Maintenance: ATR ships patch releases when wild-scans find new patterns (last 30 days: 26 → 338 rules). Cisco's pinned the rules in their own ATR mirror and I'd do the same for NeMo so version pin is the user's choice.
What I need from you
Yes / no on the integration angle
Pointer to the right Colang flow primitive if option A's loader output should look different from what I'd guess from the docs
Not asking for prioritization or maintainer time beyond review. If this isn't a fit, "not now" is a fine answer — I'll close.
Proposal — Load Agent Threat Rules (ATR) detection patterns as Colang rails
Hi NeMo-Guardrails team,
I maintain Agent Threat Rules (ATR), an open detection standard for AI agent attacks (Apache 2.0, https://github.com/Agent-Threat-Rule/agent-threat-rules). Filing this as a proposal because the integration is a clean fit with how NeMo-Guardrails composes rails today and I want to know if you'd accept the PR before I open it.
What ATR is
Why NeMo-Guardrails specifically
You already model rails as composable Colang flows. ATR provides a curated, severity-tagged, MITRE ATLAS / OWASP Agentic-Top-10 / SAFE-MCP cross-walked catalog of detection patterns. Loading ATR rules as a Colang library would:
@agent-threat-rulesv2.1.2 on npm. NeMo users onnemoguardrails[atr]would inherit those rails on next install.Proposed integration shape
Option A — optional extra:
pip install nemoguardrails[atr]pulls our Python loader that compiles each ATR YAML into a Colangdefine flowblock. Configurable per category/severity.Option B — example library: ship as
examples/atr_rails/reference with a tutorial. Lower lift, also lower discoverability.I lean toward A, but happy to start with B if that matches your roadmap better.
What I'd contribute
condition/agent_source/responsefrom the ATR schema to Colang flowsnvidia/aegis-ai-content-safety-testso a NeMo PR can prove FP rate stays under their thresholdsWhat I need from you
Not asking for prioritization or maintainer time beyond review. If this isn't a fit, "not now" is a fine answer — I'll close.
Refs:
Thanks for the time. Will hold off on opening a PR until I hear back.
— Adam Lin (linkedin/eeee2345)