Skip to content

Proposal: Load Agent Threat Rules (ATR) detection patterns as Colang rails #1872

@eeee2345

Description

@eeee2345

Proposal — Load Agent Threat Rules (ATR) detection patterns as Colang rails

Hi NeMo-Guardrails team,

I maintain Agent Threat Rules (ATR), an open detection standard for AI agent attacks (Apache 2.0, https://github.com/Agent-Threat-Rule/agent-threat-rules). Filing this as a proposal because the integration is a clean fit with how NeMo-Guardrails composes rails today and I want to know if you'd accept the PR before I open it.

What ATR is

Why NeMo-Guardrails specifically

You already model rails as composable Colang flows. ATR provides a curated, severity-tagged, MITRE ATLAS / OWASP Agentic-Top-10 / SAFE-MCP cross-walked catalog of detection patterns. Loading ATR rules as a Colang library would:

  1. Cut the time from new threat disclosure to deployable rail. Example: Microsoft Semantic Kernel CVE-2026-26030 (lambda+eval RCE) had ATR rules merged within 4 days of MSRC disclosure (5/7 → 5/11), shipped as @agent-threat-rules v2.1.2 on npm. NeMo users on nemoguardrails[atr] would inherit those rails on next install.
  2. Cover MCP-specific surfaces (tool poisoning, skill compromise, excessive autonomy) that don't have first-party rails today.
  3. Map cleanly to OWASP Agentic-Top-10 categories — useful when users ask "which rails cover LLM06 sensitive info disclosure?"

Proposed integration shape

Option A — optional extra: pip install nemoguardrails[atr] pulls our Python loader that compiles each ATR YAML into a Colang define flow block. Configurable per category/severity.

Option B — example library: ship as examples/atr_rails/ reference with a tutorial. Lower lift, also lower discoverability.

I lean toward A, but happy to start with B if that matches your roadmap better.

What I'd contribute

  • Loader (Python, MIT) that maps condition / agent_source / response from the ATR schema to Colang flows
  • 10 example flows (one per category) shipped in the repo
  • CI test against the existing benchmark in nvidia/aegis-ai-content-safety-test so a NeMo PR can prove FP rate stays under their thresholds
  • Maintenance: ATR ships patch releases when wild-scans find new patterns (last 30 days: 26 → 338 rules). Cisco's pinned the rules in their own ATR mirror and I'd do the same for NeMo so version pin is the user's choice.

What I need from you

  • Yes / no on the integration angle
  • Pointer to the right Colang flow primitive if option A's loader output should look different from what I'd guess from the docs

Not asking for prioritization or maintainer time beyond review. If this isn't a fit, "not now" is a fine answer — I'll close.

Refs:

Thanks for the time. Will hold off on opening a PR until I hear back.

— Adam Lin (linkedin/eeee2345)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions