Skip to content

feature: optional export of guardrail decisions for audit and compliance #1786

@mohdibrahimaiml

Description

@mohdibrahimaiml

Did you check the docs?

  • I have read all the NeMo-Guardrails docs

Is your feature request related to a problem? Please describe.

NeMo Guardrails enforces conversational policies at runtime. However, guardrail decisions (allow, block, modify) are primarily observable through logs or tracing systems.

For audit and compliance workflows, teams need portable and verifiable records of these decisions that can be:

  • Shared across systems
  • Reviewed independently
  • Preserved in a tamper-evident format

Why Current Approaches Are Insufficient

Logs and traces are useful for debugging, but they are not sufficient for audit workflows because they are:

  • Not standardized
  • Not portable
  • Difficult to verify independently
  • Require access to the original infrastructure to review

Real Use Case

A lending bot denies a loan using a fairness guardrail. The customer requests proof the decision was fair and auditable. The company can show OpenTelemetry traces (technical, complex) but cannot easily show a regulator or auditor a portable, independently verifiable artifact that proves:

"On this date, for this customer, this policy was evaluated, this decision was made, and no one modified it afterward."

Describe the solution you'd like

Introduce an optional evidence export layer that exposes guardrail decisions as structured data for downstream processing.

Key Principles

  • Does not change runtime behavior
  • Optional and non-invasive
  • Can be implemented as an extension point
  • Teams choose their own destination for evidence

Implementation Options

Option A: Post-Execution Hook

on_guardrail_decision(
    decision: str,           # "allow" | "block" | "modify"
    rail_id: str,           # "content-safety", "topic-control", etc.
    policy_name: str,       # "no-jailbreak", "stay-on-topic"
    reason: str,            # "matched rule: 'politics'"
    confidence: float,      # 0.95
    input_context: dict,    # user message, conversation state
    output_action: str,     # what happened (allow/block/rephrase)
)

Option B: Export API Endpoint

GET /api/guardrails/decisions/{session_id}

Returns structured JSON with all decisions in a session.

Option C: Plugin/Middleware Interface

class EvidenceExporter(ABC):
    def export(self, decision: GuardrailDecision) -> None:
        """
        Called for each guardrail decision.
        Implementers decide where to send the evidence.
        """
        pass

Decision Record Structure

Each exported decision record should include:

  • Policy/rule applied — which guardrail evaluated this input
  • Decision outcome — allow, block, or modify
  • Timestamp and session identifier — when and in which conversation
  • Input/output context — relevant message content and conversation state
  • Reason and confidence score — why the decision was made, with confidence

Describe alternatives you've considered

Current Approaches

Teams currently use:

  • Application logs — useful for debugging, not suitable for audit
  • Tracing systems (e.g., OpenTelemetry) — provide observability, not portability
  • Custom audit pipelines — inconsistent, per-team implementation

Why These Are Insufficient

Approach Portable? Standardized? Independently Verifiable?
Logs
OpenTelemetry traces ⚠️ (vendor-specific)
Custom pipelines

All require additional infrastructure and access to the original system to review evidence.

Additional context

Broader Pattern in AI Infrastructure

This reflects a gap in AI systems:

  • Runtime enforcement (what NeMo does well) is well-supported
  • Portable audit artifacts are not

Rationale for Non-Invasive Design

This request is for a downstream extension point, not a core change:

  • Runtime behavior stays unchanged
  • Zero latency impact
  • Teams decide whether to use it
  • Teams choose their own destination format

Related Work

EPI Recorder is an example of this pattern: it captures AI execution into portable, verifiable artifacts for debugging, review, and verification.

This request does NOT propose adopting any specific format.

Instead, it asks for a structured export interface. Teams can:

  • Use a portable evidence format (like EPI)
  • Send data to compliance dashboards
  • Store in internal databases
  • Integrate with custom workflows

Expected Outcome

By adding an evidence export layer, NeMo Guardrails becomes:

  1. The runtime gold standard for policy enforcement ✅ (already true)
  2. Compliance-workflow-friendly ✅ (enabled by this feature)

This positions NeMo as the choice for regulated industries where audit trails are mandatory.


Implementation Notes

  • No changes to core runtime
  • Optional feature (teams opt-in)
  • Can be added as a post-execution stage
  • Should support async/streaming scenarios
  • Consider traceability (request ID, session ID)
  • Should be compatible with existing LLM providers (OpenAI, Azure, NIM, etc.)

Questions for Maintainers

  1. Does this align with NeMo's direction toward enterprise/compliance workflows?
  2. Which implementation option (A, B, or C) would be most compatible with the current architecture?
  3. Should evidence export be enabled per-rail, per-configuration, or globally?
  4. How should this interact with existing tracing/logging systems?

Closing

Happy to help with implementation, examples, or a PR if this direction is of interest.

Thank you for considering this feature request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requeststatus: needs triageNew issues that have not yet been reviewed or categorized.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions