Skip to content

Feature Request: shadow traffic mirroring for AIGatewayRoute with per-mirror model rewriting #2137

@mtparet

Description

@mtparet

Summary

  • What: Add a mirrors field to AIGatewayRouteRule that sends a fire-and-forget copy of each matched request to one or more shadow backends, with the AI Gateway's upstream ExtProc pipeline applied to the mirror leg (model rewrite,
    header/body mutation).
  • Why: Evaluate a new model, provider, or region against live production traffic without affecting client responses — a common ask for canary, regression, and A/B-eval workflows.
  • Where: New field on AIGatewayRouteRule in api/v1beta1, plus follow-through in the controller and extension server.
  • Impact: Backward compatible (opt-in field). Mirror leg billing is suppressed so shadow traffic does not double-count in LLMRequestCosts.

Problem

Shadow-traffic evaluation is the standard way to qualify a new LLM backend before flipping production routing. Today, an operator can technically add an HTTPRequestMirrorFilter by editing the controller-generated HTTPRoute out-of-band,
but:

  1. The change is overwritten on the next reconcile — AIGatewayRoute has no surface for it.
  2. The mirror cluster does not receive the AI Gateway's upstream ExtProc filter chain, so model rewriting (x-ai-eg-model, body model field) and header/body mutations do not run — the shadow backend sees a request shaped for the
    primary, not for itself.
  3. LLMRequestCosts dynamic metadata is emitted for the mirror leg, double-counting tokens in access logs and downstream billing pipelines.

Proposal

Add a mirrors field to AIGatewayRouteRule. Each entry wraps an AIGatewayRouteRuleBackendRef (so it inherits modelNameOverride, headerMutation, bodyMutation) plus an HTTPRequestMirrorFilter-style percent for sampling:

type AIGatewayRouteRule struct {
    // ...existing fields...
    // +optional
    // +kubebuilder:validation:MaxItems=16
    Mirrors []AIGatewayRouteRuleMirror `json:"mirrors,omitempty"`
}

type AIGatewayRouteRuleMirror struct {
    AIGatewayRouteRuleBackendRef `json:",inline"`
    // +optional
    Percent *gwapiv1.Fraction `json:"percent,omitempty"`
}

Semantics: responses from mirror backends are always discarded; only the primary backendRefs respond to the client. Each entry maps to one Gateway API HTTPRequestMirrorFilter.

What this enables

  • Side-by-side evaluation of a new model or provider on real traffic.
  • Region-failover dry runs (mirror to a candidate region, compare error rates).
  • Regression testing of modelNameOverride / translation changes against production payloads.
  • Per-tenant audit of model behavior without paying for it twice.

Open questions

  1. Per-mirror translation shape: should mirrors carry full modelNameOverride / headerMutation / bodyMutation, or reference an AIServiceBackend only and inherit its config? Inheriting is cleaner but forces operators to create a dedicated AIServiceBackend per shadow target.
  2. Cost emission policy: always-suppress LLMRequestCosts on mirrors, opt-in via a field on the mirror entry, or controlled at BackendSecurityPolicy / GatewayConfig level?
  3. Mirror-cluster naming contract: the extension server needs to identify mirror clusters in the xDS push to install (or skip) the upstream filter chain. Envoy Gateway currently emits names of the form
    httproute///rule/-mirror- with 1-based mirror indices. Is it acceptable to depend on that wire format, or should AI Gateway negotiate a stable contract / metadata-based marker with Envoy Gateway?
  4. Failure handling: should mirror-leg ExtProc failures be silently absorbed (current Envoy mirror semantics) or surface as gen_ai.* error metrics tagged is_mirror=true?

Related

We have a working PoC on a fork with e2e coverage and would be happy to upstream it once the design questions above are resolved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    designThis is related to a design proposal or discussionenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions