Summary
- What: Add a mirrors field to AIGatewayRouteRule that sends a fire-and-forget copy of each matched request to one or more shadow backends, with the AI Gateway's upstream ExtProc pipeline applied to the mirror leg (model rewrite,
header/body mutation).
- Why: Evaluate a new model, provider, or region against live production traffic without affecting client responses — a common ask for canary, regression, and A/B-eval workflows.
- Where: New field on AIGatewayRouteRule in api/v1beta1, plus follow-through in the controller and extension server.
- Impact: Backward compatible (opt-in field). Mirror leg billing is suppressed so shadow traffic does not double-count in LLMRequestCosts.
Problem
Shadow-traffic evaluation is the standard way to qualify a new LLM backend before flipping production routing. Today, an operator can technically add an HTTPRequestMirrorFilter by editing the controller-generated HTTPRoute out-of-band,
but:
- The change is overwritten on the next reconcile — AIGatewayRoute has no surface for it.
- The mirror cluster does not receive the AI Gateway's upstream ExtProc filter chain, so model rewriting (x-ai-eg-model, body model field) and header/body mutations do not run — the shadow backend sees a request shaped for the
primary, not for itself.
- LLMRequestCosts dynamic metadata is emitted for the mirror leg, double-counting tokens in access logs and downstream billing pipelines.
Proposal
Add a mirrors field to AIGatewayRouteRule. Each entry wraps an AIGatewayRouteRuleBackendRef (so it inherits modelNameOverride, headerMutation, bodyMutation) plus an HTTPRequestMirrorFilter-style percent for sampling:
type AIGatewayRouteRule struct {
// ...existing fields...
// +optional
// +kubebuilder:validation:MaxItems=16
Mirrors []AIGatewayRouteRuleMirror `json:"mirrors,omitempty"`
}
type AIGatewayRouteRuleMirror struct {
AIGatewayRouteRuleBackendRef `json:",inline"`
// +optional
Percent *gwapiv1.Fraction `json:"percent,omitempty"`
}
Semantics: responses from mirror backends are always discarded; only the primary backendRefs respond to the client. Each entry maps to one Gateway API HTTPRequestMirrorFilter.
What this enables
- Side-by-side evaluation of a new model or provider on real traffic.
- Region-failover dry runs (mirror to a candidate region, compare error rates).
- Regression testing of modelNameOverride / translation changes against production payloads.
- Per-tenant audit of model behavior without paying for it twice.
Open questions
- Per-mirror translation shape: should mirrors carry full modelNameOverride / headerMutation / bodyMutation, or reference an AIServiceBackend only and inherit its config? Inheriting is cleaner but forces operators to create a dedicated AIServiceBackend per shadow target.
- Cost emission policy: always-suppress LLMRequestCosts on mirrors, opt-in via a field on the mirror entry, or controlled at BackendSecurityPolicy / GatewayConfig level?
- Mirror-cluster naming contract: the extension server needs to identify mirror clusters in the xDS push to install (or skip) the upstream filter chain. Envoy Gateway currently emits names of the form
httproute///rule/-mirror- with 1-based mirror indices. Is it acceptable to depend on that wire format, or should AI Gateway negotiate a stable contract / metadata-based marker with Envoy Gateway?
- Failure handling: should mirror-leg ExtProc failures be silently absorbed (current Envoy mirror semantics) or surface as gen_ai.* error metrics tagged is_mirror=true?
Related
We have a working PoC on a fork with e2e coverage and would be happy to upstream it once the design questions above are resolved.
Summary
header/body mutation).
Problem
Shadow-traffic evaluation is the standard way to qualify a new LLM backend before flipping production routing. Today, an operator can technically add an HTTPRequestMirrorFilter by editing the controller-generated HTTPRoute out-of-band,
but:
primary, not for itself.
Proposal
Add a mirrors field to AIGatewayRouteRule. Each entry wraps an AIGatewayRouteRuleBackendRef (so it inherits modelNameOverride, headerMutation, bodyMutation) plus an HTTPRequestMirrorFilter-style percent for sampling:
Semantics: responses from mirror backends are always discarded; only the primary backendRefs respond to the client. Each entry maps to one Gateway API HTTPRequestMirrorFilter.
What this enables
Open questions
httproute///rule/-mirror- with 1-based mirror indices. Is it acceptable to depend on that wire format, or should AI Gateway negotiate a stable contract / metadata-based marker with Envoy Gateway?
Related
We have a working PoC on a fork with e2e coverage and would be happy to upstream it once the design questions above are resolved.