ext_proc: two FULL_DUPLEX_STREAMED filters in chain causes request body truncation

While validating envoyproxy/envoy#43175 for @nirrozenbaum to confirm the ext_proc chaining fix works with BBR + EPP on Istio 1.29.1 ([gateway-api-inference-extension#2115](https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/2115)), I found that the fix might not resolve the truncation bug when both ext_proc filters use FULL_DUPLEX_STREAMED mode — which is the default BBR + EPP configuration.

*Description:*

  PR #43175 fixed the two ext_proc filter chain bug (#41654) for the case where one filter processes body in `FULL_DUPLEX_STREAMED` and the other processes headers only. However, the fix does not resolve body
   truncation when **both** filters use `FULL_DUPLEX_STREAMED` mode. This is the default configuration for BBR + EPP in the [Gateway API Inference Extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/2115).

  What should happen: all responses should return the full ~130KB streaming body. What actually happens: ~80% of responses are truncated (partial body or empty), returning HTTP 200 with anywhere from 0 to ~94KB instead of ~130KB. The failure rate is the same with and without the fix.

  The root cause (general guessing here (if it isn't obvious to maintainers I'm happy to hack on a fix): PR #43175 updates `observed_decode_end_stream_` during `injectDecodedDataToFilterChain()`, but this only helps when Filter-0's inject fires before Filter-1's `commonContinue()`. When both filters use `FULL_DUPLEX_STREAMED`, the ordering is a race. When `commonContinue()` wins, it reads stale `observedEndStream() = true` (set when client data arrived) and sends partial data upstream with `end_of_stream=true`.

  The existing integration test (`TwoExtProcFiltersInResponseProcessing`) only covers the one-body-one-header configuration with controlled ordering. The two-`FULL_DUPLEX_STREAMED` configuration is not tested.

  *Repro steps:*

  Full deployment and reproduction: https://github.com/nerdalert/envoy-ext-proc-bug-repro

  Minimal reproduction:

  1. Deploy two ext_proc filters in chain, both with `request_body_mode: FULL_DUPLEX_STREAMED` and `response_body_mode: FULL_DUPLEX_STREAMED`
  2. Send a request with body > 6.2KB and `stream: true`
  3. Run 10 times — most responses will be truncated

  ```bash
  # Generate 8KB request body
  dd if=/dev/zero bs=1 count=8000 2>/dev/null | tr '\0' 'x' > /tmp/pad.txt
  printf '{"model":"meta-llama/Llama-3.1-8B-Instruct","messages":[{"role":"system","content":"' > /tmp/big.json
  cat /tmp/pad.txt >> /tmp/big.json
  printf '"},{"role":"user","content":"Hi"}],"max_tokens":500,"stream":true}' >> /tmp/big.json

  for i in $(seq 1 10); do
    curl -s -o /dev/null -w "size=%{size_download}\n" \
      -X POST "http://$GW_IP:$GW_PORT/v1/chat/completions" \
      -H "Content-Type: application/json" \
      -d @/tmp/big.json --max-time 30
  done

  Typical output (full response should be ~130KB):

  size=134921
  size=134961
  size=16470
  size=14551
  size=5436
  size=6196
  size=2452
  size=4073

  Results at scale (100 concurrent requests, 8KB body, streaming):

  ┌─────────────────────────┬───────────────────────────┬────────────────────────────────────┐
  │                         │ Envoy 1.36.3-dev (no fix) │ Envoy 1.37.1-dev (with fix #43175) │
  ├─────────────────────────┼───────────────────────────┼────────────────────────────────────┤
  │ Full responses (>100KB) │ 18/100                    │ 11/100                             │
  ├─────────────────────────┼───────────────────────────┼────────────────────────────────────┤
  │ Truncated (1-100KB)     │ 73/100                    │ 66/100                             │
  ├─────────────────────────┼───────────────────────────┼────────────────────────────────────┤
  │ Empty (0 bytes)         │ 9/100                     │ 23/100                             │
  └─────────────────────────┴───────────────────────────┴────────────────────────────────────┘

  Admin and Stats Output:

  N/A — the truncation produces no errors in stats. All requests return HTTP 200.

  Config:

  Both ext_proc filters configured via Istio (BBR via EnvoyFilter, EPP via inference extension per-route override):

  # Filter 0 (BBR) — inserted by EnvoyFilter
  processing_mode:
    request_header_mode: SEND
    response_header_mode: SEND
    request_body_mode: FULL_DUPLEX_STREAMED
    response_body_mode: FULL_DUPLEX_STREAMED
    request_trailer_mode: SEND
    response_trailer_mode: SEND

  # Filter 1 (EPP) — configured via per-route override
  processing_mode:
    request_header_mode: SEND
    response_header_mode: SEND
    request_body_mode: FULL_DUPLEX_STREAMED
    response_body_mode: FULL_DUPLEX_STREAMED
    request_trailer_mode: SEND
    response_trailer_mode: SEND
```

  Full config and deployment: https://github.com/nerdalert/envoy-ext-proc-bug-repro

  Logs:

  Envoy ext_proc debug logs for a truncated request (response was 9983 bytes instead of ~130KB):

```
  # BBR receives 8150 bytes from client in two parts:
  Sending a body chunk of 8150 bytes, end_stream false   (to BBR)
  Sending a body chunk of 0 bytes, end_stream true        (to BBR)

  # BBR processes and injects back — EPP receives it as one chunk:
  Sending a body chunk of 8150 bytes, end_stream true     (to EPP)

  # EPP processes, commonContinue() fires with stale observedEndStream()=true
  Finish external processing call. Next state: 0
  Continuing processing

  # Backend receives truncated request, returns short streaming response:
  Sending a body chunk of 9983 bytes, end_stream false    (response, filter 0)
  Sending a body chunk of 0 bytes, end_stream true        (response, filter 0)
```

  No warnings or errors are logged. The truncation is silent.

  Call Stack:

  N/A — no crash. The issue is a logic race in commonContinue() at filter_manager.cc:132:

  doData(observedEndStream() && !had_trailers_before_data);

  Environment:

  - Envoy 1.37.1-dev (commit fca9ef95, includes fix PR #43175)
  - Istio 1.29.1
  - BBR: bbr:main (k8s staging registry)
  - EPP: epp:v1.4.0-rc.2 (registry.k8s.io)
  - Kind cluster (v1.35.0)

  Related:

  - #41654 — original bug report
  - #43175 — fix (covers one-body-one-header case only)
  - gateway-api-inference-extension#2115

Tyvm!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ext_proc: two FULL_DUPLEX_STREAMED filters in chain causes request body truncation #43983

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ext_proc: two FULL_DUPLEX_STREAMED filters in chain causes request body truncation #43983

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions