Skip to content

ext_proc: two FULL_DUPLEX_STREAMED filters in chain causes request body truncation #43983

@nerdalert

Description

@nerdalert

While validating #43175 for @nirrozenbaum to confirm the ext_proc chaining fix works with BBR + EPP on Istio 1.29.1 (gateway-api-inference-extension#2115), I found that the fix might not resolve the truncation bug when both ext_proc filters use FULL_DUPLEX_STREAMED mode — which is the default BBR + EPP configuration.

Description:

PR #43175 fixed the two ext_proc filter chain bug (#41654) for the case where one filter processes body in FULL_DUPLEX_STREAMED and the other processes headers only. However, the fix does not resolve body
truncation when both filters use FULL_DUPLEX_STREAMED mode. This is the default configuration for BBR + EPP in the Gateway API Inference Extension.

What should happen: all responses should return the full ~130KB streaming body. What actually happens: ~80% of responses are truncated (partial body or empty), returning HTTP 200 with anywhere from 0 to ~94KB instead of ~130KB. The failure rate is the same with and without the fix.

The root cause (general guessing here (if it isn't obvious to maintainers I'm happy to hack on a fix): PR #43175 updates observed_decode_end_stream_ during injectDecodedDataToFilterChain(), but this only helps when Filter-0's inject fires before Filter-1's commonContinue(). When both filters use FULL_DUPLEX_STREAMED, the ordering is a race. When commonContinue() wins, it reads stale observedEndStream() = true (set when client data arrived) and sends partial data upstream with end_of_stream=true.

The existing integration test (TwoExtProcFiltersInResponseProcessing) only covers the one-body-one-header configuration with controlled ordering. The two-FULL_DUPLEX_STREAMED configuration is not tested.

Repro steps:

Full deployment and reproduction: https://github.com/nerdalert/envoy-ext-proc-bug-repro

Minimal reproduction:

  1. Deploy two ext_proc filters in chain, both with request_body_mode: FULL_DUPLEX_STREAMED and response_body_mode: FULL_DUPLEX_STREAMED
  2. Send a request with body > 6.2KB and stream: true
  3. Run 10 times — most responses will be truncated
# Generate 8KB request body
dd if=/dev/zero bs=1 count=8000 2>/dev/null | tr '\0' 'x' > /tmp/pad.txt
printf '{"model":"meta-llama/Llama-3.1-8B-Instruct","messages":[{"role":"system","content":"' > /tmp/big.json
cat /tmp/pad.txt >> /tmp/big.json
printf '"},{"role":"user","content":"Hi"}],"max_tokens":500,"stream":true}' >> /tmp/big.json

for i in $(seq 1 10); do
  curl -s -o /dev/null -w "size=%{size_download}\n" \
    -X POST "http://$GW_IP:$GW_PORT/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d @/tmp/big.json --max-time 30
done

Typical output (full response should be ~130KB):

size=134921
size=134961
size=16470
size=14551
size=5436
size=6196
size=2452
size=4073

Results at scale (100 concurrent requests, 8KB body, streaming):

┌─────────────────────────┬───────────────────────────┬────────────────────────────────────┐
│                         │ Envoy 1.36.3-dev (no fix) │ Envoy 1.37.1-dev (with fix #43175) │
├─────────────────────────┼───────────────────────────┼────────────────────────────────────┤
│ Full responses (>100KB) │ 18/100                    │ 11/100                             │
├─────────────────────────┼───────────────────────────┼────────────────────────────────────┤
│ Truncated (1-100KB)     │ 73/100                    │ 66/100                             │
├─────────────────────────┼───────────────────────────┼────────────────────────────────────┤
│ Empty (0 bytes)         │ 9/100                     │ 23/100                             │
└─────────────────────────┴───────────────────────────┴────────────────────────────────────┘

Admin and Stats Output:

N/A — the truncation produces no errors in stats. All requests return HTTP 200.

Config:

Both ext_proc filters configured via Istio (BBR via EnvoyFilter, EPP via inference extension per-route override):

# Filter 0 (BBR) — inserted by EnvoyFilter
processing_mode:
  request_header_mode: SEND
  response_header_mode: SEND
  request_body_mode: FULL_DUPLEX_STREAMED
  response_body_mode: FULL_DUPLEX_STREAMED
  request_trailer_mode: SEND
  response_trailer_mode: SEND

# Filter 1 (EPP) — configured via per-route override
processing_mode:
  request_header_mode: SEND
  response_header_mode: SEND
  request_body_mode: FULL_DUPLEX_STREAMED
  response_body_mode: FULL_DUPLEX_STREAMED
  request_trailer_mode: SEND
  response_trailer_mode: SEND

Full config and deployment: https://github.com/nerdalert/envoy-ext-proc-bug-repro

Logs:

Envoy ext_proc debug logs for a truncated request (response was 9983 bytes instead of ~130KB):

  # BBR receives 8150 bytes from client in two parts:
  Sending a body chunk of 8150 bytes, end_stream false   (to BBR)
  Sending a body chunk of 0 bytes, end_stream true        (to BBR)

  # BBR processes and injects back — EPP receives it as one chunk:
  Sending a body chunk of 8150 bytes, end_stream true     (to EPP)

  # EPP processes, commonContinue() fires with stale observedEndStream()=true
  Finish external processing call. Next state: 0
  Continuing processing

  # Backend receives truncated request, returns short streaming response:
  Sending a body chunk of 9983 bytes, end_stream false    (response, filter 0)
  Sending a body chunk of 0 bytes, end_stream true        (response, filter 0)

No warnings or errors are logged. The truncation is silent.

Call Stack:

N/A — no crash. The issue is a logic race in commonContinue() at filter_manager.cc:132:

doData(observedEndStream() && !had_trailers_before_data);

Environment:

Related:

Tyvm!

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions