-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
While validating #43175 for @nirrozenbaum to confirm the ext_proc chaining fix works with BBR + EPP on Istio 1.29.1 (gateway-api-inference-extension#2115), I found that the fix might not resolve the truncation bug when both ext_proc filters use FULL_DUPLEX_STREAMED mode — which is the default BBR + EPP configuration.
Description:
PR #43175 fixed the two ext_proc filter chain bug (#41654) for the case where one filter processes body in FULL_DUPLEX_STREAMED and the other processes headers only. However, the fix does not resolve body
truncation when both filters use FULL_DUPLEX_STREAMED mode. This is the default configuration for BBR + EPP in the Gateway API Inference Extension.
What should happen: all responses should return the full ~130KB streaming body. What actually happens: ~80% of responses are truncated (partial body or empty), returning HTTP 200 with anywhere from 0 to ~94KB instead of ~130KB. The failure rate is the same with and without the fix.
The root cause (general guessing here (if it isn't obvious to maintainers I'm happy to hack on a fix): PR #43175 updates observed_decode_end_stream_ during injectDecodedDataToFilterChain(), but this only helps when Filter-0's inject fires before Filter-1's commonContinue(). When both filters use FULL_DUPLEX_STREAMED, the ordering is a race. When commonContinue() wins, it reads stale observedEndStream() = true (set when client data arrived) and sends partial data upstream with end_of_stream=true.
The existing integration test (TwoExtProcFiltersInResponseProcessing) only covers the one-body-one-header configuration with controlled ordering. The two-FULL_DUPLEX_STREAMED configuration is not tested.
Repro steps:
Full deployment and reproduction: https://github.com/nerdalert/envoy-ext-proc-bug-repro
Minimal reproduction:
- Deploy two ext_proc filters in chain, both with
request_body_mode: FULL_DUPLEX_STREAMEDandresponse_body_mode: FULL_DUPLEX_STREAMED - Send a request with body > 6.2KB and
stream: true - Run 10 times — most responses will be truncated
# Generate 8KB request body
dd if=/dev/zero bs=1 count=8000 2>/dev/null | tr '\0' 'x' > /tmp/pad.txt
printf '{"model":"meta-llama/Llama-3.1-8B-Instruct","messages":[{"role":"system","content":"' > /tmp/big.json
cat /tmp/pad.txt >> /tmp/big.json
printf '"},{"role":"user","content":"Hi"}],"max_tokens":500,"stream":true}' >> /tmp/big.json
for i in $(seq 1 10); do
curl -s -o /dev/null -w "size=%{size_download}\n" \
-X POST "http://$GW_IP:$GW_PORT/v1/chat/completions" \
-H "Content-Type: application/json" \
-d @/tmp/big.json --max-time 30
done
Typical output (full response should be ~130KB):
size=134921
size=134961
size=16470
size=14551
size=5436
size=6196
size=2452
size=4073
Results at scale (100 concurrent requests, 8KB body, streaming):
┌─────────────────────────┬───────────────────────────┬────────────────────────────────────┐
│ │ Envoy 1.36.3-dev (no fix) │ Envoy 1.37.1-dev (with fix #43175) │
├─────────────────────────┼───────────────────────────┼────────────────────────────────────┤
│ Full responses (>100KB) │ 18/100 │ 11/100 │
├─────────────────────────┼───────────────────────────┼────────────────────────────────────┤
│ Truncated (1-100KB) │ 73/100 │ 66/100 │
├─────────────────────────┼───────────────────────────┼────────────────────────────────────┤
│ Empty (0 bytes) │ 9/100 │ 23/100 │
└─────────────────────────┴───────────────────────────┴────────────────────────────────────┘
Admin and Stats Output:
N/A — the truncation produces no errors in stats. All requests return HTTP 200.
Config:
Both ext_proc filters configured via Istio (BBR via EnvoyFilter, EPP via inference extension per-route override):
# Filter 0 (BBR) — inserted by EnvoyFilter
processing_mode:
request_header_mode: SEND
response_header_mode: SEND
request_body_mode: FULL_DUPLEX_STREAMED
response_body_mode: FULL_DUPLEX_STREAMED
request_trailer_mode: SEND
response_trailer_mode: SEND
# Filter 1 (EPP) — configured via per-route override
processing_mode:
request_header_mode: SEND
response_header_mode: SEND
request_body_mode: FULL_DUPLEX_STREAMED
response_body_mode: FULL_DUPLEX_STREAMED
request_trailer_mode: SEND
response_trailer_mode: SENDFull config and deployment: https://github.com/nerdalert/envoy-ext-proc-bug-repro
Logs:
Envoy ext_proc debug logs for a truncated request (response was 9983 bytes instead of ~130KB):
# BBR receives 8150 bytes from client in two parts:
Sending a body chunk of 8150 bytes, end_stream false (to BBR)
Sending a body chunk of 0 bytes, end_stream true (to BBR)
# BBR processes and injects back — EPP receives it as one chunk:
Sending a body chunk of 8150 bytes, end_stream true (to EPP)
# EPP processes, commonContinue() fires with stale observedEndStream()=true
Finish external processing call. Next state: 0
Continuing processing
# Backend receives truncated request, returns short streaming response:
Sending a body chunk of 9983 bytes, end_stream false (response, filter 0)
Sending a body chunk of 0 bytes, end_stream true (response, filter 0)
No warnings or errors are logged. The truncation is silent.
Call Stack:
N/A — no crash. The issue is a logic race in commonContinue() at filter_manager.cc:132:
doData(observedEndStream() && !had_trailers_before_data);
Environment:
- Envoy 1.37.1-dev (commit fca9ef95, includes fix PR Support two ext_proc filters in the chain 2nd attempt #43175)
- Istio 1.29.1
- BBR: bbr:main (k8s staging registry)
- EPP: epp:v1.4.0-rc.2 (registry.k8s.io)
- Kind cluster (v1.35.0)
Related:
- ext_proc: FULL DUPLEX STREAMED response body truncated when multiple filters defined #41654 — original bug report
- Support two ext_proc filters in the chain 2nd attempt #43175 — fix (covers one-body-one-header case only)
- gateway-api-inference-extension#2115
Tyvm!