fix: buffer gzip data for anthropic messages #1247

VarSuren · 2025-09-29T21:44:42Z

Description
Request with gzip were failing because of EOF ( trying to read in complete chunks )

Example:

Failing request prior to the change where gcp-claude.json any Claude request.

gcp-claude.json:

{
        "model": "gcp.claude-3-5-haiku",
        "messages": [
          {"role": "user", "content":"Give me long story about love of NYC"}
        ],
        "temperature": 0.1, "max_tokens":512, "stream" : true
}

curl -X POST
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json" -H "accept-encoding: br, gzip, deflate"
-d @gcp-claude.json
"http://localhost:8080/anthropic/v1/messages" -v --compressed




Current approach:

Add new function in utils for buffered streaming, call it once when stream is done, so no duplication of token usage happens during intermediate chunks. 

For incomplete chunks just pass it through but append to buffer for subsequent checks 

Anthropic sends usage in first and last chunk.


Test:

both compressed/non compressed request on both streaming/non-streaming

Signed-off-by: Suren Vartanian <[email protected]>

codecov-commenter · 2025-09-29T21:52:53Z

Codecov Report

❌ Patch coverage is 90.58824% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.21%. Comparing base (0d43423) to head (eb5d1fa).
⚠️ Report is 30 commits behind head on main.

Files with missing lines	Patch %	Lines
internal/extproc/messages_processor.go	82.60%	3 Missing and 1 partial ⚠️
...ernal/extproc/translator/anthropic_gcpanthropic.go	60.00%	4 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1247   +/-   ##
=======================================
  Coverage   79.21%   79.21%           
=======================================
  Files          97       97           
  Lines       11125    11178   +53     
=======================================
+ Hits         8813     8855   +42     
- Misses       1919     1930   +11     
  Partials      393      393

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

yuzisun · 2025-09-30T03:22:22Z

internal/extproc/util.go

+		// Try to decompress the accumulated buffer
+		if len(*gzipBuffer) > 0 {
+			gzipReader, err := gzip.NewReader(bytes.NewReader(*gzipBuffer))
+			if err != nil {


Do we want to check the specific error like "unexpected EOF" ?

not sure, I'm not really that good with gzip errors so not sure what are other error and whether we can buffer or cannot buffer them

internal/extproc/util.go

internal/extproc/messages_processor.go

Signed-off-by: Suren Vartanian <[email protected]>

yuzisun · 2025-09-30T17:21:20Z

@mathetake do you remember why we do not use the decompression and compression filter?

yuzisun · 2025-10-01T18:46:34Z

internal/extproc/translator/anthropic_gcpanthropic.go

 // ResponseBody implements [AnthropicMessagesTranslator.ResponseBody] for Anthropic to GCP Anthropic.
 // This is essentially a passthrough since both use the same Anthropic response format.
-func (a *anthropicToGCPAnthropicTranslator) ResponseBody(_ map[string]string, body io.Reader, endOfStream bool) (
+func (a *anthropicToGCPAnthropicTranslator) ResponseBody(_ map[string]string, body io.Reader, isStreaming bool) (


Why renaming this? endOfStream is different from isStreaming

good question, but I indeed mean isStreaming rather than endOfStream basically we have 2 cases:

translation for streaming

translation for regular request

This boolean basically tells which one to use.

Once you raised this question let me elaborate. Previously we would do translation while it's streaming, now because we accumulate we don't care about endOfStream. Let me know if that makes sense

Can you leave a comment so other maintainers are aware of the difference?

sukumargaonkar · 2025-10-01T19:59:09Z

lgtm

mathetake

Thanks. Mainly I am concerned about the diff of the impl with this and others in chat/completions. At the end of the day, the exact same logic can be applicable, so I would like the same logic appear here as well.

Especially, special-casing gzip and buffering in the processor side feels a bit confusing just to solve the underlying the issue inside translate/anthropic_gcpanthropic.go compared to the other chat/completion streaming stuff.

Let me push some fix to make this in line with others

edited: I am getting to understand the underlying problem more. I think I need more time to think about this one..

mathetake · 2025-10-01T23:45:17Z

Sorry a couple of things I need to know

Does this mean that Anthropic returns the entire giant event stream in a single gzipped response body containing potentially hundreds of events? How is that acceptable when it is supposed to return "streaming" response?
If the above is true, how the Anthropic SDK is supposed to parse the response body in a streaming way? I see the current code is waiting until the end of stream but if the anthropic SDK can perform the event parsing in a streaming way, then the current code looks not correct to me? What happens when we want to do the translation later into another input format (openai likely) hypothetically? shouldn't we be able to parse event-by-event instead of waiting for the end of the stream?

mathetake · 2025-10-02T00:10:09Z

had a quick offline chat with Dan and getting the context more. I think the problem here is bigger than anthropic specific and i think we should be able to handle this at the fundamental level. Let me dig into this one more tomorrow

mathetake · 2025-10-02T00:17:07Z

my comment in the chat with Dan

oh i think we can use the decompressor filter in envoy since now i realize that response handling is happening at the router level, not upstream level so we can use the decompress filter for sure

VarSuren · 2025-10-02T14:08:45Z

Let’s be clear having buffering and gzip handling in the current code doesn’t look right to me either, if compressions/decompressions work I’d go that way but we already have some gzip logic , I don’t want to mess up with other processors, additionally there is a time pressure to get this done , several teams are now blocked by this

VarSuren · 2025-10-02T14:12:54Z

Sorry a couple of things I need to know

Does this mean that Anthropic returns the entire giant event stream in a single gzipped response body containing potentially hundreds of events? How is that acceptable when it is supposed to return "streaming" response?

If the above is true, how the Anthropic SDK is supposed to parse the response body in a streaming way? I see the current code is waiting until the end of stream but if the anthropic SDK can perform the event parsing in a streaming way, then the current code looks not correct to me? What happens when we want to do the translation later into another input format (openai likely) hypothetically? shouldn't we be able to parse event-by-event instead of waiting for the end of the stream?

Anthropic doesn’t send all events in one response it’s done one be one, but you never know when you going to have complete chunk to decompress, therefore we buffer. You can skip waiting till the end of stream, but because I didn’t what to accidentally double parse usage in intermediate chunks and when chunks are over I do in the end of stream. The other way would be once chunks complete ( can be decompressed ) you can translate etc

yuzisun · 2025-10-02T14:24:40Z

Synced with @mathetake offline, will get this fix in first and then explore a proper long term fix using the decompression filter. @VarSuren can you help create an issue to track that?

VarSuren · 2025-10-02T14:34:46Z

Synced with @mathetake offline, will get this fix in first and then explore a proper long term fix using the decompression filter. @VarSuren can you help create an issue to track that?

Sure , do we want internal ticket or open source ?

yuzisun · 2025-10-02T15:35:40Z

Synced with @mathetake offline, will get this fix in first and then explore a proper long term fix using the decompression filter. @VarSuren can you help create an issue to track that?

Sure , do we want internal ticket or open source ?

issues here https://github.com/envoyproxy/ai-gateway/issues

buffer gzip data for anthropic messages

34385a4

Signed-off-by: Suren Vartanian <[email protected]>

VarSuren requested a review from a team as a code owner September 29, 2025 21:44

Merge branch 'main' into fix-gzip

e89196d

VarSuren changed the title ~~buffer gzip data for anthropic messages~~ /fix buffer gzip data for anthropic messages Sep 29, 2025

VarSuren changed the title ~~/fix buffer gzip data for anthropic messages~~ fix: buffer gzip data for anthropic messages Sep 29, 2025

yuzisun reviewed Sep 30, 2025

View reviewed changes

internal/extproc/util.go Outdated Show resolved Hide resolved

yuzisun reviewed Sep 30, 2025

View reviewed changes

internal/extproc/messages_processor.go Outdated Show resolved Hide resolved

VarSuren and others added 3 commits September 30, 2025 10:22

Merge branch 'main' into fix-gzip

7909e9e

address comments in PR

33a5728

Signed-off-by: Suren Vartanian <[email protected]>

Merge branch 'main' into fix-gzip

c163006

yuzisun added 2 commits September 30, 2025 20:35

Merge branch 'main' into fix-gzip

ba8fec2

Merge branch 'main' into fix-gzip

a6b6ecd

yuzisun reviewed Oct 1, 2025

View reviewed changes

mathetake reviewed Oct 1, 2025

View reviewed changes

Merge branch 'main' into fix-gzip

eb5d1fa

fix: buffer gzip data for anthropic messages #1247

Are you sure you want to change the base?

fix: buffer gzip data for anthropic messages #1247

Uh oh!

Conversation

VarSuren commented Sep 29, 2025

Uh oh!

codecov-commenter commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yuzisun Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

VarSuren Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yuzisun commented Sep 30, 2025

Uh oh!

yuzisun Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

VarSuren Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

yuzisun Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

sukumargaonkar commented Oct 1, 2025

Uh oh!

mathetake left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mathetake commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mathetake commented Oct 2, 2025

Uh oh!

mathetake commented Oct 2, 2025

Uh oh!

VarSuren commented Oct 2, 2025

Uh oh!

VarSuren commented Oct 2, 2025

Uh oh!

yuzisun commented Oct 2, 2025

Uh oh!

VarSuren commented Oct 2, 2025

Uh oh!

yuzisun commented Oct 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-commenter commented Sep 29, 2025 •

edited

Loading

mathetake left a comment •

edited

Loading

mathetake commented Oct 1, 2025 •

edited

Loading