Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing response model results in Anthropic cache miss #1349

Open
6 tasks
ameade opened this issue Feb 15, 2025 · 0 comments
Open
6 tasks

Changing response model results in Anthropic cache miss #1349

ameade opened this issue Feb 15, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@ameade
Copy link
Contributor

ameade commented Feb 15, 2025

  • [X ] This is actually a bug report.
  • I am not getting good LLM Results
  • I have tried asking for help in the community on discord or discussions and have not received a response.
  • I have tried searching the documentation and have not found an answer.

What Model are you using?

  • gpt-3.5-turbo
  • gpt-4-turbo
  • gpt-4
  • [ X] Other (please specify)
    sonnet3.5

Describe the bug
It looks like Anthropic prompt caching always results in a cache miss when changing between response models.

Examples from my logs
<cache_write> <cache_read> <input_tokens> <output_tokens>
Different response models:
9246 0 2229 961
0 9246 2281 763
0 9246 2248 851
0 9246 1414 772
-- Change response model, different prompt --
9046 0 2482 1235
-- Change response model, different prompt--
8274 0 1087 477

Same response model:
9295 0 2233 1152
0 9295 2285 1027
0 9295 2252 935
0 9295 1418 1008
-- Same response model, l, different prompt --
0 9083 2642 1652
-- Same response model, l, different prompt --
0 9083 1482 1131

To Reproduce
Steps to reproduce the behavior, including code snippets of the model and the input data and openai response.

  1. Make two requests with different ResponseModels that share the same first part of their prompts
        messages = [
            {
                "role": "system",
                "content": self.base_system_prompt.format(language=transcript_language)
            },
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": f"""<transcript language="{transcript_language}"> {transcript_text} </transcript>""",
                        "cache_control": {"type": "ephemeral"},
                    },
....
  1. Do requests, notice that the cache always results in a miss.

Expected behavior
A clear and concise description of what you expected to happen.
I expect the first part of the messages to Anthropic not to change based on the Responsemodel

@github-actions github-actions bot added the bug Something isn't working label Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant