Description
- [X ] This is actually a bug report.
- I am not getting good LLM Results
- I have tried asking for help in the community on discord or discussions and have not received a response.
- I have tried searching the documentation and have not found an answer.
What Model are you using?
- gpt-3.5-turbo
- gpt-4-turbo
- gpt-4
- [ X] Other (please specify)
sonnet3.5
Describe the bug
It looks like Anthropic prompt caching always results in a cache miss when changing between response models.
Examples from my logs
<cache_write> <cache_read> <input_tokens> <output_tokens>
Different response models:
9246 0 2229 961
0 9246 2281 763
0 9246 2248 851
0 9246 1414 772
-- Change response model, different prompt --
9046 0 2482 1235
-- Change response model, different prompt--
8274 0 1087 477
Same response model:
9295 0 2233 1152
0 9295 2285 1027
0 9295 2252 935
0 9295 1418 1008
-- Same response model, l, different prompt --
0 9083 2642 1652
-- Same response model, l, different prompt --
0 9083 1482 1131
To Reproduce
Steps to reproduce the behavior, including code snippets of the model and the input data and openai response.
- Make two requests with different ResponseModels that share the same first part of their prompts
messages = [
{
"role": "system",
"content": self.base_system_prompt.format(language=transcript_language)
},
{
"role": "user",
"content": [
{
"type": "text",
"text": f"""<transcript language="{transcript_language}"> {transcript_text} </transcript>""",
"cache_control": {"type": "ephemeral"},
},
....
- Do requests, notice that the cache always results in a miss.
Expected behavior
A clear and concise description of what you expected to happen.
I expect the first part of the messages to Anthropic not to change based on the Responsemodel