Why is cache_creation_input_tokens not included in prompt_tokens? #14890

anthony-liner · 2025-09-25T12:14:59Z

anthony-liner
Sep 25, 2025

When using "cache_control": {"type": "ephemeral"} for Claude prompt caching as mentioned in the litellm docs, it seems that for cache write the cache_creation_input_tokens value is not being included in prompt_tokens in the response Usage object.

# 1) cache write
response = await router.acompletion(...)
print(response.usage)
# Usage(completion_tokens=4096, prompt_tokens=19, total_tokens=4115, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=7277, cache_read_input_tokens=0)

# 2) cache hit
response = await router.acompletion(...)
print(response.usage)
Usage(completion_tokens=4096, prompt_tokens=7296, total_tokens=11392, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=7277, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=7277)

Is this intended behavior? Currently this is leading to undercounting of input(prompt) tokens in downstream LLM tracing tools like langfuse.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Why is cache_creation_input_tokens not included in prompt_tokens? #14890

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Why is cache_creation_input_tokens not included in prompt_tokens? #14890

Uh oh!

anthony-liner Sep 25, 2025

Replies: 0 comments

anthony-liner
Sep 25, 2025