-
Notifications
You must be signed in to change notification settings - Fork 17.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect token count (usage_metadata) in streaming mode #30429
Comments
Hello @andrePankraz , model = ChatOpenAI()
prompt = "write the recipe of tiramisu"
response = model.stream(prompt, stream_usage=True)
for s in response:
print(s) I get the expected usage_metadata at the end of the stream.
Can you please share your code ? |
I’m facing the same issue. In my case, I don’t get
Note I have enabled Code const stream = await this.graph?.stream(graphInput, {
configurable: this.config?.configurable,
streamMode: "messages",
});
for await (const [msg] of stream!) {
// Write message to JSON file
const logPath = `./logs/stream_${this.config?.configurable.thread_id}.json`;
const logData = {
timestamp: new Date().toISOString(),
message: msg,
};
fs.mkdirSync("./logs", { recursive: true });
let existingData = [];
if (fs.existsSync(logPath)) {
existingData = JSON.parse(fs.readFileSync(logPath, "utf8"));
}
existingData.push(logData);
fs.writeFileSync(logPath, JSON.stringify(existingData, null, 2));
} Logs for
|
@Yogesh-Dubey-Ayesavi could you open a separate issue in https://github.com/langchain-ai/langchainjs? |
Yes @ccurme Sure. |
Hi, thank you for locking into this. It only happens in astream-mode with callback handler - what many chatbots are using for step tracking - it's embedded into bigger agent graph structures. The final usage_metadata is wrong:
The problematic parts are called like described in original error description. But it only appears in final on_llm_end callback (with agregated generation-message), else astream() delivers the original usage_metadata per token (yield chunk.message). |
Hello @andrePankraz
Note that to have this usage metadata I had to pass the Maybe it is due to the llm then, what model are you using ? |
I have passed But I see the problem with your log, here is mine:
As you can see, my OpenAI compatible (!) API gives usage_metadata per token (!) with I use VLLM Inference Server with OpenAI API. At the end, the usage_metadata aggregation code in
doesn't make any sense at all, you can not aggregate input_tokens and total_tokens like this in a stream. It kind of works for output_tokens, but even than the final message also doubles the expected output_tokens number with this addition. It just works for OpenAI original API, because they just forward usage_metadata once in "final output token" (more like a synthetic technical final message, not a real token, stop token comes before in stream). |
Ahh ok I see. Yes you are right this |
Checked other resources
Example Code
Any LLM-call with streaming.
The aggregated token usage is totally wrong and much to high.
See this method:
langchain/libs/core/langchain_core/messages/ai.py
Line 406 in b75573e
For streaming we get usage_metdata for each token, e.g.
'input_tokens' = 713
'output_tokens' = 1
'total_tokens' = 714
output_tokens is always 1 and adds up nicely.
input_tokens is always 713 for llm-token-stream and adds up to "input_tokens" * "count(tokens)" (same total_tokens with 714)
This just adds up tokens to huge (totally useless) numbers.
What is the strategy here? Should the llm not report per-token usage metdata and only report this in final chunk? Then Langchain-openai has to change this for that call:
langchain/libs/partners/openai/langchain_openai/chat_models/base.py
Line 2805 in b75573e
Error Message and Stack Trace (if applicable)
No response
Description
System Info
totally not relevant
The text was updated successfully, but these errors were encountered: