-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[ML] Append all data to Chat Completion buffer #127658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Append all data to Chat Completion buffer #127658
Conversation
Moved the Chat Completion buffer into the StreamingUnifiedChatCompletionResults so that all Chat Completion responses can benefit from it. Chat Completions is meant to adhere to OpenAI as much as possible, and OpenAI only sends one response chunk at a time. All implementations of Chat Completions will now buffer. This fixes a bug where more than two chunks in a single item would be dropped, instead they are all added to the buffer. This fixes a bug where onComplete would omit trailing items in the buffer.
Hi @prwhelan, I've created a changelog YAML for you. |
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, left some questions.
subscription.request(n); | ||
} | ||
} else { | ||
downstream.onNext(new Results(DequeUtils.of(buffer.poll()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there only 1 thread accessing the buffer
? Or is there a chance that we could check for isEmpty()
and then some other thread picks up the item before this thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only 1 thread calling request
, but we can be safe and change to bufer.poll()
and check if it's null
@Override | ||
public void onComplete() {} | ||
}); | ||
assertThat(counter.get(), equalTo(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand, does this test that we only get a 1 result even if we have multiple in a single item?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes because we'll only call onNext
once per chunk, so if we send a chunk of 2 elements then counter
will equal 1. Let me change to mockito spies so that's easier to read (I think)
💔 Backport failed
You can use sqren/backport to manually backport by running |
Moved the Chat Completion buffer into the
StreamingUnifiedChatCompletionResults so that all Chat Completion responses can benefit from it. Chat Completions is meant to adhere to OpenAI as much as possible, and OpenAI only sends one response chunk at a time. All implementations of Chat Completions will now buffer.
This fixes a bug where more than two chunks in a single item would be dropped, instead they are all added to the buffer.
This fixes a bug where onComplete would omit trailing items in the buffer.