-
Notifications
You must be signed in to change notification settings - Fork 48
[AQUA] Adding handler for streaming inference predict endpoint #1190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AQUA] Adding handler for streaming inference predict endpoint #1190
Conversation
except Exception: | ||
return False | ||
|
||
class AquaDeploymentStreamingInferenceHandler(AquaAPIhandler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that we use it at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was added before we had kernel messaging solution for inference. Since then it has been lying orphaned in code.
Now we can use this handler to expose streaming inference API which our client (AQUA UI) can use for inference instead of keeping the script at UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, let's use it but with AQUA Client.
I've added this PR to update the docs. I think In UI playground it would be useful to add a checkbox where users can choose if they want to use streaming or not. By default it can use streaming end-point.
] | ||
|
||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is AQUA client, that can be utilized.
https://accelerated-data-science.readthedocs.io/en/latest/user_guide/large_language_model/aqua_client.html#usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. yeah we can use aqua client instead of model deployment client for streaming inference support. will update. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
…ub.com/oracle/accelerated-data-science into ODSC-72334/streaming-inference-handler
|
||
@telemetry(entry_point="plugin=inference&action=get_response", name="aqua") | ||
def get_model_deployment_response( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: I think we probably don't need it on the API level, having this logic in handler would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might not have properly understood your statement ..
but we have business logic based on predict endpoint type like text/chat completions and even endpoint override feature in this method.
Do you mean adding this logic inside deployment_handler.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I mean do we really need this logic to be placed in the deployment.py? Will it be used outside of the UI? If not, then probably would be better to move this logic to the handlers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks
prompt = input_data.get("prompt") | ||
if not prompt: | ||
raise HTTPError(400, Errors.MISSING_REQUIRED_PARAMETER.format("prompt")) | ||
messages = input_data.get("messages") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we accept chat_template as well since we want to support it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it will be accepted implicitly. Chat template will be optional but still still user can provide it and the code will accept and pass it to aqua client.
934ddbe
Description
This PR is intended to enhance the Model deployment inference experience within AQUA. With the introduction of new predict endpoint (
/predictWithResponseStream
) with streaming support , AQUA can now use this native API to provide streaming inference experience to users.Sample payload
CURL
Unit tests