[AQUA] Adding handler for streaming inference predict endpoint #1190

kumar-shivam-ranjan · 2025-05-20T13:42:59Z

Description

This PR is intended to enhance the Model deployment inference experience within AQUA. With the introduction of new predict endpoint (/predictWithResponseStream) with streaming support , AQUA can now use this native API to provide streaming inference experience to users.

Sample payload

CURL

curl --location 'http://localhost:8888/aqua/inference/stream/<MD-OCID>' \
--header 'Content-Type: application/json' \
--data '{
    "max_tokens": 1024,
    "temperature": 0.5,
    "prompt": "what are some good skills deep learning expert. Give us some tips on how to structure interview with some coding example?",
    "top_p": 0.4,
    "top_k": 100,
    "model": "odsc-llm",
    "frequency_penalty": 1,
    "presence_penalty": 1,
    "stream": true
}'

Unit tests

github-actions · 2025-05-20T14:14:36Z

📌 Cov diff with main:

📌 Overall coverage:

github-actions · 2025-05-20T15:27:40Z

📌 Cov diff with main:

📌 Overall coverage:

mrDzurb · 2025-05-20T15:30:30Z

ads/aqua/extension/deployment_handler.py

-        except Exception:
-            return False
-
+class AquaDeploymentStreamingInferenceHandler(AquaAPIhandler):


I didn't know that we use it at all

It was added before we had kernel messaging solution for inference. Since then it has been lying orphaned in code.
Now we can use this handler to expose streaming inference API which our client (AQUA UI) can use for inference instead of keeping the script at UI.

Got it, let's use it but with AQUA Client.
I've added this PR to update the docs. I think In UI playground it would be useful to add a checkbox where users can choose if they want to use streaming or not. By default it can use streaming end-point.

mrDzurb · 2025-05-20T15:34:02Z

ads/aqua/modeldeployment/deployment.py

+        ]
+
+    @staticmethod


There is AQUA client, that can be utilized.
https://accelerated-data-science.readthedocs.io/en/latest/user_guide/large_language_model/aqua_client.html#usage

I see. yeah we can use aqua client instead of model deployment client for streaming inference support. will update. Thanks

…ub.com/oracle/accelerated-data-science into ODSC-72334/streaming-inference-handler

github-actions · 2025-05-21T15:05:36Z

📌 Cov diff with main:

📌 Overall coverage:

github-actions · 2025-05-21T16:21:51Z

📌 Cov diff with main:

📌 Overall coverage:

mrDzurb · 2025-05-21T23:12:02Z

ads/aqua/modeldeployment/deployment.py

+
+    @telemetry(entry_point="plugin=inference&action=get_response", name="aqua")
+    def get_model_deployment_response(


NIT: I think we probably don't need it on the API level, having this logic in handler would be sufficient.

I might not have properly understood your statement ..
but we have business logic based on predict endpoint type like text/chat completions and even endpoint override feature in this method.

Do you mean adding this logic inside deployment_handler.py ?

Yes, I mean do we really need this logic to be placed in the deployment.py? Will it be used outside of the UI? If not, then probably would be better to move this logic to the handlers.

Updated. Thanks

VipulMascarenhas · 2025-05-22T22:41:38Z

ads/aqua/extension/deployment_handler.py

        prompt = input_data.get("prompt")
-        if not prompt:
-            raise HTTPError(400, Errors.MISSING_REQUIRED_PARAMETER.format("prompt"))
+        messages = input_data.get("messages")


can we accept chat_template as well since we want to support it?

yes it will be accepted implicitly. Chat template will be optional but still still user can provide it and the code will accept and pass it to aqua client.

github-actions · 2025-05-26T20:23:17Z

📌 Cov diff with main:

📌 Overall coverage:

github-actions · 2025-05-26T20:56:02Z

📌 Cov diff with main:

📌 Overall coverage:

github-actions · 2025-05-26T21:25:54Z

📌 Cov diff with main:

📌 Overall coverage:

Adding AQUA handler for streaming inference API

177f888

kumar-shivam-ranjan requested review from darenr, mayoor, mrDzurb, VipulMascarenhas, qiuosier and ahosler as code owners May 20, 2025 13:43

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 20, 2025

Updating payload format

d1d942d

kumar-shivam-ranjan changed the title ~~[WIP] [AQUA] Adding handler for streaming inference predict endpoint~~ [AQUA] Adding handler for streaming inference predict endpoint May 20, 2025

Merge branch 'main' into ODSC-72334/streaming-inference-handler

8b1ee7d

mrDzurb reviewed May 20, 2025

View reviewed changes

kumar-shivam-ranjan added 2 commits May 21, 2025 20:07

Adding aqua client

e68c215

Merge branch 'ODSC-72334/streaming-inference-handler' of https://gith…

2e6195d

…ub.com/oracle/accelerated-data-science into ODSC-72334/streaming-inference-handler

kumar-shivam-ranjan added 5 commits May 21, 2025 20:44

Adding endpoint override feature

c7b7a42

Fixing UTs

0a867c7

Removing MD client

83e34f6

Removing unused code

db2cc99

Removing MD client

600cb3e

kumar-shivam-ranjan requested a review from mrDzurb May 21, 2025 16:01

mrDzurb previously approved these changes May 21, 2025

View reviewed changes

VipulMascarenhas previously approved these changes May 22, 2025

View reviewed changes

Addressing review comments

934ddbe

kumar-shivam-ranjan dismissed stale reviews from VipulMascarenhas and mrDzurb via 934ddbe May 26, 2025 19:53

kumar-shivam-ranjan requested a review from VipulMascarenhas May 26, 2025 19:55

kumar-shivam-ranjan requested a review from mrDzurb May 26, 2025 19:55

Merge branch 'main' into ODSC-72334/streaming-inference-handler

3f0bbfa

kumar-shivam-ranjan self-assigned this May 27, 2025

mrDzurb approved these changes May 27, 2025

View reviewed changes

VipulMascarenhas approved these changes May 28, 2025

View reviewed changes

kumar-shivam-ranjan merged commit 33c9966 into main May 28, 2025
23 of 32 checks passed


		@telemetry(entry_point="plugin=inference&action=get_response", name="aqua")
		def get_model_deployment_response(

[AQUA] Adding handler for streaming inference predict endpoint #1190

[AQUA] Adding handler for streaming inference predict endpoint #1190

Uh oh!

Conversation

kumar-shivam-ranjan commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Sample payload

CURL

Unit tests

Uh oh!

github-actions bot commented May 20, 2025

Uh oh!

github-actions bot commented May 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 21, 2025

Uh oh!

github-actions bot commented May 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 26, 2025

Uh oh!

github-actions bot commented May 26, 2025

Uh oh!

github-actions bot commented May 26, 2025

Uh oh!

Uh oh!

Uh oh!

kumar-shivam-ranjan commented May 20, 2025 •

edited

Loading