Skip to content

[AQUA] Adding handler for streaming inference predict endpoint #1190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 28, 2025

Conversation

kumar-shivam-ranjan
Copy link
Member

@kumar-shivam-ranjan kumar-shivam-ranjan commented May 20, 2025

Description

This PR is intended to enhance the Model deployment inference experience within AQUA. With the introduction of new predict endpoint (/predictWithResponseStream) with streaming support , AQUA can now use this native API to provide streaming inference experience to users.

Sample payload

Screenshot 2025-05-20 at 8 29 19 PM

CURL

curl --location 'http://localhost:8888/aqua/inference/stream/<MD-OCID>' \
--header 'Content-Type: application/json' \
--data '{
    "max_tokens": 1024,
    "temperature": 0.5,
    "prompt": "what are some good skills deep learning expert. Give us some tips on how to structure interview with some coding example?",
    "top_p": 0.4,
    "top_k": 100,
    "model": "odsc-llm",
    "frequency_penalty": 1,
    "presence_penalty": 1,
    "stream": true
}'

Unit tests

Screenshot 2025-05-21 at 9 12 11 PM

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 20, 2025
Copy link

📌 Cov diff with main:

Coverage-8%

📌 Overall coverage:

Coverage-19.13%

@kumar-shivam-ranjan kumar-shivam-ranjan changed the title [WIP] [AQUA] Adding handler for streaming inference predict endpoint [AQUA] Adding handler for streaming inference predict endpoint May 20, 2025
Copy link

📌 Cov diff with main:

Coverage-5%

📌 Overall coverage:

Coverage-19.13%

except Exception:
return False

class AquaDeploymentStreamingInferenceHandler(AquaAPIhandler):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that we use it at all

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added before we had kernel messaging solution for inference. Since then it has been lying orphaned in code.
Now we can use this handler to expose streaming inference API which our client (AQUA UI) can use for inference instead of keeping the script at UI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, let's use it but with AQUA Client.
I've added this PR to update the docs. I think In UI playground it would be useful to add a checkbox where users can choose if they want to use streaming or not. By default it can use streaming end-point.

]

@staticmethod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. yeah we can use aqua client instead of model deployment client for streaming inference support. will update. Thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link

📌 Cov diff with main:

Coverage-4%

📌 Overall coverage:

Coverage-19.12%

Copy link

📌 Cov diff with main:

Coverage-56%

📌 Overall coverage:

Coverage-58.62%

mrDzurb
mrDzurb previously approved these changes May 21, 2025

@telemetry(entry_point="plugin=inference&action=get_response", name="aqua")
def get_model_deployment_response(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I think we probably don't need it on the API level, having this logic in handler would be sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might not have properly understood your statement ..
but we have business logic based on predict endpoint type like text/chat completions and even endpoint override feature in this method.

Do you mean adding this logic inside deployment_handler.py ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mean do we really need this logic to be placed in the deployment.py? Will it be used outside of the UI? If not, then probably would be better to move this logic to the handlers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thanks

prompt = input_data.get("prompt")
if not prompt:
raise HTTPError(400, Errors.MISSING_REQUIRED_PARAMETER.format("prompt"))
messages = input_data.get("messages")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we accept chat_template as well since we want to support it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it will be accepted implicitly. Chat template will be optional but still still user can provide it and the code will accept and pass it to aqua client.

Copy link

📌 Cov diff with main:

Coverage-43%

📌 Overall coverage:

Coverage-56.85%

Copy link

📌 Cov diff with main:

Coverage-43%

📌 Overall coverage:

Coverage-56.85%

1 similar comment
Copy link

📌 Cov diff with main:

Coverage-43%

📌 Overall coverage:

Coverage-56.85%

@kumar-shivam-ranjan kumar-shivam-ranjan self-assigned this May 27, 2025
@kumar-shivam-ranjan kumar-shivam-ranjan merged commit 33c9966 into main May 28, 2025
23 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants