Skip to content

[WIP][AQUA] Add Supporting Fine-Tuned Models in Multi-Model Deployment #1186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mrDzurb
Copy link
Member

@mrDzurb mrDzurb commented May 16, 2025

Description

The current implementation of Multi-Model Deployment in AQUA supports base models only. Fine-tuned models, however, are a critical part of many customer workflows - allowing them to adapt base models to domain-specific use cases.
This PR introduces support for deploying fine-tuned LLM models as part of a multi-model deployment group on the VLLM container.

Implementation

In the first iteration, we will treat each selected model, whether it's a base model or a fine-tuned variant—as an independent entity. Even if multiple fine-tuned models share the same base model, each one will be deployed in its own isolated VLLM instance.

On the SMC side, we will leverage VLLM's capability to dynamically merge LoRA adapter weights during runtime. This means each VLLM instance will load the base model and its corresponding fine-tuned weights independently.

To avoid routing conflicts caused by multiple instances using the same base model name, we will route the base model name to one instance only, but we will not advertise this base model as an endpoint to users (This is current behavior with Single Model Deployment).

This configuration structure will prepare us for future enhancements, such as stacked fine-tuned deployments, where multiple fine-tuned variants are hosted under a single base model within one VLLM instance. However, this future enhancement will apply to single-model deployments initially.

In a second iteration, we will explore expanding this capability to multi-model deployments, enabling grouped deployment of fine-tuned variants with shared GPU allocation. That enhancement will require additional work across the ADS SDK, AQUA UI, and validation logic.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 16, 2025
@mrDzurb mrDzurb requested review from elizjo and dipatidar May 16, 2025 21:40
Copy link

📌 Cov diff with main:

Coverage-24%

📌 Overall coverage:

Coverage-58.63%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant