orm-stack-oke-helm-deployment-vllm

Getting started

This stack deploys an OKE cluster with two nodepools:

one nodepool with flexible shapes
one nodepool with GPU shapes

And several supporting applications using helm:

nginx
cert-manager
qdrant vector DB
jupyterhub

With the scope of deploying vLLM for LLM inferecing in OKE.

Note: For helm deployments it's necessary to create bastion and operator host (with the associated policy for the operator to manage the clsuter), or configure a cluster with public API endpoint.

In case the bastion and operator hosts are not created, is a prerequisite to have the following tools already installed and configured:

bash
helm
jq
kubectl
oci-cli

Helm Deployments

Nginx

Nginx is deployed and configured as default ingress controller.

Cert-manager

Cert-manager is deployed to handle the configuration of TLS certificate for the configured ingress resources. Currently it's using the staging Let's Encrypt endpoint.

Jupyterhub

Jupyterhub will be accessible to the address: https://jupyter.a.b.c.d.nip.io, where a.b.c.d is the public IP address of the load balancer associated with the NGINX ingress controller.

JupyterHub is using a dummy authentication scheme (user/password) and the access is secured using the variables:

jupyter_admin_user
jupyter_admin_password

It also supports the option to automatically clone a git repo when user is connecting and making it available under examples directory.

If you are looking to integrate JupyterHub with an Identity Provider, please take a look at the options available here: https://oauthenticator.readthedocs.io/en/latest/tutorials/provider-specific-setup/index.html

For integration with your OCI tenancy IDCS domain, you may go through the following steps:

Setup a new Application in IDCS

Navigate to the following address: https://cloud.oracle.com/identity/domains/
Click on the OracleIdentityCloudService domain
Navigate to Integrated applications from the left-side menu
Click Add application
Select Confidential Application and click Launch worflow

Application configuration

Under Add application details configure

name: Jupyterhub

(all the other fields are optional, you may leave them empty)
Under Configure OAuth

Resource server configuration -> Skip for later

Client configuration -> Configure this application as a client now

Authorization:
- Check the Authorization code check-box
- Leave the other check-boxes unchecked
Redirect URL:

https://<jupyterhub-domain>/hub/oauth_callback
Under Configure policy

Web tier policy -> Skip for later
Click Finish
Scroll down wehere you fill find the General Information section.
Copy the Client ID and Client secret:
Click Activate button at the top.

Connect to the OKE cluster and update the JupyterHub Helm deployment values.

Create a file named oauth2-values.yaml with the following content (make sure to fill-in the values relevant for your setup)

hub:
config:
  Authenticator:
    allow_all: true
  GenericOAuthenticator:
    client_id: <client-id>
    client_secret: <client-secret>

    authorize_url:  <idcs-stripe-url>/oauth2/v1/authorize
    token_url:  <idcs-stripe-url>/oauth2/v1/token
    userdata_url:  <idcs-stripe-url>/oauth2/v1/userinfo

    scope:
    - openid
    - email
    username_claim: "email"
  JupyterHub:
    authenticator_class: generic-oauth

Note: IDCS stripe URL can be fetched from the OracleIdentityCloudService IDCS Domain Overview -> Domain Information -> Domain URL.

Should be something like this: https://idcs-18bb6a27b33d416fb083d27a9bcede3b.identity.oraclecloud.com

Execute the following command to update the JupyterHub Helm deployment:

helm upgrade jupyterhub jupyterhub --repo https://hub.jupyter.org/helm-chart/ --reuse-values -f oauth2-values.yaml

vLLM

The LLM is fetched from HuggingFace and deployed using vLLM.

Parameters:

HF_TOKEN - required to pull the model from HuggingFace.
model - the name of the LLM you intend to pull from HuggingFace. Make sure to accept the license for the model you intend to pull.
max_model_len - override the default maximum context length. It may be required on shapes with not enough GPU memory available.
LLM_API_KEY - used to secure the endpoint exposed by vLLM for inferencing.

How to deploy?

Deploy via ORM

Create a new stack
Upload the TF configuration files
Configure the variables
Apply

Local deployment

Create a file called terraform.auto.tfvars with the required values.

# ORM injected values

region            = "us-ashburn-1"
tenancy_ocid      = "ocid1.tenancy.oc1..aaaaaaaaiyavtwbz4kyu7g7b6wglllccbflmjx2lzk5nwpbme44mv54xu7dq"
compartment_ocid  = "ocid1.compartment.oc1..aaaaaaaaqi3if6t4n24qyabx5pjzlw6xovcbgugcmatavjvapyq3jfb4diqq"

# OKE Terraform module values
create_iam_resources     = false
create_iam_tag_namespace = false
ssh_public_key           = "<ssh_public_key>"

## NodePool with non-GPU shape is created by default with size 1
simple_np_flex_shape   = { "instanceShape" = "VM.Standard.E4.Flex", "ocpus" = 2, "memory" = 16 }

## NodePool with GPU shape is created by default with size 0
gpu_np_size  = 1
gpu_np_shape = "VM.GPU.A10.1"

## OKE Deployment values
cluster_name           = "oke"
vcn_name               = "oke-vcn"
compartment_id         = "ocid1.compartment.oc1..aaaaaaaaqi3if6t4n24qyabx5pjzlw6xovcbgugcmatavjvapyq3jfb4diqq"

# Jupyter Hub deployment values
jupyter_admin_user     = "oracle-ai"
jupyter_admin_password = "<admin-passowrd>"
playbooks_repo         = "https://github.com/robo-cap/llm-jupyter-notebooks.git"

# vLLM Deployment values
HF_TOKEN               = "<my-HuggingFace-token>"
model                  = "meta-llama/Meta-Llama-3-8B-Instruct"

Execute the commands

terraform init
terraform plan
terraform apply

Known Issues

If terraform destroy fails, manually remove the LoadBalancer resource configured for the Nginx Ingress Controller.

After terrafrom destroy, the block volumes corresponding to the PVCs used by the applications in the cluster won't be removed. You have to manually remove them.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
helm-module		helm-module
helm-values-templates		helm-values-templates
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
adb.tf		adb.tf
common.tf		common.tf
datasources.tf		datasources.tf
helm-deployments.tf		helm-deployments.tf
main.tf		main.tf
provider.tf		provider.tf
schema.yaml		schema.yaml
terraform.auto.tfvars.example		terraform.auto.tfvars.example
tls.tf		tls.tf
variables.tf		variables.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

orm-stack-oke-helm-deployment-vllm

Getting started

Helm Deployments

Nginx

Cert-manager

Jupyterhub

vLLM

How to deploy?

Known Issues

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

dranicu/terraform-oke-vllm-ora23ai

Folders and files

Latest commit

History

Repository files navigation

orm-stack-oke-helm-deployment-vllm

Getting started

Helm Deployments

Nginx

Cert-manager

Jupyterhub

vLLM

How to deploy?

Known Issues

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages