This stack deploys an OKE cluster with two nodepools:
- one nodepool with flexible shapes
- one nodepool with GPU shapes
And several supporting applications using helm:
- nginx
- cert-manager
- qdrant vector DB
- jupyterhub
With the scope of deploying vLLM for LLM inferecing in OKE.
Note: For helm deployments it's necessary to create bastion and operator host (with the associated policy for the operator to manage the clsuter), or configure a cluster with public API endpoint.
In case the bastion and operator hosts are not created, is a prerequisite to have the following tools already installed and configured:
- bash
- helm
- jq
- kubectl
- oci-cli
Nginx is deployed and configured as default ingress controller.
Cert-manager is deployed to handle the configuration of TLS certificate for the configured ingress resources. Currently it's using the staging Let's Encrypt endpoint.
Jupyterhub will be accessible to the address: https://jupyter.a.b.c.d.nip.io, where a.b.c.d is the public IP address of the load balancer associated with the NGINX ingress controller.
JupyterHub is using a dummy authentication scheme (user/password) and the access is secured using the variables:
jupyter_admin_user
jupyter_admin_password
It also supports the option to automatically clone a git repo when user is connecting and making it available under examples
directory.
If you are looking to integrate JupyterHub with an Identity Provider, please take a look at the options available here: https://oauthenticator.readthedocs.io/en/latest/tutorials/provider-specific-setup/index.html
For integration with your OCI tenancy IDCS domain, you may go through the following steps:
- Setup a new Application in IDCS
-
Navigate to the following address: https://cloud.oracle.com/identity/domains/
-
Click on the
OracleIdentityCloudService
domain -
Navigate to
Integrated applications
from the left-side menu -
Click Add application
-
Select Confidential Application and click Launch worflow
- Application configuration
-
Under Add application details configure
name:
Jupyterhub
(all the other fields are optional, you may leave them empty)
-
Under Configure OAuth
Resource server configuration -> Skip for later
Client configuration -> Configure this application as a client now
Authorization:
- Check the
Authorization code
check-box - Leave the other check-boxes unchecked
Redirect URL:
https://<jupyterhub-domain>/hub/oauth_callback
- Check the
-
Under Configure policy
Web tier policy -> Skip for later
-
Click Finish
-
Scroll down wehere you fill find the General Information section.
-
Copy the
Client ID
andClient secret
: -
Click Activate button at the top.
- Connect to the OKE cluster and update the JupyterHub Helm deployment values.
-
Create a file named
oauth2-values.yaml
with the following content (make sure to fill-in the values relevant for your setup)hub: config: Authenticator: allow_all: true GenericOAuthenticator: client_id: <client-id> client_secret: <client-secret> authorize_url: <idcs-stripe-url>/oauth2/v1/authorize token_url: <idcs-stripe-url>/oauth2/v1/token userdata_url: <idcs-stripe-url>/oauth2/v1/userinfo scope: - openid - email username_claim: "email" JupyterHub: authenticator_class: generic-oauth
Note: IDCS stripe URL can be fetched from the OracleIdentityCloudService IDCS Domain Overview -> Domain Information -> Domain URL.
Should be something like this:
https://idcs-18bb6a27b33d416fb083d27a9bcede3b.identity.oraclecloud.com
-
Execute the following command to update the JupyterHub Helm deployment:
helm upgrade jupyterhub jupyterhub --repo https://hub.jupyter.org/helm-chart/ --reuse-values -f oauth2-values.yaml
The LLM is fetched from HuggingFace and deployed using vLLM.
Parameters:
HF_TOKEN
- required to pull the model from HuggingFace.model
- the name of the LLM you intend to pull from HuggingFace. Make sure to accept the license for the model you intend to pull.max_model_len
- override the default maximum context length. It may be required on shapes with not enough GPU memory available.LLM_API_KEY
- used to secure the endpoint exposed by vLLM for inferencing.
- Deploy via ORM
- Create a new stack
- Upload the TF configuration files
- Configure the variables
- Apply
- Local deployment
- Create a file called
terraform.auto.tfvars
with the required values.
# ORM injected values
region = "us-ashburn-1"
tenancy_ocid = "ocid1.tenancy.oc1..aaaaaaaaiyavtwbz4kyu7g7b6wglllccbflmjx2lzk5nwpbme44mv54xu7dq"
compartment_ocid = "ocid1.compartment.oc1..aaaaaaaaqi3if6t4n24qyabx5pjzlw6xovcbgugcmatavjvapyq3jfb4diqq"
# OKE Terraform module values
create_iam_resources = false
create_iam_tag_namespace = false
ssh_public_key = "<ssh_public_key>"
## NodePool with non-GPU shape is created by default with size 1
simple_np_flex_shape = { "instanceShape" = "VM.Standard.E4.Flex", "ocpus" = 2, "memory" = 16 }
## NodePool with GPU shape is created by default with size 0
gpu_np_size = 1
gpu_np_shape = "VM.GPU.A10.1"
## OKE Deployment values
cluster_name = "oke"
vcn_name = "oke-vcn"
compartment_id = "ocid1.compartment.oc1..aaaaaaaaqi3if6t4n24qyabx5pjzlw6xovcbgugcmatavjvapyq3jfb4diqq"
# Jupyter Hub deployment values
jupyter_admin_user = "oracle-ai"
jupyter_admin_password = "<admin-passowrd>"
playbooks_repo = "https://github.com/robo-cap/llm-jupyter-notebooks.git"
# vLLM Deployment values
HF_TOKEN = "<my-HuggingFace-token>"
model = "meta-llama/Meta-Llama-3-8B-Instruct"
- Execute the commands
terraform init
terraform plan
terraform apply
If terraform destroy
fails, manually remove the LoadBalancer resource configured for the Nginx Ingress Controller.
After terrafrom destroy
, the block volumes corresponding to the PVCs used by the applications in the cluster won't be removed. You have to manually remove them.