Skip to content

Latest commit

 

History

History
212 lines (151 loc) · 6.33 KB

DEVELOPMENT.md

File metadata and controls

212 lines (151 loc) · 6.33 KB

Development

Developing and testing the Gateway API Inference Extension (GIE) is done by building your Endpoint Picker (EPP) image and attaching that to a Gateway on a development cluster, with some model serving backend to route traffic to.

We provide Makefile targets and development environment deployment manifests under the deploy/environments directory, which include support for multiple kinds of clusters:

We support multiple different model serving platforms for testing:

  • VLLM
  • VLLM-Simulator

In the following sections we will cover how to use the different development environment options.

Kubernetes In Docker (KIND)

A KIND cluster can be used for basic development and testing on a local system. This environment will generally be limited to using a model serving simulator and as such is very limited compared to clusters with full model serving resources.

Setup

WARNING: This current requires you to have manually built the vllm simulator separately on your local system. In a future iteration this will be handled automatically and will not be required.

Run the following:

make environment.dev.kind

This will create a kind cluster (or re-use an existing one) using the system's local container runtime and deploy the development stack into the default namespace. Instrutions will be provided on how to access the Gateway and send requests for testing.

NOTE: If you require significant customization of this environment beyond what the standard deployment provides, you can use the deploy/components with kustomize to build your own highly customized environment. You can use the deploy/environments/kind deployment as a reference for your own.

Development Cycle

To test your changes to the GIE in this environment, make your changes locally and then run the following:

make environment.dev.kind.update

This will build images with your recent changes and load the new images to the cluster. Then a rollout the Deployments will be performed so that your recent changes are refleted.

Kubernetes

A Kubernetes (or OpenShift) cluster can be used for development and testing. There is a cluster-level infrastructure deployment that needs to be managed, and then development environments can be created on a per-namespace basis to enable sharing the cluster with multiple developers (or feel free to just use the default namespace if the cluster is private/personal).

Setup - Infrastructure

WARNING: In shared cluster situations you should probably not be running this unless you're the cluster admin and you're certain it's you that should be running this, as this can be disruptive to other developers in the cluster.

The following will deploy all the infrastructure-level requirements (e.g. CRDs, Operators, etc) to support the namespace-level development environments:

make environment.dev.kubernetes.infrastructure

Whenever the deploy/environments/dev/kubernetes-infra deployment's components are updated, this will need to be re-deployed.

Setup - Developer Environment

WARNING: This setup is currently very manual in regards to container images for the VLLM simulator and the EPP. It is expected that you build and push images for both to your own private registry. In future iterations, we will be providing automation around this to make it simpler.

To deploy a development environment to the cluster you'll need to explicitly provide a namespace. This can be default if this is your personal cluster, but on a shared cluster you should pick something unique. For example:

export NAMESPACE=annas-dev-environment

NOTE: You could also use a tool like uuidgen to come up with a unique name (e.g. anna-0d03d66c-8880-4000-88b7-22f1d430f7d0).

Create the namespace:

kubectl create namespace ${NAMESPACE}

You'll need to provide a Secret with the login credentials for your private repository (e.g. quay.io). It should look something like this:

apiVersion: v1
kind: Secret
metadata:
  name: anna-pull-secret
data:
  .dockerconfigjson: <YOUR_ENCODED_CONFIG_HERE>
type: kubernetes.io/dockerconfigjson

Apply that to your namespace:

kubectl -n ${NAMESPACE} apply -f secret.yaml

Export the name of the Secret to the environment:

export REGISTRY_SECRET=anna-pull-secret

Now you need to provide several other environment variables. You'll need to indicate the location and tag of the vllm-sim image:

export VLLM_SIM_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
export VLLM_SIM_TAG="<YOUR_TAG>"

The same thing will need to be done for the EPP:

export EPP_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
export EPP_TAG="<YOUR_TAG>"

Once all this is set up, you can deploy the environment:

make environment.dev.kubernetes

This will deploy the entire stack to whatever namespace you chose. You can test by exposing the inference Gateway via port-forward:

kubectl -n ${NAMESPACE} port-forward service/inference-gateway-istio 8080:80

And making requests with curl:

curl -s -w '\n' http://localhost:8080/v1/completions -H 'Content-Type: application/json' \
  -d '{"model":"food-review","prompt":"hi","max_tokens":10,"temperature":0}' | jq

Development Cycle

WARNING: This is a very manual process at the moment. We expect to make this more automated in future iterations.

Make your changes locally and commit them. Then select an image tag based on the git SHA:

export EPP_TAG=$(git rev-parse HEAD)

Build the image:

DEV_VERSION=$EPP_TAG make image-build

Tag the image for your private registry and push it:

$CONTAINER_RUNTIME tag quay.io/vllm-d/gateway-api-inference-extension/epp:$TAG \
    <MY_REGISTRY>/<MY_IMAGE>:$EPP_TAG
$CONTAINER_RUNTIME push <MY_REGISTRY>/<MY_IMAGE>:$EPP_TAG

NOTE: $CONTAINER_RUNTIME can be configured or replaced with whatever your environment's standard container runtime is (e.g. podman, docker).

Then you can re-deploy the environment with the new changes (don't forget all the required env vars):

make environment.dev.kubernetes

And test the changes.