Developing and testing the Gateway API Inference Extension (GIE) is done by
building your Endpoint Picker (EPP) image and attaching that to a Gateway
on a
development cluster, with some model serving backend to route traffic to.
We provide Makefile
targets and development environment deployment manifests
under the deploy/environments
directory, which include support for
multiple kinds of clusters:
We support multiple different model serving platforms for testing:
- VLLM
- VLLM-Simulator
In the following sections we will cover how to use the different development environment options.
A KIND cluster can be used for basic development and testing on a local system. This environment will generally be limited to using a model serving simulator and as such is very limited compared to clusters with full model serving resources.
WARNING: This current requires you to have manually built the vllm simulator separately on your local system. In a future iteration this will be handled automatically and will not be required.
Run the following:
make environment.dev.kind
This will create a kind
cluster (or re-use an existing one) using the system's
local container runtime and deploy the development stack into the default
namespace. Instrutions will be provided on how to access the Gateway
and send
requests for testing.
NOTE: If you require significant customization of this environment beyond what the standard deployment provides, you can use the
deploy/components
withkustomize
to build your own highly customized environment. You can use thedeploy/environments/kind
deployment as a reference for your own.
To test your changes to the GIE in this environment, make your changes locally and then run the following:
make environment.dev.kind.update
This will build images with your recent changes and load the new images to the
cluster. Then a rollout the Deployments
will be performed so that your
recent changes are refleted.
A Kubernetes (or OpenShift) cluster can be used for development and testing.
There is a cluster-level infrastructure deployment that needs to be managed,
and then development environments can be created on a per-namespace basis to
enable sharing the cluster with multiple developers (or feel free to just use
the default
namespace if the cluster is private/personal).
WARNING: In shared cluster situations you should probably not be running this unless you're the cluster admin and you're certain it's you that should be running this, as this can be disruptive to other developers in the cluster.
The following will deploy all the infrastructure-level requirements (e.g. CRDs, Operators, etc) to support the namespace-level development environments:
make environment.dev.kubernetes.infrastructure
Whenever the deploy/environments/dev/kubernetes-infra
deployment's components
are updated, this will need to be re-deployed.
WARNING: This setup is currently very manual in regards to container images for the VLLM simulator and the EPP. It is expected that you build and push images for both to your own private registry. In future iterations, we will be providing automation around this to make it simpler.
To deploy a development environment to the cluster you'll need to explicitly
provide a namespace. This can be default
if this is your personal cluster,
but on a shared cluster you should pick something unique. For example:
export NAMESPACE=annas-dev-environment
NOTE: You could also use a tool like
uuidgen
to come up with a unique name (e.g.anna-0d03d66c-8880-4000-88b7-22f1d430f7d0
).
Create the namespace:
kubectl create namespace ${NAMESPACE}
You'll need to provide a Secret
with the login credentials for your private
repository (e.g. quay.io). It should look something like this:
apiVersion: v1
kind: Secret
metadata:
name: anna-pull-secret
data:
.dockerconfigjson: <YOUR_ENCODED_CONFIG_HERE>
type: kubernetes.io/dockerconfigjson
Apply that to your namespace:
kubectl -n ${NAMESPACE} apply -f secret.yaml
Export the name of the Secret
to the environment:
export REGISTRY_SECRET=anna-pull-secret
Now you need to provide several other environment variables. You'll need to
indicate the location and tag of the vllm-sim
image:
export VLLM_SIM_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
export VLLM_SIM_TAG="<YOUR_TAG>"
The same thing will need to be done for the EPP:
export EPP_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
export EPP_TAG="<YOUR_TAG>"
Once all this is set up, you can deploy the environment:
make environment.dev.kubernetes
This will deploy the entire stack to whatever namespace you chose. You can test
by exposing the inference Gateway
via port-forward:
kubectl -n ${NAMESPACE} port-forward service/inference-gateway-istio 8080:80
And making requests with curl
:
curl -s -w '\n' http://localhost:8080/v1/completions -H 'Content-Type: application/json' \
-d '{"model":"food-review","prompt":"hi","max_tokens":10,"temperature":0}' | jq
WARNING: This is a very manual process at the moment. We expect to make this more automated in future iterations.
Make your changes locally and commit them. Then select an image tag based on
the git
SHA:
export EPP_TAG=$(git rev-parse HEAD)
Build the image:
DEV_VERSION=$EPP_TAG make image-build
Tag the image for your private registry and push it:
$CONTAINER_RUNTIME tag quay.io/vllm-d/gateway-api-inference-extension/epp:$TAG \
<MY_REGISTRY>/<MY_IMAGE>:$EPP_TAG
$CONTAINER_RUNTIME push <MY_REGISTRY>/<MY_IMAGE>:$EPP_TAG
NOTE:
$CONTAINER_RUNTIME
can be configured or replaced with whatever your environment's standard container runtime is (e.g.podman
,docker
).
Then you can re-deploy the environment with the new changes (don't forget all the required env vars):
make environment.dev.kubernetes
And test the changes.