@@ -17,7 +17,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
1717 Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
1818 ``` bash
1919 kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
20- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /pkg/manifests/vllm/deployment.yaml
20+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v0.1.0-rc.1 /pkg/manifests/vllm/deployment.yaml
2121 ```
2222
23231 . ** Install the Inference Extension CRDs:**
@@ -31,22 +31,22 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
3131 Deploy the sample InferenceModel which is configured to load balance traffic between the ` tweet-summary-0 ` and ` tweet-summary-1 `
3232 [ LoRA adapters] ( https://docs.vllm.ai/en/latest/features/lora.html ) of the sample model server.
3333 ``` bash
34- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /pkg/manifests/inferencemodel.yaml
34+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v0.1.0-rc.1 /pkg/manifests/inferencemodel.yaml
3535 ```
3636
37371 . ** Update Envoy Gateway Config to enable Patch Policy**
3838
3939 Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via ` EnvoyPatchPolicy ` . To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
4040 ``` bash
41- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /pkg/manifests/gateway/enable_patch_policy.yaml
41+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v0.1.0-rc.1 /pkg/manifests/gateway/enable_patch_policy.yaml
4242 kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
4343 ```
4444 Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
4545
46461 . ** Deploy Gateway**
4747
4848 ``` bash
49- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /pkg/manifests/gateway/gateway.yaml
49+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v0.1.0-rc.1 /pkg/manifests/gateway/gateway.yaml
5050 ```
5151 > ** _ NOTE:_ ** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: ` Backend ` , ` HTTPRoute ` , the resources included in the ` ./manifests/gateway/ext-proc.yaml ` file, and an additional ` ./manifests/gateway/patch_policy.yaml ` file. *** Should you choose to experiment, familiarity with xDS and Envoy are very useful.***
5252
@@ -60,14 +60,14 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
60601 . ** Deploy the Inference Extension and InferencePool**
6161
6262 ``` bash
63- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /pkg/manifests/ext_proc.yaml
63+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v0.1.0-rc.1 /pkg/manifests/ext_proc.yaml
6464 ```
6565
66661 . ** Deploy Envoy Gateway Custom Policies**
6767
6868 ``` bash
69- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /pkg/manifests/gateway/extension_policy.yaml
70- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /pkg/manifests/gateway/patch_policy.yaml
69+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v0.1.0-rc.1 /pkg/manifests/gateway/extension_policy.yaml
70+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v0.1.0-rc.1 /pkg/manifests/gateway/patch_policy.yaml
7171 ```
7272 > ** _ NOTE:_ ** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
7373
@@ -76,7 +76,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
7676 For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
7777
7878 ``` bash
79- kubectl apply -f https://github. com/kubernetes-sigs/gateway-api-inference-extension/raw/main /pkg/manifests/gateway/traffic_policy.yaml
79+ kubectl apply -f https://raw.githubusercontent. com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v0.1.0-rc.1 /pkg/manifests/gateway/traffic_policy.yaml
8080 ```
8181
82821 . ** Try it out**
0 commit comments