Merge pull request #98 from anthonsu/main

karan · web-flow · commit 66f80d4c18e2 · 2025-10-15T11:26:57.000-07:00
Point vllm-tpu image to latest tag
diff --git a/inference/trillium/vLLM/Llama3.1/README.md b/inference/trillium/vLLM/Llama3.1/README.md
@@ -80,7 +80,7 @@ gcloud compute tpus tpu-vm ssh $TPU_NAME --project $PROJECT --zone=$ZONE
 ## Step 3: Use the latest vllm docker image for TPU
 
 ```bash
-export DOCKER_URI=vllm/vllm-tpu:nightly
+export DOCKER_URI=vllm/vllm-tpu:latest
 ```
 
 ## Step 4: Run the docker container in the TPU instance
diff --git a/inference/trillium/vLLM/Qwen2.5-32B/README.md b/inference/trillium/vLLM/Qwen2.5-32B/README.md
@@ -35,10 +35,8 @@ gcloud compute tpus tpu-vm ssh $TPU_NAME --project $PROJECT --zone=$ZONE
 
 ## Step 3: Use the latest vllm docker image for TPU
 
-We use a pinned image but you can change it to `vllm/vllm-tpu:nightly` to get the latest TPU nightly image.
-
 ```bash
-export DOCKER_URI=vllm/vllm-tpu:nightly
+export DOCKER_URI=vllm/vllm-tpu:latest
 ```
 
 ## Step 4: Run the docker container in the TPU instance
diff --git a/inference/trillium/vLLM/Qwen2.5-VL/README.md b/inference/trillium/vLLM/Qwen2.5-VL/README.md
@@ -76,10 +76,10 @@ gcloud compute tpus tpu-vm ssh $TPU_NAME --project $PROJECT --zone=$ZONE
 ## Step 3: Use the latest vllm docker image for TPU
 
 ```bash
-export DOCKER_URI=vllm/vllm-tpu:nightly
+export DOCKER_URI=vllm/vllm-tpu:latest
 ```
 
-> **!!Important!!:** As of 10/07/2025, the `vllm/vllm-tpu:nightly` Docker image does not yet include the necessary `tpu_inference` updates to support multi-modal models like Qwen2.5-VL. The following instructions require installing [vllm-tpu](https://docs.vllm.ai/en/latest/getting_started/installation/google_tpu.html#set-up-using-python) and [tpu-inference](https://github.com/vllm-project/tpu-inference) manually on the TPU VM and run directly from the source (user can also choose to build a local Docker image) instead of using Docker published images. For production environments, we recommend waiting for an official `vllm-tpu` Docker image release that includes this support.
+> **Note:** For production deployments, we recommend using the official `vllm-tpu` Docker images to ensure a stable and supported environment. For development and testing, you have the option to perform a manual installation of [vllm-tpu](https://docs.vllm.ai/en/latest/getting_started/installation/google_tpu.html#set-up-using-python) and [tpu-inference](https://github.com/vllm-project/tpu-inference) from source on a TPU VM or to build a custom Docker image.
 
 ## Step 4: Run the docker container in the TPU instance
 
diff --git a/inference/trillium/vLLM/Qwen3/README.md b/inference/trillium/vLLM/Qwen3/README.md
@@ -69,7 +69,7 @@ gcloud compute tpus tpu-vm ssh $TPU_NAME --project $PROJECT --zone=$ZONE
 ## Step 3: Use the latest vllm docker image for TPU
 
 ```bash
-export DOCKER_URI=vllm/vllm-tpu:nightly
+export DOCKER_URI=vllm/vllm-tpu:latest
 ```
 
 ## Step 4: Run the docker container in the TPU instance