[WIP] GKE Inference Gateway example #2448

SinaChavoshi · 2025-09-16T19:12:56Z

This PR adds the gke_inference_gateway_standard_cluster example that uses latest GA (General Availability) guide for the GKE Inference Gateway. The configuration has been modernized to reflect current best practices and resource definitions.

apeabody · 2025-10-30T23:08:43Z

/gemini review

apeabody · 2025-10-30T23:09:30Z

Hi @SinaChavoshi - Is this still WIP?

gemini-code-assist

Code Review

This pull request introduces a new example for deploying the GKE Inference Gateway on a standard cluster. The configuration is comprehensive and utilizes modern practices. My review has identified a few areas for improvement, primarily concerning correctness and maintainability. Key findings include a resource allocation issue that would prevent some pods from scheduling, a fragile method for applying Kubernetes CRDs that depends on local tooling, and a missing provider definition that would cause Terraform to fail. I've provided specific suggestions to address these points and enhance the robustness and clarity of the example.

gemini-code-assist · 2025-10-30T23:10:17Z

examples/gke_inference_gateway_standard_cluster/main.tf

+resource "null_resource" "apply_crds" {
+  provisioner "local-exec" {
+    command = "kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.0.0/manifests.yaml"
+  }
+  depends_on = [module.gke]
+}


Using local-exec to apply Kubernetes manifests is fragile as it depends on kubectl being installed and configured on the machine running Terraform. The README instructs users to configure kubectl after terraform apply, which creates a chicken-and-egg problem. A more robust, Terraform-native approach is to use the http data source to fetch the manifest and the kubernetes_manifest resource to apply it. This ensures the process is self-contained within Terraform's execution.

data "http" "inference_gateway_crds" { url = "https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.0.0/manifests.yaml" } resource "kubernetes_manifest" "apply_crds" { for_each = { for i, v in split("\n---\n", data.http.inference_gateway_crds.body) : i => v if trimspace(v) != "" } manifest = yamldecode(each.value) depends_on = [module.gke] }

Hi @SinaChavoshi - I would recommend using the kubectl module which handles installation, auth, etc:

module "kubectl" { source = "terraform-google-modules/gcloud/google//modules/kubectl-wrapper" project_id = var.project_id cluster_name = var.cluster_name cluster_location = var.cluster_location kubectl_create_command = "kubectl create deploy nginx --image=nginx" kubectl_destroy_command = "kubectl delete deploy nginx" }

gemini-code-assist · 2025-10-30T23:10:17Z

examples/gke_inference_gateway_standard_cluster/main.tf

+    name  = "healthCheckPolicy.create"
+    value = "false"
+  }
+  depends_on = [kubernetes_deployment.vllm, null_resource.apply_crds]


Since null_resource.apply_crds is being replaced by kubernetes_manifest.apply_crds, this depends_on needs to be updated to reflect the new resource.

depends_on = [kubernetes_deployment.vllm, kubernetes_manifest.apply_crds]

gemini-code-assist · 2025-10-30T23:10:18Z

examples/gke_inference_gateway_standard_cluster/versions.tf

+  required_providers {
+    google = {
+      source = "hashicorp/google"
+    }
+    google-beta = {
+      source = "hashicorp/google-beta"
+    }
+    kubernetes = {
+      source = "hashicorp/kubernetes"
+    }


The helm provider is used in main.tf but it is not declared in the required_providers block. This will cause terraform init to fail. Please add the helm provider to this block.

required_providers { google = { source = "hashicorp/google" } google-beta = { source = "hashicorp/google-beta" } kubernetes = { source = "hashicorp/kubernetes" } helm = { source = "hashicorp/helm" } }

gemini-code-assist · 2025-10-30T23:10:18Z

examples/gke_inference_gateway_standard_cluster/main.tf

+    name = "vllm-llama3.1-8b-instruct"
+  }
+  spec {
+    replicas = 3


The vllm deployment is configured with 3 replicas, but the gpupool node pool is configured with only one node (node_count = 1). The a3-highgpu-2g machine type has 2 GPUs, and each pod requests 1 GPU. This means only 2 of the 3 replicas can be scheduled, leaving one pod in a Pending state. To fix this, you should either increase the node_count to 2 in the node pool definition, or reduce the number of replicas to 2 to match the available GPUs on a single node.

replicas = 2

gemini-code-assist · 2025-10-30T23:10:18Z

examples/gke_inference_gateway_standard_cluster/README.md

+        "model": "food-review",
+        "prompt": "What is a good recipe for a chicken curry?",
+        "max_tokens": 100,
+        "temperature": "0.7"


The temperature parameter in the JSON payload is specified as a string ("0.7"). While some servers might be lenient, the OpenAI API specification (which vLLM aims to be compatible with) expects this to be a number. It's better to provide it as a numeric value for correctness and broader compatibility.

Suggested change

"temperature": "0.7"

"temperature": 0.7

gemini-code-assist · 2025-10-30T23:10:18Z

examples/gke_inference_gateway_standard_cluster/main.tf

+    guest_accelerator = {
+      type  = "nvidia-h100-80gb"
+      count = 2
+    }


For accelerator-optimized machine types like a3-highgpu-2g, the accelerator type and count are inferred from the machine type. This guest_accelerator block is redundant and can be removed for clarity and to rely on the machine type's implicit configuration.

gemini-code-assist · 2025-10-30T23:10:18Z

examples/gke_inference_gateway_standard_cluster/main.tf

+        enable_service_links             = false
+        container {
+          name              = "vllm"
+          image             = "vllm/vllm-openai:latest"


Using the :latest tag for container images is not recommended as it can lead to unpredictable behavior and makes deployments difficult to reproduce. It's a best practice to pin images to a specific, immutable version tag (e.g., v0.5.1).

gemini-code-assist · 2025-10-30T23:10:19Z

examples/gke_inference_gateway_standard_cluster/main.tf

+            name = "HUGGING_FACE_HUB_TOKEN"
+            value_from {
+              secret_key_ref {
+                name = kubernetes_secret.hf_secret.metadata[0].name


Accessing metadata via metadata[0] is unnecessary and less readable. The Terraform Kubernetes provider schema defines metadata as a single block, not a list. You can access its attributes directly. Please use kubernetes_secret.hf_secret.metadata.name instead. This applies to other resource references in this file as well (e.g., lines 299, 341, 345, 355, 368, 380).

name = kubernetes_secret.hf_secret.metadata.name

gemini-code-assist · 2025-10-30T23:10:19Z

examples/gke_inference_gateway_standard_cluster/main.tf

+        }
+        container {
+          name  = "lora-adapter-syncer"
+          image = "us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/lora-syncer:main"


Using a floating tag like :main is not recommended for the same reasons as using :latest. It can cause unexpected changes and break reproducibility. Please use a specific release tag or commit SHA if available.

gemini-code-assist · 2025-10-30T23:10:19Z

examples/gke_inference_gateway_standard_cluster/outputs.tf

+}
+
+output "addons_config" {
+  description = "The configuration for addons supported by GKE Autopilot."


The description for this output mentions GKE Autopilot, but this example provisions a GKE Standard cluster. The description should be updated to be accurate.

description = "The configuration for addons supported by GKE."

SinaChavoshi added 3 commits September 15, 2025 18:20

copy of gke standard cluster

71b2638

update to ga release

8ebbc56

fix curl command in readme

dc362ed

SinaChavoshi requested review from a team, apeabody and ericyz as code owners September 16, 2025 19:12

SinaChavoshi and others added 2 commits September 16, 2025 12:13

Merge branch 'main' into inference_gateway

50b791f

Merge branch 'main' into inference_gateway

895772c

gemini-code-assist bot reviewed Oct 30, 2025

View reviewed changes

[WIP] GKE Inference Gateway example #2448

Are you sure you want to change the base?

[WIP] GKE Inference Gateway example #2448

Uh oh!

Conversation

SinaChavoshi commented Sep 16, 2025

Uh oh!

apeabody commented Oct 30, 2025

Uh oh!

apeabody commented Oct 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

apeabody Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants