Fix upstream inferencepool chart integration with correct OCI registry

jeremyeder · jeremyeder · commit 39b6b4ea026b · 2025-06-13T19:54:22.000-04:00
- Use correct OCI registry: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts
- Restore proper subchart dependency structure for inferencepool
- Remove manual InferencePool template (use upstream chart instead)
- Fix HTTPRoute backend reference to use subchart release name
- Successfully builds dependencies and passes lint

The upstream kubernetes-sigs/gateway-api-inference-extension charts are published
to Google Artifact Registry, not GitHub Container Registry. This fix enables
proper integration with the official upstream Helm charts.
diff --git a/charts/IMPLEMENTATION_SUMMARY.md b/charts/IMPLEMENTATION_SUMMARY.md
@@ -42,7 +42,7 @@ This implementation addresses [issue #312](https://github.com/llm-d/llm-d-deploy
 - Configuration orchestration
 
 **Integration Points**:
-- Uses upstream `inferencepool` chart for intelligent routing
+- Creates InferencePool resources (requires upstream CRDs)
 - Connects vLLM services via label matching
 - Maintains backward compatibility for deployment
 
@@ -97,9 +97,9 @@ helm install llm-d-new ./charts/llm-d-umbrella \
 ## Benefits Achieved
 
 ### ✅ Upstream Integration
-- Uses official Gateway API Inference Extension charts
-- Leverages multi-provider support (GKE, Istio, kGateway)
-- Gets upstream bug fixes and feature updates automatically
+- Uses official Gateway API Inference Extension CRDs and APIs
+- Creates InferencePool resources following upstream specifications
+- Compatible with multi-provider support (GKE, Istio, kGateway)
 
 ### ✅ Modular Architecture
 - vLLM and gateway concerns properly separated
diff --git a/charts/llm-d-umbrella/Chart.lock b/charts/llm-d-umbrella/Chart.lock
@@ -0,0 +1,12 @@
+dependencies:
+- name: common
+  repository: https://charts.bitnami.com/bitnami
+  version: 2.27.0
+- name: inferencepool
+  repository: oci://us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts
+  version: v0
+- name: llm-d-vllm
+  repository: file://../llm-d-vllm
+  version: 1.0.0
+digest: sha256:80feac6ba991f6b485fa14153c7f061a0cbfb19d65ee332c03c8fba288922501
+generated: "2025-06-13T19:53:15.903878-04:00"
diff --git a/charts/llm-d-umbrella/Chart.yaml b/charts/llm-d-umbrella/Chart.yaml
@@ -26,8 +26,8 @@ dependencies:
     version: "2.27.0"
   # Upstream inference gateway chart
   - name: inferencepool
-    repository: oci://ghcr.io/kubernetes-sigs/gateway-api-inference-extension/charts
-    version: "0.0.0"
+    repository: oci://us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts
+    version: "v0"
     condition: inferencepool.enabled
   # Our vLLM model serving chart
   - name: llm-d-vllm
diff --git a/charts/llm-d-umbrella/README.md.gotmpl b/charts/llm-d-umbrella/README.md.gotmpl
@@ -0,0 +1,52 @@
+{{ template "chart.header" . }}
+
+{{ template "chart.description" . }}
+
+## Prerequisites
+
+- Kubernetes 1.30+
+- Helm 3.10+
+- Gateway API CRDs installed
+- **InferencePool CRDs** (from Gateway API Inference Extension):
+  ```bash
+  kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencepool-resources.yaml
+  ```
+
+{{ template "chart.maintainersSection" . }}
+
+{{ template "chart.sourcesSection" . }}
+
+{{ template "chart.requirementsSection" . }}
+
+{{ template "chart.valuesSection" . }}
+
+## Installation
+
+1. Install prerequisites:
+```bash
+# Install Gateway API CRDs (if not already installed)
+kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.0.0/standard-install.yaml
+
+# Install InferencePool CRDs
+kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencepool-resources.yaml
+```
+
+2. Install the chart:
+```bash
+helm install my-llm-d-umbrella llm-d/llm-d-umbrella
+```
+
+## Architecture
+
+This umbrella chart combines:
+- **Upstream InferencePool**: Intelligent routing and load balancing for inference workloads
+- **llm-d-vLLM**: Dedicated vLLM model serving components
+- **Gateway API**: External traffic routing and management
+
+The modular design enables:
+- Clean separation between inference gateway and model serving
+- Leveraging upstream Gateway API Inference Extension
+- Intelligent endpoint selection and load balancing
+- Backward compatibility with existing deployments
+
+{{ template "chart.homepage" . }}
diff --git a/charts/llm-d-umbrella/templates/httproute.yaml b/charts/llm-d-umbrella/templates/httproute.yaml
@@ -20,7 +20,7 @@ spec:
       {{- range .backendRefs }}
         - group: {{ .group }}
           kind: {{ .kind }}
-          name: {{ .name }}
+          name: {{ tpl .name $ }}
           port: {{ .port }}
       {{- end }}
 ---
diff --git a/charts/llm-d-umbrella/values.yaml b/charts/llm-d-umbrella/values.yaml
@@ -33,18 +33,7 @@ commonAnnotations: {}
 inferencepool:
   enabled: true
 
-  # Configure the inference extension (endpoint picker)
-  inferenceExtension:
-    replicas: 1
-    image:
-      name: epp
-      hub: gcr.io/gke-ai-eco-dev
-      tag: 0.3.0
-      pullPolicy: Always
-    externalProcessingPort: 9002
-    env: []
-
-  # Configure the inference pool for vLLM
+  # InferencePool configuration (passed to upstream chart)
   inferencePool:
     targetPort: 8000
     modelServerType: vllm
@@ -120,5 +109,5 @@ gateway:
       backendRefs:
         - group: inference.networking.x-k8s.io
           kind: InferencePool
-          name: vllm-inference-pool  # Name from inferencepool chart
+          name: "{{ .Release.Name }}-inferencepool"
           port: 8000
diff --git a/charts/llm-d-vllm/Chart.lock b/charts/llm-d-vllm/Chart.lock
@@ -0,0 +1,9 @@
+dependencies:
+- name: common
+  repository: https://charts.bitnami.com/bitnami
+  version: 2.27.0
+- name: redis
+  repository: https://charts.bitnami.com/bitnami
+  version: 20.13.4
+digest: sha256:772ec68662ea0b33874d50d86123af9486c4f549bd1fb18db7b685315a3d0163
+generated: "2025-06-13T19:53:30.705482-04:00"