-
Notifications
You must be signed in to change notification settings - Fork 169
Description
Describe the change you'd like to see
Add a documentation page for the LLMInferenceService label and annotation propagation feature, as requested by @sivanantha321 in kserve/kserve#5009 (kserve/kserve#5009 (comment)).
This feature (merged in kserve/kserve#5009) enables users to propagate Kubernetes labels and annotations from an LLMInferenceService resource to the workload pods it manages. It supports all deployment modes: single-node Deployments, multi-node LeaderWorkerSets, disaggregated prefill-decode workloads, and the scheduler (EPP) Deployment.
The documentation should cover:
- Two propagation layers: top-level metadata (prefix-filtered via an approved allowlist) vs. spec-level fields (
spec.labels,spec.annotations, and per-component equivalents) which propagate all keys without filtering. - Approved prefix allowlists: which annotation prefixes (
k8s.v1.cni.cncf.io,kueue.x-k8s.io,prometheus.io) and label prefixes (kueue.x-k8s.io) are propagated from.metadata. - Per-component spec fields:
spec.prefill.labels/spec.prefill.annotationsfor prefill pods andspec.router.scheduler.labels/spec.router.scheduler.annotationsfor the scheduler pod. - Multi-node behaviour: propagation to both leader and worker pod templates.
- Precedence rules: spec-level values override top-level metadata when the same key appears in both.
- Practical examples: Kueue queue assignment, Multus CNI attachment, Prometheus scraping config, and custom platform labels for cost allocation.
Suggested location: docs/model-serving/generative-inference/llmisvc/llmisvc-label-propagation.md under the existing LLMInferenceService sidebar category.
Additional context