Skip to content

Add namespace level vGPU resource quotas #207

Closed
@Code2Life

Description

@Code2Life

Feature Description

TensorFusion has a built-in lightweight GPU scheduler and requires no Kubernetes device-plugin. However, this means Kubernetes built-in LimitRanges and ResourceQuotas won't take effect on vGPU resources.

To limit namespace-level total vGPU capacity and enforce min/max constraints on tflops/vram/gpu-replicas per workload, we need a similar mechanism to what LimitRanges and ResourceQuotas provide for standard Kubernetes resources.

Motivation

  • Multi-tenancy support: Enable fair resource sharing across teams and namespaces
  • Cost control: Prevent runaway GPU usage and associated costs
  • Resource planning: Ensure predictable resource allocation and prevent resource starvation
  • Compliance: Meet organizational policies for resource governance
  • Performance isolation: Prevent single workloads from monopolizing cluster resources

Design Proposal

The resource quotas should be simpler than Kubernetes native resources, and only supports namespace level limits, the quota CR's name must be the same as namespace it limits.

GPUResourceQuota CRD

apiVersion: tensor-fusion.ai/v1
kind: GPUResourceQuota
metadata:
  name: name-of-namespace
spec:
  # Total namespace limits (similar to ResourceQuotas)
  total:
    requests.tflops: "1000"
    requests.vram: "20Gi"
    limits.tflops: "1000" 
    limits.vram: "20Gi"
    workers: "100"
    alertThresholdPercent: 95
    
  # Per-workload limits (similar to LimitRanges)
  single:
    max:
      tflops: "200"
      vram: "4Gi"
      workers: "10"
    min:
      tflops: "10m"
      vram: "256Mi"
      workers: "1"
    default:
      tflops: "100m"
      vram: "1Gi"
    defaultRequest:
      tflops: "50m"
      vram: "512Mi"

GPUResourceQuotaStatus

status:
  used:
      requests.tflops: "450"
      requests.vram: "8Gi"
      limits.tflops: "600"
      limits.vram: "12Gi"
      workers: "25"
  availablePercent:
      requests.tflops: 30
      requests.vram: 10
      limits.tflops: 10
      limits.vram: 20
      workers: 20

Acceptance Criteria

Core Functionality

  • GPUResourceQuota CRD: Define and implement GPUResourceQuota custom resource
  • Admission Controller: Enhance current mutating webhook to enforce quotas on workload creation/updates
  • Resource Tracking: Track current usage across all workloads in namespace, when availablePercent < (100- alertThresholdPercent), should trigger event to alert cluster admin to check resource usage of this namespace.
  • Quota Enforcement: Block workload creation when quotas would be exceeded

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions