Add namespace level vGPU resource quotas

## Feature Description
TensorFusion has a built-in lightweight GPU scheduler and requires no Kubernetes device-plugin. However, this means Kubernetes built-in LimitRanges and ResourceQuotas won't take effect on vGPU resources.

To limit namespace-level total vGPU capacity and enforce min/max constraints on tflops/vram/gpu-replicas per workload, we need a similar mechanism to what LimitRanges and ResourceQuotas provide for standard Kubernetes resources.

## Motivation
- **Multi-tenancy support**: Enable fair resource sharing across teams and namespaces
- **Cost control**: Prevent runaway GPU usage and associated costs
- **Resource planning**: Ensure predictable resource allocation and prevent resource starvation
- **Compliance**: Meet organizational policies for resource governance
- **Performance isolation**: Prevent single workloads from monopolizing cluster resources

## Design Proposal

The resource quotas should be simpler than Kubernetes native resources, and only supports namespace level limits, the quota CR's name must be the same as namespace it limits.

### GPUResourceQuota CRD
```yaml
apiVersion: tensor-fusion.ai/v1
kind: GPUResourceQuota
metadata:
  name: name-of-namespace
spec:
  # Total namespace limits (similar to ResourceQuotas)
  total:
    requests.tflops: "1000"
    requests.vram: "20Gi"
    limits.tflops: "1000" 
    limits.vram: "20Gi"
    workers: "100"
    alertThresholdPercent: 95
    
  # Per-workload limits (similar to LimitRanges)
  single:
    max:
      tflops: "200"
      vram: "4Gi"
      workers: "10"
    min:
      tflops: "10m"
      vram: "256Mi"
      workers: "1"
    default:
      tflops: "100m"
      vram: "1Gi"
    defaultRequest:
      tflops: "50m"
      vram: "512Mi"
```

### GPUResourceQuotaStatus
```yaml
status:
  used:
      requests.tflops: "450"
      requests.vram: "8Gi"
      limits.tflops: "600"
      limits.vram: "12Gi"
      workers: "25"
  availablePercent:
      requests.tflops: 30
      requests.vram: 10
      limits.tflops: 10
      limits.vram: 20
      workers: 20
```

## Acceptance Criteria

### Core Functionality
- [ ] **GPUResourceQuota CRD**: Define and implement GPUResourceQuota custom resource
- [ ] **Admission Controller**: Enhance current mutating webhook to enforce quotas on workload creation/updates
- [ ] **Resource Tracking**: Track current usage across all workloads in namespace, when availablePercent < (100- alertThresholdPercent), should trigger event to alert cluster admin to check resource usage of this namespace.
- [ ] **Quota Enforcement**: Block workload creation when quotas would be exceeded


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add namespace level vGPU resource quotas #207

Feature Description

Motivation

Design Proposal

GPUResourceQuota CRD

GPUResourceQuotaStatus

Acceptance Criteria

Core Functionality

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add namespace level vGPU resource quotas #207

Description

Feature Description

Motivation

Design Proposal

GPUResourceQuota CRD

GPUResourceQuotaStatus

Acceptance Criteria

Core Functionality

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions