Skip to content

Conversation

@makirill
Copy link
Contributor

Auto-scaling for UserAssignments

Overview

Implements automatic scaling of UserAssignments based on allocation demand. When utilization exceeds a configurable threshold, the system automatically increases ResourceClaim capacity to provide more available UserAssignments.

Problem Statement

Workshops with high user demand can run out of available UserAssignments, requiring manual intervention to increase ResourceClaim counts. This creates delays and requires constant monitoring of workshop utilization.

Solution

Auto-scaling monitors UserAssignment allocation rates and automatically scales WorkshopProvisions when demand is high:

  • Trigger: When assigned_users / total_users >= threshold (default: 80%)
  • Action: Increase WorkshopProvision count by scaleFactor (default: 1.5x)
  • Scope: Each WorkshopProvision scales independently
  • Safety: Race condition prevention via autoScalingActive status flag

Key Features

  • Per-Provision Scaling: Each WorkshopProvision manages its own auto-scaling independently
  • Race Condition Safe: autoScalingActive flag prevents concurrent scaling operations
  • Configurable Thresholds: Customizable trigger threshold and scaling factor
  • Safety Limits: maxProvisions prevents runaway scaling
  • Backward Compatible: Disabled by default, no impact on existing workshops
  • Completion-Based: Waits for actual ResourceClaim completion, not arbitrary timers

Configuration

spec:
  autoScaling:
    enabled: true           # Enable auto-scaling
    threshold: 0.8         # Scale at 80% utilization
    scaleFactor: 1.5       # Increase capacity by 50%
    maxProvisions: 100     # Safety limit per provision

Race Condition Prevention

The implementation prevents race conditions that could occur during ResourceClaim provisioning:

  1. Problem: Multiple scaling triggers could create excess ResourceClaims
  2. Solution: autoScalingActive flag in WorkshopProvision status
  3. Workflow:
  • Set flag when scaling starts
  • Skip scaling if flag is active
  • Clear flag when all ResourceClaims are complete/failed

Testing

To test auto-scaling:

  1. Apply example configuration with autoScaling.enabled: true
  2. Create user assignments until utilization reaches threshold
  3. Verify WorkshopProvision count increases automatically
  4. Monitor autoScalingActive flag behavior

Deployment Requirements

The Workshop and WorkshopProvision CRDs must be updated with new schema before deploying this code.

@makirill makirill requested a review from jkupferer September 10, 2025 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants