Skip to content

Allow dynamic quota creation and removal#287

Open
QuanMPhm wants to merge 1 commit intonerc-project:mainfrom
QuanMPhm:ops_1391/final
Open

Allow dynamic quota creation and removal#287
QuanMPhm wants to merge 1 commit intonerc-project:mainfrom
QuanMPhm:ops_1391/final

Conversation

@QuanMPhm
Copy link
Contributor

@QuanMPhm QuanMPhm commented Jan 21, 2026

Closes nerc-project/operations#1391. This is how I would suggest to review this PR.

Two CLI commands have been added, add_quota_to_resource.py and remove_quota_from_resource.py. I would suggest understanding those two commands first. These commands allow us to dynamically add/remove quotas instead of having them hard-coded as they are currently done. These commands don't impact the quota objects in the clusters, nor the quota attributes in allocations. Their full impact is illustrated when used within the typical user workflow, or in tandem with validate_allocations.py. I would now suggest checking the changes to functional/openshift/test_allocations.py to see the full implications of this PR. The other functional test cases only contain minor changes.

Afterwards, tasks.py, validate_allocations.py, and the allocator base and subclasses should be reviewed. They are the main consumers of quota information. All other changes relatively minor.

Follow-up issues to write based on these comments:
openshift vm quotas
idempotency for register_default_quotas
storage billing
Clean-up hard-coded attributes

@QuanMPhm QuanMPhm marked this pull request as draft January 21, 2026 18:05
@QuanMPhm QuanMPhm force-pushed the ops_1391/final branch 6 times, most recently from b3c58d8 to 35273aa Compare January 29, 2026 17:08
@QuanMPhm
Copy link
Contributor Author

@knikolla I addressed all your suggestions on Slack except one:

To migrate the display name of an attribute
Before, since the attributes were stored in code, the migrations were also stored in code
Now, since the adding of new quota is a command, migrating the display name of an attribute should also be a command.

May I ask that I implement this feature in a subsequent PR, to prevent this PR from bloating even more? If not, I will implement this after I receive answers for my questions above.

@joachimweyl
Copy link

What is the impact of this omission?

@QuanMPhm
Copy link
Contributor Author

QuanMPhm commented Jan 29, 2026

@joachimweyl The impact will be that to change the display names of attributes (the names that users will see in the Coldfront UI, i.e OpenShift Limit on CPU Quota) will be a bit inconvenient. An admin will have to do some manual renaming in Coldfront. Still doable, but not in a way that's quick and programmatic. We ideally want a CLI command that makes renaming easier, but I didn't want this PR to take too long to review because of the February maintenance.

@joachimweyl
Copy link

joachimweyl commented Jan 30, 2026

Makes sense to me.

@QuanMPhm QuanMPhm marked this pull request as ready for review February 4, 2026 18:24
Copy link
Contributor

@naved001 naved001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @QuanMPhm! Some basic questions in there as I try to refresh my memory of coldfront. Will do another pass.

class Command(BaseCommand):
def add_arguments(self, parser):
parser.add_argument(
"--display_name",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have an underscore instead of a dash.

--display_name -> --display-name

help="The default quota value for the storage attribute. In GB",
)
parser.add_argument(
"--resource_name",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--resource_name -> --resource-name

type=str,
default="",
help="Name of quota as it appears on invoice. Required if --is-storage-type is set.",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come you didn't specify dest= for some of these arguments?

Copy link
Contributor Author

@QuanMPhm QuanMPhm Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I normally wouldn't include dest=, and didn't review closely enough what Copilot generated this code for me. I've removed the dest=. Apologies

def handle(self, *args, **options):
if options["resource_type"] == "storage" and not options["invoice_name"]:
logger.error(
"--invoice-name must be provided when storage type is `storage`."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"when resource type is storage."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea is any quota that is relevant for storage billing should have the resource type storage, such as:

QUOTA_REQUESTS_IBM_STORAGE = "OpenShift Request on IBM Storage Quota (GiB)"
QUOTA_REQUESTS_NESE_STORAGE = "OpenShift Request on NESE Storage Quota (GiB)"

Copy link
Contributor

@naved001 naved001 Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I am just pointing out that the error message says "storage type is storage" instead of "resource type is storage". You are checking options["resource_type"], there's no storage_type

"--invoice-name",
type=str,
default="",
help="Name of quota as it appears on invoice. Required if --is-storage-type is set.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where's --is-storage-type? Did you mean --resource-type is set to storage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. My bad

else options["name"],
)

# Add common Openshift resources (cpu, memory, etc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remind how were these resources created before this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the information for these quotas are spread in multiple places in the repo. The display names are in attributes.py, the multiplier and static quantities are in tasks.py, other info in other places. The allocation attributes for these quotas were loaded by register_cloud_attributes.py, which consumes the attributes defined in attributes.py.

A by-product of this PR is that now all that info is created and stored in one place.


def add_arguments(self, parser):
parser.add_argument(
"--resource_name",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--resource_name -> --resource-name

help="Name of the Resource to modify.",
)
parser.add_argument(
"--display_name",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--display_name -> --display-name

@QuanMPhm
Copy link
Contributor Author

QuanMPhm commented Feb 13, 2026

@knikolla @naved001 A one-time command migrate_resource_quota.py has been added.

EDIT: Sorry I've missed some comments. I get to work on them

@QuanMPhm QuanMPhm requested a review from knikolla February 13, 2026 16:49
@QuanMPhm
Copy link
Contributor Author

QuanMPhm commented Feb 25, 2026

@knikolla @naved001 This PR is waiting on review. This is my last question

def handle(self, *args, **options):
if options["resource_type"] == "storage" and not options["invoice_name"]:
logger.error(
"--invoice-name must be provided when reousrce type is `storage`."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type, reousrce -> resource

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol you don't have to apologize. Ironically, I also made at typo in my original comment: type->typo

)

# Define quotas for each resource type
openshift_quotas = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to worry about migrating openshift vm quotas? I know we don't have any current deployment so maybe not?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be done in a follow-up patch if not already here. Management commands can be deployed as a separate version debug container for their execution so they don't necessarily require a maintenance window to upgrade.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've included openshift vm resources in the command. However, it will have the quotas defined in openshift_quotas, so some GPU quotas (i.e H100, A100) will be missing

Copy link
Collaborator

@knikolla knikolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! I think this will be ready with a few minor iterations mostly related to fetching rate data from invoicing and some polish here and there.

I'll do a deeper pass in the afternoon but I don't expect anything big to jump out. Again, good work!

@QuanMPhm QuanMPhm force-pushed the ops_1391/final branch 4 times, most recently from 12789b2 to 29ff0dc Compare February 27, 2026 15:25
@QuanMPhm
Copy link
Contributor Author

@knikolla @naved001 I have responded to all comments so far. In response to this comment, I decided to use register_default_quotas in the test cases to remove some redundant code.

Copy link
Collaborator

@knikolla knikolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some questions.

@knikolla
Copy link
Collaborator

knikolla commented Mar 3, 2026

Adding copilot to check if I missed something. If there's valid feedback I will tag you on the relevant comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves quota definitions from hard-coded mappings into per-resource, dynamically managed quota specs stored on the resource, and updates allocators/tasks/commands/tests to consume those specs. It adds management commands to register defaults and to add/remove quotas on a resource so quotas can evolve without code changes.

Changes:

  • Introduce QuotaSpec/QuotaSpecs models and load quota specs from a resource attribute (Available Quota Resources) in allocators.
  • Add management commands to register default quotas and to add/remove quota specs on a resource.
  • Refactor OpenShift/OpenStack allocators, tasks.activate_allocation, allocation validation, and storage billing to use resource-defined quota specs; update functional/unit tests accordingly.

Reviewed changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/coldfront_plugin_cloud/tests/unit/test_usage_models.py Update imports for moved usage_models.
src/coldfront_plugin_cloud/tests/unit/test_register_default_quotas.py New unit tests for register_default_quotas behavior/idempotency.
src/coldfront_plugin_cloud/tests/unit/test_fetch_daily_billable_usage.py Update imports for moved usage_models.
src/coldfront_plugin_cloud/tests/unit/test_calculate_quota_unit_hours.py Ensure test resources have quota specs via add_quota_to_resource.
src/coldfront_plugin_cloud/tests/unit/openshift/test_rbac.py Switch to shared unit OpenShift base allocator.
src/coldfront_plugin_cloud/tests/unit/openshift/base.py Add mock allocator wrapper to avoid real allocator initialization dependencies.
src/coldfront_plugin_cloud/tests/functional/openstack/test_allocation.py Adjust tests to rely on registered default quotas and resource quota specs.
src/coldfront_plugin_cloud/tests/functional/openshift_vm/test_allocation.py Register defaults and customize VM GPU quotas via add/remove quota commands.
src/coldfront_plugin_cloud/tests/functional/openshift/test_allocation.py Register defaults and expand tests around adding/removing quotas dynamically.
src/coldfront_plugin_cloud/tests/functional/esi/test_allocations.py Set up ESI quota specs dynamically in test setup.
src/coldfront_plugin_cloud/tests/base.py Update OpenShift resource creation signature (internal_name) and remove IBM storage flag.
src/coldfront_plugin_cloud/tasks.py Replace hard-coded quota math with QuotaSpec-driven computation.
src/coldfront_plugin_cloud/openstack.py Replace static quota mappings with resource-defined quota specs grouped by service.
src/coldfront_plugin_cloud/openshift_vm.py Remove VM-specific hard-coded quota mapping in favor of resource-defined specs.
src/coldfront_plugin_cloud/openshift.py Remove hard-coded quota mapping; format quotas via resource-defined specs.
src/coldfront_plugin_cloud/models/usage_models.py New module for pydantic usage/charges models (moved from old location).
src/coldfront_plugin_cloud/models/quota_models.py New pydantic models for quota specs and validation.
src/coldfront_plugin_cloud/management/commands/validate_allocations.py Validate allocations based on per-resource quota specs rather than hard-coded mappings.
src/coldfront_plugin_cloud/management/commands/remove_quota_from_resource.py New command to remove a quota spec from a resource.
src/coldfront_plugin_cloud/management/commands/register_default_quotas.py New command to populate default quota specs onto OpenShift/OpenStack resources.
src/coldfront_plugin_cloud/management/commands/fetch_daily_billable_usage.py Update imports for moved usage_models.
src/coldfront_plugin_cloud/management/commands/calculate_storage_gb_hours.py Drive storage billing from resource quota specs of type storage.
src/coldfront_plugin_cloud/management/commands/add_quota_to_resource.py New command to add a quota spec to a resource and create its allocation attribute type.
src/coldfront_plugin_cloud/management/commands/add_openshift_resource.py Remove IBM storage attribute wiring; add internal cluster name support.
src/coldfront_plugin_cloud/esi.py Remove ESI hard-coded quota mapping in favor of inherited dynamic behavior.
src/coldfront_plugin_cloud/base.py Load QuotaSpecs from resource attribute in allocator initialization.
src/coldfront_plugin_cloud/attributes.py Replace IBM storage availability attribute with Available Quota Resources; adjust registered allocation quota attributes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +215 to +217
obj_key = allocator.resource_quotaspecs.root[
attributes.QUOTA_OBJECT_GB
].quota_label
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QuotaSpec.quota_label for OpenStack quotas includes a service prefix (e.g. object.x-account-meta-quota-bytes), but OpenStackResourceAllocator.get_quota() returns quota keys without the service prefix (e.g. x-account-meta-quota-bytes). Using the full quota_label here means the Swift/object-quota assertions will never run (and the later indexing by obj_key would be wrong if it did). Consider splitting on the first . and using just the per-service key when interacting with get_quota() / Swift headers.

Suggested change
obj_key = allocator.resource_quotaspecs.root[
attributes.QUOTA_OBJECT_GB
].quota_label
full_obj_quota_label = allocator.resource_quotaspecs.root[
attributes.QUOTA_OBJECT_GB
].quota_label
obj_key = (
full_obj_quota_label.split(".", 1)[1]
if "." in full_obj_quota_label
else full_obj_quota_label
)

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QuanMPhm please verify this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid bug

Comment on lines 147 to 158
@@ -157,10 +154,12 @@ def handle(self, *args, **options):

expected_value = allocation.get_attribute(attr)
current_value = quota.get(key, None)
if key == obj_key and expected_value <= 0:
if key == OPENSTACK_OBJ_KEY and expected_value <= 0:
expected_obj_value = 1
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the OpenStack validation loop, key = quotaspec.quota_label uses the full label (e.g. compute.instances / object.x-account-meta-quota-bytes), but OpenStackResourceAllocator.get_quota() returns a dict keyed by the service-local label (e.g. instances / x-account-meta-quota-bytes). This mismatch makes current_value lookup incorrect and also prevents the OPENSTACK_OBJ_KEY special-case from ever triggering. Consider splitting the quotaspec label (same logic as OpenStackResourceAllocator._extract_quota_label) and validating against the per-service key.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QuanMPhm please verify this.

Copy link
Contributor Author

@QuanMPhm QuanMPhm Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid. Added assertion on log message to capture bug

for coldfront_attr in expected_coldfront_attrs:
resource_quotaspecs = allocator.resource_quotaspecs
for coldfront_attr, quota_spec in resource_quotaspecs.root.items():
if not allocation.get_attribute(coldfront_attr):
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not allocation.get_attribute(coldfront_attr): treats a legitimate quota value of 0 as “missing” and will overwrite it with the computed default. Since 0 is a valid quota for several specs, this should check explicitly for is None (or otherwise distinguish unset vs zero).

Suggested change
if not allocation.get_attribute(coldfront_attr):
if allocation.get_attribute(coldfront_attr) is None:

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QuanMPhm this seems like a bug in the current implementation of the code (not PR) you may apply this while you're at it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added small change to test case

Comment on lines +30 to +36
resource_quota_attr = resource_models.ResourceAttribute.objects.get(
resource=resource,
resource_attribute_type__name=attributes.RESOURCE_QUOTA_RESOURCES,
)
self.resource_quotaspecs = QuotaSpecs.model_validate(
json.loads(resource_quota_attr.value)
)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResourceAllocator.__init__ unconditionally does ResourceAttribute.objects.get(...RESOURCE_QUOTA_RESOURCES...). If a resource doesn’t have this attribute yet (e.g. newly created resources before register_default_quotas / add_quota_to_resource runs), allocator construction will raise and break workflows like activate_allocation and validate_allocations. Consider handling the missing attribute case with a clearer error (e.g. raise CommandError with guidance) or defaulting to an empty quota spec set where appropriate.

Suggested change
resource_quota_attr = resource_models.ResourceAttribute.objects.get(
resource=resource,
resource_attribute_type__name=attributes.RESOURCE_QUOTA_RESOURCES,
)
self.resource_quotaspecs = QuotaSpecs.model_validate(
json.loads(resource_quota_attr.value)
)
try:
resource_quota_attr = resource_models.ResourceAttribute.objects.get(
resource=resource,
resource_attribute_type__name=attributes.RESOURCE_QUOTA_RESOURCES,
)
self.resource_quotaspecs = QuotaSpecs.model_validate(
json.loads(resource_quota_attr.value)
)
except resource_models.ResourceAttribute.DoesNotExist:
# If the resource does not yet have quota resources configured,
# default to an empty quota specification set.
self.resource_quotaspecs = QuotaSpecs.model_validate({})

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QuanMPhm seems reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a clearer error message

Comment on lines +22 to +27
type=str,
help="Name of the Resource to modify.",
)
parser.add_argument(
"--display-name",
type=str,
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command-line args --resource-name and --display-name are not marked required=True. As written, running the command without them will pass None into Resource.objects.get(name=...) and fail with an unhelpful exception. Make these arguments required (and/or validate and raise a CommandError with a clear message).

Suggested change
type=str,
help="Name of the Resource to modify.",
)
parser.add_argument(
"--display-name",
type=str,
type=str,
required=True,
help="Name of the Resource to modify.",
)
parser.add_argument(
"--display-name",
type=str,
required=True,

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QuanMPhm seems reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update Storage types in ColdFront

5 participants