Allow dynamic quota creation and removal#287
Allow dynamic quota creation and removal#287QuanMPhm wants to merge 1 commit intonerc-project:mainfrom
Conversation
src/coldfront_plugin_cloud/management/commands/add_quota_to_resource.py
Outdated
Show resolved
Hide resolved
src/coldfront_plugin_cloud/management/commands/calculate_storage_gb_hours.py
Outdated
Show resolved
Hide resolved
src/coldfront_plugin_cloud/tests/functional/openshift/test_allocation.py
Outdated
Show resolved
Hide resolved
b3c58d8 to
35273aa
Compare
|
@knikolla I addressed all your suggestions on Slack except one:
May I ask that I implement this feature in a subsequent PR, to prevent this PR from bloating even more? If not, I will implement this after I receive answers for my questions above. |
|
What is the impact of this omission? |
|
@joachimweyl The impact will be that to change the display names of attributes (the names that users will see in the Coldfront UI, i.e |
|
Makes sense to me. |
src/coldfront_plugin_cloud/management/commands/add_quota_to_resource.py
Outdated
Show resolved
Hide resolved
src/coldfront_plugin_cloud/management/commands/add_quota_to_resource.py
Outdated
Show resolved
Hide resolved
src/coldfront_plugin_cloud/management/commands/add_openshift_resource.py
Show resolved
Hide resolved
| class Command(BaseCommand): | ||
| def add_arguments(self, parser): | ||
| parser.add_argument( | ||
| "--display_name", |
There was a problem hiding this comment.
you have an underscore instead of a dash.
--display_name -> --display-name
| help="The default quota value for the storage attribute. In GB", | ||
| ) | ||
| parser.add_argument( | ||
| "--resource_name", |
There was a problem hiding this comment.
--resource_name -> --resource-name
| type=str, | ||
| default="", | ||
| help="Name of quota as it appears on invoice. Required if --is-storage-type is set.", | ||
| ) |
There was a problem hiding this comment.
how come you didn't specify dest= for some of these arguments?
There was a problem hiding this comment.
I normally wouldn't include dest=, and didn't review closely enough what Copilot generated this code for me. I've removed the dest=. Apologies
| def handle(self, *args, **options): | ||
| if options["resource_type"] == "storage" and not options["invoice_name"]: | ||
| logger.error( | ||
| "--invoice-name must be provided when storage type is `storage`." |
There was a problem hiding this comment.
"when resource type is storage."
There was a problem hiding this comment.
My idea is any quota that is relevant for storage billing should have the resource type storage, such as:
QUOTA_REQUESTS_IBM_STORAGE = "OpenShift Request on IBM Storage Quota (GiB)"
QUOTA_REQUESTS_NESE_STORAGE = "OpenShift Request on NESE Storage Quota (GiB)"
There was a problem hiding this comment.
sure, I am just pointing out that the error message says "storage type is storage" instead of "resource type is storage". You are checking options["resource_type"], there's no storage_type
| "--invoice-name", | ||
| type=str, | ||
| default="", | ||
| help="Name of quota as it appears on invoice. Required if --is-storage-type is set.", |
There was a problem hiding this comment.
where's --is-storage-type? Did you mean --resource-type is set to storage?
| else options["name"], | ||
| ) | ||
|
|
||
| # Add common Openshift resources (cpu, memory, etc) |
There was a problem hiding this comment.
remind how were these resources created before this?
There was a problem hiding this comment.
Currently, the information for these quotas are spread in multiple places in the repo. The display names are in attributes.py, the multiplier and static quantities are in tasks.py, other info in other places. The allocation attributes for these quotas were loaded by register_cloud_attributes.py, which consumes the attributes defined in attributes.py.
A by-product of this PR is that now all that info is created and stored in one place.
35273aa to
01021dd
Compare
|
|
||
| def add_arguments(self, parser): | ||
| parser.add_argument( | ||
| "--resource_name", |
There was a problem hiding this comment.
--resource_name -> --resource-name
| help="Name of the Resource to modify.", | ||
| ) | ||
| parser.add_argument( | ||
| "--display_name", |
There was a problem hiding this comment.
--display_name -> --display-name
01021dd to
0cf04ea
Compare
src/coldfront_plugin_cloud/management/commands/add_openshift_resource.py
Outdated
Show resolved
Hide resolved
0cf04ea to
0c68d91
Compare
|
@knikolla @naved001 This PR is waiting on review. This is my last question |
src/coldfront_plugin_cloud/management/commands/register_default_quotas.py
Show resolved
Hide resolved
| def handle(self, *args, **options): | ||
| if options["resource_type"] == "storage" and not options["invoice_name"]: | ||
| logger.error( | ||
| "--invoice-name must be provided when reousrce type is `storage`." |
There was a problem hiding this comment.
type, reousrce -> resource
There was a problem hiding this comment.
lol you don't have to apologize. Ironically, I also made at typo in my original comment: type->typo
| ) | ||
|
|
||
| # Define quotas for each resource type | ||
| openshift_quotas = [ |
There was a problem hiding this comment.
do we need to worry about migrating openshift vm quotas? I know we don't have any current deployment so maybe not?
There was a problem hiding this comment.
It can be done in a follow-up patch if not already here. Management commands can be deployed as a separate version debug container for their execution so they don't necessarily require a maintenance window to upgrade.
There was a problem hiding this comment.
I've included openshift vm resources in the command. However, it will have the quotas defined in openshift_quotas, so some GPU quotas (i.e H100, A100) will be missing
knikolla
left a comment
There was a problem hiding this comment.
Good work! I think this will be ready with a few minor iterations mostly related to fetching rate data from invoicing and some polish here and there.
I'll do a deeper pass in the afternoon but I don't expect anything big to jump out. Again, good work!
src/coldfront_plugin_cloud/management/commands/calculate_storage_gb_hours.py
Outdated
Show resolved
Hide resolved
12789b2 to
29ff0dc
Compare
|
@knikolla @naved001 I have responded to all comments so far. In response to this comment, I decided to use |
knikolla
left a comment
There was a problem hiding this comment.
Looks good, some questions.
src/coldfront_plugin_cloud/management/commands/register_default_quotas.py
Show resolved
Hide resolved
src/coldfront_plugin_cloud/tests/functional/openshift/test_allocation.py
Outdated
Show resolved
Hide resolved
src/coldfront_plugin_cloud/tests/functional/openshift/test_allocation.py
Show resolved
Hide resolved
|
Adding copilot to check if I missed something. If there's valid feedback I will tag you on the relevant comments. |
There was a problem hiding this comment.
Pull request overview
This PR moves quota definitions from hard-coded mappings into per-resource, dynamically managed quota specs stored on the resource, and updates allocators/tasks/commands/tests to consume those specs. It adds management commands to register defaults and to add/remove quotas on a resource so quotas can evolve without code changes.
Changes:
- Introduce
QuotaSpec/QuotaSpecsmodels and load quota specs from a resource attribute (Available Quota Resources) in allocators. - Add management commands to register default quotas and to add/remove quota specs on a resource.
- Refactor OpenShift/OpenStack allocators,
tasks.activate_allocation, allocation validation, and storage billing to use resource-defined quota specs; update functional/unit tests accordingly.
Reviewed changes
Copilot reviewed 26 out of 27 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coldfront_plugin_cloud/tests/unit/test_usage_models.py | Update imports for moved usage_models. |
| src/coldfront_plugin_cloud/tests/unit/test_register_default_quotas.py | New unit tests for register_default_quotas behavior/idempotency. |
| src/coldfront_plugin_cloud/tests/unit/test_fetch_daily_billable_usage.py | Update imports for moved usage_models. |
| src/coldfront_plugin_cloud/tests/unit/test_calculate_quota_unit_hours.py | Ensure test resources have quota specs via add_quota_to_resource. |
| src/coldfront_plugin_cloud/tests/unit/openshift/test_rbac.py | Switch to shared unit OpenShift base allocator. |
| src/coldfront_plugin_cloud/tests/unit/openshift/base.py | Add mock allocator wrapper to avoid real allocator initialization dependencies. |
| src/coldfront_plugin_cloud/tests/functional/openstack/test_allocation.py | Adjust tests to rely on registered default quotas and resource quota specs. |
| src/coldfront_plugin_cloud/tests/functional/openshift_vm/test_allocation.py | Register defaults and customize VM GPU quotas via add/remove quota commands. |
| src/coldfront_plugin_cloud/tests/functional/openshift/test_allocation.py | Register defaults and expand tests around adding/removing quotas dynamically. |
| src/coldfront_plugin_cloud/tests/functional/esi/test_allocations.py | Set up ESI quota specs dynamically in test setup. |
| src/coldfront_plugin_cloud/tests/base.py | Update OpenShift resource creation signature (internal_name) and remove IBM storage flag. |
| src/coldfront_plugin_cloud/tasks.py | Replace hard-coded quota math with QuotaSpec-driven computation. |
| src/coldfront_plugin_cloud/openstack.py | Replace static quota mappings with resource-defined quota specs grouped by service. |
| src/coldfront_plugin_cloud/openshift_vm.py | Remove VM-specific hard-coded quota mapping in favor of resource-defined specs. |
| src/coldfront_plugin_cloud/openshift.py | Remove hard-coded quota mapping; format quotas via resource-defined specs. |
| src/coldfront_plugin_cloud/models/usage_models.py | New module for pydantic usage/charges models (moved from old location). |
| src/coldfront_plugin_cloud/models/quota_models.py | New pydantic models for quota specs and validation. |
| src/coldfront_plugin_cloud/management/commands/validate_allocations.py | Validate allocations based on per-resource quota specs rather than hard-coded mappings. |
| src/coldfront_plugin_cloud/management/commands/remove_quota_from_resource.py | New command to remove a quota spec from a resource. |
| src/coldfront_plugin_cloud/management/commands/register_default_quotas.py | New command to populate default quota specs onto OpenShift/OpenStack resources. |
| src/coldfront_plugin_cloud/management/commands/fetch_daily_billable_usage.py | Update imports for moved usage_models. |
| src/coldfront_plugin_cloud/management/commands/calculate_storage_gb_hours.py | Drive storage billing from resource quota specs of type storage. |
| src/coldfront_plugin_cloud/management/commands/add_quota_to_resource.py | New command to add a quota spec to a resource and create its allocation attribute type. |
| src/coldfront_plugin_cloud/management/commands/add_openshift_resource.py | Remove IBM storage attribute wiring; add internal cluster name support. |
| src/coldfront_plugin_cloud/esi.py | Remove ESI hard-coded quota mapping in favor of inherited dynamic behavior. |
| src/coldfront_plugin_cloud/base.py | Load QuotaSpecs from resource attribute in allocator initialization. |
| src/coldfront_plugin_cloud/attributes.py | Replace IBM storage availability attribute with Available Quota Resources; adjust registered allocation quota attributes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| obj_key = allocator.resource_quotaspecs.root[ | ||
| attributes.QUOTA_OBJECT_GB | ||
| ].quota_label |
There was a problem hiding this comment.
QuotaSpec.quota_label for OpenStack quotas includes a service prefix (e.g. object.x-account-meta-quota-bytes), but OpenStackResourceAllocator.get_quota() returns quota keys without the service prefix (e.g. x-account-meta-quota-bytes). Using the full quota_label here means the Swift/object-quota assertions will never run (and the later indexing by obj_key would be wrong if it did). Consider splitting on the first . and using just the per-service key when interacting with get_quota() / Swift headers.
| obj_key = allocator.resource_quotaspecs.root[ | |
| attributes.QUOTA_OBJECT_GB | |
| ].quota_label | |
| full_obj_quota_label = allocator.resource_quotaspecs.root[ | |
| attributes.QUOTA_OBJECT_GB | |
| ].quota_label | |
| obj_key = ( | |
| full_obj_quota_label.split(".", 1)[1] | |
| if "." in full_obj_quota_label | |
| else full_obj_quota_label | |
| ) |
There was a problem hiding this comment.
This is a valid bug
| @@ -157,10 +154,12 @@ def handle(self, *args, **options): | |||
|
|
|||
| expected_value = allocation.get_attribute(attr) | |||
| current_value = quota.get(key, None) | |||
| if key == obj_key and expected_value <= 0: | |||
| if key == OPENSTACK_OBJ_KEY and expected_value <= 0: | |||
| expected_obj_value = 1 | |||
There was a problem hiding this comment.
In the OpenStack validation loop, key = quotaspec.quota_label uses the full label (e.g. compute.instances / object.x-account-meta-quota-bytes), but OpenStackResourceAllocator.get_quota() returns a dict keyed by the service-local label (e.g. instances / x-account-meta-quota-bytes). This mismatch makes current_value lookup incorrect and also prevents the OPENSTACK_OBJ_KEY special-case from ever triggering. Consider splitting the quotaspec label (same logic as OpenStackResourceAllocator._extract_quota_label) and validating against the per-service key.
There was a problem hiding this comment.
Valid. Added assertion on log message to capture bug
| for coldfront_attr in expected_coldfront_attrs: | ||
| resource_quotaspecs = allocator.resource_quotaspecs | ||
| for coldfront_attr, quota_spec in resource_quotaspecs.root.items(): | ||
| if not allocation.get_attribute(coldfront_attr): |
There was a problem hiding this comment.
if not allocation.get_attribute(coldfront_attr): treats a legitimate quota value of 0 as “missing” and will overwrite it with the computed default. Since 0 is a valid quota for several specs, this should check explicitly for is None (or otherwise distinguish unset vs zero).
| if not allocation.get_attribute(coldfront_attr): | |
| if allocation.get_attribute(coldfront_attr) is None: |
There was a problem hiding this comment.
@QuanMPhm this seems like a bug in the current implementation of the code (not PR) you may apply this while you're at it.
There was a problem hiding this comment.
Added small change to test case
| resource_quota_attr = resource_models.ResourceAttribute.objects.get( | ||
| resource=resource, | ||
| resource_attribute_type__name=attributes.RESOURCE_QUOTA_RESOURCES, | ||
| ) | ||
| self.resource_quotaspecs = QuotaSpecs.model_validate( | ||
| json.loads(resource_quota_attr.value) | ||
| ) |
There was a problem hiding this comment.
ResourceAllocator.__init__ unconditionally does ResourceAttribute.objects.get(...RESOURCE_QUOTA_RESOURCES...). If a resource doesn’t have this attribute yet (e.g. newly created resources before register_default_quotas / add_quota_to_resource runs), allocator construction will raise and break workflows like activate_allocation and validate_allocations. Consider handling the missing attribute case with a clearer error (e.g. raise CommandError with guidance) or defaulting to an empty quota spec set where appropriate.
| resource_quota_attr = resource_models.ResourceAttribute.objects.get( | |
| resource=resource, | |
| resource_attribute_type__name=attributes.RESOURCE_QUOTA_RESOURCES, | |
| ) | |
| self.resource_quotaspecs = QuotaSpecs.model_validate( | |
| json.loads(resource_quota_attr.value) | |
| ) | |
| try: | |
| resource_quota_attr = resource_models.ResourceAttribute.objects.get( | |
| resource=resource, | |
| resource_attribute_type__name=attributes.RESOURCE_QUOTA_RESOURCES, | |
| ) | |
| self.resource_quotaspecs = QuotaSpecs.model_validate( | |
| json.loads(resource_quota_attr.value) | |
| ) | |
| except resource_models.ResourceAttribute.DoesNotExist: | |
| # If the resource does not yet have quota resources configured, | |
| # default to an empty quota specification set. | |
| self.resource_quotaspecs = QuotaSpecs.model_validate({}) |
There was a problem hiding this comment.
Added a clearer error message
src/coldfront_plugin_cloud/management/commands/remove_quota_from_resource.py
Show resolved
Hide resolved
| type=str, | ||
| help="Name of the Resource to modify.", | ||
| ) | ||
| parser.add_argument( | ||
| "--display-name", | ||
| type=str, |
There was a problem hiding this comment.
The command-line args --resource-name and --display-name are not marked required=True. As written, running the command without them will pass None into Resource.objects.get(name=...) and fail with an unhelpful exception. Make these arguments required (and/or validate and raise a CommandError with a clear message).
| type=str, | |
| help="Name of the Resource to modify.", | |
| ) | |
| parser.add_argument( | |
| "--display-name", | |
| type=str, | |
| type=str, | |
| required=True, | |
| help="Name of the Resource to modify.", | |
| ) | |
| parser.add_argument( | |
| "--display-name", | |
| type=str, | |
| required=True, |
Closes nerc-project/operations#1391. This is how I would suggest to review this PR.
Two CLI commands have been added,
add_quota_to_resource.pyandremove_quota_from_resource.py. I would suggest understanding those two commands first. These commands allow us to dynamically add/remove quotas instead of having them hard-coded as they are currently done. These commands don't impact the quota objects in the clusters, nor the quota attributes in allocations. Their full impact is illustrated when used within the typical user workflow, or in tandem withvalidate_allocations.py. I would now suggest checking the changes tofunctional/openshift/test_allocations.pyto see the full implications of this PR. The other functional test cases only contain minor changes.Afterwards,
tasks.py,validate_allocations.py, and the allocator base and subclasses should be reviewed. They are the main consumers of quota information. All other changes relatively minor.Follow-up issues to write based on these comments:
openshift vm quotas
idempotency for
register_default_quotasstorage billing
Clean-up hard-coded attributes