Allow dynamic quota creation and removal by QuanMPhm · Pull Request #287 · nerc-project/coldfront-plugin-cloud

QuanMPhm · 2026-01-21T16:13:06Z

Closes nerc-project/operations#1391. This is how I would suggest to review this PR.

Two CLI commands have been added, add_quota_to_resource.py and remove_quota_from_resource.py. I would suggest understanding those two commands first. These commands allow us to dynamically add/remove quotas instead of having them hard-coded as they are currently done. These commands don't impact the quota objects in the clusters, nor the quota attributes in allocations. Their full impact is illustrated when used within the typical user workflow, or in tandem with validate_allocations.py. I would now suggest checking the changes to functional/openshift/test_allocations.py to see the full implications of this PR. The other functional test cases only contain minor changes.

Afterwards, tasks.py, validate_allocations.py, and the allocator base and subclasses should be reviewed. They are the main consumers of quota information. All other changes relatively minor.

This is a draft for now since I have some questions, and the tests are failing. I just wanted people to know my general direction with this feature.

I will wait for people's feedback before continuing work on this PR, since I assume substantial feedback will be given.

QuanMPhm · 2026-01-21T16:13:50Z

src/coldfront_plugin_cloud/management/commands/add_quota_to_resource.py

+            defaults={"value": json.dumps(new_quota_dict)},
+        )
+
+        # TODO (Quan): Dict update allows migration of existing quotas. This is fine?


@knikolla @jtriley This is a pre-existing feature, so I assume the answer is yes. Just to make sure.

I don't think I fully understand this comment. Can you elaborate?

We currently allow migrating the quota's cluster label (i.e "limits.cpu" for Openshift CPUs) by changing the hardcoded values in the QUOTA_KEY_MAPPING of the appropriate allocator. This migration feature is demonstrated in the functional test that I linked.

Below my TODO comment:

if not created: available_quotas_dict = json.loads(available_quotas_attr.value) available_quotas_dict.update(new_quota_dict) QuotaSpecs.model_validate(available_quotas_dict) # Validate uniqueness available_quotas_attr.value = json.dumps(available_quotas_dict) available_quotas_attr.save()

I wanted to show that this migration feature will still be available, because if you decide to add the same quota to the same resource, available_quotas_dict.update(new_quota_dict) means you can update/migrate everything about the quota, including its cluster label (with the exception of the display name, which you've mentioned and I responded here).

Understood.

QuanMPhm · 2026-01-21T16:15:56Z

src/coldfront_plugin_cloud/management/commands/calculate_storage_gb_hours.py

-                    "OpenStack Storage",
-                    openstack_nese_storage_rate,
-                )
+                # TODO (Quan): An illustration of how billing could be simplified. Shuold I follow with this?


@knikolla I couldn't do the same refactoring for the Openshift allocations because different storages have their own rates. I could have refactored the code further to circumvent that issue, but I didn't want the PR to be too long.

We need to find a way to fetch the rate from nerc-rates (as we discussed in a different comment thread in this PR) so that we can generate invoice rows for OpenShift storage. With the current changes this won't work because IBM storage is hardcoded.

Can I leave the refactoring of calculate_storage_gb_hours to a follow-up PR? Given my suggestion, the refactoring will also have to be coordinated with updates to nerc-rates. Since this is a management command, a maintenance window won't be necessary?

QuanMPhm · 2026-01-21T16:19:52Z

src/coldfront_plugin_cloud/tests/functional/openshift/test_allocation.py

+            },
+        )
+
+        # TODO (Quan): What happens when a quota is removed? Should the attribute be removed from Coldfront?


@knikolla @jtriley @joachimweyl This also has implications for billing storage. This test case is failing here since I would like people's consensus on desired behavior.

My hunch is no, but I want to wait for @knikolla input

For now just have the quota be removed from the Resource Attribute but untouched in the allocations.

QuanMPhm · 2026-01-29T17:28:11Z

@knikolla I addressed all your suggestions on Slack except one:

To migrate the display name of an attribute
Before, since the attributes were stored in code, the migrations were also stored in code
Now, since the adding of new quota is a command, migrating the display name of an attribute should also be a command.

May I ask that I implement this feature in a subsequent PR, to prevent this PR from bloating even more? If not, I will implement this after I receive answers for my questions above.

joachimweyl · 2026-01-29T17:31:50Z

What is the impact of this omission?

QuanMPhm · 2026-01-29T21:47:44Z

@joachimweyl The impact will be that to change the display names of attributes (the names that users will see in the Coldfront UI, i.e OpenShift Limit on CPU Quota) will be a bit inconvenient. An admin will have to do some manual renaming in Coldfront. Still doable, but not in a way that's quick and programmatic. We ideally want a CLI command that makes renaming easier, but I didn't want this PR to take too long to review because of the February maintenance.

joachimweyl · 2026-01-30T16:01:19Z

Makes sense to me.

src/coldfront_plugin_cloud/models/quota_models.py

knikolla · 2026-02-05T14:52:30Z

src/coldfront_plugin_cloud/openstack.py

    def _get_network_quota(self, quotas, project_id):
        network_quota = self.network.show_quota(project_id)["quota"]
-        for k in self.QUOTA_KEY_MAPPING["network"]["keys"].values():
+        for cf_k in self.SERVICE_QUOTA_MAPPING["network"]:


You could have used the resource_type field of the QuotaSpec here. This will result in an error if not all quotaspecs are defined for OpenStack resources.

My original use for the resource_type field was to identify quotas that are processed by the storage billing script. If you believe I should also have a field that identifies a resource's Openstack resource type, is it fine if I have two fields then? Like:

resource_type: str # Which Openstack service (i.e compute, object) does a quota belong to? is_for_storage_billing: bool # Is the quota checked by the storage billing script?

@QuanMPhm Wouldn't resource_type == "storage" be the same thing?

As, I see what you mean. In OpenStack it would be volume, so there isn't a 1-1 mapping. I really don't want to introduce a new is_for_storage_billing parameter, so perhaps you could use quota label in a specific way for openstack. For example volume.volumes or compute.vcpu with the part before the . signifying the service and the latter the type.

So that I know moving forward, what is your reasoning against a new is_for_storage_billing parameter when compared to the other option? It seems either case, there's some "new information" that the developer has to be aware about when maintaining the code (a new quota field, or how the Openstack quota label is parsed), which makes them seem equally burdensome to me.

This is one of those situations where the burden becomes clear as the project evolves and new requirements are added.

If tomorrow we need to treat network quotas and some other type of quotas differently, it is easier to check resource_type == "network" then to add a new is_network_quota or is_gpu_quota. You'd need N attributes for N different resource types as opposed to one attribute with a flexible string.

The developer will have to be aware of this regardless.

src/coldfront_plugin_cloud/management/commands/add_quota_to_resource.py

knikolla · 2026-02-05T14:57:54Z

src/coldfront_plugin_cloud/management/commands/add_quota_to_resource.py

+            defaults={"value": json.dumps(new_quota_dict)},
+        )
+
+        # TODO (Quan): Dict update allows migration of existing quotas. This is fine?


I don't think I fully understand this comment. Can you elaborate?

src/coldfront_plugin_cloud/models/quota_models.py

src/coldfront_plugin_cloud/management/commands/add_openshift_resource.py

naved001

thanks @QuanMPhm! Some basic questions in there as I try to refresh my memory of coldfront. Will do another pass.

naved001 · 2026-02-05T15:00:21Z

src/coldfront_plugin_cloud/management/commands/add_quota_to_resource.py

+class Command(BaseCommand):
+    def add_arguments(self, parser):
+        parser.add_argument(
+            "--display_name",


you have an underscore instead of a dash.

--display_name -> --display-name

naved001 · 2026-02-05T15:03:51Z

src/coldfront_plugin_cloud/management/commands/add_quota_to_resource.py

+            help="The default quota value for the storage attribute. In GB",
+        )
+        parser.add_argument(
+            "--resource_name",


--resource_name -> --resource-name

naved001 · 2026-02-05T15:09:50Z

src/coldfront_plugin_cloud/management/commands/add_quota_to_resource.py

+            type=str,
+            default="",
+            help="Name of quota as it appears on invoice. Required if --is-storage-type is set.",
+        )


how come you didn't specify dest= for some of these arguments?

I normally wouldn't include dest=, and didn't review closely enough what Copilot generated this code for me. I've removed the dest=. Apologies

naved001 · 2026-02-05T16:01:14Z

src/coldfront_plugin_cloud/management/commands/add_quota_to_resource.py

+    def handle(self, *args, **options):
+        if options["resource_type"] == "storage" and not options["invoice_name"]:
+            logger.error(
+                "--invoice-name must be provided when storage type is  `storage`."


"when resource type is storage."

My idea is any quota that is relevant for storage billing should have the resource type storage, such as:

QUOTA_REQUESTS_IBM_STORAGE = "OpenShift Request on IBM Storage Quota (GiB)" QUOTA_REQUESTS_NESE_STORAGE = "OpenShift Request on NESE Storage Quota (GiB)"

sure, I am just pointing out that the error message says "storage type is storage" instead of "resource type is storage". You are checking options["resource_type"], there's no storage_type

naved001 · 2026-02-05T16:15:31Z

src/coldfront_plugin_cloud/management/commands/add_quota_to_resource.py

+            "--invoice-name",
+            type=str,
+            default="",
+            help="Name of quota as it appears on invoice. Required if --is-storage-type is set.",


where's --is-storage-type? Did you mean --resource-type is set to storage?

Ah yes. My bad

naved001 · 2026-02-05T16:38:47Z

src/coldfront_plugin_cloud/management/commands/add_openshift_resource.py

            else options["name"],
        )
+
+        # Add common Openshift resources (cpu, memory, etc)


remind how were these resources created before this?

Currently, the information for these quotas are spread in multiple places in the repo. The display names are in attributes.py, the multiplier and static quantities are in tasks.py, other info in other places. The allocation attributes for these quotas were loaded by register_cloud_attributes.py, which consumes the attributes defined in attributes.py.

A by-product of this PR is that now all that info is created and stored in one place.

naved001 · 2026-02-11T16:38:29Z

src/coldfront_plugin_cloud/management/commands/remove_quota_from_resource.py

+
+    def add_arguments(self, parser):
+        parser.add_argument(
+            "--resource_name",


--resource_name -> --resource-name

naved001 · 2026-02-11T16:38:47Z

src/coldfront_plugin_cloud/management/commands/remove_quota_from_resource.py

+            help="Name of the Resource to modify.",
+        )
+        parser.add_argument(
+            "--display_name",


--display_name -> --display-name

knikolla · 2026-02-12T16:59:45Z

src/coldfront_plugin_cloud/management/commands/add_openshift_resource.py

+        # Add common Openshift resources (cpu, memory, etc)
+        call_command(
+            "add_quota_to_resource",
+            display_name=attributes.QUOTA_LIMITS_CPU,


Is it still necessary to keep these values hardcoded in the attributes module?

I didn't remove the hardcoded values from attributes since a few files still need to reference the quota strings, namely in test cases, in openstack.py, and storage billing, and count_gpu_usage.py. I thought it didn't make sense to move the hardcoded strings to the test files or elsewhere, since that just felt like moving the problem somewhere else, and would lead to a lot of cleanup.

With how the test cases are, I can't see how the hardcoded strings can be entirely removed.

I would like to see a follow-up PR at some point that moves all the quota attributes from attributes.py to a module within the tests to send a strong signal that the hardcoded attributes should only ever be used for the test cases.

openstack.py and storage billing shouldn't need to use the hardcoded attributes. Why do they need to?

Given all your suggestions so far, I'll remove the hardcoded attributes from openstack.py. For storage billing, I just need to ask one last thing:

To bill for storage, the storage script needs two things:

Name of allocation attribute to check

Name of the storage's su charge on nerc-rates

First piece of info can be in resource_type as we discussed before. I would like your thumbs up on adding a second field for the nerc-rates key. Something like nerc_rates_key?

Name of allocation attribute to check = [ display name of quotas within a given allocation that as per the QuotaSpec of that Resource are of resource_type == storage ]

I don't like the idea of introducing a key that is specific to our instance of deployment of ColdFront by naming it nerc_rates_key. I would like all NERC specific business logic to be restricted to the management command files whenever possible. It should be possible to fetch a Storage rate from the NERC rates file by having the name of the attribute conform to a specific form. Right now we only have rates for NESE and therefore whenever we need to introduce a new rate we can have it conform to a specific convention.

@knikolla @naved001 @joachimweyl Would a substring search suffice for matching a Coldfront display name to a nerc-rates key?

I.e for a storage quota attribute like OpenShift Request on NESE Storage Quota (GiB), it can match to the nerc-rates key NESE Storage GB Rate if we only check for the substring NESE Storage in both strings.

It can be even simpler if we rename the quota attribute to OpenShift Request on NESE Storage (GiB) Quota and the nerc-rates key to NESE Storage (GiB) Rate. Searching on NESE Storage (GiB) would allow generalization on units.

I think that makes sense.

QuanMPhm · 2026-02-13T16:48:59Z

@knikolla @naved001 A one-time command migrate_resource_quota.py has been added.

EDIT: Sorry I've missed some comments. I get to work on them

QuanMPhm · 2026-02-25T19:41:01Z

@knikolla @naved001 This PR is waiting on review. This is my last question

naved001 · 2026-02-26T14:33:21Z

src/coldfront_plugin_cloud/management/commands/register_default_quotas.py

+
+
+class Command(BaseCommand):
+    help = """One time command to migrate quotas to each Openshift and Openstack resource"""


would anything bad happen if I run this command twice? Just wondering since it's mentioned "One time" use only.

Is this command idempotent? If it isn't, it should be.

If this command was called register_default_quotas and its purpose was to register default quotas (which at this current moment is the same as migrating resource quotas) then you could remove the duplicated code from add_openshift_resource that does the same (and also from the tests). (This second point can be done in a follow-up patch)

I will make sure the command is idempotent, and will add a test case for it.
@knikolla @naved001 Do note that the quotas expected on Openstack and ESI resources are quite different. ESI only requires floating IPs and networks, at least, in the current state of the code. This means if we're adding an ESI resource, quotas may need to be manually added through add_quota_to_resource

naved001 · 2026-02-26T14:33:57Z

src/coldfront_plugin_cloud/management/commands/add_quota_to_resource.py

+    def handle(self, *args, **options):
+        if options["resource_type"] == "storage" and not options["invoice_name"]:
+            logger.error(
+                "--invoice-name must be provided when reousrce type is `storage`."


type, reousrce -> resource

lol you don't have to apologize. Ironically, I also made at typo in my original comment: type->typo

naved001 · 2026-02-26T14:35:23Z

src/coldfront_plugin_cloud/management/commands/register_default_quotas.py

+            )
+
+        # Define quotas for each resource type
+        openshift_quotas = [


do we need to worry about migrating openshift vm quotas? I know we don't have any current deployment so maybe not?

It can be done in a follow-up patch if not already here. Management commands can be deployed as a separate version debug container for their execution so they don't necessarily require a maintenance window to upgrade.

knikolla

Good work! I think this will be ready with a few minor iterations mostly related to fetching rate data from invoicing and some polish here and there.

I'll do a deeper pass in the afternoon but I don't expect anything big to jump out. Again, good work!

knikolla · 2026-02-26T15:15:49Z

src/coldfront_plugin_cloud/models/quota_models.py

+class QuotaSpec(pydantic.BaseModel):
+    """
+    Fields:
+    - quota_label: human readable label for the quota (must be unique across the dict)


This description doesn't match what this is. This is the cluster side identifier of the quota.

knikolla · 2026-02-26T15:22:14Z

src/coldfront_plugin_cloud/management/commands/calculate_storage_gb_hours.py

-                    "OpenStack Storage",
-                    openstack_nese_storage_rate,
-                )
+                # TODO (Quan): An illustration of how billing could be simplified. Shuold I follow with this?


We need to find a way to fetch the rate from nerc-rates (as we discussed in a different comment thread in this PR) so that we can generate invoice rows for OpenShift storage. With the current changes this won't work because IBM storage is hardcoded.

knikolla · 2026-02-26T15:25:00Z

src/coldfront_plugin_cloud/openstack.py

-        quota_key: quota_name
-        for k in QUOTA_KEY_MAPPING.values()
-        for quota_key, quota_name in k["keys"].items()
+    SERVICE_QUOTA_MAPPING = {


I don't think is necessary any longer? You're only using it in set_quota to fetch the list of quotas and that can be gotten from the resource quota spec by grouping on the resource key.

QuanMPhm requested review from Milstein, jtriley, knikolla and naved001 January 21, 2026 16:13

QuanMPhm commented Jan 21, 2026

View reviewed changes

QuanMPhm marked this pull request as draft January 21, 2026 18:05

QuanMPhm force-pushed the ops_1391/final branch 6 times, most recently from b3c58d8 to 35273aa Compare January 29, 2026 17:08

QuanMPhm marked this pull request as ready for review February 4, 2026 18:24

knikolla reviewed Feb 5, 2026

View reviewed changes

src/coldfront_plugin_cloud/models/quota_models.py Outdated Show resolved Hide resolved

knikolla reviewed Feb 5, 2026

View reviewed changes

src/coldfront_plugin_cloud/models/quota_models.py Outdated Show resolved Hide resolved

knikolla reviewed Feb 5, 2026

View reviewed changes

src/coldfront_plugin_cloud/models/quota_models.py Show resolved Hide resolved

knikolla requested changes Feb 5, 2026

View reviewed changes

naved001 reviewed Feb 5, 2026

View reviewed changes

QuanMPhm force-pushed the ops_1391/final branch from 35273aa to 01021dd Compare February 5, 2026 20:10

naved001 reviewed Feb 11, 2026

View reviewed changes

QuanMPhm force-pushed the ops_1391/final branch from 01021dd to 0cf04ea Compare February 12, 2026 16:57

knikolla reviewed Feb 12, 2026

View reviewed changes

QuanMPhm requested a review from knikolla February 13, 2026 16:49

QuanMPhm requested review from joachimweyl and naved001 February 13, 2026 16:49

QuanMPhm force-pushed the ops_1391/final branch from 0cf04ea to 0c68d91 Compare February 13, 2026 16:49

naved001 reviewed Feb 26, 2026

View reviewed changes

knikolla reviewed Feb 26, 2026

View reviewed changes

Allow dynamic quota creation and removal

0596382

QuanMPhm force-pushed the ops_1391/final branch from 0c68d91 to 0596382 Compare February 27, 2026 05:11



		class Command(BaseCommand):
		help = """One time command to migrate quotas to each Openshift and Openstack resource"""

Conversation

QuanMPhm commented Jan 21, 2026

Uh oh!

QuanMPhm Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joachimweyl Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QuanMPhm commented Jan 29, 2026

Uh oh!

joachimweyl commented Jan 29, 2026

Uh oh!

QuanMPhm commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joachimweyl commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

naved001 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QuanMPhm Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

naved001 Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QuanMPhm Jan 21, 2026 •

edited

Loading

joachimweyl Feb 4, 2026 •

edited

Loading

QuanMPhm commented Jan 29, 2026 •

edited

Loading

joachimweyl commented Jan 30, 2026 •

edited

Loading

QuanMPhm Feb 5, 2026 •

edited

Loading

naved001 Feb 6, 2026 •

edited

Loading

QuanMPhm commented Feb 13, 2026 •

edited

Loading