Skip to content

NERC Allocation Revocation Workflow#28

Open
knikolla wants to merge 2 commits intoCCI-MOC:masterfrom
knikolla:nerc/coldfront/revocation
Open

NERC Allocation Revocation Workflow#28
knikolla wants to merge 2 commits intoCCI-MOC:masterfrom
knikolla:nerc/coldfront/revocation

Conversation

@knikolla
Copy link

No description provided.

Copy link
Contributor

@joachimweyl joachimweyl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few small changes and a question or two added in comments.

@knikolla knikolla requested a review from joachimweyl January 21, 2026 15:50

| **Phase** | **Duration** (Days) | **System Actions** | **User Impact** |
|--------------------------|---------------------|------------------------------------------------------------------|-------------------------------------------------------------------|
| **Renewal Grace Period** | 0-30 | Status changed to **Active (Needs Renewal)**. Notification sent. | No impact. |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once an allocation’s status changes to Active (Needs Renewal), the user must follow up with an administrator. At this stage, administrative intervention is required to manually update the status to Active, then only ColdFront enables the user to submit change requests.

Please ensure that when the admin updates the allocation status, the End Date is extended by one year (or some extention period?) during this update.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct.


For OpenStack, revocation is implemented by deleting all VMs and networking objects and switching the project status to disabled to prevent further access. Object storage and volumes are preserved. After the storage grace period, the remaining storage resources are deleted.

For OpenShift, revocation is implemented by deleting all Pods, Deployments, Jobs, CronJobs and other resources that pertain to compute or networking. Persistent Volumes, ConfigMaps and Secrets are preserved during the storage grace period. After the storage grace period, those resources are deleted too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including the namespace?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The namespace would be the very last thing that gets deleted, after the storage grace period. We can also preserve the namespace, if you prefer that. Are there any reasons in particular for preserving it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knikolla, how does invoicing use namespaces? Will deleting it mid-month cause any issues with the invoice script?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how there will be any issues, but just to triple check:

@naved001 would deleting a namespace have any effect on collection of metrics up to its point of deletion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knikolla I don't think it should matter. In my mind it's the same as pod's metrics will remain (up to the retention period) even after a pod object is deleted. Should be the same for all the pods in a namespace that gets deleted.

And we also ship off metrics every day to s3 anyway.

That being said I will do a quick test and update this comment - can't be too careful with billing stuff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I tested this and I can say it's safe to delete the namespace. 4 hours after deleting the namespace, I can query prometheus and get the metrics for the pods in the deleted namespace.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking!

Copy link

@Milstein Milstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review my comments.


#### 1. The Expiration / Revocation Lifecycle

Each allocation has an end date at which point the allocation automatically enters **"Active (Needs Renewal)"** status.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original version was, "At 30 days before End Date, the allocation changes to Active (Needs Renewal)". That would align with the data that the PI sets when they create the allocation. Is that what is still happening?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way they will be able to renew themsleves until the status is revoked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Milstein discussed with Kristi, Quan, Kim.
Proposal:
1.30 days before the end date they see a button at 30 days out that says "Expires in: " (we will no longer show Active Needs Renewal)
2. At the end of 30 days it goes to expired, turned off VMs and/or pods, access to PI and teams turned off. Admin action to turn back on. They will likely lose state.
3. At the end of the 30 days in expired status we switch to revoked and delete storage and other resources.

  1. testing: 4 months of expiration happening without happening.
  2. manual method for approving revocation, at least at first.
  3. Will need active communication during testing and rollout plan warning folks about this change (folks have been ignoring them).

This approach returns us to normal coldfront period. Kristi plans to rewrite parts of the proposal to capture this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants