From 4e39bb1c50aca90a8ffcb0161745058ac13febd9 Mon Sep 17 00:00:00 2001 From: Kristi Nikolla Date: Wed, 21 Jan 2026 09:27:34 -0500 Subject: [PATCH 1/2] NERC Allocation Revocation Workflow --- policies/0009-nerc-allocation-revocation.md | 73 +++++++++++++++++++++ 1 file changed, 73 insertions(+) create mode 100644 policies/0009-nerc-allocation-revocation.md diff --git a/policies/0009-nerc-allocation-revocation.md b/policies/0009-nerc-allocation-revocation.md new file mode 100644 index 0000000..30ea74a --- /dev/null +++ b/policies/0009-nerc-allocation-revocation.md @@ -0,0 +1,73 @@ +### Summary + +This proposal defines a standardized workflow for handling resource allocations in ColdFront that transition into a **Revoked** status. It establishes clear timelines for access termination, data retention grace periods, and administrative overrides, ensuring automated lifecycle management and manual oversight if necessary. + +### Motivation + +Currently, the transition from an active allocation to total resource deletion requires a manual and granular approach. We need to: + +- **Automate lifecycle management** to simplify administration. +- **Provide clear communication** to users regarding data deletion deadlines. +- **Support "Special Case" scenarios** where data must be preserved for legal or security reasons without charging the user or allowing access. + +### User Stories + +- **As an Admin,** I want the system to automatically revoke access when an allocation reaches its end date (expires). +- **As a PI,** I want to receive multiple notifications before my allocation is expired, access is revoked, and storage is deleted. +- **As an Admin,** I want to "suspend" an allocation (Special Case: No Deletion) so that I can investigate an incident without the risk of automated scripts deleting the evidence. +- **As an Admin,** I want to manually override revocation timings to accommodate valid user appeals or policy exceptions. + +### Proposal + +#### 1. The Expiration / Revocation Lifecycle + +Each allocation has an end date at which point the allocation automatically enters **"Active (Needs Renewal)"** status. + +The transition to **Revoked** occurs automatically 30 days after an allocation enters **"Active (Needs Renewal)"**. It can also be triggered manually by an admin. + +| **Phase** | **Duration** (Days) | **System Actions** | **User Impact** | +|--------------------------|---------------------|------------------------------------------------------------------|-------------------------------------------------------------------| +| **Renewal Grace Period** | 0-30 | Status changed to **Active (Needs Renewal)**. Notification sent. | No impact. | +| **Revocation Trigger** | 30 | Status changed to **Revoked**. Notification sent. | All cluster access disabled. Compute and networking stopped. | +| **Storage Grace Period** | 30-50 | Storage retained. | Access remains blocked. Possibilty of data recovery if requested. | +| **Storage Grace Period** | 50 | Storage deleted. | | + +For OpenStack, revocation is implemented by deleting all VMs and networking objects and switching the project status to disable to prevent further access. Object storage and volumes are preserved. After the storage grace period the remaining storage resources are deleted. + +For OpenShift, revocation is implemented by deleting all pods, deployments, jobs, cronjobs and other resources that pertain to compute or networking. Persistent Volumes, ConfigMaps and Secrets are preserved during the storage grace period. After the storage grace period those resources are deleted too. + +#### 2. Notification Schedule + +To ensure data integrity, users will receive automated reminders during the 20-day storage grace period: + +- **Initial:** Upon entering Revoked status. +- **Reminder 1:** 14 days after revocation. +- **Reminder 2:** 7 days before deletion. +- **Final Warning:** 2 days before deletion. + +#### 3. Special Case: Administrative Hold (No Deletion) + +For legal holds or security incidents, a new behavior is proposed. While the allocation appears "Revoked" to the user, the system handles it as follows: + +- **Status:** Switches to a new "Suspended" +- **Access:** Only Admins retain access to compute/storage. + - Admins will need to manually decide whether to stop VMs/pods/networking on a case by case basis. +- **Billing:** Billing is disabled. +- **Persistence:** Storage and compute is retained pending admin action or suspension being lifted. + +#### 4. Manual Overrides + +Admins have the authority to: + +- Extend end dates of projects manually. +- Revoke or reinstate projects ahead of time by switching their status. + +### Drawbacks + +- **Storage Costs:** Maintaining a 20-day grace period for all revoked allocations increases storage overhead. +- **Complexity:** Managing "Special Case" holds requires has an impact on billing scripts. + +### Alternatives + +- **Keep suspended projects in Active state:** Revoking access and deleting storage simultaneously. (Rejected: Too high risk for data loss). +- **Continue with Manual Lifecycle Management:** Continuing to rely on admins to manually move states and clean up allocation. Not scalable for high-volume environments. From fa4c0cce4f07dd05863d443ad16cb56916b81ebe Mon Sep 17 00:00:00 2001 From: Kristi Nikolla Date: Wed, 21 Jan 2026 10:50:12 -0500 Subject: [PATCH 2/2] Fixes comments by Kim --- policies/0009-nerc-allocation-revocation.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/policies/0009-nerc-allocation-revocation.md b/policies/0009-nerc-allocation-revocation.md index 30ea74a..1a0c247 100644 --- a/policies/0009-nerc-allocation-revocation.md +++ b/policies/0009-nerc-allocation-revocation.md @@ -30,27 +30,26 @@ The transition to **Revoked** occurs automatically 30 days after an allocation e | **Renewal Grace Period** | 0-30 | Status changed to **Active (Needs Renewal)**. Notification sent. | No impact. | | **Revocation Trigger** | 30 | Status changed to **Revoked**. Notification sent. | All cluster access disabled. Compute and networking stopped. | | **Storage Grace Period** | 30-50 | Storage retained. | Access remains blocked. Possibilty of data recovery if requested. | -| **Storage Grace Period** | 50 | Storage deleted. | | +| **Storage Deletion** | 50 | Storage deleted. | | -For OpenStack, revocation is implemented by deleting all VMs and networking objects and switching the project status to disable to prevent further access. Object storage and volumes are preserved. After the storage grace period the remaining storage resources are deleted. +For OpenStack, revocation is implemented by deleting all VMs and networking objects and switching the project status to disabled to prevent further access. Object storage and volumes are preserved. After the storage grace period, the remaining storage resources are deleted. -For OpenShift, revocation is implemented by deleting all pods, deployments, jobs, cronjobs and other resources that pertain to compute or networking. Persistent Volumes, ConfigMaps and Secrets are preserved during the storage grace period. After the storage grace period those resources are deleted too. +For OpenShift, revocation is implemented by deleting all Pods, Deployments, Jobs, CronJobs and other resources that pertain to compute or networking. Persistent Volumes, ConfigMaps and Secrets are preserved during the storage grace period. After the storage grace period, those resources are deleted too. #### 2. Notification Schedule To ensure data integrity, users will receive automated reminders during the 20-day storage grace period: - **Initial:** Upon entering Revoked status. -- **Reminder 1:** 14 days after revocation. -- **Reminder 2:** 7 days before deletion. +- **Reminder:** 7 days before deletion. - **Final Warning:** 2 days before deletion. #### 3. Special Case: Administrative Hold (No Deletion) -For legal holds or security incidents, a new behavior is proposed. While the allocation appears "Revoked" to the user, the system handles it as follows: +For legal holds or security incidents, a new behavior is proposed. - **Status:** Switches to a new "Suspended" -- **Access:** Only Admins retain access to compute/storage. +- **Access:** Only Admins retain access to compute/storage, user access is removed. - Admins will need to manually decide whether to stop VMs/pods/networking on a case by case basis. - **Billing:** Billing is disabled. - **Persistence:** Storage and compute is retained pending admin action or suspension being lifted. @@ -65,9 +64,9 @@ Admins have the authority to: ### Drawbacks - **Storage Costs:** Maintaining a 20-day grace period for all revoked allocations increases storage overhead. -- **Complexity:** Managing "Special Case" holds requires has an impact on billing scripts. +- **Complexity:** Selectively removing only specific resources from an allocation while preserving data is more complex than deleting the entire OpenStack project or OpenShift namespace. ### Alternatives -- **Keep suspended projects in Active state:** Revoking access and deleting storage simultaneously. (Rejected: Too high risk for data loss). +- **Keep suspended (administrative hold) projects in Active state:** Possibility of confusion and harder to distinguish in billing. - **Continue with Manual Lifecycle Management:** Continuing to rely on admins to manually move states and clean up allocation. Not scalable for high-volume environments.