diff --git a/policies/0009-nerc-allocation-revocation.md b/policies/0009-nerc-allocation-revocation.md new file mode 100644 index 0000000..1a0c247 --- /dev/null +++ b/policies/0009-nerc-allocation-revocation.md @@ -0,0 +1,72 @@ +### Summary + +This proposal defines a standardized workflow for handling resource allocations in ColdFront that transition into a **Revoked** status. It establishes clear timelines for access termination, data retention grace periods, and administrative overrides, ensuring automated lifecycle management and manual oversight if necessary. + +### Motivation + +Currently, the transition from an active allocation to total resource deletion requires a manual and granular approach. We need to: + +- **Automate lifecycle management** to simplify administration. +- **Provide clear communication** to users regarding data deletion deadlines. +- **Support "Special Case" scenarios** where data must be preserved for legal or security reasons without charging the user or allowing access. + +### User Stories + +- **As an Admin,** I want the system to automatically revoke access when an allocation reaches its end date (expires). +- **As a PI,** I want to receive multiple notifications before my allocation is expired, access is revoked, and storage is deleted. +- **As an Admin,** I want to "suspend" an allocation (Special Case: No Deletion) so that I can investigate an incident without the risk of automated scripts deleting the evidence. +- **As an Admin,** I want to manually override revocation timings to accommodate valid user appeals or policy exceptions. + +### Proposal + +#### 1. The Expiration / Revocation Lifecycle + +Each allocation has an end date at which point the allocation automatically enters **"Active (Needs Renewal)"** status. + +The transition to **Revoked** occurs automatically 30 days after an allocation enters **"Active (Needs Renewal)"**. It can also be triggered manually by an admin. + +| **Phase** | **Duration** (Days) | **System Actions** | **User Impact** | +|--------------------------|---------------------|------------------------------------------------------------------|-------------------------------------------------------------------| +| **Renewal Grace Period** | 0-30 | Status changed to **Active (Needs Renewal)**. Notification sent. | No impact. | +| **Revocation Trigger** | 30 | Status changed to **Revoked**. Notification sent. | All cluster access disabled. Compute and networking stopped. | +| **Storage Grace Period** | 30-50 | Storage retained. | Access remains blocked. Possibilty of data recovery if requested. | +| **Storage Deletion** | 50 | Storage deleted. | | + +For OpenStack, revocation is implemented by deleting all VMs and networking objects and switching the project status to disabled to prevent further access. Object storage and volumes are preserved. After the storage grace period, the remaining storage resources are deleted. + +For OpenShift, revocation is implemented by deleting all Pods, Deployments, Jobs, CronJobs and other resources that pertain to compute or networking. Persistent Volumes, ConfigMaps and Secrets are preserved during the storage grace period. After the storage grace period, those resources are deleted too. + +#### 2. Notification Schedule + +To ensure data integrity, users will receive automated reminders during the 20-day storage grace period: + +- **Initial:** Upon entering Revoked status. +- **Reminder:** 7 days before deletion. +- **Final Warning:** 2 days before deletion. + +#### 3. Special Case: Administrative Hold (No Deletion) + +For legal holds or security incidents, a new behavior is proposed. + +- **Status:** Switches to a new "Suspended" +- **Access:** Only Admins retain access to compute/storage, user access is removed. + - Admins will need to manually decide whether to stop VMs/pods/networking on a case by case basis. +- **Billing:** Billing is disabled. +- **Persistence:** Storage and compute is retained pending admin action or suspension being lifted. + +#### 4. Manual Overrides + +Admins have the authority to: + +- Extend end dates of projects manually. +- Revoke or reinstate projects ahead of time by switching their status. + +### Drawbacks + +- **Storage Costs:** Maintaining a 20-day grace period for all revoked allocations increases storage overhead. +- **Complexity:** Selectively removing only specific resources from an allocation while preserving data is more complex than deleting the entire OpenStack project or OpenShift namespace. + +### Alternatives + +- **Keep suspended (administrative hold) projects in Active state:** Possibility of confusion and harder to distinguish in billing. +- **Continue with Manual Lifecycle Management:** Continuing to rely on admins to manually move states and clean up allocation. Not scalable for high-volume environments.