Add NFS TTL cleanup workflow with dual-cluster support #11
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tracking issue
Related to internal request from carlos@exa.ai
Link to Devin session: https://app.devin.ai/sessions/32b236a03baa490dac3abf979f08667d
Why are the changes needed?
This adds automated NFS storage cleanup to prevent unbounded growth of stale data on shared storage volumes across two different cluster environments (exa-cluster and cirrascale).
What changes were proposed in this pull request?
Added a new example workflow (
examples/nfs_ttl_cleanup.py) that implements TTL-based directory cleanup for NFS mounts with the following features:Core functionality:
st_atime) to determine stalenessDual-cluster support:
exa-cluster: Uses PVC mount (nfs-pvc)cirrascale: Uses direct NFS mount (172.18.72.200:/export/metaphor)Launch plans:
Documentation:
How was this patch tested?
What was verified:
What needs verification:
st_atime) is reliably updated on your NFS mountsRecommended testing approach:
dry_run=Trueto validate what would be deletedCheck all the applicable boxes
Human Review Checklist
Please carefully review the following potentially dangerous aspects:
Deletion safety: The workflow permanently deletes directories. Verify the logic in
should_delete_directory()is correct.Hardcoded values: Confirm these match your infrastructure:
172.18.72.200:/export/metaphor(cirrascale)nfs-pvc(exa-cluster)cluster: "exa-cluster"andcluster: "cirrascale"File access time reliability: The workflow uses
st_atime. This may not work correctly if NFS is mounted withnoatimeor similar options.Scope limitation: Only top-level directories in
base_pathare checked, not nested subdirectories.Error handling: Permission errors cause directories to be skipped (marked as "active"). Is this the desired behavior?
Security: The cirrascale task runs as root (UID 0, GID 0). Verify this is acceptable.
Schedule & TTL: Default is daily at midnight UTC with 28-day TTL. Confirm these values are appropriate.