Add NFS TTL cleanup workflow with dual-cluster support #11

devin-ai-integration · 2025-10-15T00:03:32Z

Tracking issue

Related to internal request from carlos@exa.ai

Link to Devin session: https://app.devin.ai/sessions/32b236a03baa490dac3abf979f08667d

Why are the changes needed?

This adds automated NFS storage cleanup to prevent unbounded growth of stale data on shared storage volumes across two different cluster environments (exa-cluster and cirrascale).

What changes were proposed in this pull request?

Added a new example workflow (examples/nfs_ttl_cleanup.py) that implements TTL-based directory cleanup for NFS mounts with the following features:

Core functionality:

Scans top-level directories in a configurable base path
Deletes directories where ALL files have not been accessed within the TTL period (default: 28 days/4 weeks)
Uses file access time (st_atime) to determine staleness
Returns statistics on deleted/skipped directories

Dual-cluster support:

exa-cluster: Uses PVC mount (nfs-pvc)
cirrascale: Uses direct NFS mount (172.18.72.200:/export/metaphor)
Each cluster has its own task with appropriate pod templates and node selectors

Launch plans:

Two launch plans scheduled to run daily at midnight UTC
Configurable TTL (default 4 weeks), base path, and dry-run mode
Resource allocation: 4 CPU, 8Gi memory per task

Documentation:

Comprehensive README with usage examples, safety considerations, and configuration options

How was this patch tested?

⚠️ Limited testing performed - Python syntax validation only. Full runtime testing was not possible due to missing dependencies in the development environment.

What was verified:

Python syntax validation passes
Code structure follows existing flytekit examples (based on monorepo patterns)

What needs verification:

Cluster-specific values (NFS server IP, PVC names, paths) match your production environment
Deletion logic correctly identifies stale directories
File access time (st_atime) is reliably updated on your NFS mounts
Resource allocations are appropriate
Running as root (UID 0) on cirrascale is acceptable for your security requirements

Recommended testing approach:

First run with dry_run=True to validate what would be deleted
Test on a non-production NFS mount
Verify with a small TTL value (e.g., 1 day) before deploying with 28-day default

Check all the applicable boxes

I updated the documentation accordingly.
All new and existing tests passed. (No tests added - example workflow)
All commits are signed-off.

Human Review Checklist

Please carefully review the following potentially dangerous aspects:

Deletion safety: The workflow permanently deletes directories. Verify the logic in should_delete_directory() is correct.
Hardcoded values: Confirm these match your infrastructure:
- NFS server: 172.18.72.200:/export/metaphor (cirrascale)
- PVC name: nfs-pvc (exa-cluster)
- Node selectors: cluster: "exa-cluster" and cluster: "cirrascale"
File access time reliability: The workflow uses st_atime. This may not work correctly if NFS is mounted with noatime or similar options.
Scope limitation: Only top-level directories in base_path are checked, not nested subdirectories.
Error handling: Permission errors cause directories to be skipped (marked as "active"). Is this the desired behavior?
Security: The cirrascale task runs as root (UID 0, GID 0). Verify this is acceptable.
Schedule & TTL: Default is daily at midnight UTC with 28-day TTL. Confirm these values are appropriate.

- Add workflow to clean up old directories from NFS storage based on TTL - Support for two clusters: exa-cluster (PVC mount) and cirrascale (direct NFS mount) - Configurable TTL (default: 4 weeks / 28 days) - Daily scheduled execution at midnight UTC - Dry run mode for testing before actual deletion - Comprehensive documentation in README The workflow scans directories and deletes those where all files haven't been accessed within the TTL period. Two launch plans are configured for the different clusters with appropriate NFS mounting and node selection. Co-Authored-By: carlos@exa.ai <carlos@exa.ai>

devin-ai-integration · 2025-10-15T00:03:35Z

Original prompt from carlos

Received message in Slack channel #devin-land:

@Devin

• write a flyte workflow with two launchplans, one for exa-cluster and one for cirrascale
• it should correctly mount the corresponding NFS and select the correct cluster
• you should be able to define a TTL and default to 4 weeks
• it should delete all directories where all files haven't been accessed in that TTL period
• make it run every day

Thread URL: https://T0274T299K6.slack.com/archives/C090Z3VH487/p1760486286506579?thread_ts=1760486286.506579

devin-ai-integration · 2025-10-15T00:03:37Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR that start with 'DevinAI' or '@devin'.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NFS TTL cleanup workflow with dual-cluster support #11

Add NFS TTL cleanup workflow with dual-cluster support #11

Uh oh!

devin-ai-integration bot commented Oct 15, 2025

Uh oh!

devin-ai-integration bot commented Oct 15, 2025

Uh oh!

devin-ai-integration bot commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Add NFS TTL cleanup workflow with dual-cluster support #11

Are you sure you want to change the base?

Add NFS TTL cleanup workflow with dual-cluster support #11

Uh oh!

Conversation

devin-ai-integration bot commented Oct 15, 2025

Tracking issue

Why are the changes needed?

What changes were proposed in this pull request?

How was this patch tested?

Check all the applicable boxes

Human Review Checklist

Uh oh!

devin-ai-integration bot commented Oct 15, 2025

Uh oh!

devin-ai-integration bot commented Oct 15, 2025

🤖 Devin AI Engineer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants