Automated cleanup of Box folders for completed Jira cases in the AEA Data Editor workflow.
This script automates the process of cleaning up Box folders for cases that have completed their workflow:
- Scans the Box root folder for case folders (e.g.,
aearep-1234) - Queries
jira_purge_query.pyto check if each case is ready for purging - For ready cases:
- Deletes data files (CSV, DTA, ZIP, etc.)
- Keeps documents (PDF, DOCX, TXT, etc.)
- Moves the folder to the
1Completedsubfolder
pip install 'boxsdk[jwt]'and jira_purge_query.py must be in your path.
export BOX_FOLDER_PRIVATE="your_root_folder_id"
export BOX_PRIVATE_KEY_ID="your_key_id"
export BOX_ENTERPRISE_ID="your_enterprise_id"
export BOX_CONFIG_PATH="/path/to/config/directory"Alternative: Use base64-encoded config
export BOX_PRIVATE_JSON="base64_encoded_config_json"export JIRA_USERNAME="your_email@example.com"
export JIRA_API_KEY="your_jira_api_token"Get your Jira API token at: https://id.atlassian.com/manage-profile/security/api-tokens
git clone https://github.com/AEADataEditor/clean-up-box.git
cd clean-up-boxAlways test first to see what would be done without making any changes:
python3 clean_box_folders.py --testThis will:
- Query Jira to check which cases are ready
- Show which files would be deleted
- Show which folders would be moved
- Make NO actual changes to Box
python3 clean_box_folders.pyYou'll be prompted to confirm if multiple folders will be processed.
python3 clean_box_folders.py --case 1234This processes only aearep-1234.
python3 clean_box_folders.py --yesTo see which cases are in Box and check their Jira status without making any changes:
python3 clean_box_folders.py --listThis will:
- Scan Box for all case folders
- Query Jira for each case's purge status
- Display the full output from
jira_purge_query.pyfor each case - Show a summary of how many cases are ready
Example output:
✗ [FAIL] AEAREP-6645: Neither this issue nor linked revisions passed through required statuses (Current MCstatus: Done; Current MCRecommendationV2: N/A)
✓ [OK] AEAREP-2124: Ready for purge
- Current status: Done; Current MCRecommendationV2: Accept
You can also check a specific case:
python3 clean_box_folders.py --list --case 6645# Skip Jira checks (process all folders found) - TESTING ONLY
python3 clean_box_folders.py --skip-jira-check --testFull command-line options available:
usage: clean_box_folders.py [-h] [--test] [--list] [--case NUMBER] [--yes]
[--skip-jira-check]
Clean up Box folders for completed Jira cases
options:
-h, --help show this help message and exit
--test Test mode: show what would be done without making changes
--list List all cases and their Jira status without making any
changes
--case NUMBER Process only this specific case number (e.g., 1234 for
aearep-1234)
--yes, -y Skip confirmation prompt
--skip-jira-check Skip Jira status checks (process all folders found) - for
testing only
Examples:
# Test mode (dry run - no modifications)
clean_box_folders.py --test
# Process all ready cases
clean_box_folders.py
# Process specific case
clean_box_folders.py --case 1234
# Skip confirmation prompt
clean_box_folders.py --yes
Environment Variables Required:
Box Authentication:
BOX_FOLDER_PRIVATE - Root Box folder ID
BOX_PRIVATE_KEY_ID - JWT public key ID
BOX_ENTERPRISE_ID - Enterprise ID
BOX_CONFIG_PATH - Directory containing config JSON file
(or BOX_PRIVATE_JSON - Base64 encoded config)
Jira Authentication:
JIRA_USERNAME - Your Jira email address
JIRA_API_KEY - API token
- Statistical data:
.csv,.dta,.sas7bdat,.rds,.rdata,.mat,.sav - Compressed:
.zip,.gz,.tar,.7z,.rar - Databases:
.db,.sqlite,.sql - Modern formats:
.parquet,.feather,.hdf5 - Other:
.json,.xml,.xlsx
- Documents:
.pdf,.docx,.doc,.txt,.md,.rtf - LaTeX:
.tex,.bib,.aux,.log - Presentations:
.pptx,.ppt - Unknown file types (kept to be safe)
For each case folder found:
-
Jira Check: Queries
jira_purge_query.pyto verify the case has passed through required workflow statuses:- "Pending openICPSR"
- "Assess openICPSR"
- "Pending Publication"
-
File Classification: Recursively scans the folder and all subfolders to:
- Identify data files (to delete)
- Identify documents (to keep)
- Handle unknown file types conservatively (keep them)
-
Data Deletion: Deletes all identified data files
-
Folder Move: Moves the entire folder (with remaining contents) to
1Completed
Each run creates a detailed log file: box_cleanup_YYYYMMDD_HHMMSS.log
The log includes:
- All actions taken (or would be taken in test mode)
- File paths and sizes
- Any errors encountered
- Summary statistics
Console output shows:
- Progress through each case
- Files being deleted
- Folders being moved
- Final summary
- Test mode: Complete dry-run capability
- Confirmation prompts: Required for batch operations (>1 folder)
- Conservative file handling: Unknown file types are preserved
- Error recovery: Continues processing remaining cases if one fails
- Detailed logging: Full audit trail of all operations
- Exit detection: Folder name conflicts in 1Completed are handled gracefully
# List all cases and check which are ready for purge
python3 clean_box_folders.py --list
# Check specific case
python3 clean_box_folders.py --list --case 1234# See what cases exist and which are ready
python3 clean_box_folders.py --test
# Review the log file
cat box_cleanup_*.log# Test a specific case first
python3 clean_box_folders.py --case 1234 --test
# If it looks good, run for real
python3 clean_box_folders.py --case 1234# Test all ready cases
python3 clean_box_folders.py --test
# Review what will happen, then execute
python3 clean_box_folders.py
# (You'll be prompted to confirm)# Skip confirmation - useful for scheduled jobs
python3 clean_box_folders.py --yes2026-02-19 10:30:00 - INFO - Box Cleanup Script Started
2026-02-19 10:30:00 - INFO - Log file: box_cleanup_20260219_103000.log
2026-02-19 10:30:01 - INFO - Authenticating to Box...
2026-02-19 10:30:02 - INFO - ✓ Authenticated as: John Doe
2026-02-19 10:30:02 - INFO - Scanning root folder 123456789 for case folders...
2026-02-19 10:30:03 - INFO - Found 5 case folder(s)
============================================================
Processing: aearep-1234
============================================================
2026-02-19 10:30:04 - INFO - ✓ Ready for purge
2026-02-19 10:30:04 - INFO - Scanning folder contents...
2026-02-19 10:30:05 - INFO - Found 23 data file(s) to delete
2026-02-19 10:30:05 - INFO - Found 5 document(s) to keep
2026-02-19 10:30:05 - INFO - Deleting data files...
2026-02-19 10:30:08 - INFO - ✓ Deleted 23/23 data files (145.67 MB)
2026-02-19 10:30:08 - INFO - Moving folder to '1Completed'...
2026-02-19 10:30:09 - INFO - ✓ Moved folder 'aearep-1234' to '1Completed'
============================================================
SUMMARY
============================================================
Folders found: 5
Folders checked: 5
Folders ready to purge: 3
Folders moved: 3
Data files deleted: 67
Total bytes deleted: 423.45 MB
Errors: 0
Set the required Box environment variables. Check your .bashrc or .zshrc.
Ensure jira_purge_query.py exists and is in your PATH. Typically in $HOME/bin or $HOME/bin/editor-scripts.
pip install 'boxsdk[jwt]'Check that:
- The base64 encoding is correct
- The JSON contains valid credentials
- Your service account has access to the folder
The folder was already moved previously. The script will skip it and continue.
The script is based on the authentication patterns from download_box_private.py and integrates with the existing jira_purge_query.py workflow tool.
# Test the script structure without actually calling APIs
python3 -c "import clean_box_folders; print('Syntax OK')"- Never commit credentials to version control
- Store Box config files in secure locations with restricted permissions
- Use environment variables or secure secret management
- Review log files before sharing (may contain folder/file names)
- Test mode logs may reveal folder structure
If files were accidentally deleted by the cleanup script, they can be recovered from Box trash within the retention period (typically 30-90 days). The recover_box_files.py script automates this process.
The recovery script requires a case number and looks up the Box Folder ID from Jira's "Restricted data Box ID" custom field, along with the "Bitbucket short name" field which contains the actual Box folder name. Note that the Jira case number (e.g., 8040) may differ from the Box folder name (e.g., "7712").
List deleted files for a case:
python3.12 recover_box_files.py --case 8040 --listTest mode (dry run):
python3.12 recover_box_files.py --case 8040 --testRestore files to 1Completed folder:
python3.12 recover_box_files.py --case 8040Look back 14 days instead of default 7:
python3.12 recover_box_files.py --case 8040 --days 14Skip confirmation prompt:
python3.12 recover_box_files.py --case 8040 --yesImportant: The cleanup script moves case folders to '1Completed' and then deletes the data files inside. The recovery script finds those deleted files and restores them back to the case folder.
- Jira Lookup: Queries Jira issue (e.g., "aearep-8040") to retrieve:
- Box Folder ID from "Restricted data Box ID" custom field
- Box folder name from "Bitbucket short name" custom field (may be different, e.g., "7712")
- Trash Search: Gets all trashed items from Box and filters by:
- Deleted in the last N days (default: 7)
- Deleted by user "aeadata"
- Files that belonged to the specified folder (checks folder ID and path)
- Display: Shows all matching deleted files with details
- Restore: Restores the files back to their case folder (which should be in '1Completed')
Same as the cleanup script, plus:
jiraPython package:pip install jiraJIRA_USERNAMEandJIRA_API_KEYenvironment variables- Optional:
JIRA_SERVER(defaults to https://aeadataeditors.atlassian.net)
usage: recover_box_files.py [-h] --case NUMBER [--days N] [--list] [--test] [--yes]
options:
--case NUMBER Jira case number (e.g., 8040 for aearep-8040) [REQUIRED]
--days N Number of days to look back (default: 7)
--list List deleted items only, do not restore
--test Test mode: show what would be done without changes
--yes, -y Skip confirmation prompt
- File restoration: The script finds and restores individual files that were deleted from the case folder (the folder itself remains in '1Completed')
- Destination: Files are restored back to the case folder in '1Completed' (e.g., '1Completed/aearep-7712/')
- Service account context: The cleanup script runs as service account "aeadata", so all deletions appear to be by that user
- Trash retention: Items in Box trash are auto-deleted after the retention period (typically 30-90 days)
- Name conflicts: If a file with the same name already exists in the folder, restoration will be skipped with a warning
download_box_private.py- Downloads content from Box folders (not required; present in each template repository)jira_purge_query.py- Checks if Jira cases are ready for purging (REQUIRED)recover_box_files.py- Recovers deleted files from Box trash (NEW)