An intelligent Docker registry cleanup tool that analyzes workload usage patterns and safely removes unused images while preserving actively used ones.
This project provides a comprehensive solution for cleaning up Docker registries by:
- Analyzing registry contents - Maps image layers, sizes, and tags with shared layer awareness
- Domino-integrated intelligent detection - Identifies which images are actively used by workloads or project defaults
- Safe deletion with backups - Optionally backs up Docker images to S3 before deletion
- Transaction safety - Only deletes MongoDB records for successfully deleted Docker images
- Unused reference detection - Identifies and removes MongoDB records referencing non-existent Docker images
# Install dependencies
pip install -r requirements.txt
# Configure (copy and edit config-example.yaml)
cp config-example.yaml config.yamlAll scripts are invoked through python/main.py with a standardized interface:
# 1. Analyze what would be deleted (dry-run is default)
python python/main.py delete_archived_tags --environment
# 2. Delete with confirmation
python python/main.py delete_archived_tags --environment --apply
# 3. Delete with S3 backup (recommended)
python python/main.py delete_archived_tags --environment --apply --backup --s3-bucket my-bucket
# 4. Comprehensive cleanup
python python/main.py delete_all_unused_environments --apply --backup --s3-bucket my-bucketAll Docker deletion scripts share a standardized interface with these common options:
| Option | Description | Default |
|---|---|---|
--apply |
Actually perform deletions (without this, it's dry-run only) | false |
--force |
Skip confirmation prompts | false |
--backup |
Back up images to S3 before deletion | false |
--s3-bucket BUCKET |
S3 bucket for backups (required with --backup) |
From config |
--region REGION |
AWS region for S3/ECR operations | us-west-2 |
--generate-reports |
Force regeneration of analysis reports | false |
--enable-docker-deletion |
Override registry auto-detection | false |
--registry-statefulset NAME |
StatefulSet/Deployment name for registry | docker-registry |
- Dry-run mode - No changes are made to Docker or MongoDB unless
--applyis specified. - Confirmation prompts - User must confirm deletions when using
--apply. Use--forceto skip the confirmation prompt. - S3 backups - Use
--backupto back up images before deletion. - Transaction safety - MongoDB records are only deleted after successful Docker deletion.
# Analyze workload usage
python python/main.py inspect_workload [--file OBJECTIDS]
# Analyze registry contents
python python/main.py image_data_analysis [--file OBJECTIDS]
# Generate usage reports
python python/main.py reports [--generate-reports]
# Extract MongoDB metadata
python python/main.py extract_metadata --target bothAll deletion commands support the common options listed above.
# Analyze archived environments (dry-run)
python python/main.py delete_archived_tags --environment
# Analyze archived models (dry-run)
python python/main.py delete_archived_tags --model
# Delete both archived environments and models
python python/main.py delete_archived_tags --environment --model --apply
# Delete archived environments with S3 backup
python python/main.py delete_archived_tags --environment --apply --backup --s3-bucket my-bucket# Find unused environments (dry-run)
python python/main.py delete_unused_environments
# Delete with S3 backup and confirmation
python python/main.py delete_unused_environments --apply --backup --s3-bucket my-bucket
# Force regenerate reports and delete
python python/main.py delete_unused_environments --generate-reports --apply --force# Find private environments owned by deactivated Keycloak users
python python/main.py delete_unused_private_environments
# Delete with backup
python python/main.py delete_unused_private_environments --apply --backup --s3-bucket my-bucketRun multiple cleanup operations in sequence:
# Analyze all unused environments (dry-run)
python python/main.py delete_all_unused_environments
# Delete all unused environments with backup
python python/main.py delete_all_unused_environments --apply --backup --s3-bucket my-bucketThis command runs:
- Delete unused environments (not used in workspaces, models, or project defaults)
- Delete deactivated user private environments
Cleans up MongoDB records referencing non-existent Docker images:
# Find unused references (dry-run)
python python/main.py delete_unused_references
# Delete unused references
python python/main.py delete_unused_references --applyNote: This command only modifies MongoDB, not Docker images, so --backup is not applicable.
# Delete specific image
python python/main.py delete_image environment:abc-123 --apply
# Delete using analysis reports
python python/main.py delete_image --apply --backup --s3-bucket my-bucket
# Filter by ObjectIDs from file
python python/main.py delete_image --file environments --applyTarget specific models or compute environments by ObjectID:
# Create a file with ObjectIDs (one per line)
cat > environments <<EOF
# Applies to both environment and model
62798b9bee0eb12322fc97e8
# Explicitly environment-only
environment:6286a3c76d4fd0362f8ba3ec
# Explicitly model-only
model:627d94043035a63be6140e93
EOF
# Use with any analysis or deletion command
python python/main.py inspect_workload --file environments
python python/main.py image_data_analysis --file environments
python python/main.py delete_image --file environments --applyAll Docker deletion commands support --backup:
# Backup and delete
python python/main.py delete_archived_tags --environment --apply --backup --s3-bucket my-bucket
# Backup only (no deletion)
python python/main.py delete_archived_tags --environment --backup --s3-bucket my-bucket --force# Restore specific tags from S3 backup
python python/backup_restore.py restore --tags tag1 tag2
# Restore with explicit S3 bucket override
python python/backup_restore.py restore --s3-bucket my-backup-bucket --tags tag1 tag2Behavior:
- Images are backed up to S3 before deletion
- If backup fails, deletion is aborted to prevent data loss
- Images can be restored to any compatible registry
- Restoration of Docker images does not restore their records in Mongo
- However, once an image has been restored, its URL can be used as the base image for a new Domino Compute Environment
- Docker-first deletion - Always deletes Docker images before MongoDB records
- Success tracking - Tracks which Docker deletions succeeded
- Conditional cleanup - Only deletes MongoDB records for successfully deleted images
- Failure preservation - Preserves MongoDB records if Docker deletion fails
- Workload-aware - Only deletes images not used by running pods
- Shared layer analysis - Properly calculates freed space accounting for shared layers
- Reference counting - Only counts layers that would have zero references after deletion
All delete scripts ensure that registry deletion is properly disabled after operations:
- Automatic cleanup - Registry deletion (
REGISTRY_STORAGE_DELETE_ENABLED) is always disabled after script completion - Error handling - Cleanup occurs even if errors occur during deletion
- Pod readiness checks - Scripts wait for registry pods to restart and become ready after configuration changes
Configuration is loaded in this order (later values override earlier):
config.yamlin project root- Environment variables
- Command-line arguments
Copy config-example.yaml to config.yaml and modify as needed.
# Docker Registry
export REGISTRY_URL="registry.example.com"
export REPOSITORY="my-repo"
export REGISTRY_PASSWORD="your_password" # Optional for ECR
# Kubernetes
export PLATFORM_NAMESPACE="domino-platform"
export COMPUTE_NAMESPACE="domino-compute"
# MongoDB
export MONGODB_USERNAME="admin" # Optional
export MONGODB_PASSWORD="mongo_password" # Optional - uses K8s secrets if not set
# Keycloak (for deactivated user cleanup)
export KEYCLOAK_HOST="https://keycloak.example.com/auth/"
export KEYCLOAK_USERNAME="admin"
export KEYCLOAK_PASSWORD="keycloak_password"
# S3 Backup
export S3_BUCKET="my-backup-bucket"
export S3_REGION="us-west-2"
# Skopeo
export SKOPEO_USE_POD="false" # Set to "true" for K8s pod modepython python/main.py --configpython/main.py- Unified entrypoint for all operationspython/config_manager.py- Centralized configuration and Skopeo client managementpython/backup_restore.py- S3 backup and restore functionality
All deletion scripts follow the same pattern and support common options:
python/delete_archived_tags.py- Delete archived environments and/or modelspython/delete_unused_environments.py- Delete environments not used anywherepython/delete_unused_private_environments.py- Delete private environments owned by deactivated userspython/delete_unused_references.py- Delete MongoDB references to non-existent imagespython/delete_image.py- Intelligent deletion based on workload analysis
python/inspect_workload.py- Analyze Kubernetes workloadspython/image_data_analysis.py- Analyze registry contents with shared layer detectionpython/extract_metadata.py- Extract MongoDB metadatapython/reports.py- Generate tag usage reports
python/mongo_cleanup.py- MongoDB record cleanuppython/mongo_utils.py- MongoDB connection utilitiespython/object_id_utils.py- ObjectID handlingpython/logging_utils.py- Logging configuration
- Scans running Kubernetes pods
- Extracts container images from pod specifications
- Tracks usage patterns by ObjectID
- Generates
reports/workload-report.json
- Lists all image tags in Docker registry
- Inspects image layers and calculates sizes
- Detects shared layers across images
- Tracks reference counts for accurate space calculation
- Generates
reports/final-report.json
- Cross-references workload and image analysis
- Identifies unused images not referenced by running pods
- Queries MongoDB for additional usage (project defaults, scheduled jobs, etc.)
- Calculates freed space with shared layer awareness
- Optionally backs up to S3 before deletion
- Deletes Docker images first, then MongoDB records
- Ensures registry deletion is disabled after completion
If your Docker registry URL doesn't contain enough information for auto-detection:
# Enable registry deletion with default "docker-registry" statefulset
python python/main.py delete_archived_tags --environment --apply --enable-docker-deletion
# Or specify custom statefulset/deployment name
python python/main.py delete_unused_environments --apply \
--enable-docker-deletion \
--registry-statefulset my-custom-registryProgrammatic usage:
from python.config_manager import config_manager, SkopeoClient
skopeo_client = SkopeoClient(
config_manager,
use_pod=False,
enable_docker_deletion=True,
registry_statefulset="my-custom-registry"
)This is useful when:
- Registry URL is an IP address or external DNS name
- Registry service has non-standard naming
- You want explicit control over which StatefulSet/Deployment is modified
Kubernetes API access:
kubectl get pods -n domino-computeRegistry authentication:
export REGISTRY_PASSWORD="your_password"
skopeo list-tags docker://registry.example.com/repositoryMongoDB connection:
export MONGODB_PASSWORD="your_password"
python -c "from python.mongo_utils import get_mongo_client; print('Connected')"ObjectID format:
# Valid: 62798b9bee0eb12322fc97e8 (24 hex chars)
# Valid: environment:62798b9bee0eb12322fc97e8
# Invalid: 62798b9bee0eb12322fc97e (23 chars)export PYTHONPATH=python
python python/main.py inspect_workload --max-workers 1pip install -r requirements.txtboto3- AWS SDK for S3 operationskubernetes- Kubernetes API clientpymongo- MongoDB clientpython-keycloak- Keycloak admin clientPyYAML- Configuration parsingrequests- HTTP client
- Python 3.8+
- kubectl access - For Kubernetes operations
- Registry access - For image inspection and deletion
- MongoDB access - For metadata and cleanup
- Keycloak access - For deactivated user detection (optional)
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.