|
| 1 | +# Add comprehensive OpenShift cluster destroyer script |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +This PR introduces a robust bash script for safely destroying OpenShift clusters on AWS. The script handles multiple cluster states including properly installed clusters, orphaned clusters without state files, and partially created clusters that failed during installation. |
| 6 | + |
| 7 | +## Key Features |
| 8 | + |
| 9 | +### Core Capabilities |
| 10 | +- **Multi-method destruction**: Attempts openshift-install first, falls back to manual AWS cleanup |
| 11 | +- **Comprehensive resource cleanup**: Handles EC2, VPC, ELB, Route53, S3, and all associated resources |
| 12 | +- **Auto-detection**: Automatically discovers infrastructure IDs from cluster names |
| 13 | +- **Orphaned cluster support**: Can destroy clusters even without metadata/state files |
| 14 | +- **Reconciliation loop**: Multiple attempts with intelligent retry logic for stubborn resources |
| 15 | + |
| 16 | +### Safety Features |
| 17 | +- **Dry-run mode**: Preview all resources before deletion with `--dry-run` |
| 18 | +- **Confirmation prompts**: Requires explicit confirmation before destructive actions |
| 19 | +- **Input validation**: Prevents injection attacks with strict input sanitization |
| 20 | +- **Detailed logging**: Local file logging + optional CloudWatch integration |
| 21 | +- **Resource verification**: Post-destruction verification to ensure complete cleanup |
| 22 | + |
| 23 | +### Operational Features |
| 24 | +- **List clusters**: Discover all OpenShift clusters in a region with `--list` |
| 25 | +- **Flexible targeting**: Destroy by cluster name, infra-id, or metadata file |
| 26 | +- **Parallel operations**: Optimized API calls for faster resource counting |
| 27 | +- **Progress tracking**: Real-time status updates during destruction |
| 28 | +- **S3 state management**: Automatic cleanup of cluster state files |
| 29 | +- **Flexible logging**: Custom log paths with `--log-file`, disable with `--no-log` |
| 30 | +- **Color control**: Disable colors with `--no-color` for CI/CD pipelines |
| 31 | +- **No jq dependency**: Uses native Unix tools for JSON parsing |
| 32 | + |
| 33 | +## Architecture Overview |
| 34 | + |
| 35 | +```mermaid |
| 36 | +flowchart TD |
| 37 | + A[Start: User runs script] --> B[Setup logging<br/>+ CloudWatch if available] |
| 38 | + B --> C{--list?} |
| 39 | +
|
| 40 | + %% List mode |
| 41 | + C -- yes --> L1[List clusters] |
| 42 | + L1 --> L2[Collect EC2/VPC tags] |
| 43 | + L2 --> L3[List S3 prefixes] |
| 44 | + L3 --> L4[Merge + deduplicate] |
| 45 | + L4 --> L5{--detailed?} |
| 46 | + L5 -- yes --> L6[Count resources in parallel] |
| 47 | + L5 -- no --> L7[Quick VPC status check] |
| 48 | + L6 --> L8[Print cluster list] |
| 49 | + L7 --> L8 |
| 50 | + L8 --> Z[End] |
| 51 | +
|
| 52 | + %% Destroy mode |
| 53 | + C -- no --> D[Parse args + validate inputs] |
| 54 | + D --> E{metadata-file?} |
| 55 | + E -- yes --> E1[Extract infraID, clusterName, region] |
| 56 | + E -- no --> F{infra-id provided?} |
| 57 | + E1 --> G |
| 58 | + F -- yes --> G[Use provided infra-id] |
| 59 | + F -- no --> H{cluster-name provided?} |
| 60 | + H -- yes --> H1[Detect infra-id via VPC tag or S3] |
| 61 | + H -- no --> X[Exit: missing identifier] |
| 62 | + H1 --> G |
| 63 | +
|
| 64 | + G --> I[Count resources parallel] |
| 65 | + I --> J{resources == 0?} |
| 66 | + J -- yes --> J1[Cleanup S3 state] --> Z |
| 67 | + J -- no --> K[Show detailed resources] |
| 68 | +
|
| 69 | + K --> Q{--force or --dry-run?} |
| 70 | + Q -- no --> Q1[Prompt confirm] --> Q2{confirmed?} |
| 71 | + Q2 -- no --> Z |
| 72 | + Q2 -- yes --> R |
| 73 | + Q -- yes --> R[Proceed] |
| 74 | +
|
| 75 | + R --> S{openshift-install + metadata?} |
| 76 | + S -- yes --> S1[Run openshift-install destroy] |
| 77 | + S1 --> S2{success?} |
| 78 | + S2 -- yes --> S3[Clean Route53 records] --> T |
| 79 | + S2 -- no --> U |
| 80 | + S -- no --> U[Manual cleanup] |
| 81 | +
|
| 82 | + subgraph Reconciliation Loop |
| 83 | + direction TB |
| 84 | + U --> M1[1. Terminate EC2 instances] |
| 85 | + M1 --> M2[2. Delete Classic ELBs + ALB/NLBs<br/>by name and by VPC] |
| 86 | + M2 --> M3[3. Delete NAT Gateways] |
| 87 | + M3 --> M4[4. Release Elastic IPs] |
| 88 | + M4 --> M5[5. Delete orphan ENIs] |
| 89 | + M5 --> M6[6. Delete VPC Endpoints] |
| 90 | + M6 --> M7[7. Delete Security Groups<br/>remove rules first] |
| 91 | + M7 --> M8[8. Delete Subnets] |
| 92 | + M8 --> M9[9. Delete Route Tables + associations] |
| 93 | + M9 --> M10[10. Detach & Delete Internet Gateway] |
| 94 | + M10 --> M11[11. Delete VPC] |
| 95 | + M11 --> M12[12. Cleanup Route53: api and *.apps] |
| 96 | + M12 --> V[Recount resources] |
| 97 | + V --> W{remaining > 0 and attempts < MAX_ATTEMPTS?} |
| 98 | + W -- yes --> U |
| 99 | + W -- no --> T[Proceed] |
| 100 | + end |
| 101 | +
|
| 102 | + T --> Y[Cleanup S3 state<br/>resolve by cluster or infra-id] |
| 103 | + Y --> V2[Final verification count] |
| 104 | + V2 --> CW[Send summary to CloudWatch if enabled] |
| 105 | + CW --> Z |
| 106 | +``` |
| 107 | + |
| 108 | + |
| 109 | +## Usage Examples |
| 110 | + |
| 111 | +### List all clusters in a region |
| 112 | +```bash |
| 113 | +./scripts/destroy-openshift-cluster.sh --list |
| 114 | +./scripts/destroy-openshift-cluster.sh --list --detailed # With resource counts |
| 115 | +``` |
| 116 | + |
| 117 | +### Destroy a cluster |
| 118 | +```bash |
| 119 | +# By cluster name (auto-detects infra-id) |
| 120 | +./scripts/destroy-openshift-cluster.sh --cluster-name my-cluster |
| 121 | + |
| 122 | +# By infrastructure ID |
| 123 | +./scripts/destroy-openshift-cluster.sh --infra-id my-cluster-abc12 |
| 124 | + |
| 125 | +# Using metadata file |
| 126 | +./scripts/destroy-openshift-cluster.sh --metadata-file /path/to/metadata.json |
| 127 | +``` |
| 128 | + |
| 129 | +### Preview destruction (dry-run) |
| 130 | +```bash |
| 131 | +./scripts/destroy-openshift-cluster.sh --cluster-name my-cluster --dry-run |
| 132 | +``` |
| 133 | + |
| 134 | +### Force deletion without prompts |
| 135 | +```bash |
| 136 | +./scripts/destroy-openshift-cluster.sh --cluster-name my-cluster --force |
| 137 | +``` |
| 138 | + |
| 139 | +### Customize reconciliation attempts |
| 140 | +```bash |
| 141 | +./scripts/destroy-openshift-cluster.sh --cluster-name stubborn-cluster --max-attempts 10 |
| 142 | +``` |
| 143 | + |
| 144 | +### Logging options |
| 145 | +```bash |
| 146 | +# Custom log file location |
| 147 | +./scripts/destroy-openshift-cluster.sh --cluster-name my-cluster --log-file /var/log/destroy.log |
| 148 | + |
| 149 | +# Disable file logging (console only) |
| 150 | +./scripts/destroy-openshift-cluster.sh --cluster-name my-cluster --no-log |
| 151 | + |
| 152 | +# Disable colored output for CI/CD |
| 153 | +./scripts/destroy-openshift-cluster.sh --cluster-name my-cluster --no-color |
| 154 | +``` |
| 155 | + |
| 156 | +## Resource Deletion Order |
| 157 | + |
| 158 | +The script follows a carefully designed deletion order to handle AWS dependencies: |
| 159 | + |
| 160 | +1. **EC2 Instances** - Terminate all instances first |
| 161 | +2. **Load Balancers** - Delete ELBs/ALBs/NLBs (releases public IPs) |
| 162 | +3. **NAT Gateways** - Remove NAT gateways |
| 163 | +4. **Elastic IPs** - Release allocated IPs |
| 164 | +5. **Network Interfaces** - Clean orphaned ENIs |
| 165 | +6. **VPC Endpoints** - Remove endpoints |
| 166 | +7. **Security Groups** - Delete after removing dependencies |
| 167 | +8. **Subnets** - Delete VPC subnets |
| 168 | +9. **Route Tables** - Remove custom route tables |
| 169 | +10. **Internet Gateway** - Detach and delete IGW |
| 170 | +11. **VPC** - Finally delete the VPC itself |
| 171 | +12. **Route53** - Clean DNS records |
| 172 | +13. **S3 State** - Remove cluster state files |
| 173 | + |
| 174 | +## Error Handling |
| 175 | + |
| 176 | +- **Timeout protection**: Commands timeout after 30 seconds to prevent hanging |
| 177 | +- **Graceful degradation**: Falls back to manual cleanup if openshift-install fails |
| 178 | +- **Reconciliation loop**: Automatically retries failed deletions |
| 179 | +- **Dependency resolution**: Removes security group rules before deletion to break circular dependencies |
| 180 | +- **State verification**: Post-destruction check ensures complete cleanup |
| 181 | + |
| 182 | +## Requirements |
| 183 | + |
| 184 | +- AWS CLI configured with appropriate credentials |
| 185 | +- Standard Unix tools (grep, sed, awk - pre-installed on most systems) |
| 186 | +- Optional: openshift-install binary for metadata-based destruction |
| 187 | +- Optional: timeout command (coreutils) for operation timeouts |
| 188 | + |
| 189 | +## Security Considerations |
| 190 | + |
| 191 | +- Input validation prevents injection attacks |
| 192 | +- Restricted file permissions on log files (600) |
| 193 | +- No sensitive data logged to CloudWatch |
| 194 | +- AWS profile validation before operations |
| 195 | +- Confirmation prompts prevent accidental deletions |
| 196 | + |
| 197 | +## Files Changed |
| 198 | + |
| 199 | +- `scripts/destroy-openshift-cluster.sh` - New comprehensive destroyer script (2000+ lines) |
| 200 | + |
| 201 | +## Testing Recommendations |
| 202 | + |
| 203 | +1. Test with `--dry-run` first to verify resource detection |
| 204 | +2. Test on a small test cluster before production use |
| 205 | +3. Verify S3 state cleanup for your bucket naming convention |
| 206 | +4. Test reconciliation with partially deleted clusters |
| 207 | +5. Validate CloudWatch logging if using in CI/CD |
| 208 | + |
| 209 | +## Related Documentation |
| 210 | + |
| 211 | +- [OpenShift on AWS Documentation](https://docs.openshift.com/container-platform/latest/installing/installing_aws/installing-aws-default.html) |
| 212 | +- [AWS Resource Tagging](https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html) |
| 213 | +- Script includes comprehensive inline documentation and help text |
0 commit comments