"Infrastructure that heals itself."
CloudSlash is an autonomous infrastructure optimization platform designed for high-scale, enterprise cloud environments. Unlike passive observability tools that merely report metrics, CloudSlash leverages advanced mathematical modeling, graph topology analysis, and Abstract Syntax Tree (AST) parsing to actively solve resource inefficiency problems at their source.
It functions as a forensic auditor and autonomous agent, correlating disparate data sources—CloudWatch metrics, network traffic logs, infrastructure-as-code (IaC) definitions, and version control history—to identify, attribute, and remediate waste with mathematical certainty.
Modern cloud environments suffer from "Resource Sprawl"—ghost assets that incur significant financial cost but deliver zero business value. Traditional tools (CloudHealth, Vantage, Trusted Advisor) provide visibility but lack actuation. They tell you that you are wasting money, but rarely tell you why, who caused it, or how to fix it safely.
CloudSlash bridges this gap by combining Linear Programming (for fleet optimization) with Code Provenance (for attribution). It delivers a closed-loop system that:
- Detects waste via graph topology analysis.
- Attributes ownership via git blame and Terraform AST parsing.
- Remediates safely via the "Lazarus Protocol" (Snapshot -> Stop -> Detach).
Impact: Organizations utilizing CloudSlash typically see a 15-25% reduction in EC2/EBS spend within the first execution cycle.
CloudSlash is engineered around four distinct intelligence engines:
graph LR
%% --- STYLING ---
classDef aws fill:#FF9900,color:white,stroke:#232F3E,stroke-width:2px;
classDef input fill:#0f172a,color:#38bdf8,stroke:#38bdf8,stroke-width:2px,stroke-dasharray: 5 5;
classDef logic fill:#0F172A,color:#fff,stroke:#3B82F6,stroke-width:2px;
classDef core fill:#1e293b,color:#fff,stroke:#94a3b8,stroke-width:2px;
classDef risk fill:#450a0a,color:#FCA5A5,stroke:#EF4444,stroke-width:2px;
classDef safety fill:#064e3b,color:#6EE7B7,stroke:#10B981,stroke-width:2px;
classDef output fill:#222,color:#fff,stroke:#fff,stroke-width:2px;
%% --- 1. DATA INGESTION ---
subgraph INPUT ["DATA SOURCES"]
direction TB
AWS[("AWS API<br/>(Live State)")]:::aws
TF[("Terraform State<br/>(Infrastructure as Code)")]:::input
Git[("Git History<br/>(Provenance)")]:::input
end
%% --- 2. THE BRAIN ---
subgraph CORE ["AUTONOMY ENGINE"]
Builder["Graph Builder<br/>(Topology Resolution)"]:::logic
subgraph MEMORY ["IN-MEMORY DAG"]
direction LR
Graph[("Dependency Graph")]:::core
%% Simulation Nodes
EC2((EC2)):::core --> ENI((ENI)):::core
ENI --> SG((SG)):::core
NAT((NAT)):::risk -.->|Dependency| Graph
Vol((EBS)):::risk -.->|Attached| EC2
end
Solver["MILP Solver<br/>(Optimization)"]:::logic
end
%% --- 3. HEURISTICS ---
subgraph CHECKS ["FORENSIC AUDIT"]
direction TB
Check1{"H-001<br/>Zombie Volume?"}:::logic
Check2{"H-002<br/>EIP Wasted?"}:::logic
Check3{"H-006<br/>NAT Hollow?"}:::logic
end
%% --- 4. SAFETY ---
subgraph SAFETY ["LAZARUS PROTOCOL"]
Verify{"Safety Check"}:::safety
Block["BLOCK<br/>(Production DB)"]:::risk
Safe["SAFE<br/>(Dev Resource)"]:::safety
end
%% --- 5. OUTPUT ---
subgraph OUT ["REMEDIATION"]
Script["safe_cleanup.sh<br/>(Soft Delete)"]:::output
Fix["fix_terraform.sh<br/>(State Sync)"]:::output
Dash["Dashboard v2.0<br/>(Executive Report)"]:::output
end
%% --- CONNECTIONS ---
AWS ==> Builder
TF ==> Builder
Git -.->|Blame Info| Builder
Builder ==> Graph
Graph --> Solver
Solver --> Check1
Solver --> Check2
Solver --> Check3
Check1 -->|True| Verify
Check2 -->|True| Verify
Check3 -->|True| Verify
Verify -->|Risk High| Block
Verify -->|Risk Low| Safe
Safe ==> Script
Safe ==> Fix
Safe ==> Dash
Instead of simple "right-sizing" rules, the Autonomy Engine treats infrastructure as a Mixed-Integer Linear Programming (MILP) problem.
- The Solver: Calculates the mathematically perfect mix of EC2 instances to satisfy CPU/RAM requirements while minimizing cost.
- Tetris Engine: Uses 2D Bin Packing (Best Fit Decreasing algorithm) to visualize how workloads are fragmented across Kubernetes clusters, identifying "slack" capacity that metrics miss.
- Bayesian Risk Model: Tracks Spot Instance interruptions per-AZ, assigning a verifiable "Reliability Score" to every potential instance type to balance cost vs. stability.
Links every runtime resource ID (e.g., i-012345) back to its Genetic Code.
- IaC Correlation: Parsing
main.tffiles (HCL) to find the exact resource definition block responsible for the asset. - Git Blame Integration: Queries the
.gitdirectory to identify the specific commit hash, date, and author who introduced the resource. - Technical Debt Attribution: Automatically flags resources created >1 year ago as "Fossilized Code," prompting review for deprecation.
Maps abstract IAM ARNs to tangible corporate identities.
- Identity Resolution: Correlates
arn:aws:iam::123:user/jdoe->jdoe@company.com->SlackID: U12345. - Direct Notification: Sends "Waste Reports" directly to the responsible engineer via Slack, bypassing generic "FinOps" channels to ensure action is taken by the context owner.
Standardizes the lifecycle of remediation to prevent catastrophic data loss.
- Lifecycle Management: Enforces a rigid state transition:
Active->Stopped->Snapshot->Detached->Archived. - Resurrection Capability: Automatically generates
terraform importblocks, effectively allowing you to "Undo" a deletion by restoring state management in seconds. - Interactive Confirmation: All destructive actions require explicit CLI confirmation and are blocked if specific safety heuristics (e.g., "Database Production") are triggered.
View Heuristics Catalog (Deep Dive)
| Detection | Logic | Remediation |
|---|---|---|
| Lambda Code Stagnation | 0 Invocations (90d) AND Last Modified > 90d. | Delete function or archive code to S3. |
| ECS Idle Cluster | EC2 instances running for >1h but Cluster has 0 Tasks/Services. | Scale ASG to 0 or delete Cluster. |
| ECS Crash Loop | Service Desired Count > 0 but Running Count == 0. | Check Task Definitions / ECR Image pulls. |
| Detection | Logic | Remediation |
|---|---|---|
| Zombie EBS | Volume state is available (unattached) for > 14 days. |
Snapshot (optional) then Delete. |
| Legacy EBS (gp2) | Volume is gp2. gp3 is 20% cheaper and decoupled. |
Modify Volume to gp3 (No downtime). |
| Fossil Snapshots | RDS/EBS Snapshot > 90 days old, not attached to AMI. | Delete old snapshots. |
| RDS Idle | 0 Connections (7d) AND CPU < 5%. | Stop instance or take final snapshot & delete. |
| Detection | Logic | Remediation |
|---|---|---|
| Hollow NAT Gateway | Traffic < 1GB (30d) OR Connected Subnets have 0 Running Instances. | Delete NAT Gateway. |
| Dangling EIP | EIP unattached but matches an A-Record in Route53. | URGENT: Update DNS first, then release EIP. |
| Orphaned ELB | Load Balancer has 0 registered/healthy targets. | Delete ELB. |
| Detection | Logic | Remediation |
|---|---|---|
| ECR Lifecycle Missing | Repo has images > 90d old but no expiration policy. | Add Lifecycle Policy to expire untagged images. |
| Log Retention Missing | CloudWatch Group set to "Never Expire" (>1GB size). | Set retention to 30d/90d. |
CloudSlash stands apart by focusing on root cause resolution rather than just reporting.
| Feature | CloudSlash | AWS Trusted Advisor | Cloud Custodian | Vantage / CloudHealth |
|---|---|---|---|---|
| Primary Goal | Automated Remediation | Basic Visibility | Policy Enforcement | Financial Reporting |
| Logic Engine | Graph Topology (DAG) | Simple Metrics | Stateless Rules | Aggregated Billing Data |
| Remediation | Interactive TUI & Scripts | None (Manual) | Lambda (Black Box) | None (Manual) |
| Safety | Soft-Delete / Snapshot first | N/A | Hard Delete | N/A |
| Attribution | Git Commit & Author | None | Tag-based | Tag-based |
| IaC Awareness | Terraform AST Parsing | None | None | None |
| Dependency Mapping | Full Graph Visualization | None | None | None |
| Cost | OSS / Self-Hosted | Enterprise Support plan | OSS | $$ SaaS Subscription |
CloudSlash is distributed as a single static binary. No external dependencies (Python, Node.js, JVM) are required.
curl -sL https://raw.githubusercontent.com/DrSkyle/CloudSlash/main/scripts/install.sh | bashirm https://raw.githubusercontent.com/DrSkyle/CloudSlash/main/scripts/install.ps1 | iexgit clone https://github.com/DrSkyle/CloudSlash.git
cd CloudSlash
make buildThe tool utilizes the standard AWS Credential Chain (~/.aws/credentials or ENV vars). Ensure your environment is configured for the target account:
export AWS_PROFILE=production
export AWS_REGION=us-east-1For persistent configuration, create a cloudslash.yaml in your root directory.
CloudSlash respects specific resource tags to allow for granular control over waste detection. Organizations can define retention policies or explicitly exclude resources from analysis using the cloudslash:ignore tag key.
To exclude a resource from the waste report, apply the tag cloudslash:ignore with one of the following value formats:
- Boolean: Set to
trueto permanently exclude the resource from all analysis. - Expiration Date: Set a date in
YYYY-MM-DDformat (e.g.,2027-01-01). The resource will be ignored until this date is reached. - Retention Period: Set a duration with a
d(days) orh(hours) suffix (e.g.,120d). The resource will be ignored if its age is less than the specified duration.
The logic described above applies to all supported resources, including Amazon Machine Images (AMIs). For example, to enforce a custom retention policy for a specific backup AMI, apply the tag cloudslash:ignore with value 180d. This overrides the default heuristic and ensures the image is only flagged as waste after 180 days have elapsed.
Resources matching the exclusion criteria are removed from the interactive TUI, the JSON output, and the Executive Dashboard. They will not be counted towards waste totals or financial deficiency metrics.
The primary entry point. Orchestrates the graph resolution and waste detection.
cloudslash scan [flags]Flags:
--headless: Disables the TUI. Recommended for CI/CD pipelines.--region <list>: Comma-separated target regions (e.g.,us-east-1,eu-central-1).--no-metrics: Skip CloudWatch API calls (faster, but less accurate).
Interactive TUI Controls:
h/l: Navigate hierarchy.t: Toggle Topology Visualization.SPACE: Mark resource for remediation.ENTER: View detailed resource inspection (Cost, Tags, Provenance).
The interactive cleanup interface. Reads the graph and prompts for safe deletion.
cloudslash cleanupNote: This command generates executable shell scripts (safe_cleanup.sh) in the cloudslash-out/ directory for manual review before execution.
CloudSlash generates a self-contained HTML dashboard for stakeholders, featuring financial projections and Sankey cost flow diagrams.
CloudSlash is open source and free. If you'd like to support development or need priority help:
- Support ($9.99/mo) - Priority Support & Gratitude.
- Super Support ($19.99/mo) - Prioritized Feature Requests & Sponsor Recognition.
CloudSlash is released under the AGPLv3 license.
- Internal Use: You are free to use CloudSlash internally to audit your own infrastructure.
- Commercial Use: If you modify CloudSlash and provide it as a service or product to external clients, you must open-source your changes.
Commercial Exemption: If your organization requires a commercial license for AGPL compliance (e.g., embedding CloudSlash in proprietary software without releasing source code), exemptions are available.
Contact: drskyle8000@gmail.com
To remove CloudSlash from your system:
# 1. Remove binary
sudo rm /usr/local/bin/cloudslash
# 2. Remove configuration
rm ~/.cloudslash.yaml
# 3. Remove reports (optional)
rm -rf cloudslash-out/CloudSlash v2.0 was developed with a philosophy of "Zero Waste." We acknowledge the open-source community for the robust libraries that make this engine possible.
Copyright © 2026 DrSkyle.





