An automated Azure infrastructure cost optimization system that leverages Azure Monitor metrics and Azure Automation to implement intelligent cost-saving recommendations.
This solution provides automated infrastructure cost optimization through real-time monitoring, analysis, and execution of cost-saving measures across Azure resources. The system analyzes VM utilization, storage access patterns, and database performance metrics to automatically implement appropriate optimizations while maintaining service reliability.
- CPU and memory utilization analysis over configurable time periods
- Automatic VM size recommendations based on performance data
- Support for both scale-up and scale-down operations
- Integration with Azure Advisor recommendations
- Automated blob storage tier management (Hot/Cool/Archive)
- Unused storage account identification and cleanup
- Storage access pattern analysis and optimization
- Cost impact assessment for tier changes
- Azure SQL Database performance monitoring
- Automatic scaling recommendations for elastic pools
- Reserved capacity optimization suggestions
- Performance tier adjustments based on DTU/vCore utilization
- Real-time cost anomaly detection
- Custom Azure Monitor workbooks and dashboards
- Automated reporting on optimization actions
- Integration with Azure Cost Management APIs
Azure Monitor β Log Analytics β Azure Automation β Resource Optimization
β β β β
Metrics Data Analysis PowerShell VM/Storage/DB
Collection & Queries Runbooks Modifications
- Azure Monitor: Collects performance and cost metrics
- Log Analytics Workspace: Stores and analyzes telemetry data
- Azure Automation Account: Executes optimization runbooks
- PowerShell Runbooks: Implements optimization logic
- Azure Key Vault: Secures credentials and configuration
- Logic Apps: Orchestrates workflows and approvals
- Azure subscription with appropriate permissions
- Contributor access to target resource groups
- Azure CLI installed and configured
- PowerShell 7.0 or later with Azure modules
- Azure Monitor
- Log Analytics Workspace
- Azure Automation Account
- Azure Key Vault
- Application Insights
- Azure Cost Management
The deployment requires the following Azure RBAC roles:
Contributoron the target subscription or resource groupsUser Access Administratorfor role assignmentsKey Vault Administratorfor secrets management
Clone the repository and set up the environment:
git clone <repository-url>
cd Infrastructure-Cost-Optimizer
chmod +x scripts/*.sh
./scripts/setup-environment.shCreate a .env file with your Azure configuration:
# Azure Configuration
AZURE_SUBSCRIPTION_ID="your-subscription-id"
AZURE_TENANT_ID="your-tenant-id"
AZURE_CLIENT_ID="your-client-id"
AZURE_CLIENT_SECRET="your-client-secret"
# Resource Configuration
RESOURCE_GROUP_NAME="rg-cost-optimizer"
LOCATION="eastus"
AUTOMATION_ACCOUNT_NAME="aa-cost-optimizer"
LOG_ANALYTICS_WORKSPACE_NAME="law-cost-optimizer"
KEY_VAULT_NAME="kv-cost-optimizer"Deploy the core Azure infrastructure:
# Using Bicep (recommended)
./scripts/deploy-infrastructure.sh
# Or using Terraform
cd infrastructure/terraform
terraform init
terraform plan
terraform applySet up Azure Monitor components and custom metrics:
./scripts/configure-monitoring.shInstall the PowerShell automation runbooks:
./scripts/deploy-runbooks.shVerify the installation and run initial tests:
./scripts/validate-deployment.shEdit the configuration files to customize optimization behavior:
VM Optimization Thresholds (runbooks/vm-optimization.ps1):
$CPUThresholdLow = 20 # CPU utilization below which to consider downsizing
$CPUThresholdHigh = 80 # CPU utilization above which to consider upsizing
$MemoryThresholdLow = 30 # Memory utilization threshold for optimization
$AnalysisPeriodDays = 7 # Number of days to analyze for metricsStorage Optimization (runbooks/storage-optimization.ps1):
$CoolTierThresholdDays = 30 # Days without access before moving to Cool
$ArchiveTierThresholdDays = 90 # Days without access before moving to Archive
$MinimumSavingsThreshold = 50 # Minimum monthly savings required for actionConfigure automation schedules in Azure Automation:
# Daily VM analysis
az automation schedule create \
--automation-account-name $AUTOMATION_ACCOUNT_NAME \
--resource-group $RESOURCE_GROUP_NAME \
--name "Daily-VM-Analysis" \
--frequency "Day" \
--interval 1 \
--start-time "02:00"
# Weekly storage optimization
az automation schedule create \
--automation-account-name $AUTOMATION_ACCOUNT_NAME \
--resource-group $RESOURCE_GROUP_NAME \
--name "Weekly-Storage-Optimization" \
--frequency "Week" \
--interval 1 \
--start-time "03:00"Execute optimization runbooks manually:
# VM optimization for specific resource group
Start-AzAutomationRunbook \
-AutomationAccountName "aa-cost-optimizer" \
-Name "VM-Optimization" \
-ResourceGroupName "rg-cost-optimizer" \
-Parameters @{
"ResourceGroupName" = "rg-production"
"DryRun" = $true
}
# Storage tier optimization
Start-AzAutomationRunbook \
-AutomationAccountName "aa-cost-optimizer" \
-Name "Storage-Optimization" \
-ResourceGroupName "rg-cost-optimizer" \
-Parameters @{
"StorageAccountName" = "mystorageaccount"
"Force" = $false
}Access monitoring dashboards:
- Navigate to Azure Monitor in the Azure portal
- Open the "Cost Optimization" workbook
- Review optimization recommendations and savings
- Configure alert rules for cost anomalies
Query optimization data programmatically:
# Get optimization recommendations
az monitor log-analytics query \
--workspace $LOG_ANALYTICS_WORKSPACE_ID \
--analytics-query "
CostOptimization_CL
| where TimeGenerated >= ago(7d)
| summarize
TotalSavings = sum(PotentialSavings_d),
RecommendationCount = count()
by OptimizationType_s
"The system tracks and reports on:
- Cost Savings: Monthly and annual savings achieved
- Resource Utilization: CPU, memory, and storage metrics
- Optimization Actions: Number and type of optimizations performed
- Service Health: Impact on application performance and availability
Pre-built Azure Monitor workbooks provide:
- Cost optimization overview
- Resource utilization trends
- Savings tracking and projections
- Optimization action history
- Performance impact analysis
Automated alerts for:
- High-impact optimization opportunities
- Cost anomalies and unexpected spending
- Optimization failures or errors
- Service performance degradation
Authentication Errors:
# Verify Azure CLI login
az account show
az account set --subscription "your-subscription-id"Runbook Execution Failures:
# Check automation account permissions
Get-AzRoleAssignment -Scope "/subscriptions/$subscriptionId"
# Review runbook execution logs
Get-AzAutomationJob -AutomationAccountName $automationAccount -ResourceGroupName $resourceGroupMissing Metrics Data:
# Verify Log Analytics workspace connection
az monitor log-analytics workspace show \
--resource-group $RESOURCE_GROUP_NAME \
--workspace-name $LOG_ANALYTICS_WORKSPACE_NAMEEnable detailed logging:
# In PowerShell runbooks
$VerbosePreference = "Continue"
$DebugPreference = "Continue"Access logs through:
- Azure Automation job history
- Log Analytics query interface
- Application Insights telemetry
- Service Principal authentication for automated operations
- Managed Identity for Azure resource access
- Key Vault integration for credential management
- RBAC-based permissions with least privilege principle
- All sensitive data encrypted at rest and in transit
- Audit logging for all optimization actions
- Compliance with Azure security best practices
- Regular security assessment and updates
- Initial release with VM, storage, and database optimization
- Azure Monitor integration and custom dashboards
- Automated runbook deployment and scheduling
- Comprehensive monitoring and alerting capabilities
- π‘οΈ Safety First: Comprehensive dry-run capabilities and approval workflows
graph TB
User[π€ User] --> Copilot[π€ Azure Copilot]
Copilot --> Monitor[π Azure Monitor]
Copilot --> Automation[βοΈ Azure Automation]
Monitor --> Analytics[π Log Analytics]
Monitor --> Alerts[π¨ Alerts & Actions]
Automation --> VMOpt[π₯οΈ VM Optimization]
Automation --> StorageOpt[πΎ Storage Optimization]
Automation --> DBOpt[ποΈ Database Optimization]
VMOpt --> VMs[Virtual Machines]
StorageOpt --> Storage[Storage Accounts]
DBOpt --> Databases[SQL/Cosmos DB]
Analytics --> Insights[π‘ Cost Insights]
Alerts --> Actions[π Automated Actions]
Budget[π° Cost Management] --> Alerts
Logic[π Logic Apps] --> Automation
| Component | Purpose | Technology |
|---|---|---|
| Azure Copilot | Natural language interface for cost optimization queries and commands | Azure AI |
| Azure Monitor | Comprehensive monitoring, logging, and alerting infrastructure | Log Analytics, Application Insights |
| Azure Automation | Automated runbook execution for resource optimization | PowerShell, Python |
| Logic Apps | Workflow orchestration and integration between services | Azure Logic Apps |
| Cost Management | Budget monitoring, cost analysis, and anomaly detection | Azure Cost Management APIs |
| Key Vault | Secure storage of secrets, keys, and configuration | Azure Key Vault |
- Azure CLI installed and configured
- PowerShell 7.0+ (for runbook development)
- Terraform or Azure Bicep (for Infrastructure as Code)
- Appropriate Azure subscription permissions:
- Contributor access to target resource groups
- Cost Management Reader
- Automation Contributor
- Log Analytics Contributor
git clone <repository-url>
cd Infrastructure-Cost-Optimizer
# Make scripts executable
chmod +x scripts/*.sh
chmod +x tests/*.sh# Run the setup script to create base environment
./scripts/setup-environment.sh
# Deploy core infrastructure
./scripts/deploy-infrastructure.sh
# Configure monitoring and alerting
./scripts/configure-monitoring.sh
# Deploy automation runbooks
./scripts/deploy-runbooks.sh# Validate the complete deployment
./scripts/validate-deployment.sh
# Run integration tests
./tests/integration-tests.sh
# Test performance characteristics
./tests/performance-tests.shChoose between Bicep or Terraform for declarative infrastructure deployment:
cd infrastructure/bicep
# Deploy using Bicep
az deployment group create \
--resource-group "your-resource-group" \
--template-file main.bicep \
--parameters @parameters.jsoncd infrastructure/terraform
# Initialize Terraform
terraform init
# Plan the deployment
terraform plan -var-file="terraform.tfvars"
# Apply the configuration
terraform applyThe system uses environment variables for configuration. These are automatically generated during setup or can be manually configured:
# Core Configuration
RESOURCE_GROUP="cost-optimization-rg"
LOCATION="eastus"
SUBSCRIPTION_ID="your-subscription-id"
RANDOM_SUFFIX="unique-suffix"
# Azure Services
WORKSPACE_NAME="costopt-workspace-${RANDOM_SUFFIX}"
AUTOMATION_ACCOUNT="costopt-automation-${RANDOM_SUFFIX}"
STORAGE_ACCOUNT="costoptstorage${RANDOM_SUFFIX}"
KEY_VAULT="costopt-kv-${RANDOM_SUFFIX}"
# Authentication
PRINCIPAL_ID="automation-account-principal-id"
WORKSPACE_ID="log-analytics-workspace-id"The deployment requires the following Azure RBAC roles:
Subscription Level:
- Cost Management Reader
- Monitoring Reader
- Reader
Resource Group Level:
- Contributor
- Automation Contributor
- Log Analytics Contributor
- Storage Account ContributorConfigure optimization thresholds in the runbooks or through environment variables:
# VM Optimization Thresholds
$CPUThreshold = 20 # Average CPU utilization %
$MemoryThreshold = 30 # Average memory utilization %
$AnalysisPeriodDays = 7 # Days of data to analyze
# Storage Optimization Thresholds
$InactiveThresholdDays = 30 # Days without access for cool tier
$ArchiveThresholdDays = 90 # Days without access for archive tier
# Database Optimization Thresholds
$DTUThreshold = 20 # Average DTU utilization %
$RUThreshold = 25 # Average RU utilization %Customize monitoring and alerting through Azure Monitor:
Cost Alert Thresholds:
- Monthly Budget: $1000 (configurable)
- Anomaly Detection: 10% variance
- Resource-specific alerts: Per-service thresholds
Performance Metrics:
- VM CPU/Memory utilization
- Storage access patterns
- Database performance counters
- Cost trending analysisOnce deployed, you can interact with your cost optimization system using natural language through Azure Copilot:
Examples:
- "Show me VMs with low utilization in the last 7 days"
- "What storage accounts can be moved to cool tier?"
- "Optimize database costs for production resources"
- "Create a cost optimization report for this month"
- "Alert me when any resource exceeds $100 per day"
- Enable Azure Copilot in your Azure subscription
- Configure Integration with your Log Analytics workspace
- Set Up Permissions for Copilot to access automation runbooks
- Train Copilot on your specific cost optimization patterns
# Configure Copilot integration
az extension add --name copilot
az copilot configure --workspace-id $WORKSPACE_IDThe system creates several monitoring dashboards:
-
Cost Overview Dashboard
- Monthly cost trends
- Resource-wise cost breakdown
- Optimization savings tracking
-
Resource Utilization Dashboard
- VM performance metrics
- Storage access patterns
- Database utilization trends
-
Automation Dashboard
- Runbook execution status
- Optimization actions taken
- Error tracking and resolution
Automated alerts are configured for:
Budget Alerts:
- 50% of monthly budget consumed
- 80% of monthly budget consumed
- 100% of monthly budget exceeded
Anomaly Detection:
- Unusual cost spikes (>20% variance)
- Resource utilization anomalies
- Failed optimization attempts
Performance Alerts:
- High-cost, low-utilization resources
- Storage tier optimization opportunities
- Database scaling opportunitiesPurpose: Analyzes VM utilization and recommends/implements right-sizing
Features:
- π 7-day utilization analysis
- π― Intelligent size recommendations
- π° Cost impact calculations
- π‘οΈ Safety checks and approvals
- π Detailed reporting
Triggers:
- Scheduled (weekly)
- Manual execution
- Cost threshold alerts
- Azure Copilot requests
Purpose: Optimizes storage costs through intelligent tier management
Features:
- π Blob access pattern analysis
- π Automated tier transitions
- π Lifecycle policy management
- πΎ Redundancy optimization
- π Cost savings tracking
Optimization Logic:
Hot β Cool: No access for 30+ days
Cool β Archive: No access for 90+ days
Archive β Hot: Recent access detected
Purpose: Optimizes database costs and performance
Features:
- ποΈ SQL Database DTU analysis
- π Cosmos DB RU optimization
- β‘ Performance tier recommendations
- π Query performance analysis
- π‘ Index optimization suggestions
Add your own optimization logic:
# Template for custom runbooks
param(
[bool]$DryRun = $true,
[string]$ResourceGroup,
[string]$SubscriptionId
)
# Your optimization logic here
# Use the provided helper functions for:
# - Azure authentication
# - Metric collection
# - Cost calculations
# - ReportingRun comprehensive validation of your deployment:
./scripts/validate-deployment.shValidation Includes:
- β Azure login and permissions
- β Resource group and core infrastructure
- β Automation runbooks and modules
- β Monitoring and alerting configuration
- β Role assignments and security
- β Log Analytics data collection
Test end-to-end workflows:
./tests/integration-tests.shTest Scenarios:
- π Complete optimization workflows
- π Cost monitoring and alerting
- π Log Analytics query performance
- βοΈ Automation schedules and webhooks
- π― Resource creation and optimization
Benchmark system performance:
./tests/performance-tests.shPerformance Metrics:
- β‘ Azure CLI operation response times
- π Log Analytics query performance
- π Runbook startup and execution times
- π Concurrent operation handling
- π Large dataset query performance
- π Managed Identity: Secure, password-less authentication
- π Key Vault Integration: Secure secret storage and rotation
- π‘οΈ RBAC Controls: Principle of least privilege access
- π Audit Logging: Comprehensive activity tracking
- π Compliance Monitoring: Built-in compliance checks
-
Regular Security Reviews
# Review automation account permissions az role assignment list --assignee $PRINCIPAL_ID # Audit Key Vault access az keyvault list --query "[].{name:name, location:location}"
-
Network Security
- Configure private endpoints for Azure services
- Implement network security groups
- Use Azure Firewall for advanced protection
-
Data Protection
- Enable encryption at rest and in transit
- Implement data retention policies
- Regular backup and disaster recovery testing
-
VM Right-Sizing
- Continuous utilization monitoring
- Automatic size recommendations
- Scheduled optimization windows
- Cost-benefit analysis
-
Storage Optimization
- Intelligent tier management
- Lifecycle policy automation
- Redundancy optimization
- Delete unused resources
-
Database Optimization
- Performance tier adjustments
- RU scaling for Cosmos DB
- DTU optimization for SQL
- Query performance tuning
The system identifies opportunities for manual review:
- π Zombie Resources: Unused resources consuming costs
- π Oversized Resources: Resources with consistently low utilization
- π Optimization Candidates: Resources suitable for reserved instances
- πΎ Storage Inefficiencies: Suboptimal storage configurations
-
Runbook Execution Failures
# Check runbook status az automation job list --automation-account-name $AUTOMATION_ACCOUNT --resource-group $RESOURCE_GROUP # View job output az automation job output show --automation-account-name $AUTOMATION_ACCOUNT --resource-group $RESOURCE_GROUP --id <job-id>
-
Module Import Issues
# Check module status az automation module list --automation-account-name $AUTOMATION_ACCOUNT --resource-group $RESOURCE_GROUP # Reimport failed modules az automation module create --automation-account-name $AUTOMATION_ACCOUNT --resource-group $RESOURCE_GROUP --name "Az.Compute"
-
Log Analytics Data Issues
# Test workspace connectivity az monitor log-analytics query --workspace $WORKSPACE_ID --analytics-query "Heartbeat | take 1" # Check data collection rules az monitor data-collection rule list --resource-group $RESOURCE_GROUP
Enable debug mode for detailed logging:
# Set debug environment variable
export DEBUG_MODE=true
# Run scripts with verbose output
./scripts/deploy-infrastructure.sh --verbose- π Azure Automation Documentation
- π Azure Monitor Documentation
- π° Azure Cost Management Documentation
- π€ Azure Copilot Documentation
Add custom cost optimization metrics:
// Custom KQL query for cost per utilization
Perf
| where TimeGenerated > ago(7d)
| where CounterName == "% Processor Time"
| summarize AvgCPU = avg(CounterValue) by Computer
| join kind=inner (
AzureActivity
| where OperationNameValue contains "Microsoft.Compute/virtualMachines"
| distinct ResourceId, Computer = split(ResourceId, "/")[-1]
) on Computer
| extend CostPerUtilization = 100 / AvgCPU
| order by CostPerUtilization descSet up webhooks for external system integration:
# Create webhook for external notifications
az automation webhook create \
--automation-account-name $AUTOMATION_ACCOUNT \
--resource-group $RESOURCE_GROUP \
--name "ExternalCostAlert" \
--runbook-name "Optimize-VMSize" \
--parameters "DryRun=false"Extend to multiple subscriptions:
# Multi-subscription runbook template
$subscriptions = @("sub1-id", "sub2-id", "sub3-id")
foreach ($subscription in $subscriptions) {
Set-AzContext -SubscriptionId $subscription
# Run optimization logic for each subscription
}The system generates several types of reports:
-
Executive Summary Report
- Monthly cost savings achieved
- Optimization recommendations
- ROI analysis
-
Technical Report
- Detailed resource analysis
- Performance impact assessment
- Security and compliance status
-
Trend Analysis Report
- Cost trending over time
- Utilization patterns
- Forecast predictions
Create custom Azure Workbooks for specific needs:
{
"version": "Notebook/1.0",
"items": [
{
"type": 3,
"content": {
"version": "KqlItem/1.0",
"query": "AzureActivity | where TimeGenerated > ago(30d) | summarize count() by bin(TimeGenerated, 1d)",
"visualization": "timechart"
}
}
]
}-
Monthly Reviews
- Review optimization thresholds
- Update runbook parameters
- Analyze cost savings reports
-
Quarterly Updates
- Update PowerShell modules
- Review and update automation schedules
- Security assessment and updates
-
Annual Planning
- Budget planning and threshold updates
- Architecture review and optimization
- Capacity planning and scaling
# Update PowerShell modules in Automation Account
./scripts/update-modules.sh
# Update runbook logic
./scripts/deploy-runbooks.sh --update
# Refresh monitoring configuration
./scripts/configure-monitoring.sh --refresh