diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md new file mode 100644 index 000000000..e17947c1c --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -0,0 +1,123 @@ +--- +name: ๐Ÿ› Bug Report +about: Report a bug or issue with Rover +title: '[BUG] ' +labels: ['bug', 'triage'] +assignees: [] + +--- + +## ๐Ÿ› Bug Description + +A clear and concise description of what the bug is. + +## ๐Ÿ”„ Steps to Reproduce + +1. Go to '...' +2. Run command '...' +3. Set configuration '...' +4. See error + +## โœ… Expected Behavior + +A clear and concise description of what you expected to happen. + +## โŒ Actual Behavior + +A clear and concise description of what actually happened. + +## ๐Ÿ–ผ๏ธ Screenshots/Logs + +If applicable, add screenshots or logs to help explain your problem. + +``` +Paste error logs here +``` + +## ๐ŸŒ Environment Information + +Please complete the following information: + +**Host System:** +- OS: [e.g., Ubuntu 20.04, Windows 11, macOS 12] +- Docker Version: [e.g., 20.10.17] +- Architecture: [e.g., x64, arm64] + +**Rover:** +- Rover Version: [e.g., 1.3.0] +- Container Image: [e.g., aztfmod/rover:latest] +- Installation Method: [e.g., Docker Hub, Local Build] + +**Azure:** +- Azure CLI Version: [e.g., 2.40.0] +- Subscription Type: [e.g., Free Trial, Pay-As-You-Go, Enterprise] +- Authentication Method: [e.g., Interactive, Service Principal, Managed Identity] + +**Terraform:** +- Terraform Version: [e.g., 1.3.0] +- Provider Versions: [e.g., azurerm 3.20.0] + +## ๐Ÿ”ง Configuration + +**Landing Zone:** +- Level: [e.g., level0, level1, level2] +- Environment: [e.g., dev, staging, production] +- Custom Configuration: [Yes/No] + +**Command Used:** +```bash +# Paste the exact rover command that caused the issue +rover -lz ./landingzone -a apply -env production +``` + +**Terraform Configuration:** +```hcl +# Include relevant terraform configuration if applicable +# Remove any sensitive information +``` + +## ๐Ÿ•ต๏ธ Diagnostic Information + +Please run the diagnostic script and paste the output: + +```bash +# Run this diagnostic script and paste output +curl -s https://raw.githubusercontent.com/aztfmod/rover/main/scripts/diagnostics.sh | bash +``` + +
+Diagnostic Output + +``` +Paste diagnostic output here +``` + +
+ +## ๐Ÿ” Additional Context + +Add any other context about the problem here. + +- Have you encountered this issue before? [Yes/No] +- Does this issue occur consistently? [Yes/No] +- Any recent changes to your environment? [Yes/No - describe] +- Are you using any custom configurations? [Yes/No - describe] + +## ๐Ÿ“‹ Checklist + +Before submitting this issue, please confirm: + +- [ ] I have searched existing issues to avoid duplicates +- [ ] I have included all required environment information +- [ ] I have included the exact error message and/or logs +- [ ] I have included steps to reproduce the issue +- [ ] I have removed any sensitive information (credentials, personal data) +- [ ] I have run the diagnostic script and included the output + +## ๐Ÿ’ก Possible Solution + +If you have any ideas on how to solve this issue, please describe them here. + +--- + +**Note**: This issue will be triaged by the maintainers. Please be patient and provide additional information if requested. \ No newline at end of file diff --git a/.github/ISSUE_TEMPLATE/documentation.md b/.github/ISSUE_TEMPLATE/documentation.md new file mode 100644 index 000000000..5f4bc417a --- /dev/null +++ b/.github/ISSUE_TEMPLATE/documentation.md @@ -0,0 +1,139 @@ +--- +name: ๐Ÿ“š Documentation Issue +about: Report issues with documentation or suggest improvements +title: '[DOCS] ' +labels: ['documentation', 'triage'] +assignees: [] + +--- + +## ๐Ÿ“š Documentation Issue + +**Type of documentation issue:** +- [ ] Missing documentation +- [ ] Outdated documentation +- [ ] Incorrect information +- [ ] Unclear/confusing content +- [ ] Documentation improvement suggestion +- [ ] New documentation request + +## ๐Ÿ“ Location + +**Where is the documentation issue located?** +- File/Page: [e.g., README.md, docs/USAGE.md, website page] +- Section: [e.g., Installation, Authentication, Examples] +- Line number: [if applicable] +- URL: [if web documentation] + +## ๐Ÿ› Current Issue + +**What is wrong with the current documentation?** +A clear description of the issue with the existing documentation. + +```markdown + +Current text that needs to be fixed +``` + +## โœ… Suggested Fix + +**What should it say instead?** +Provide your suggested improvement or correction. + +```markdown + +Suggested replacement text +``` + +## ๐Ÿ“‹ Missing Information + +**What information is missing?** (if applicable) +- Topic/concept that needs documentation +- Use case that needs examples +- Configuration that needs explanation +- Process that needs step-by-step guide + +## ๐ŸŽฏ Target Audience + +**Who would benefit from this documentation?** +- [ ] New users getting started +- [ ] Experienced users seeking reference +- [ ] Developers contributing to the project +- [ ] DevOps engineers implementing CI/CD +- [ ] Security teams reviewing compliance + +## ๐Ÿ“– Content Suggestions + +**Proposed content structure:** (for new documentation) + +1. Section 1: [Title] + - Subsection 1.1 + - Subsection 1.2 + +2. Section 2: [Title] + - Subsection 2.1 + - Subsection 2.2 + +**Key topics to cover:** +- Topic 1 +- Topic 2 +- Topic 3 + +## ๐Ÿ’ก Examples Needed + +**What examples would be helpful?** +- [ ] Code examples +- [ ] Command-line examples +- [ ] Configuration examples +- [ ] Workflow examples +- [ ] Troubleshooting scenarios + +```bash +# Example command that should be documented +rover example-command --with-options +``` + +## ๐Ÿ”— References + +**Related resources:** +- Links to related documentation +- External resources that could help +- Similar examples from other projects + +## ๐Ÿ“ฑ Format Preferences + +**Preferred documentation format:** +- [ ] Markdown file +- [ ] Code comments +- [ ] README section +- [ ] Separate guide +- [ ] Video tutorial +- [ ] Interactive examples + +## ๐ŸŒ Context + +**Additional context about this documentation need:** +- How did you discover this issue? +- What were you trying to accomplish? +- What documentation would have helped you? + +## ๐Ÿ“‹ Checklist + +Before submitting this documentation issue: + +- [ ] I have searched existing issues for similar documentation requests +- [ ] I have checked if this documentation already exists elsewhere +- [ ] I have provided specific suggestions for improvement +- [ ] I have considered the target audience for this documentation + +## ๐Ÿค Contribution + +**Are you willing to help with this documentation?** +- [ ] Yes, I can write the documentation +- [ ] Yes, I can review the documentation +- [ ] Yes, I can provide examples +- [ ] No, but I can provide feedback on drafts + +--- + +**Note**: Documentation issues help improve the project for everyone. Thank you for taking the time to suggest improvements! \ No newline at end of file diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md new file mode 100644 index 000000000..7fb2ebe91 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -0,0 +1,133 @@ +--- +name: ๐Ÿš€ Feature Request +about: Suggest a new feature or enhancement for Rover +title: '[FEATURE] ' +labels: ['enhancement', 'triage'] +assignees: [] + +--- + +## ๐Ÿš€ Feature Request + +A clear and concise description of the feature you would like to see added. + +## ๐ŸŽฏ Problem Statement + +**Is your feature request related to a problem? Please describe.** +A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] + +## ๐Ÿ’ก Proposed Solution + +**Describe the solution you'd like** +A clear and concise description of what you want to happen. + +## ๐Ÿ”„ User Story + +**As a** [type of user] +**I want** [some goal] +**So that** [some reason/benefit] + +## ๐Ÿ“‹ Acceptance Criteria + +- [ ] Criterion 1 +- [ ] Criterion 2 +- [ ] Criterion 3 + +## ๐ŸŽจ Mockups/Examples + +If applicable, add mockups, screenshots, or code examples to illustrate your feature request. + +```bash +# Example command or usage +rover new-feature --option value +``` + +```yaml +# Example configuration +new_feature: + enabled: true + options: + - option1 + - option2 +``` + +## ๐Ÿ”ง Technical Considerations + +**Implementation Ideas:** +- Possible approach 1 +- Possible approach 2 +- Integration points + +**Impact Areas:** +- [ ] CLI interface +- [ ] Docker container +- [ ] Terraform workflows +- [ ] CI/CD integration +- [ ] Documentation +- [ ] Testing + +## ๐ŸŒŸ Alternatives Considered + +**Describe alternatives you've considered** +A clear and concise description of any alternative solutions or features you've considered. + +## ๐Ÿ“Š Business Value + +**Impact Level:** [Low/Medium/High] + +**Benefits:** +- Benefit 1 +- Benefit 2 +- Benefit 3 + +**Use Cases:** +1. Use case 1: Description +2. Use case 2: Description +3. Use case 3: Description + +## ๐Ÿ”— Related Issues + +- Link to related issues/discussions +- Reference existing feature requests +- Dependencies on other features + +## ๐ŸŒ Environment Context + +**Affected Components:** +- [ ] Rover Core +- [ ] Landing Zones +- [ ] CI/CD Agents +- [ ] Documentation +- [ ] Testing + +**Target Users:** +- [ ] Infrastructure Engineers +- [ ] DevOps Engineers +- [ ] Platform Teams +- [ ] Developers +- [ ] Security Teams + +## ๐Ÿ“ Additional Context + +Add any other context, research, or screenshots about the feature request here. + +**Research:** +- Links to relevant documentation +- Similar features in other tools +- Industry standards or best practices + +**Priority Justification:** +Explain why this feature should be prioritized. + +## ๐Ÿ“‹ Definition of Done + +- [ ] Feature implemented and tested +- [ ] Documentation updated +- [ ] Examples provided +- [ ] CI/CD tests pass +- [ ] Breaking changes documented +- [ ] Migration guide provided (if applicable) + +--- + +**Note**: Feature requests will be reviewed by the maintainers and prioritized based on community needs and project roadmap. \ No newline at end of file diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md new file mode 100644 index 000000000..669173a93 --- /dev/null +++ b/.github/pull_request_template.md @@ -0,0 +1,187 @@ +# Pull Request + +## ๐Ÿ“ Description + +Provide a brief description of what this PR does. + +## ๐Ÿ”— Related Issues + +- Fixes #(issue number) +- Closes #(issue number) +- Related to #(issue number) + +## ๐ŸŽฏ Type of Change + +- [ ] ๐Ÿ› Bug fix (non-breaking change which fixes an issue) +- [ ] ๐Ÿš€ New feature (non-breaking change which adds functionality) +- [ ] ๐Ÿ’ฅ Breaking change (fix or feature that would cause existing functionality to not work as expected) +- [ ] ๐Ÿ“š Documentation update (improvements or additions to documentation) +- [ ] ๐Ÿ”ง Maintenance (dependency updates, code cleanup, etc.) +- [ ] ๐ŸŽจ Style (formatting, missing semi colons, etc; no production code change) +- [ ] โ™ป๏ธ Refactoring (no functional changes, no api changes) +- [ ] โšก Performance improvements +- [ ] โœ… Test additions or modifications + +## ๐Ÿงช Testing + +**How has this been tested?** + +- [ ] Unit tests added/updated +- [ ] Integration tests added/updated +- [ ] Manual testing completed +- [ ] Existing tests pass + +**Test Configuration:** +- Rover version: +- OS: [e.g., Ubuntu 20.04, Windows 11, macOS 12] +- Docker version: +- Azure subscription type: + +**Test scenarios covered:** +1. Scenario 1: Description +2. Scenario 2: Description +3. Scenario 3: Description + +## ๐Ÿ“‹ Checklist + +**Before submitting this PR, please make sure:** + +### Code Quality +- [ ] My code follows the project's style guidelines +- [ ] I have performed a self-review of my own code +- [ ] I have commented my code, particularly in hard-to-understand areas +- [ ] My changes generate no new warnings or errors +- [ ] I have removed any debugging code or console.log statements + +### Testing +- [ ] I have added tests that prove my fix is effective or that my feature works +- [ ] New and existing unit tests pass locally with my changes +- [ ] Any dependent changes have been merged and published + +### Documentation +- [ ] I have made corresponding changes to the documentation +- [ ] I have updated the README.md if needed +- [ ] I have added/updated inline code comments +- [ ] I have updated any relevant guides or tutorials + +### Security & Compliance +- [ ] My changes don't introduce security vulnerabilities +- [ ] I have not committed sensitive information (credentials, secrets, etc.) +- [ ] I have followed the security guidelines in SECURITY.md +- [ ] My changes comply with relevant compliance requirements + +### Breaking Changes +- [ ] This change requires a documentation update +- [ ] This change requires a migration guide +- [ ] I have updated the version number appropriately +- [ ] I have updated CHANGELOG.md + +## ๐Ÿ“ธ Screenshots (if applicable) + +**Before:** + + +**After:** + + +## ๐Ÿš€ Deployment Notes + +**Special deployment considerations:** +- Database migrations required: [ ] Yes [ ] No +- Configuration changes required: [ ] Yes [ ] No +- Dependencies updated: [ ] Yes [ ] No +- Breaking changes: [ ] Yes [ ] No + +**Post-deployment verification steps:** +1. Step 1 +2. Step 2 +3. Step 3 + +## ๐Ÿ”„ Migration Guide (if breaking changes) + +**For users upgrading from previous version:** + +1. Step-by-step migration instructions +2. Configuration changes required +3. Deprecated features and alternatives + +```bash +# Example migration commands +rover migrate --from v1.0 --to v2.0 +``` + +## ๐Ÿ“Š Performance Impact + +**Performance considerations:** +- [ ] This change improves performance +- [ ] This change has no performance impact +- [ ] This change may impact performance (explain below) + +**Performance testing results:** +- Metric 1: Before/After +- Metric 2: Before/After +- Metric 3: Before/After + +## ๐Ÿ” Code Review Focus Areas + +**Please pay special attention to:** +- [ ] Security implications +- [ ] Performance impact +- [ ] Error handling +- [ ] Edge cases +- [ ] API compatibility +- [ ] User experience + +**Specific questions for reviewers:** +1. Question 1 about implementation choice +2. Question 2 about architecture decision +3. Question 3 about test coverage + +## ๐ŸŒ Additional Context + +Add any other context about the pull request here. + +**Dependencies:** +- This PR depends on: #(issue/PR number) +- This PR blocks: #(issue/PR number) + +**Future work:** +- Follow-up tasks that should be created +- Related improvements planned + +**Rollback plan:** +- How to revert this change if needed +- Any data recovery considerations + +--- + +## ๐Ÿ‘ฅ Reviewer Guidelines + +**For maintainers reviewing this PR:** + +1. **Code Review Checklist:** + - [ ] Code follows project conventions + - [ ] Logic is clear and well-documented + - [ ] Error handling is appropriate + - [ ] Tests are comprehensive + - [ ] Performance impact is acceptable + +2. **Security Review:** + - [ ] No sensitive data exposed + - [ ] Input validation is proper + - [ ] Authentication/authorization correct + - [ ] No security vulnerabilities introduced + +3. **Documentation Review:** + - [ ] Documentation is accurate and complete + - [ ] Examples are working and relevant + - [ ] Breaking changes are clearly documented + +**Testing Notes for Reviewers:** +- Key scenarios to test manually +- Edge cases to verify +- Integration points to check + +--- + +Thank you for contributing to Rover! ๐Ÿš€ \ No newline at end of file diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 000000000..a8303fa32 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,310 @@ +# Contributing to Rover + +We welcome contributions to the Azure Terraform SRE Rover project! This guide will help you get started with contributing to the project. + +## ๐Ÿ“‹ Table of Contents + +- [Code of Conduct](#code-of-conduct) +- [Development Environment Setup](#development-environment-setup) +- [Contributing Workflow](#contributing-workflow) +- [Code Standards](#code-standards) +- [Testing Guidelines](#testing-guidelines) +- [Documentation Standards](#documentation-standards) +- [Pull Request Process](#pull-request-process) + +## Code of Conduct + +This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). +For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or +contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. + +## Development Environment Setup + +### Prerequisites + +- Docker Desktop +- Git +- Visual Studio Code (recommended) +- Azure CLI (for testing) + +### Local Development Setup + +1. **Clone the repository** + ```bash + git clone https://github.com/aztfmod/rover.git + cd rover + ``` + +2. **Build local development image** + ```bash + make dev + ``` + +3. **Set up Dev Container (recommended)** + - Open the project in VS Code + - Install the "Remote - Containers" extension + - Command Palette โ†’ "Remote-Containers: Reopen in Container" + +4. **Alternative: Run locally** + ```bash + # Build and run rover locally + make local + docker run -it --rm rover-local:latest + ``` + +### Project Structure + +``` +rover/ +โ”œโ”€โ”€ scripts/ # Main shell scripts and rover logic +โ”‚ โ”œโ”€โ”€ rover.sh # Main entry point +โ”‚ โ”œโ”€โ”€ functions.sh # Core utility functions +โ”‚ โ”œโ”€โ”€ lib/ # Library functions +โ”‚ โ””โ”€โ”€ ci_tasks/ # CI/CD task definitions +โ”œโ”€โ”€ docs/ # Documentation +โ”œโ”€โ”€ agents/ # CI/CD agent configurations +โ”œโ”€โ”€ .devcontainer/ # VS Code dev container config +โ”œโ”€โ”€ Dockerfile # Main container definition +โ””โ”€โ”€ Makefile # Build automation +``` + +## Contributing Workflow + +### 1. Issue Creation + +Before making changes: +- Check existing issues for duplicates +- Create a new issue describing the problem or feature +- Discuss the approach with maintainers if it's a significant change + +### 2. Branch and Development + +```bash +# Create feature branch +git checkout -b feature/your-feature-name + +# Make your changes +# Test your changes + +# Commit with clear messages +git commit -m "feat: add new landing zone validation" +``` + +### 3. Commit Message Format + +We follow conventional commit format: + +``` +type(scope): description + +[optional body] + +[optional footer] +``` + +**Types:** +- `feat`: New features +- `fix`: Bug fixes +- `docs`: Documentation changes +- `style`: Code style changes +- `refactor`: Code refactoring +- `test`: Test additions/changes +- `chore`: Maintenance tasks + +## Code Standards + +### Shell Script Guidelines + +1. **Header Documentation** + ```bash + #!/bin/bash + # + # Description: Brief description of script purpose + # Usage: script_name.sh [options] + # Author: Your Name + # Last Modified: Date + # + ``` + +2. **Function Documentation** + ```bash + # + # Function: function_name + # Description: What the function does + # Parameters: + # $1 - Parameter description + # $2 - Parameter description + # Returns: Description of return value + # Example: function_name "param1" "param2" + # + function_name() { + # Function implementation + } + ``` + +3. **Error Handling** + - Always check return codes for critical operations + - Use proper error messages with context + - Exit with appropriate error codes + +4. **Variable Naming** + - Use lowercase with underscores: `my_variable` + - Use uppercase for environment variables: `MY_ENV_VAR` + - Prefix local variables in functions: `local my_local_var` + +5. **Code Style** + - Use 4 spaces for indentation + - Keep lines under 120 characters + - Use meaningful variable and function names + - Add comments for complex logic + +### Dockerfile Guidelines + +- Use official base images when possible +- Minimize layers and image size +- Document installation steps with comments +- Use multi-stage builds where appropriate +- Set appropriate user permissions + +## Testing Guidelines + +### Running Tests + +```bash +# Run all tests +./scripts/test_runner.sh + +# Run specific test suite +shellspec spec/unit/ + +# Run with coverage +shellspec --kcov +``` + +### Writing Tests + +1. **Test File Structure** + ```bash + # spec/unit/my_feature_spec.sh + Describe "My Feature" + It "should perform expected behavior" + When call my_function "input" + The output should equal "expected_output" + The status should be success + End + End + ``` + +2. **Test Categories** + - Unit tests for individual functions + - Integration tests for script workflows + - End-to-end tests for complete scenarios + +3. **Test Best Practices** + - Test both success and failure scenarios + - Use descriptive test names + - Keep tests isolated and independent + - Mock external dependencies + +## Documentation Standards + +### Markdown Guidelines + +- Use clear, concise language +- Include code examples for complex concepts +- Add table of contents for longer documents +- Use consistent heading structure +- Include links to related documentation + +### Code Documentation + +- Document all public functions and scripts +- Include usage examples +- Explain complex algorithms or logic +- Update documentation when changing code + +## Pull Request Process + +### Before Submitting + +1. **Code Review Checklist** + - [ ] Code follows project standards + - [ ] Tests pass locally + - [ ] Documentation updated + - [ ] No sensitive information committed + - [ ] Commit messages follow convention + +2. **Testing Checklist** + - [ ] Unit tests added/updated + - [ ] Integration tests pass + - [ ] Manual testing completed + - [ ] No breaking changes (or properly documented) + +### PR Submission + +1. **Create Pull Request** + - Use descriptive title + - Reference related issues + - Include detailed description of changes + - Add screenshots for UI changes + +2. **PR Template** + ```markdown + ## Description + Brief description of changes + + ## Type of Change + - [ ] Bug fix + - [ ] New feature + - [ ] Documentation update + - [ ] Code refactoring + + ## Testing + - [ ] Tests added/updated + - [ ] Manual testing completed + + ## Related Issues + Fixes #123 + ``` + +### Review Process + +1. **Automated Checks** + - CI/CD pipeline must pass + - Code quality checks must pass + - Security scans must pass + +2. **Human Review** + - At least one maintainer approval required + - Address all review feedback + - Maintain clean commit history + +3. **Merge** + - Squash commits if multiple small commits + - Use merge commit for feature branches + - Delete feature branch after merge + +## Getting Help + +### Communication Channels + +- **Issues**: GitHub Issues for bugs and feature requests +- **Discussions**: GitHub Discussions for questions and ideas +- **Gitter**: [aztfmod/community](https://gitter.im/aztfmod/community) for real-time chat +- **Email**: tf-landingzones@microsoft.com for direct contact + +### Resources + +- [Azure Terraform Documentation](https://aka.ms/caf/terraform) +- [Terraform Best Practices](https://www.terraform.io/docs/cloud/guides/recommended-practices/index.html) +- [Azure Cloud Adoption Framework](https://docs.microsoft.com/azure/cloud-adoption-framework/) + +## License Agreement + +Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com. + +When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately. Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA. + +--- + +Thank you for contributing to Rover! ๐Ÿš€ \ No newline at end of file diff --git a/Dockerfile b/Dockerfile index c5bbbfcf3..707f92c07 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,32 +1,63 @@ +# +# Azure Terraform SRE - Rover Container Image +# +# This Dockerfile builds the rover container image with all necessary tools +# for managing Azure infrastructure using Terraform. The image includes: +# - Azure CLI and authentication tools +# - Terraform and related tools (tflint, terrascan, etc.) +# - Kubernetes tools (kubectl, helm) +# - Development tools and utilities +# - Security scanning and compliance tools +# +# Build process uses multi-stage approach for optimization: +# - base: Core Ubuntu system with essential packages +# - tools: Development and infrastructure tools +# - security: Security scanning and compliance tools +# - final: Complete rover image with user setup +# +# Usage: +# docker build -t rover . +# docker run -it rover bash +# +# Build arguments can be used to customize tool versions: +# docker build --build-arg versionKubectl=1.25.0 -t rover . +# + ########################################################### -# base tools and dependencies +# Stage 1: Base system with essential packages ########################################################### FROM ubuntu:22.04 AS base +# Use bash as default shell for better scripting support SHELL ["/bin/bash", "-c"] +# +# Build Arguments +# These arguments define versions of tools to install and can be overridden +# during build time to customize the image for specific requirements +# -# Arguments set during docker-compose build -b --build from .env file - -ARG versionVault \ - versionKubectl \ - versionKubelogin \ - versionDockerCompose \ - versionPowershell \ - versionPacker \ - versionGolang \ - versionTerraformDocs \ - versionAnsible \ - versionTerrascan \ - versionTfupdate \ - extensionsAzureCli \ - SSH_PASSWD \ - TARGETARCH \ - TARGETOS +# Tool version arguments - can be overridden during build +ARG versionVault \ # HashiCorp Vault CLI version + versionKubectl \ # Kubernetes kubectl version + versionKubelogin \ # Azure Kubernetes login tool version + versionDockerCompose \ # Docker Compose version + versionPowershell \ # PowerShell Core version + versionPacker \ # HashiCorp Packer version + versionGolang \ # Go programming language version + versionTerraformDocs \ # Terraform documentation generator version + versionAnsible \ # Ansible automation tool version + versionTerrascan \ # Terrascan security scanner version + versionTfupdate \ # Terraform module updater version + extensionsAzureCli \ # Azure CLI extensions to install + SSH_PASSWD \ # SSH password for development access + TARGETARCH \ # Target architecture (amd64, arm64) + TARGETOS # Target OS (linux, windows, darwin) -ARG USERNAME=vscode -ARG USER_UID=1000 -ARG USER_GID=${USER_UID} +# User configuration arguments +ARG USERNAME=vscode # Default username for the container user +ARG USER_UID=1000 # User ID for the container user +ARG USER_GID=${USER_UID} # Group ID for the container user (defaults to USER_UID) ENV SSH_PASSWD=${SSH_PASSWD} \ USERNAME=${USERNAME} \ diff --git a/Makefile b/Makefile index 4b83fefe2..c8c9f3a17 100644 --- a/Makefile +++ b/Makefile @@ -1,23 +1,145 @@ -default: github - -github: - @bash "$(CURDIR)/scripts/build_image.sh" "github" - -# -# To build local images in a different platform architecture (from a macos m1 processor). (used to generate the azdo agent on macos) -# make local arch=Linux/amd64 -# -# To build local images -# make local -local: - echo ${arch} - @bash "$(CURDIR)/scripts/build_image.sh" "local" ${arch} ${agent} - -dev: - @bash "$(CURDIR)/scripts/build_image.sh" "dev" ${arch} ${agent} - -ci: - @bash "$(CURDIR)/scripts/build_image.sh" "ci" - -alpha: - @bash "$(CURDIR)/scripts/build_image.sh" "alpha" \ No newline at end of file +# +# Rover Build Automation Makefile +# +# This Makefile provides targets for building rover container images for different +# environments and platforms. All targets use the build_image.sh script with +# different parameters to generate appropriate container images. +# +# Usage: +# make [target] # Build for specific target +# make help # Show available targets and usage +# +# Available targets: +# github (default) # Build image for GitHub Actions +# local # Build local development image +# dev # Build development image +# ci # Build CI/CD optimized image +# alpha # Build alpha/preview image +# +# Variables: +# arch # Target architecture (e.g., Linux/amd64, Linux/arm64) +# agent # CI/CD agent type (github, azdo, gitlab, etc.) +# +# Examples: +# make # Build default GitHub image +# make local # Build local development image +# make local arch=Linux/amd64 # Build for specific architecture +# make dev agent=azdo # Build Azure DevOps development agent +# + +# Set default target to github if no target is specified +default: github + +# +# Target: github +# Description: Build rover image optimized for GitHub Actions workflows +# Output: Container image tagged for GitHub Actions usage +# Usage: make github +# +github: + @echo "Building rover image for GitHub Actions..." + @bash "$(CURDIR)/scripts/build_image.sh" "github" + +# +# Target: local +# Description: Build rover image for local development and testing +# Parameters: +# arch (optional) - Target architecture (e.g., Linux/amd64, Linux/arm64) +# agent (optional) - CI/CD agent type to include +# Output: Local container image for development use +# Usage: +# make local # Build for current architecture +# make local arch=Linux/amd64 # Build for specific architecture (e.g., M1 Mac building x64) +# make local agent=azdo # Build with Azure DevOps agent +# +local: + @echo "Building local rover image..." + @echo "Architecture: ${arch}" + @echo "Agent: ${agent}" + @bash "$(CURDIR)/scripts/build_image.sh" "local" ${arch} ${agent} + +# +# Target: dev +# Description: Build rover image for development with additional debugging tools +# Parameters: +# arch (optional) - Target architecture +# agent (optional) - CI/CD agent type to include +# Output: Development container image with enhanced tooling +# Usage: make dev +# +dev: + @echo "Building development rover image..." + @bash "$(CURDIR)/scripts/build_image.sh" "dev" ${arch} ${agent} + +# +# Target: ci +# Description: Build rover image optimized for CI/CD pipelines +# Output: Container image optimized for continuous integration workflows +# Usage: make ci +# +ci: + @echo "Building CI-optimized rover image..." + @bash "$(CURDIR)/scripts/build_image.sh" "ci" + +# +# Target: alpha +# Description: Build rover alpha/preview image with latest features +# Output: Preview container image for testing unreleased features +# Usage: make alpha +# +alpha: + @echo "Building rover alpha/preview image..." + @bash "$(CURDIR)/scripts/build_image.sh" "alpha" + +# +# Target: help +# Description: Display help information about available targets +# Usage: make help +# +help: + @echo "" + @echo "Rover Build System" + @echo "==================" + @echo "" + @echo "Available targets:" + @echo " github (default) Build image for GitHub Actions" + @echo " local Build local development image" + @echo " dev Build development image with debugging tools" + @echo " ci Build CI/CD optimized image" + @echo " alpha Build alpha/preview image" + @echo " help Show this help message" + @echo "" + @echo "Variables:" + @echo " arch= Target architecture (Linux/amd64, Linux/arm64)" + @echo " agent= CI/CD agent type (github, azdo, gitlab, tfc)" + @echo "" + @echo "Examples:" + @echo " make # Build default GitHub image" + @echo " make local # Build local development image" + @echo " make local arch=Linux/amd64 # Build for x64 (useful on M1 Macs)" + @echo " make dev agent=azdo # Build Azure DevOps development image" + @echo "" + @echo "For more information, see: docs/CONTRIBUTING.md" + @echo "" + +# +# Target: clean +# Description: Clean up build artifacts and temporary files +# Usage: make clean +# +clean: + @echo "Cleaning build artifacts..." + @docker system prune -f + @docker image prune -f + @echo "Clean complete." + +# +# Target: test +# Description: Run tests against built images +# Usage: make test +# +test: + @echo "Running rover image tests..." + @bash "$(CURDIR)/scripts/test_runner.sh" + +.PHONY: default github local dev ci alpha help clean test \ No newline at end of file diff --git a/README.md b/README.md index fdabf6a8b..7756ebb86 100644 --- a/README.md +++ b/README.md @@ -6,9 +6,13 @@ > :warning: This solution, offered by the Open-Source community, will no longer receive contributions from Microsoft. Customers are encouraged to transition to [Microsoft Azure Verified Modules](https://aka.ms/avm) for Microsoft support and updates. -Azure Terraform SRE provides you with guidance and best practices to adopt Azure. +## Overview -The CAF **rover** is helping you managing your enterprise Terraform deployments on Microsoft Azure and is composed of two parts: +Azure Terraform SRE provides you with guidance and best practices to adopt Azure infrastructure-as-code using Terraform. The Cloud Adoption Framework (CAF) **rover** is a comprehensive toolset for managing enterprise-scale Terraform deployments on Microsoft Azure. + +## What is Rover? + +The CAF **rover** helps you manage your enterprise Terraform deployments on Microsoft Azure and is composed of two main components: - **A docker container** - Allows consistent developer experience on PC, Mac, Linux, including the right tools, git hooks and DevOps tools. @@ -25,16 +29,59 @@ The CAF **rover** is helping you managing your enterprise Terraform deployments The rover is available from the Docker Hub in form of: - [Standalone edition](https://hub.docker.com/r/aztfmod/rover/tags?page=1&ordering=last_updated): to be used for landing zones engineering or pipelines. -- [Adding runner (agent) for the following platforms](https://hub.docker.com/r/aztfmod/rover-agent/tags?page=1&ordering=last_updated) +- [Runner (agent) editions](https://hub.docker.com/r/aztfmod/rover-agent/tags?page=1&ordering=last_updated) for CI/CD platforms: - Azure DevOps - GitHub Actions - - Gitlab + - GitLab - Terraform Cloud/Terraform Enterprise -### Getting starter with CAF Terraform landing zones +## Quick Start + +### Prerequisites + +- Docker installed on your system +- Azure CLI (if running outside container) +- Access to an Azure subscription + +### Running Rover + +```bash +# Pull the latest rover image +docker pull aztfmod/rover:latest + +# Run rover interactively +docker run -it --rm aztfmod/rover:latest + +# Login to Azure +rover login + +# Deploy a landing zone +rover -lz /tf/caf/landingzones/launchpad -a plan -launchpad +``` + +## Key Features + +- **๐Ÿš€ Enterprise-ready**: Production-tested Terraform patterns and best practices +- **๐Ÿณ Containerized**: Consistent development environment across all platforms +- **๐Ÿ”„ CI/CD Ready**: Native integration with popular DevOps platforms +- **๐Ÿ“Š State Management**: Automated Terraform state file handling on Azure Storage +- **๐Ÿ›ก๏ธ Security**: Built-in security scanning and policy enforcement +- **๐Ÿ“š Comprehensive**: Complete toolchain including Terraform, Azure CLI, and supporting tools + +## Documentation + +- ๐Ÿ“– **[Installation Guide](docs/INSTALLATION.md)** - Detailed setup instructions +- ๐Ÿš€ **[Usage Guide](docs/USAGE.md)** - Command reference and examples +- ๐Ÿ—๏ธ **[Architecture](docs/ARCHITECTURE.md)** - System design and components +- ๐Ÿค **[Contributing](CONTRIBUTING.md)** - Developer guidelines and setup +- ๐Ÿ”’ **[Security](docs/SECURITY.md)** - Security best practices +- ๐Ÿ› **[Troubleshooting](docs/TROUBLESHOOTING.md)** - Common issues and solutions +- โ“ **[FAQ](docs/FAQ.md)** - Frequently asked questions + +### Getting Started with CAF Terraform Landing Zones -If you are reading this, you are probably interested also in reading the doc as below: -:books: Read our [centralized documentation page](https://aka.ms/caf/terraform) +For comprehensive documentation on Cloud Adoption Framework patterns: +:books: **[Visit our centralized documentation](https://aka.ms/caf/terraform)** ## Community diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md new file mode 100644 index 000000000..1664fd6c0 --- /dev/null +++ b/docs/ARCHITECTURE.md @@ -0,0 +1,514 @@ +# Rover Architecture + +This document provides a comprehensive overview of the Azure Terraform SRE Rover architecture, components, and design principles. + +## ๐Ÿ“‹ Table of Contents + +- [Overview](#overview) +- [System Architecture](#system-architecture) +- [Core Components](#core-components) +- [Container Architecture](#container-architecture) +- [Terraform Workflow](#terraform-workflow) +- [State Management](#state-management) +- [CI/CD Integration](#cicd-integration) +- [Security Architecture](#security-architecture) +- [Extension Points](#extension-points) + +## Overview + +Rover is designed as a containerized Terraform wrapper that provides enterprise-grade capabilities for managing Azure infrastructure at scale. The architecture follows containerization best practices and implements the Cloud Adoption Framework (CAF) patterns. + +### Key Design Principles + +- **๐Ÿณ Container-First**: Everything runs in containers for consistency +- **๐Ÿ”„ Stateless**: No persistent state in the container itself +- **๐Ÿ›ก๏ธ Security-First**: Built-in security scanning and policy enforcement +- **๐Ÿ“ฆ Modular**: Extensible through agents and plugins +- **๐ŸŒ Multi-Platform**: Runs on any container platform +- **๐Ÿ”ง Toolchain Integration**: Comprehensive DevOps toolset included + +## System Architecture + +```mermaid +graph TB + subgraph "Developer Workstation" + A[VS Code] --> B[Dev Container] + C[Docker Desktop] --> B + B --> D[Rover Core] + end + + subgraph "CI/CD Platform" + E[GitHub Actions] --> F[Rover Agent] + G[Azure DevOps] --> H[Rover Agent] + I[GitLab] --> J[Rover Agent] + F --> K[Rover Core] + H --> K + J --> K + end + + subgraph "Rover Core" + K --> L[CLI Parser] + L --> M[Terraform Wrapper] + M --> N[State Manager] + N --> O[Azure Provider] + end + + subgraph "Azure Cloud" + O --> P[Storage Account] + O --> Q[Resource Groups] + O --> R[Landing Zones] + P --> S[Terraform State] + end + + subgraph "Security & Compliance" + T[TFLint] --> K + U[Terrascan] --> K + V[Policy Validation] --> K + end +``` + +## Core Components + +### 1. Rover Core (`scripts/rover.sh`) + +The main entry point and orchestration engine: + +```bash +# Main components sourced by rover.sh +scripts/ +โ”œโ”€โ”€ rover.sh # Main entry point and orchestrator +โ”œโ”€โ”€ functions.sh # Core utility functions +โ”œโ”€โ”€ parse_command.sh # CLI argument parsing +โ”œโ”€โ”€ tfstate.sh # Terraform state management +โ”œโ”€โ”€ remote.sh # Remote backend configuration +โ”œโ”€โ”€ ci.sh # Continuous integration workflows +โ”œโ”€โ”€ cd.sh # Continuous deployment workflows +โ””โ”€โ”€ lib/ # Core library functions + โ”œโ”€โ”€ bootstrap.sh # Environment initialization + โ”œโ”€โ”€ init.sh # Terraform initialization + โ”œโ”€โ”€ logger.sh # Logging and output management + โ””โ”€โ”€ parse_parameters.sh # Parameter validation +``` + +**Key Responsibilities:** +- Command-line interface parsing +- Environment initialization +- Terraform workflow orchestration +- Error handling and logging +- State management coordination + +### 2. Terraform Wrapper + +Provides enterprise features around standard Terraform: + +```bash +# Terraform workflow stages +terraform init # Backend configuration + provider download +terraform validate # Syntax and configuration validation +terraform plan # Execution plan generation +terraform apply # Infrastructure provisioning +terraform destroy # Resource cleanup +``` + +**Enhanced Features:** +- Automatic backend configuration +- State file encryption and versioning +- Plan artifact storage +- Policy validation integration +- Cost estimation hooks + +### 3. State Manager (`scripts/tfstate.sh`) + +Manages Terraform state files with enterprise features: + +**Capabilities:** +- Azure Storage Account integration +- State file encryption at rest +- Automatic state locking +- State versioning and backup +- Cross-environment state isolation + +**State Structure:** +``` +Storage Account: stterraformXXXXX +โ”œโ”€โ”€ Container: tfstate +โ”‚ โ”œโ”€โ”€ level0/ +โ”‚ โ”‚ โ”œโ”€โ”€ environment1/ +โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ tfstate.terraform +โ”‚ โ”‚ โ””โ”€โ”€ environment2/ +โ”‚ โ””โ”€โ”€ level1/ +โ””โ”€โ”€ Container: tfstate-backup + โ””โ”€โ”€ versioned-backups/ +``` + +### 4. CI/CD Integration Layer + +Supports multiple CI/CD platforms through specialized agents: + +#### Agent Architecture +```dockerfile +# Base: Rover Core + CI/CD Platform Tools +FROM aztfmod/rover:latest as base + +# Platform-specific layer +FROM base as github-agent +RUN install-github-runner + +FROM base as azdo-agent +RUN install-azdo-agent + +FROM base as gitlab-agent +RUN install-gitlab-runner +``` + +**Supported Platforms:** +- **GitHub Actions**: Self-hosted runners with rover pre-installed +- **Azure DevOps**: Custom agent pools with enterprise tooling +- **GitLab CI**: Shared runners or self-hosted with rover +- **Terraform Cloud**: Custom agents for TFC/TFE integration + +## Container Architecture + +### Base Image Structure + +```dockerfile +# Multi-stage build process +FROM ubuntu:20.04 as base +# System dependencies and security updates + +FROM base as tools +# Install core tools: terraform, azure-cli, kubectl, helm + +FROM tools as security +# Install security tools: tflint, terrascan, checkov + +FROM security as rover +# Install rover scripts and configuration +# Set up user environment and entrypoint +``` + +### Installed Toolchain + +| Category | Tools | Purpose | +|----------|--------|---------| +| **Core** | Terraform, Azure CLI | Infrastructure provisioning | +| **Container** | Docker, kubectl, Helm | Container orchestration | +| **Security** | TFLint, Terrascan, Checkov | Security and compliance scanning | +| **Development** | Git, jq, curl, vim | Development utilities | +| **Observability** | Azure Monitor integration | Logging and monitoring | + +### Volume Mounts + +```bash +# Production deployment +docker run -it \ + -v ~/.azure:/home/vscode/.azure \ # Azure credentials + -v $(pwd):/tf/caf \ # Terraform configurations + -v ~/.terraform.cache:/home/vscode/.terraform.cache \ # Terraform cache + aztfmod/rover:latest +``` + +## Terraform Workflow + +### Standard Workflow + +```mermaid +sequenceDiagram + participant U as User + participant R as Rover + participant T as Terraform + participant A as Azure + + U->>R: rover -lz launchpad -a plan + R->>R: Parse CLI arguments + R->>R: Initialize environment + R->>T: terraform init + T->>A: Download providers + R->>T: terraform plan + T->>A: Read current state + T->>R: Return execution plan + R->>U: Display plan output +``` + +### Enhanced Enterprise Workflow + +```mermaid +sequenceDiagram + participant U as User + participant R as Rover + participant S as Security Scanner + participant T as Terraform + participant B as Backend Storage + participant A as Azure + + U->>R: rover -lz launchpad -a apply + R->>R: Validate parameters + R->>S: Run security scans + S->>R: Security validation results + R->>B: Configure remote backend + R->>T: terraform init + T->>B: Download state + R->>T: terraform plan + T->>A: Generate execution plan + R->>U: Display plan for approval + U->>R: Confirm apply + R->>T: terraform apply + T->>A: Provision resources + T->>B: Update state file + R->>U: Apply complete +``` + +## State Management + +### Backend Configuration + +Rover automatically configures Terraform backends: + +```hcl +# Auto-generated backend.tf +terraform { + backend "azurerm" { + resource_group_name = "rg-terraform-state" + storage_account_name = "stterraform${random_id}" + container_name = "tfstate" + key = "${environment}/${level}/${workspace}.tfstate" + + # Security features + use_azuread_auth = true + use_msi = true + } +} +``` + +### State Isolation Strategy + +``` +Subscription +โ”œโ”€โ”€ Management Groups +โ”œโ”€โ”€ Level 0 (Platform Foundation) +โ”‚ โ”œโ”€โ”€ Identity & Access Management +โ”‚ โ”œโ”€โ”€ Management & Monitoring +โ”‚ โ””โ”€โ”€ Connectivity +โ”œโ”€โ”€ Level 1 (Platform Shared Services) +โ”‚ โ”œโ”€โ”€ Shared Networking +โ”‚ โ”œโ”€โ”€ Shared Security +โ”‚ โ””โ”€โ”€ Shared Management +โ”œโ”€โ”€ Level 2 (Application Landing Zones) +โ”‚ โ”œโ”€โ”€ Corp Connected Apps +โ”‚ โ”œโ”€โ”€ Online Apps +โ”‚ โ””โ”€โ”€ Sandbox/Dev +โ””โ”€โ”€ Level 3 (Application Resources) + โ”œโ”€โ”€ Data Platform + โ”œโ”€โ”€ Analytics Platform + โ””โ”€โ”€ AI/ML Platform +``` + +### State Security + +- **Encryption**: AES-256 encryption at rest +- **Access Control**: Azure RBAC integration +- **Versioning**: Automatic state versioning +- **Backup**: Daily automated backups +- **Auditing**: All state changes logged + +## CI/CD Integration + +### Pipeline Architecture + +```yaml +# Example GitHub Actions integration +name: Infrastructure Deployment +on: + push: + paths: ['terraform/**'] + +jobs: + deploy: + runs-on: self-hosted + container: aztfmod/rover:latest + + steps: + - uses: actions/checkout@v3 + + - name: Terraform Plan + run: | + rover -lz ./terraform/landing-zones/launchpad \ + -a plan \ + -env ${{ github.ref_name }} \ + -level level0 + + - name: Terraform Apply + if: github.ref == 'refs/heads/main' + run: | + rover -lz ./terraform/landing-zones/launchpad \ + -a apply \ + -env production \ + -level level0 +``` + +### Environment Promotion + +```mermaid +graph LR + A[Feature Branch] --> B[Development] + B --> C[Testing] + C --> D[Staging] + D --> E[Production] + + A --> F[rover plan] + B --> G[rover apply] + C --> H[rover apply + tests] + D --> I[rover apply + validation] + E --> J[rover apply + monitoring] +``` + +## Security Architecture + +### Security Layers + +1. **Container Security** + - Non-root user execution + - Minimal attack surface + - Regular security updates + - Vulnerability scanning + +2. **Code Security** + - Static analysis with TFLint + - Policy validation with Terrascan + - Secret scanning with git-secrets + - Configuration validation + +3. **Access Security** + - Azure AD integration + - Service principal authentication + - Managed identity support + - RBAC enforcement + +4. **Network Security** + - Private endpoint support + - VNet integration + - NSG rule validation + - Traffic encryption + +### Security Scanning Pipeline + +```mermaid +graph TB + A[Code Commit] --> B[Security Scan] + B --> C{Scan Results} + C -->|Pass| D[Terraform Plan] + C -->|Fail| E[Block Deployment] + D --> F[Policy Check] + F --> G{Policy Compliance} + G -->|Compliant| H[Deploy] + G -->|Non-Compliant| E +``` + +## Extension Points + +### Custom Agents + +```dockerfile +# Example custom agent +FROM aztfmod/rover:latest + +# Add custom tools +RUN apt-get update && apt-get install -y \ + custom-security-tool \ + custom-monitoring-agent + +# Add custom scripts +COPY custom-scripts/ /usr/local/bin/ + +# Configure custom entrypoint +COPY entrypoint.sh /entrypoint.sh +ENTRYPOINT ["/entrypoint.sh"] +``` + +### Plugin Architecture + +```bash +# Plugin structure +plugins/ +โ”œโ”€โ”€ security/ +โ”‚ โ”œโ”€โ”€ custom-scanner.sh +โ”‚ โ””โ”€โ”€ policy-checker.sh +โ”œโ”€โ”€ monitoring/ +โ”‚ โ”œโ”€โ”€ metrics-collector.sh +โ”‚ โ””โ”€โ”€ alerting.sh +โ””โ”€โ”€ compliance/ + โ”œโ”€โ”€ sox-compliance.sh + โ””โ”€โ”€ iso-27001.sh +``` + +### Custom CI Tasks + +```yaml +# Custom task definition +name: custom-security-scan +description: Run custom security validation +script: | + #!/bin/bash + echo "Running custom security scan..." + custom-security-tool scan --directory $TF_ROOT + echo "Security scan completed" +``` + +## Performance Considerations + +### Optimization Strategies + +1. **Container Optimization** + - Multi-stage builds to reduce image size + - Layer caching for faster builds + - Pre-installed tools to reduce runtime + +2. **Terraform Optimization** + - Provider caching + - Parallel execution where possible + - State file compression + +3. **Network Optimization** + - Regional deployment strategies + - CDN for terraform providers + - Caching strategies + +### Scaling Patterns + +```mermaid +graph TB + subgraph "Horizontal Scaling" + A[Load Balancer] --> B[Agent Pool 1] + A --> C[Agent Pool 2] + A --> D[Agent Pool N] + end + + subgraph "Vertical Scaling" + E[High-Memory Agents] --> F[Large Deployments] + G[Standard Agents] --> H[Regular Deployments] + I[Minimal Agents] --> J[Small Deployments] + end +``` + +## Future Architecture + +### Planned Enhancements + +1. **Golang Migration** + - Improved performance + - Better error handling + - Cross-platform binaries + +2. **Plugin System** + - Dynamic plugin loading + - Community plugin marketplace + - Custom workflow engines + +3. **Cloud-Native Features** + - Kubernetes operators + - Service mesh integration + - Observability improvements + +--- + +Next: [Security Documentation](SECURITY.md) | [Development Guide](../CONTRIBUTING.md) \ No newline at end of file diff --git a/docs/FAQ.md b/docs/FAQ.md new file mode 100644 index 000000000..c3f65be66 --- /dev/null +++ b/docs/FAQ.md @@ -0,0 +1,398 @@ +# Frequently Asked Questions (FAQ) + +This document answers common questions about using the Azure Terraform SRE Rover. + +## ๐Ÿ“‹ Table of Contents + +- [General Questions](#general-questions) +- [Installation & Setup](#installation--setup) +- [Authentication & Permissions](#authentication--permissions) +- [Terraform Operations](#terraform-operations) +- [State Management](#state-management) +- [CI/CD Integration](#cicd-integration) +- [Troubleshooting](#troubleshooting) +- [Advanced Usage](#advanced-usage) + +## General Questions + +### What is Rover? + +Rover is a containerized toolset for managing enterprise Terraform deployments on Microsoft Azure. It provides: +- Consistent development environment across platforms +- Automated Terraform state management +- Built-in security scanning and compliance +- CI/CD integration for popular platforms +- Enterprise-grade features for Azure infrastructure + +### Why use Rover instead of Terraform directly? + +Rover provides several advantages over using Terraform directly: +- **Consistency**: Same environment across all developers and CI/CD pipelines +- **State Management**: Automated Azure Storage backend configuration +- **Security**: Built-in security scanning with TFLint, Terrascan, and others +- **Toolchain**: Pre-installed tools (Azure CLI, kubectl, Helm, etc.) +- **Best Practices**: Cloud Adoption Framework (CAF) patterns built-in +- **Enterprise Features**: RBAC, audit logging, policy enforcement + +### Is Rover only for Azure? + +While Rover is optimized for Azure deployments, it can be used with other cloud providers since it includes standard Terraform. However, the Azure-specific features (state management, authentication, etc.) are designed for Azure. + +### What's the difference between Rover and Terraform Cloud? + +Rover is an open-source, self-hosted solution focused on Azure, while Terraform Cloud is HashiCorp's managed service. Key differences: + +| Feature | Rover | Terraform Cloud | +|---------|-------|-----------------| +| Hosting | Self-hosted | Managed service | +| Cost | Free (open source) | Paid tiers | +| Azure Integration | Deep integration | Basic support | +| State Storage | Azure Storage | Terraform Cloud | +| Security Scanning | Built-in | Add-ons required | + +## Installation & Setup + +### How do I install Rover? + +The easiest way is using Docker: + +```bash +# Pull latest image +docker pull aztfmod/rover:latest + +# Run interactively +docker run -it --rm aztfmod/rover:latest bash +``` + +For detailed installation options, see [Installation Guide](INSTALLATION.md). + +### Can I run Rover on Windows? + +Yes! Rover supports Windows through: +- **Docker Desktop** with WSL2 backend +- **GitHub Codespaces** for cloud development +- **VS Code Dev Containers** for consistent environment + +### Do I need Docker to use Rover? + +Yes, Docker is required as Rover is designed as a containerized solution. This ensures consistency across different environments and platforms. + +### How do I update Rover? + +```bash +# Pull latest version +docker pull aztfmod/rover:latest + +# Or specific version +docker pull aztfmod/rover:1.3.0 +``` + +For production use, pin to specific versions rather than using `latest`. + +## Authentication & Permissions + +### How do I authenticate with Azure? + +Rover supports multiple authentication methods: + +**Interactive (Development):** +```bash +rover login +``` + +**Service Principal (CI/CD):** +```bash +export ARM_CLIENT_ID="your-client-id" +export ARM_CLIENT_SECRET="your-client-secret" +export ARM_SUBSCRIPTION_ID="your-subscription-id" +export ARM_TENANT_ID="your-tenant-id" +``` + +**Managed Identity (Azure-hosted):** +```bash +export ARM_USE_MSI=true +export ARM_SUBSCRIPTION_ID="your-subscription-id" +``` + +### What Azure permissions does Rover need? + +Minimum permissions depend on your deployment: +- **Contributor** role for most deployments +- **Storage Blob Data Contributor** for state management +- **Key Vault** permissions for secret access +- Custom roles for specific scenarios + +See [Security Guide](SECURITY.md) for detailed permission requirements. + +### Can I use multiple Azure subscriptions? + +Yes, you can target different subscriptions by: +- Setting `ARM_SUBSCRIPTION_ID` environment variable +- Using `az account set --subscription` command +- Configuring different service principals per subscription + +### How do I manage secrets securely? + +Rover integrates with Azure Key Vault: + +```bash +# Store secrets in Key Vault +az keyvault secret set --vault-name MyVault --name db-password --value "secret123" + +# Use in Terraform +data "azurerm_key_vault_secret" "db_password" { + name = "db-password" + key_vault_id = data.azurerm_key_vault.main.id +} +``` + +## Terraform Operations + +### How do I deploy a landing zone? + +```bash +# Basic deployment +rover -lz /path/to/landingzone -a apply -env production + +# Launchpad deployment (state management) +rover -lz /path/to/launchpad -a apply -env production --launchpad + +# With custom variables +rover -lz /path/to/landingzone -a apply -env production --var-folder ./configs/prod +``` + +### What's the difference between levels (level0, level1, etc.)? + +Levels represent the deployment hierarchy in the Cloud Adoption Framework: +- **Level 0**: Foundation (identity, networking, state management) +- **Level 1**: Shared services (security, monitoring, connectivity) +- **Level 2**: Application landing zones (workload-specific platforms) +- **Level 3**: Applications and workloads + +### Can I use custom Terraform configurations? + +Yes! Rover works with any Terraform configuration. The level system is optional and mainly provides organization for CAF patterns. + +### How do I pass variables to Terraform? + +Multiple ways: +```bash +# Environment variables +export TF_VAR_environment="production" +export TF_VAR_location="East US" + +# Variable files +rover -lz ./landingzone -a apply --var-folder ./configs/production + +# Direct variables +terraform apply -var="location=East US" +``` + +## State Management + +### Where is my Terraform state stored? + +Rover automatically configures Azure Storage backend: +- **Storage Account**: Created in specified resource group +- **Container**: `tfstate` +- **Key**: Based on environment/level/workspace pattern +- **Encryption**: AES-256 at rest, HTTPS in transit + +### How do I manage multiple environments? + +Use environment-specific workspaces and state files: + +```bash +# Development environment +rover -lz ./landingzone -a apply -env development + +# Production environment +rover -lz ./landingzone -a apply -env production +``` + +Each environment gets its own state file path. + +### Can I migrate existing Terraform state to Rover? + +Yes, you can migrate existing state: + +```bash +# Initialize with new backend +terraform init -migrate-state + +# Or manually copy state file to Azure Storage +az storage blob upload --account-name storage --container-name tfstate --file terraform.tfstate --name myapp.tfstate +``` + +### How do I backup state files? + +Rover automatically creates backups: +- **Versioning**: Enabled on Azure Storage +- **Cross-region replication**: Configure with GRS/RA-GRS +- **Automated backups**: Daily snapshots to backup container + +## CI/CD Integration + +### Which CI/CD platforms does Rover support? + +Rover provides pre-built agents for: +- **GitHub Actions** (github runner) +- **Azure DevOps** (Azure Pipelines agent) +- **GitLab CI** (GitLab runner) +- **Terraform Cloud** (custom agents) + +### How do I set up GitHub Actions with Rover? + +```yaml +name: Deploy Infrastructure +on: + push: + branches: [main] + +jobs: + deploy: + runs-on: ubuntu-latest + container: aztfmod/rover:latest + + steps: + - uses: actions/checkout@v3 + + - name: Azure Login + run: | + az login --service-principal \ + --username ${{ secrets.AZURE_CLIENT_ID }} \ + --password ${{ secrets.AZURE_CLIENT_SECRET }} \ + --tenant ${{ secrets.AZURE_TENANT_ID }} + + - name: Deploy + run: | + rover -lz ./landingzones/app -a apply -env production +``` + +### Can I use Rover in my existing pipelines? + +Yes! You can use Rover as a container in existing pipelines: + +```bash +# In any CI/CD system +docker run --rm \ + -v $(pwd):/tf/caf \ + -e ARM_CLIENT_ID="$ARM_CLIENT_ID" \ + -e ARM_CLIENT_SECRET="$ARM_CLIENT_SECRET" \ + -e ARM_SUBSCRIPTION_ID="$ARM_SUBSCRIPTION_ID" \ + -e ARM_TENANT_ID="$ARM_TENANT_ID" \ + aztfmod/rover:latest \ + rover -lz /tf/caf/landingzone -a apply -env production +``` + +### How do I run CI/CD tests? + +Use rover's built-in CI capabilities: + +```bash +# Run all CI tasks +rover ci -sc ./symphony.yml + +# Run specific task +rover ci -ct tflint -sc ./symphony.yml + +# Run with custom configuration +rover ci -sc ./configs/ci.yml -b ./terraform +``` + +## Troubleshooting + +### Rover command not found + +**Solution**: Ensure you're running inside the rover container: +```bash +docker run -it --rm aztfmod/rover:latest bash +rover --version +``` + +### Authentication failures + +**Check**: +1. Azure CLI login: `az account show` +2. Service principal credentials: Verify `ARM_*` environment variables +3. Subscription access: `az account list` +4. Permissions: Check RBAC assignments + +### Terraform state lock errors + +**Solutions**: +1. Wait for lock to expire (20 minutes) +2. Force unlock: `terraform force-unlock ` +3. Check for running processes: `ps aux | grep terraform` + +### Performance issues + +**Optimizations**: +1. Increase container resources: `--memory=4g --cpus=2` +2. Use provider cache: `export TF_PLUGIN_CACHE_DIR="$HOME/.terraform.d/plugin-cache"` +3. Enable parallelism: `export TF_CLI_ARGS_plan="-parallelism=10"` + +For more detailed troubleshooting, see [Troubleshooting Guide](TROUBLESHOOTING.md). + +## Advanced Usage + +### Can I customize the Rover image? + +Yes, you can extend the base image: + +```dockerfile +FROM aztfmod/rover:latest + +# Add custom tools +RUN apt-get update && apt-get install -y custom-tool + +# Add custom scripts +COPY scripts/ /usr/local/bin/ + +# Set custom entrypoint +COPY entrypoint.sh /entrypoint.sh +ENTRYPOINT ["/entrypoint.sh"] +``` + +### How do I integrate with corporate proxies? + +Configure Docker daemon with proxy settings: + +```json +{ + "proxies": { + "default": { + "httpProxy": "http://proxy.company.com:8080", + "httpsProxy": "http://proxy.company.com:8080", + "noProxy": "localhost,127.0.0.1,.company.com" + } + } +} +``` + +### Can I use Rover for multi-cloud deployments? + +While optimized for Azure, Rover can deploy to other clouds: +- Install additional provider CLIs in custom image +- Configure appropriate authentication for each cloud +- Use separate state backends per cloud provider + +### How do I contribute to Rover? + +1. Read [Contributing Guide](CONTRIBUTING.md) +2. Fork the repository +3. Create feature branch +4. Submit pull request +5. Follow code review process + +### Where can I get help? + +- **Documentation**: [GitHub repository docs](https://github.com/aztfmod/rover/docs) +- **Issues**: [GitHub Issues](https://github.com/aztfmod/rover/issues) +- **Discussions**: [GitHub Discussions](https://github.com/aztfmod/rover/discussions) +- **Chat**: [Gitter Community](https://gitter.im/aztfmod/community) +- **Email**: tf-landingzones@microsoft.com + +--- + +**Didn't find your question?** Open a [GitHub Discussion](https://github.com/aztfmod/rover/discussions) or create an [issue](https://github.com/aztfmod/rover/issues) for documentation improvements. \ No newline at end of file diff --git a/docs/INSTALLATION.md b/docs/INSTALLATION.md new file mode 100644 index 000000000..b41c5e7e7 --- /dev/null +++ b/docs/INSTALLATION.md @@ -0,0 +1,544 @@ +# Installation Guide + +This guide provides comprehensive instructions for installing and setting up the Azure Terraform SRE Rover. + +## ๐Ÿ“‹ Table of Contents + +- [Prerequisites](#prerequisites) +- [Installation Methods](#installation-methods) +- [Docker Installation](#docker-installation) +- [Development Setup](#development-setup) +- [CI/CD Agent Setup](#cicd-agent-setup) +- [Configuration](#configuration) +- [Verification](#verification) +- [Troubleshooting](#troubleshooting) + +## Prerequisites + +### System Requirements + +- **Operating System**: Linux, macOS, or Windows with WSL2 +- **RAM**: Minimum 4GB, recommended 8GB+ +- **Storage**: At least 10GB free space +- **Network**: Internet access for downloading images and Azure connectivity + +### Required Software + +1. **Docker Desktop** + - Version 20.10.0 or later + - Docker Compose v2.0+ (included with Docker Desktop) + +2. **Azure Access** + - Valid Azure subscription + - Appropriate permissions for resource creation + - Azure CLI (optional, for local authentication) + +### Optional Tools + +- **Visual Studio Code** with Remote-Containers extension +- **Git** for cloning repositories +- **Azure CLI** for local development and testing + +## Installation Methods + +### Method 1: Docker Hub (Recommended) + +The fastest way to get started with Rover: + +```bash +# Pull the latest stable version +docker pull aztfmod/rover:latest + +# Run rover interactively +docker run -it --rm aztfmod/rover:latest bash + +# Verify installation +rover --version +``` + +### Method 2: Specific Version + +For production use, pin to a specific version: + +```bash +# Check available versions +# Visit: https://hub.docker.com/r/aztfmod/rover/tags + +# Pull specific version +docker pull aztfmod/rover:1.3.0 + +# Run with version tag +docker run -it --rm aztfmod/rover:1.3.0 bash +``` + +### Method 3: Local Build + +For development or customization: + +```bash +# Clone the repository +git clone https://github.com/aztfmod/rover.git +cd rover + +# Build local image +make local + +# Run local build +docker run -it --rm rover-local:latest bash +``` + +## Docker Installation + +### Linux Installation + +```bash +# Update package index +sudo apt-get update + +# Install required packages +sudo apt-get install -y \ + apt-transport-https \ + ca-certificates \ + curl \ + gnupg \ + lsb-release + +# Add Docker's official GPG key +curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg + +# Set up stable repository +echo \ + "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \ + $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null + +# Install Docker Engine +sudo apt-get update +sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin + +# Add user to docker group +sudo usermod -aG docker $USER + +# Verify installation +docker --version +``` + +### macOS Installation + +```bash +# Install Docker Desktop from https://docs.docker.com/desktop/mac/install/ +# Or using Homebrew +brew install --cask docker + +# Start Docker Desktop and verify +docker --version +``` + +### Windows Installation + +1. **Install WSL2** + ```powershell + # Run as Administrator + wsl --install + ``` + +2. **Install Docker Desktop** + - Download from [Docker Desktop for Windows](https://docs.docker.com/desktop/windows/install/) + - Ensure WSL2 backend is enabled + +3. **Verify in WSL2** + ```bash + docker --version + ``` + +## Development Setup + +### VS Code Dev Container + +For the best development experience: + +1. **Install Prerequisites** + ```bash + # Install VS Code + code --install-extension ms-vscode-remote.remote-containers + ``` + +2. **Open Project** + ```bash + git clone https://github.com/aztfmod/rover.git + cd rover + code . + ``` + +3. **Start Dev Container** + - Command Palette (Ctrl+Shift+P) + - Select "Remote-Containers: Reopen in Container" + - Wait for container to build and start + +4. **Verify Setup** + ```bash + # Inside VS Code terminal + rover --version + terraform --version + az --version + ``` + +### Local Development Build + +```bash +# Clone repository +git clone https://github.com/aztfmod/rover.git +cd rover + +# Build development image +make dev + +# Run with volume mounts for development +docker run -it --rm \ + -v $(pwd):/tf/rover \ + -v ~/.azure:/home/vscode/.azure \ + rover-dev:latest bash +``` + +## CI/CD Agent Setup + +### GitHub Actions + +```yaml +# .github/workflows/terraform.yml +name: Terraform Deployment +on: + push: + branches: [main] + +jobs: + deploy: + runs-on: ubuntu-latest + container: + image: aztfmod/rover:latest + steps: + - uses: actions/checkout@v3 + - name: Azure Login + run: | + az login --service-principal \ + --username ${{ secrets.AZURE_CLIENT_ID }} \ + --password ${{ secrets.AZURE_CLIENT_SECRET }} \ + --tenant ${{ secrets.AZURE_TENANT_ID }} + - name: Deploy Landing Zone + run: | + rover -lz ./landingzones/launchpad -a apply -launchpad +``` + +### Azure DevOps + +```yaml +# azure-pipelines.yml +trigger: + branches: + include: + - main + +pool: + vmImage: 'ubuntu-latest' + +container: aztfmod/rover:latest + +steps: +- task: AzureCLI@2 + displayName: 'Deploy with Rover' + inputs: + azureSubscription: '$(serviceConnection)' + scriptType: 'bash' + scriptLocation: 'inlineScript' + inlineScript: | + rover -lz ./landingzones/launchpad -a apply -launchpad +``` + +### GitLab CI + +```yaml +# .gitlab-ci.yml +image: aztfmod/rover:latest + +stages: + - deploy + +deploy_terraform: + stage: deploy + script: + - az login --service-principal --username $AZURE_CLIENT_ID --password $AZURE_CLIENT_SECRET --tenant $AZURE_TENANT_ID + - rover -lz ./landingzones/launchpad -a apply -launchpad + only: + - main +``` + +## Configuration + +### Azure Authentication + +#### Service Principal (Recommended for CI/CD) + +```bash +# Create service principal +az ad sp create-for-rbac \ + --name "rover-sp" \ + --role contributor \ + --scopes /subscriptions/{subscription-id} + +# Set environment variables +export ARM_CLIENT_ID="your-client-id" +export ARM_CLIENT_SECRET="your-client-secret" +export ARM_SUBSCRIPTION_ID="your-subscription-id" +export ARM_TENANT_ID="your-tenant-id" +``` + +#### Interactive Login (Development) + +```bash +# Login to Azure +az login + +# Select subscription +az account set --subscription "your-subscription-id" + +# Verify +az account show +``` + +### Rover Configuration + +#### Environment Variables + +```bash +# Core settings +export TF_VAR_environment="dev" +export TF_VAR_level="level0" +export TF_VAR_workspace="tfstate" + +# Azure settings +export ARM_USE_AZUREAD="true" +export ARM_STORAGE_USE_AZUREAD="true" + +# Logging +export TF_LOG="INFO" +export debug_mode="true" +``` + +#### Configuration File + +Create `~/.rover/config`: + +```bash +# Default configuration +TF_VAR_environment=dev +TF_VAR_level=level0 +TF_VAR_workspace=tfstate +ARM_USE_AZUREAD=true +debug_mode=false +``` + +### Storage Configuration + +#### Terraform State Storage + +```bash +# Create storage account for Terraform state +az group create --name rg-tfstate --location eastus + +az storage account create \ + --name stterraformstate$(date +%s) \ + --resource-group rg-tfstate \ + --location eastus \ + --sku Standard_LRS \ + --encryption-services blob + +# Get storage account key +STORAGE_KEY=$(az storage account keys list \ + --resource-group rg-tfstate \ + --account-name $STORAGE_ACCOUNT \ + --query '[0].value' -o tsv) + +# Configure backend +export ARM_ACCESS_KEY=$STORAGE_KEY +``` + +## Verification + +### Basic Verification + +```bash +# Check rover version +rover --version + +# Verify Azure connectivity +az account show + +# Test Terraform +terraform version + +# Check available tools +which git kubectl helm jq +``` + +### Comprehensive Test + +```bash +# Run rover with dry-run +rover -lz /tf/rover/examples/basic -a plan + +# Test CI tools +rover ci --help + +# Verify state management +rover workspace list +``` + +### Smoke Test Script + +Create `verify-installation.sh`: + +```bash +#!/bin/bash +set -e + +echo "๐Ÿงช Rover Installation Verification" +echo "==================================" + +# Check rover +echo "โœ… Checking rover..." +rover --version + +# Check Azure CLI +echo "โœ… Checking Azure CLI..." +az version + +# Check Terraform +echo "โœ… Checking Terraform..." +terraform version + +# Check authentication +echo "โœ… Checking Azure authentication..." +az account show --query "name" -o tsv + +# Check tools +echo "โœ… Checking additional tools..." +git --version +kubectl version --client +helm version + +echo "" +echo "๐ŸŽ‰ All checks passed! Rover is ready to use." +``` + +Run verification: + +```bash +chmod +x verify-installation.sh +./verify-installation.sh +``` + +## Troubleshooting + +### Common Issues + +#### Docker Permission Denied + +```bash +# Add user to docker group +sudo usermod -aG docker $USER + +# Log out and back in, or run: +newgrp docker +``` + +#### Container Image Pull Failures + +```bash +# Check network connectivity +docker run --rm busybox ping -c 3 google.com + +# Try different registry +docker pull ghcr.io/aztfmod/rover:latest + +# Clear Docker cache +docker system prune -a +``` + +#### Azure Authentication Issues + +```bash +# Clear Azure CLI cache +az account clear + +# Re-authenticate +az login + +# Verify subscription access +az account list --query "[].{Name:name,SubscriptionId:id,State:state}" --output table +``` + +#### Storage Account Access + +```bash +# Check storage account permissions +az role assignment list \ + --assignee $(az account show --query user.name -o tsv) \ + --scope /subscriptions/{subscription-id}/resourceGroups/{rg-name} + +# Test storage connectivity +az storage container list --account-name {storage-account} +``` + +### Performance Issues + +#### Slow Container Startup + +```bash +# Use specific version instead of latest +docker pull aztfmod/rover:1.3.0 + +# Pre-pull images +docker pull aztfmod/rover:latest & +``` + +#### Network Timeouts + +```bash +# Increase timeout settings +export ARM_CLIENT_TIMEOUT=300 + +# Use different Azure region +export ARM_ENVIRONMENT="AzurePublic" +``` + +### Getting Help + +- **Documentation**: Check [docs/TROUBLESHOOTING.md](TROUBLESHOOTING.md) +- **Issues**: Create GitHub issue with installation details +- **Community**: Join [Gitter chat](https://gitter.im/aztfmod/community) +- **Support**: Email tf-landingzones@microsoft.com + +### Diagnostic Information + +When reporting issues, include: + +```bash +# System information +uname -a +docker --version +docker info + +# Rover information +rover --version +az --version +terraform version + +# Network test +curl -I https://registry-1.docker.io/v2/ + +# Azure connectivity +az account show +``` + +--- + +Next Steps: [Usage Guide](USAGE.md) | [Architecture Overview](ARCHITECTURE.md) \ No newline at end of file diff --git a/docs/SECURITY.md b/docs/SECURITY.md new file mode 100644 index 000000000..38f2da0de --- /dev/null +++ b/docs/SECURITY.md @@ -0,0 +1,787 @@ +# Security Guide + +This document outlines security best practices, configurations, and considerations when using the Azure Terraform SRE Rover. + +## ๐Ÿ“‹ Table of Contents + +- [Security Overview](#security-overview) +- [Authentication & Authorization](#authentication--authorization) +- [Container Security](#container-security) +- [Network Security](#network-security) +- [State File Security](#state-file-security) +- [Secret Management](#secret-management) +- [Security Scanning](#security-scanning) +- [Compliance & Governance](#compliance--governance) +- [Incident Response](#incident-response) +- [Security Checklist](#security-checklist) + +## Security Overview + +Rover implements defense-in-depth security principles across multiple layers: + +- **๐Ÿ›ก๏ธ Container Security**: Hardened container images with minimal attack surface +- **๐Ÿ” Identity & Access**: Azure AD integration with RBAC enforcement +- **๐Ÿ”’ Encryption**: End-to-end encryption for data in transit and at rest +- **๐Ÿ” Monitoring**: Comprehensive logging and security event monitoring +- **๐Ÿ“‹ Compliance**: Built-in compliance scanning and policy enforcement +- **๐Ÿšจ Detection**: Real-time threat detection and automated response + +## Authentication & Authorization + +### Azure Active Directory Integration + +#### Service Principal Authentication (Recommended for CI/CD) + +```bash +# Create dedicated service principal +az ad sp create-for-rbac \ + --name "rover-production-sp" \ + --role "Contributor" \ + --scopes "/subscriptions/{subscription-id}" \ + --sdk-auth + +# Set environment variables (in CI/CD) +export ARM_CLIENT_ID="service-principal-client-id" +export ARM_CLIENT_SECRET="service-principal-secret" +export ARM_SUBSCRIPTION_ID="target-subscription-id" +export ARM_TENANT_ID="azure-tenant-id" +``` + +**Security Best Practices:** +- Use separate service principals per environment +- Apply principle of least privilege +- Rotate secrets regularly (90 days maximum) +- Monitor service principal usage + +#### Managed Identity Authentication (Recommended for Azure-hosted) + +```bash +# Enable system-assigned managed identity +az vm identity assign --name myVM --resource-group myRG + +# Configure rover to use managed identity +export ARM_USE_MSI=true +export ARM_SUBSCRIPTION_ID="target-subscription-id" +``` + +**Benefits:** +- No credential management required +- Automatic credential rotation +- Azure-native authentication +- Reduced secret sprawl + +#### Interactive Authentication (Development Only) + +```bash +# Device code flow for development +az login --use-device-code + +# Verify authentication +az account show +``` + +**Security Considerations:** +- Only use for local development +- Never use in automated pipelines +- Ensure proper session management + +### Role-Based Access Control (RBAC) + +#### Minimum Required Permissions + +```json +{ + "Name": "Rover Terraform Operator", + "Description": "Minimum permissions for Rover operations", + "Actions": [ + "*/read", + "Microsoft.Resources/deployments/*", + "Microsoft.Resources/subscriptions/resourceGroups/*", + "Microsoft.Storage/storageAccounts/*", + "Microsoft.KeyVault/vaults/*", + "Microsoft.Network/*", + "Microsoft.Compute/*" + ], + "NotActions": [ + "Microsoft.Authorization/*/Delete", + "Microsoft.Authorization/*/Write", + "Microsoft.Authorization/elevateAccess/Action" + ], + "DataActions": [], + "NotDataActions": [], + "AssignableScopes": [ + "/subscriptions/{subscription-id}" + ] +} +``` + +#### Environment-Specific Roles + +```bash +# Development environment - broader permissions +az role assignment create \ + --assignee {service-principal-id} \ + --role "Contributor" \ + --scope "/subscriptions/{dev-subscription-id}" + +# Production environment - restricted permissions +az role assignment create \ + --assignee {service-principal-id} \ + --role "Rover Production Operator" \ + --scope "/subscriptions/{prod-subscription-id}" +``` + +## Container Security + +### Image Security + +#### Vulnerability Scanning + +```bash +# Scan rover image for vulnerabilities +docker run --rm \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v $(pwd):/tmp \ + aquasec/trivy image aztfmod/rover:latest + +# Scan with specific severity +trivy image --severity HIGH,CRITICAL aztfmod/rover:latest +``` + +#### Image Signing and Verification + +```bash +# Verify image signatures (when available) +docker trust inspect aztfmod/rover:latest + +# Use specific digest for reproducible builds +docker pull aztfmod/rover@sha256:abc123... +``` + +### Runtime Security + +#### Non-Root Execution + +```dockerfile +# Rover runs as non-root user by default +USER vscode +WORKDIR /home/vscode +``` + +#### Resource Limits + +```bash +# Run with resource constraints +docker run -it --rm \ + --memory=2g \ + --cpus=1.5 \ + --ulimit nofile=1024:1024 \ + aztfmod/rover:latest +``` + +#### Read-Only Root Filesystem + +```bash +# Run with read-only root filesystem +docker run -it --rm \ + --read-only \ + --tmpfs /tmp \ + --tmpfs /var/tmp \ + -v rover-home:/home/vscode \ + aztfmod/rover:latest +``` + +### Secrets Management in Containers + +```bash +# Use Docker secrets (Swarm mode) +echo "secret-value" | docker secret create azure-client-secret - + +# Mount secret in container +docker service create \ + --secret azure-client-secret \ + --env ARM_CLIENT_SECRET_FILE=/run/secrets/azure-client-secret \ + aztfmod/rover:latest + +# Use external secret management +docker run -it --rm \ + -v vault-secrets:/vault/secrets:ro \ + aztfmod/rover:latest +``` + +## Network Security + +### Network Isolation + +#### Private Container Networks + +```bash +# Create isolated network +docker network create \ + --driver bridge \ + --subnet=172.20.0.0/16 \ + --ip-range=172.20.240.0/20 \ + rover-network + +# Run rover in isolated network +docker run -it --rm \ + --network rover-network \ + aztfmod/rover:latest +``` + +#### Azure Private Endpoints + +```bash +# Configure private endpoint access +export ARM_USE_PRIVATE_LINK=true +export ARM_PRIVATE_ENDPOINT_SUBNET_ID="/subscriptions/.../subnets/private-endpoints" +``` + +### Firewall Configuration + +#### Outbound Rules + +```bash +# Required Azure endpoints +https://management.azure.com # Azure Resource Manager +https://login.microsoftonline.com # Azure AD authentication +https://graph.microsoft.com # Microsoft Graph +https://storage.azure.com # Azure Storage +https://vault.azure.net # Azure Key Vault + +# Terraform endpoints +https://registry.terraform.io # Terraform Registry +https://releases.hashicorp.com # HashiCorp releases +https://checkpoint-api.hashicorp.com # Terraform checkpoint + +# Container registry +https://registry-1.docker.io # Docker Hub +https://auth.docker.io # Docker authentication +``` + +#### Network Security Groups + +```hcl +# NSG rules for rover agents +resource "azurerm_network_security_rule" "rover_outbound" { + name = "RoverOutbound" + priority = 100 + direction = "Outbound" + access = "Allow" + protocol = "Tcp" + source_port_range = "*" + destination_port_ranges = ["443", "80"] + source_address_prefix = "*" + destination_address_prefixes = [ + "AzureCloud", + "Internet" + ] + resource_group_name = azurerm_resource_group.rover.name + network_security_group_name = azurerm_network_security_group.rover.name +} +``` + +## State File Security + +### Encryption at Rest + +```hcl +# Terraform backend with encryption +terraform { + backend "azurerm" { + resource_group_name = "rg-terraform-state" + storage_account_name = "stterraformstate" + container_name = "tfstate" + key = "prod.terraform.tfstate" + + # Security configurations + use_azuread_auth = true + use_msi = true + snapshot = true + } +} +``` + +#### Storage Account Security + +```bash +# Create secure storage account +az storage account create \ + --name stterraformstate \ + --resource-group rg-terraform-state \ + --location eastus \ + --sku Standard_LRS \ + --kind StorageV2 \ + --https-only true \ + --min-tls-version TLS1_2 \ + --allow-blob-public-access false \ + --allow-shared-key-access false +``` + +### Access Control + +#### Storage Account RBAC + +```bash +# Grant rover service principal storage access +az role assignment create \ + --assignee {rover-sp-id} \ + --role "Storage Blob Data Contributor" \ + --scope "/subscriptions/{sub-id}/resourceGroups/rg-terraform-state/providers/Microsoft.Storage/storageAccounts/stterraformstate" +``` + +#### Container-Level Permissions + +```bash +# Create container with specific permissions +az storage container create \ + --name tfstate \ + --account-name stterraformstate \ + --public-access off \ + --metadata environment=production team=platform +``` + +### State File Encryption + +#### Client-Side Encryption + +```bash +# Enable client-side encryption +export TF_STATE_ENCRYPTION_KEY="base64-encoded-key" + +# Use Azure Key Vault for key management +export ARM_CLIENT_CERTIFICATE_PATH="/path/to/certificate.pfx" +export ARM_CLIENT_CERTIFICATE_PASSWORD="certificate-password" +``` + +## Secret Management + +### Azure Key Vault Integration + +#### Store Terraform Variables + +```hcl +# Retrieve secrets from Key Vault +data "azurerm_key_vault" "main" { + name = "kv-terraform-secrets" + resource_group_name = "rg-security" +} + +data "azurerm_key_vault_secret" "database_password" { + name = "database-password" + key_vault_id = data.azurerm_key_vault.main.id +} + +# Use in resources +resource "azurerm_mssql_server" "main" { + administrator_login_password = data.azurerm_key_vault_secret.database_password.value +} +``` + +#### Runtime Secret Injection + +```bash +# Retrieve secrets at runtime +export DATABASE_PASSWORD=$(az keyvault secret show \ + --vault-name kv-terraform-secrets \ + --name database-password \ + --query value -o tsv) + +# Use with rover +rover -lz ./landingzone -a apply \ + -var "database_password=${DATABASE_PASSWORD}" +``` + +### Secret Scanning + +#### Pre-commit Hooks + +```yaml +# .pre-commit-config.yaml +repos: + - repo: https://github.com/Yelp/detect-secrets + rev: v1.4.0 + hooks: + - id: detect-secrets + args: ['--baseline', '.secrets.baseline'] + exclude: package.lock.json +``` + +#### CI/CD Secret Detection + +```bash +# GitLeaks scanning +docker run --rm -v $(pwd):/code \ + zricethezav/gitleaks:latest \ + detect --source /code --verbose +``` + +## Security Scanning + +### Static Analysis + +#### Terraform Security Scanning + +```bash +# TFLint configuration +cat > .tflint.hcl << EOF +plugin "azurerm" { + enabled = true + version = "0.20.0" + source = "github.com/terraform-linters/tflint-ruleset-azurerm" +} + +rule "terraform_unused_declarations" { + enabled = true +} + +rule "terraform_typed_variables" { + enabled = true +} + +rule "azurerm_*" { + enabled = true +} +EOF + +# Run TFLint +tflint --config .tflint.hcl +``` + +#### Policy as Code + +```bash +# Terrascan scanning +terrascan scan -t terraform \ + --policy-type azure \ + --severity high \ + --output json + +# Custom policy checks +checkov -f main.tf \ + --framework terraform \ + --check CKV_AZURE_* +``` + +### Dynamic Analysis + +#### Runtime Security Monitoring + +```bash +# Container runtime security +docker run -d \ + --name rover-monitor \ + --pid host \ + --security-opt apparmor:unconfined \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v /sys/kernel/debug:/sys/kernel/debug \ + falcosecurity/falco +``` + +### Vulnerability Management + +#### Continuous Scanning + +```yaml +# GitHub Actions security workflow +name: Security Scan +on: + schedule: + - cron: '0 2 * * *' # Daily at 2 AM + +jobs: + security-scan: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v3 + + - name: Run Trivy vulnerability scanner + uses: aquasecurity/trivy-action@master + with: + image-ref: 'aztfmod/rover:latest' + format: 'sarif' + output: 'trivy-results.sarif' + + - name: Upload Trivy scan results + uses: github/codeql-action/upload-sarif@v2 + with: + sarif_file: 'trivy-results.sarif' +``` + +## Compliance & Governance + +### Policy Enforcement + +#### Azure Policy Integration + +```hcl +# Azure Policy for rover compliance +resource "azurerm_policy_definition" "rover_compliance" { + name = "rover-security-compliance" + policy_type = "Custom" + mode = "All" + display_name = "Rover Security Compliance" + + policy_rule = < /etc/docker/daemon.json << EOF +{ + "log-driver": "json-file", + "log-opts": { + "max-size": "10m", + "max-file": "3" + }, + "audit-logs": { + "level": "info", + "path": "/var/log/docker-audit.log" + } +} +EOF +``` + +## Incident Response + +### Security Incident Procedures + +#### Detection and Alerting + +```bash +# Set up security alerts +az monitor metrics alert create \ + --name "Rover-Unauthorized-Access" \ + --resource-group rg-monitoring \ + --scopes /subscriptions/{subscription-id} \ + --condition "count 'location' eq 'Global' FailedCount gt 5" \ + --description "Multiple failed authentication attempts detected" +``` + +#### Incident Response Playbook + +1. **Detection** + - Monitor Azure Activity Logs + - Review container audit logs + - Check security scan results + +2. **Containment** + ```bash + # Immediately revoke compromised credentials + az ad sp credential reset --name rover-production-sp + + # Stop running rover containers + docker stop $(docker ps -q --filter ancestor=aztfmod/rover) + + # Block suspicious IP addresses + az network nsg rule create \ + --resource-group rg-security \ + --nsg-name nsg-rover \ + --name BlockSuspiciousIP \ + --priority 100 \ + --source-address-prefixes {suspicious-ip} \ + --access Deny + ``` + +3. **Investigation** + - Analyze audit logs + - Review Terraform state changes + - Check for unauthorized resource modifications + +4. **Recovery** + - Restore from known good state + - Update security configurations + - Implement additional controls + +### Backup and Recovery + +#### Terraform State Backup + +```bash +# Automated state backup +backup_terraform_state() { + local state_file=$1 + local backup_location=$2 + local timestamp=$(date +%Y%m%d-%H%M%S) + + # Create backup + az storage blob copy start \ + --source-container tfstate \ + --source-blob "$state_file" \ + --destination-container tfstate-backup \ + --destination-blob "${state_file}-${timestamp}" \ + --account-name stterraformstate + + echo "State backup created: ${state_file}-${timestamp}" +} +``` + +#### Disaster Recovery + +```hcl +# Cross-region state replication +resource "azurerm_storage_account" "state_backup" { + name = "stterraformstatedr" + resource_group_name = azurerm_resource_group.dr.name + location = "West US 2" + account_tier = "Standard" + account_replication_type = "GRS" + + blob_properties { + versioning_enabled = true + + delete_retention_policy { + days = 30 + } + } +} +``` + +## Security Checklist + +### Pre-Deployment Security Review + +- [ ] **Authentication** + - [ ] Service principal permissions reviewed + - [ ] RBAC assignments follow least privilege + - [ ] Managed identity enabled where possible + +- [ ] **Container Security** + - [ ] Using latest rover image + - [ ] Vulnerability scan passed + - [ ] Running as non-root user + - [ ] Resource limits configured + +- [ ] **Network Security** + - [ ] Network isolation configured + - [ ] Firewall rules reviewed + - [ ] Private endpoints enabled + +- [ ] **State Management** + - [ ] Encryption at rest enabled + - [ ] Access controls configured + - [ ] Backup strategy implemented + +- [ ] **Secret Management** + - [ ] No hardcoded secrets + - [ ] Key Vault integration configured + - [ ] Secret rotation schedule defined + +### Runtime Security Monitoring + +- [ ] **Logging** + - [ ] Azure Activity Logs enabled + - [ ] Container audit logging configured + - [ ] Security alerts configured + +- [ ] **Monitoring** + - [ ] Failed authentication alerts + - [ ] Unusual resource access patterns + - [ ] Container runtime anomalies + +- [ ] **Compliance** + - [ ] Policy compliance checked + - [ ] Security scan results reviewed + - [ ] Audit reports generated + +### Post-Deployment Security Validation + +- [ ] **Access Validation** + - [ ] Authentication working correctly + - [ ] Permissions verified + - [ ] Unauthorized access blocked + +- [ ] **Configuration Review** + - [ ] Security settings applied + - [ ] Encryption verified + - [ ] Monitoring active + +- [ ] **Documentation** + - [ ] Security configuration documented + - [ ] Incident response procedures updated + - [ ] Security contacts verified + +--- + +For security issues or vulnerabilities, please email: security@aztfmod.com + +Next: [Troubleshooting Guide](TROUBLESHOOTING.md) | [Architecture Overview](ARCHITECTURE.md) \ No newline at end of file diff --git a/docs/TROUBLESHOOTING.md b/docs/TROUBLESHOOTING.md new file mode 100644 index 000000000..32bb634d7 --- /dev/null +++ b/docs/TROUBLESHOOTING.md @@ -0,0 +1,715 @@ +# Troubleshooting Guide + +This guide helps you diagnose and resolve common issues when using the Azure Terraform SRE Rover. + +## ๐Ÿ“‹ Table of Contents + +- [Quick Diagnostics](#quick-diagnostics) +- [Authentication Issues](#authentication-issues) +- [Container Issues](#container-issues) +- [Terraform Issues](#terraform-issues) +- [Azure Connectivity Issues](#azure-connectivity-issues) +- [State Management Issues](#state-management-issues) +- [Performance Issues](#performance-issues) +- [CI/CD Issues](#ci-cd-issues) +- [Getting Help](#getting-help) + +## Quick Diagnostics + +### Environment Check Script + +Run this comprehensive diagnostic script to identify common issues: + +```bash +#!/bin/bash +# rover-diagnostics.sh + +echo "๐Ÿ” Rover Diagnostics Report" +echo "==========================" +echo "Timestamp: $(date)" +echo "Host: $(hostname)" +echo "User: $(whoami)" +echo "" + +# Check Docker +echo "๐Ÿ“ฆ Docker Status:" +if command -v docker &> /dev/null; then + echo "โœ… Docker installed: $(docker --version)" + if docker info &> /dev/null; then + echo "โœ… Docker daemon running" + echo " Images: $(docker images --format "table {{.Repository}}:{{.Tag}}" | grep rover | head -3)" + else + echo "โŒ Docker daemon not accessible" + echo " Try: sudo systemctl start docker" + fi +else + echo "โŒ Docker not installed" +fi +echo "" + +# Check Azure CLI +echo "โ˜๏ธ Azure CLI Status:" +if command -v az &> /dev/null; then + echo "โœ… Azure CLI installed: $(az --version | head -1)" + if az account show &> /dev/null; then + echo "โœ… Authenticated to Azure" + echo " Subscription: $(az account show --query name -o tsv)" + echo " Tenant: $(az account show --query tenantId -o tsv)" + else + echo "โŒ Not authenticated to Azure" + echo " Try: az login" + fi +else + echo "โŒ Azure CLI not available" +fi +echo "" + +# Check Rover +echo "๐Ÿš€ Rover Status:" +if docker images | grep -q rover; then + echo "โœ… Rover images available:" + docker images --format "table {{.Repository}}:{{.Tag}}\t{{.Size}}\t{{.CreatedAt}}" | grep rover + + # Test rover execution + if docker run --rm aztfmod/rover:latest rover --version &> /dev/null; then + echo "โœ… Rover executable working" + else + echo "โŒ Rover execution failed" + fi +else + echo "โŒ No rover images found" + echo " Try: docker pull aztfmod/rover:latest" +fi +echo "" + +# Check network connectivity +echo "๐ŸŒ Network Connectivity:" +endpoints=( + "management.azure.com" + "login.microsoftonline.com" + "registry.terraform.io" + "registry-1.docker.io" +) + +for endpoint in "${endpoints[@]}"; do + if curl -s --max-time 5 "https://$endpoint" &> /dev/null; then + echo "โœ… $endpoint reachable" + else + echo "โŒ $endpoint unreachable" + fi +done +echo "" + +# Check environment variables +echo "๐Ÿ”ง Environment Variables:" +env_vars=( + "ARM_SUBSCRIPTION_ID" + "ARM_TENANT_ID" + "ARM_CLIENT_ID" + "TF_VAR_environment" +) + +for var in "${env_vars[@]}"; do + if [ -n "${!var}" ]; then + echo "โœ… $var set (length: ${#!var})" + else + echo "โš ๏ธ $var not set" + fi +done + +echo "" +echo "๐Ÿ“‹ Summary:" +echo "Run this script regularly to catch issues early." +echo "For help: https://github.com/aztfmod/rover/blob/main/docs/TROUBLESHOOTING.md" +``` + +Save and run: `chmod +x rover-diagnostics.sh && ./rover-diagnostics.sh` + +## Authentication Issues + +### Issue: "Error: building AzureRM Client: obtain subscription() from Azure CLI" + +**Symptoms:** +``` +Error: building AzureRM Client: obtain subscription() from Azure CLI: parsing json result from the Azure CLI +``` + +**Solutions:** + +1. **Re-authenticate to Azure:** + ```bash + az logout + az login + az account set --subscription "your-subscription-id" + ``` + +2. **Check subscription access:** + ```bash + az account list --output table + az account show + ``` + +3. **For service principal authentication:** + ```bash + # Verify service principal credentials + az login --service-principal \ + --username $ARM_CLIENT_ID \ + --password $ARM_CLIENT_SECRET \ + --tenant $ARM_TENANT_ID + + # Test access + az account show + ``` + +### Issue: "Error: Insufficient privileges to complete the operation" + +**Cause:** Service principal lacks required permissions. + +**Solutions:** + +1. **Check current permissions:** + ```bash + az role assignment list --assignee $ARM_CLIENT_ID --output table + ``` + +2. **Grant required permissions:** + ```bash + # Minimum permissions for rover + az role assignment create \ + --assignee $ARM_CLIENT_ID \ + --role "Contributor" \ + --scope "/subscriptions/$ARM_SUBSCRIPTION_ID" + ``` + +3. **For specific resource groups:** + ```bash + az role assignment create \ + --assignee $ARM_CLIENT_ID \ + --role "Contributor" \ + --scope "/subscriptions/$ARM_SUBSCRIPTION_ID/resourceGroups/my-rg" + ``` + +### Issue: "Error: Token has expired" + +**Solutions:** + +1. **Refresh authentication:** + ```bash + az account get-access-token --query accessToken --output tsv + az login --use-device-code # If interactive login needed + ``` + +2. **For long-running processes:** + ```bash + # Use managed identity + export ARM_USE_MSI=true + export ARM_SUBSCRIPTION_ID="your-subscription-id" + ``` + +## Container Issues + +### Issue: "Docker: permission denied" + +**Symptoms:** +``` +docker: Got permission denied while trying to connect to the Docker daemon socket +``` + +**Solutions:** + +1. **Add user to docker group:** + ```bash + sudo usermod -aG docker $USER + newgrp docker # Or log out and back in + ``` + +2. **Start Docker daemon:** + ```bash + sudo systemctl start docker + sudo systemctl enable docker + ``` + +3. **For WSL2 users:** + ```bash + # Ensure Docker Desktop is running on Windows + # Check WSL integration in Docker Desktop settings + ``` + +### Issue: "Image pull failed" + +**Symptoms:** +``` +Error response from daemon: pull access denied for aztfmod/rover +``` + +**Solutions:** + +1. **Check network connectivity:** + ```bash + curl -I https://registry-1.docker.io/v2/ + ``` + +2. **Try alternative registry:** + ```bash + docker pull ghcr.io/aztfmod/rover:latest + ``` + +3. **Use corporate proxy:** + ```bash + # Configure Docker proxy + sudo mkdir -p /etc/systemd/system/docker.service.d + sudo tee /etc/systemd/system/docker.service.d/http-proxy.conf << EOF + [Service] + Environment="HTTP_PROXY=http://proxy.company.com:8080" + Environment="HTTPS_PROXY=http://proxy.company.com:8080" + Environment="NO_PROXY=localhost,127.0.0.1" + EOF + + sudo systemctl daemon-reload + sudo systemctl restart docker + ``` + +### Issue: "Container runs but rover command not found" + +**Symptoms:** +``` +bash: rover: command not found +``` + +**Solutions:** + +1. **Check container entrypoint:** + ```bash + docker run --rm aztfmod/rover:latest which rover + docker run --rm aztfmod/rover:latest ls -la /usr/local/bin/ + ``` + +2. **Run with correct working directory:** + ```bash + docker run -it --rm -w /tf/rover aztfmod/rover:latest bash + ``` + +3. **Source rover environment:** + ```bash + # Inside container + source /usr/local/bin/rover + ``` + +## Terraform Issues + +### Issue: "Error: Failed to load plugin schemas" + +**Symptoms:** +``` +Error: Failed to load plugin schemas +Plugin reinitialization required. Please run "terraform init" +``` + +**Solutions:** + +1. **Run terraform init:** + ```bash + rover -lz /path/to/landingzone -a init + ``` + +2. **Clear terraform cache:** + ```bash + rm -rf .terraform/ + rm .terraform.lock.hcl + rover -lz /path/to/landingzone -a init + ``` + +3. **Update provider constraints:** + ```hcl + terraform { + required_providers { + azurerm = { + source = "hashicorp/azurerm" + version = "~> 3.0" + } + } + } + ``` + +### Issue: "Error: Backend configuration changed" + +**Symptoms:** +``` +Error: Backend configuration changed +A change in the backend configuration has been detected +``` + +**Solutions:** + +1. **Reconfigure backend:** + ```bash + rover -lz /path/to/landingzone -a init -reconfigure + ``` + +2. **Migrate state:** + ```bash + terraform init -migrate-state + ``` + +3. **Check backend configuration:** + ```bash + # Verify backend.tf settings + cat backend.tf + ``` + +### Issue: "Error: Error acquiring the state lock" + +**Symptoms:** +``` +Error: Error acquiring the state lock +Lock Info: + ID: abc123... + Operation: OperationTypePlan + Who: user@example.com +``` + +**Solutions:** + +1. **Force unlock (use with caution):** + ```bash + terraform force-unlock abc123 + ``` + +2. **Wait for lock to expire:** + ```bash + # Locks typically expire after 20 minutes + sleep 1200 + ``` + +3. **Check for running processes:** + ```bash + ps aux | grep terraform + docker ps | grep rover + ``` + +## Azure Connectivity Issues + +### Issue: "Error: Error retrieving available locations" + +**Symptoms:** +``` +Error: Error retrieving available locations for resource provider +``` + +**Solutions:** + +1. **Register resource providers:** + ```bash + az provider register --namespace Microsoft.Resources + az provider register --namespace Microsoft.Storage + az provider register --namespace Microsoft.Network + ``` + +2. **Check provider status:** + ```bash + az provider list --query "[?registrationState=='NotRegistered']" --output table + ``` + +3. **Verify subscription access:** + ```bash + az account list-locations --output table + ``` + +### Issue: "Error: insufficient quota" + +**Symptoms:** +``` +Error: creating Resource Group: resources.GroupsClient#CreateOrUpdate: +Failure responding to request: StatusCode=409 -- Original Error: +autorest/azure: Service returned an error. Status=409 Code="QuotaExceeded" +``` + +**Solutions:** + +1. **Check current usage:** + ```bash + az vm list-usage --location eastus --output table + ``` + +2. **Request quota increase:** + ```bash + az support tickets create \ + --ticket-name "Quota Increase Request" \ + --description "Need increased quota for rover deployments" \ + --severity minimal \ + --support-plan-type basic + ``` + +3. **Use different regions:** + ```bash + # Check quota in other regions + az vm list-usage --location westus2 --output table + ``` + +## State Management Issues + +### Issue: "Error: Failed to get existing workspaces" + +**Symptoms:** +``` +Error: Failed to get existing workspaces: storage: service returned error: +StatusCode=403, ErrorCode=Forbidden +``` + +**Solutions:** + +1. **Check storage account permissions:** + ```bash + az role assignment list \ + --scope "/subscriptions/$ARM_SUBSCRIPTION_ID/resourceGroups/rg-terraform-state/providers/Microsoft.Storage/storageAccounts/stterraformstate" \ + --assignee $ARM_CLIENT_ID + ``` + +2. **Grant required permissions:** + ```bash + az role assignment create \ + --assignee $ARM_CLIENT_ID \ + --role "Storage Blob Data Contributor" \ + --scope "/subscriptions/$ARM_SUBSCRIPTION_ID/resourceGroups/rg-terraform-state/providers/Microsoft.Storage/storageAccounts/stterraformstate" + ``` + +3. **Enable Azure AD authentication:** + ```bash + export ARM_USE_AZUREAD=true + export ARM_STORAGE_USE_AZUREAD=true + ``` + +### Issue: "Error: container does not exist" + +**Symptoms:** +``` +Error: container "tfstate" in storage account "stterraformstate" does not exist +``` + +**Solutions:** + +1. **Create storage container:** + ```bash + az storage container create \ + --name tfstate \ + --account-name stterraformstate \ + --auth-mode key + ``` + +2. **Verify container access:** + ```bash + az storage container show \ + --name tfstate \ + --account-name stterraformstate + ``` + +## Performance Issues + +### Issue: "Terraform operations are slow" + +**Symptoms:** +- Long initialization times +- Slow plan/apply operations +- Container performance issues + +**Solutions:** + +1. **Increase container resources:** + ```bash + docker run -it --rm \ + --memory=4g \ + --cpus=2.0 \ + -v $(pwd):/tf/caf \ + aztfmod/rover:latest + ``` + +2. **Use terraform provider caching:** + ```bash + export TF_PLUGIN_CACHE_DIR="$HOME/.terraform.d/plugin-cache" + mkdir -p $TF_PLUGIN_CACHE_DIR + ``` + +3. **Optimize terraform configuration:** + ```hcl + # Parallel resource creation + terraform { + parallelism = 10 + } + ``` + +### Issue: "Large terraform state files" + +**Solutions:** + +1. **Split large configurations:** + ```bash + # Use separate state files for different layers + rover -lz level0 -a apply + rover -lz level1 -a apply + rover -lz level2 -a apply + ``` + +2. **Use terraform remote state:** + ```hcl + data "terraform_remote_state" "foundation" { + backend = "azurerm" + config = { + resource_group_name = "rg-terraform-state" + storage_account_name = "stterraformstate" + container_name = "tfstate" + key = "foundation.terraform.tfstate" + } + } + ``` + +## CI/CD Issues + +### Issue: "GitHub Actions runner out of disk space" + +**Symptoms:** +``` +Error: No space left on device +``` + +**Solutions:** + +1. **Clean up runner:** + ```bash + # In GitHub Actions workflow + - name: Free Disk Space + run: | + sudo rm -rf /opt/hostedtoolcache + sudo rm -rf /usr/share/dotnet + sudo rm -rf /opt/ghc + docker system prune -a -f + ``` + +2. **Use self-hosted runners:** + ```yaml + jobs: + deploy: + runs-on: self-hosted + container: aztfmod/rover:latest + ``` + +### Issue: "Azure DevOps agent timeout" + +**Solutions:** + +1. **Increase timeout:** + ```yaml + jobs: + - job: TerraformDeploy + timeoutInMinutes: 60 + steps: + - task: AzureCLI@2 + timeoutInMinutes: 45 + ``` + +2. **Split into multiple jobs:** + ```yaml + - job: TerraformPlan + - job: TerraformApply + dependsOn: TerraformPlan + ``` + +## Getting Help + +### Log Collection + +**Collect comprehensive logs for support:** + +```bash +#!/bin/bash +# collect-rover-logs.sh + +LOG_DIR="rover-logs-$(date +%Y%m%d-%H%M%S)" +mkdir -p "$LOG_DIR" + +echo "Collecting rover diagnostic information..." + +# System information +uname -a > "$LOG_DIR/system-info.txt" +docker --version >> "$LOG_DIR/system-info.txt" +docker info >> "$LOG_DIR/system-info.txt" 2>&1 + +# Rover information +docker run --rm aztfmod/rover:latest rover --version > "$LOG_DIR/rover-version.txt" 2>&1 + +# Azure information +az --version > "$LOG_DIR/azure-info.txt" 2>&1 +az account show >> "$LOG_DIR/azure-info.txt" 2>&1 + +# Container logs +docker logs $(docker ps -q --filter ancestor=aztfmod/rover) > "$LOG_DIR/container-logs.txt" 2>&1 + +# Terraform logs (if available) +if [ -f .terraform/terraform.tfstate ]; then + cp .terraform/terraform.tfstate "$LOG_DIR/" +fi + +# Environment variables (sanitized) +env | grep -E "^(ARM_|TF_|AZURE_)" | sed 's/=.*/=***/' > "$LOG_DIR/env-vars.txt" + +echo "Logs collected in: $LOG_DIR" +echo "Please provide this directory when seeking support." +``` + +### Community Resources + +- **๐Ÿ› Bug Reports**: [GitHub Issues](https://github.com/aztfmod/rover/issues) +- **๐Ÿ’ฌ Discussions**: [GitHub Discussions](https://github.com/aztfmod/rover/discussions) +- **๐Ÿ’ฌ Real-time Chat**: [Gitter Community](https://gitter.im/aztfmod/community) +- **๐Ÿ“ง Direct Support**: tf-landingzones@microsoft.com + +### Issue Reporting Template + +When reporting issues, please include: + +```markdown +## Issue Description +Brief description of the problem + +## Environment Information +- OS: [e.g., Ubuntu 20.04, Windows 11, macOS 12] +- Docker Version: [e.g., 20.10.17] +- Rover Version: [e.g., 1.3.0] +- Azure CLI Version: [e.g., 2.40.0] + +## Steps to Reproduce +1. Step one +2. Step two +3. Step three + +## Expected Behavior +What you expected to happen + +## Actual Behavior +What actually happened + +## Error Messages +``` +Copy and paste any error messages here +``` + +## Diagnostic Information +``` +Paste output from rover-diagnostics.sh here +``` + +## Additional Context +Any other relevant information +``` + +### Escalation Path + +1. **Community Support** (GitHub Issues/Discussions) +2. **Documentation** (Check troubleshooting guides) +3. **Direct Support** (tf-landingzones@microsoft.com) +4. **Microsoft Support** (For enterprise customers) + +--- + +**Remember**: Most issues can be resolved by checking authentication, permissions, and network connectivity. When in doubt, run the diagnostic script and check the logs! + +Next: [Performance Tuning](PERFORMANCE.md) | [FAQ](FAQ.md) \ No newline at end of file diff --git a/docs/USAGE.md b/docs/USAGE.md index ecafbbd89..124a1a525 100644 --- a/docs/USAGE.md +++ b/docs/USAGE.md @@ -1,38 +1,724 @@ -Rover CLI Commands & Flags +# Rover CLI Usage Guide + +This comprehensive guide covers all rover commands, options, and usage patterns for managing Azure Terraform deployments. + +## ๐Ÿ“‹ Table of Contents + +- [Quick Reference](#quick-reference) +- [Core Commands](#core-commands) +- [Command Line Options](#command-line-options) +- [Authentication Commands](#authentication-commands) +- [Landing Zone Management](#landing-zone-management) +- [Workspace Management](#workspace-management) +- [CI/CD Commands](#cicd-commands) +- [Configuration Management](#configuration-management) +- [Advanced Usage](#advanced-usage) +- [Examples](#examples) + +## Quick Reference + +### Basic Syntax + +```bash +rover [COMMAND] [OPTIONS] +``` + +### Most Common Commands + +```bash +# Login to Azure +rover login + +# Deploy a landing zone +rover -lz /path/to/landingzone -a apply -env production + +# Run Terraform plan +rover -lz /path/to/landingzone -a plan -env development + +# Destroy infrastructure +rover -lz /path/to/landingzone -a destroy -env development + +# Run CI/CD pipeline +rover ci -sc /path/to/symphony.yml +``` + +## Core Commands + +### Authentication Commands + +#### `login` +Start interactive Azure authentication process. + +```bash +# Interactive login +rover login + +# Login to specific tenant +rover login [tenant-id] + +# Login to specific subscription +rover login [tenant-id] [subscription-id] +``` + +**Options:** +- `tenant-id` (optional): Azure Active Directory tenant ID +- `subscription-id` (optional): Target Azure subscription ID + +**Examples:** +```bash +# Basic login +rover login + +# Login to specific tenant +rover login 12345678-1234-1234-1234-123456789012 + +# Login to specific tenant and subscription +rover login 12345678-1234-1234-1234-123456789012 87654321-4321-4321-4321-210987654321 +``` + +#### `logout` +Clear Azure authentication information. + +```bash +rover logout +``` + +### Landing Zone Commands + +#### Basic Terraform Operations + +```bash +# Terraform plan +rover -lz -a plan [options] + +# Terraform apply +rover -lz -a apply [options] + +# Terraform destroy +rover -lz -a destroy [options] + +# Terraform init +rover -lz -a init [options] + +# Terraform validate +rover -lz -a validate [options] +``` + +#### Landing Zone Discovery + +```bash +# List available landing zones +rover landingzone list + +# List landing zones with details +rover landingzone list --detailed +``` + +### Workspace Management + +#### `workspace list` +Display available Terraform workspaces. + +```bash +rover workspace list +``` + +#### `workspace create` +Create a new Terraform workspace. + +```bash +rover workspace create +``` + +#### `workspace delete` +Delete an existing Terraform workspace. + +```bash +rover workspace delete +``` + +### CI/CD Commands + +#### `ci` +Execute continuous integration workflow. + +```bash +# Run all CI tasks +rover ci -sc /path/to/symphony.yml + +# Run specific CI task +rover ci -ct -sc /path/to/symphony.yml + +# Run with base directory +rover ci -sc /path/to/symphony.yml -b /base/directory +``` + +#### `cd` +Execute continuous deployment workflow. + +```bash +rover cd -sc /path/to/symphony.yml [options] +``` + +## Command Line Options + +### Core Options + +| Option | Short | Long | Description | Example | +|--------|-------|------|-------------|---------| +| Landing Zone | `-lz` | `--landingzone` | Path to landing zone | `-lz ./landingzones/launchpad` | +| Action | `-a` | `--action` | Terraform action | `-a apply` | +| Environment | `-env` | `--environment` | Environment name | `-env production` | +| Level | `-l` | `--level` | Landing zone level | `-l level0` | +| Debug | `-d` | `--debug` | Enable debug logging | `-d` | + +### State Management Options + +| Option | Description | Example | +|--------|-------------|---------| +| `--tfstate` | State file name | `--tfstate myapp.tfstate` | +| `--launchpad` | Mark as launchpad deployment | `--launchpad` | +| `--var-folder` | Variables folder path | `--var-folder ./configs` | + +### Azure Cloud Options + +| Option | Short | Long | Description | Values | +|--------|-------|------|-------------|---------| +| Cloud | `-c` | `--cloud` | Azure cloud environment | `AzurePublic`, `AzureUSGovernment`, `AzureChinaCloud`, `AzureGermanCloud` | + +### Logging Options + +| Option | Description | Values | +|--------|-------------|---------| +| `--log-severity` | Log verbosity level | `FATAL`, `ERROR`, `WARN`, `INFO`, `DEBUG`, `VERBOSE` | + +### CI/CD Options + +| Option | Short | Long | Description | +|--------|-------|------|-------------| +| Symphony Config | `-sc` | `--symphony-config` | Path to symphony.yml | +| CI Task Name | `-ct` | `--ci-task-name` | Specific CI task to run | +| Base Directory | `-b` | `--base-dir` | Base directory for symphony.yml paths | + +## Authentication Commands + +### Interactive Authentication + +```bash +# Standard login flow +rover login + +# Login with specific tenant +rover login 12345678-1234-1234-1234-123456789012 + +# Login with device code (for CI/CD) +az login --use-device-code +``` + +### Service Principal Authentication + +```bash +# Set environment variables +export ARM_CLIENT_ID="your-client-id" +export ARM_CLIENT_SECRET="your-client-secret" +export ARM_SUBSCRIPTION_ID="your-subscription-id" +export ARM_TENANT_ID="your-tenant-id" + +# Rover will automatically use service principal +rover -lz ./landingzone -a apply +``` + +### Managed Identity Authentication + +```bash +# Enable managed identity +export ARM_USE_MSI=true +export ARM_SUBSCRIPTION_ID="your-subscription-id" + +# Use with rover +rover -lz ./landingzone -a apply +``` + +## Landing Zone Management + +### Landing Zone Structure + +``` +landingzones/ +โ”œโ”€โ”€ level0/ +โ”‚ โ”œโ”€โ”€ launchpad/ +โ”‚ โ”œโ”€โ”€ networking/ +โ”‚ โ””โ”€โ”€ identity/ +โ”œโ”€โ”€ level1/ +โ”‚ โ”œโ”€โ”€ shared-services/ +โ”‚ โ””โ”€โ”€ connectivity/ +โ”œโ”€โ”€ level2/ +โ”‚ โ”œโ”€โ”€ corp-apps/ +โ”‚ โ””โ”€โ”€ online-apps/ +โ””โ”€โ”€ level3/ + โ”œโ”€โ”€ data-platform/ + โ””โ”€โ”€ analytics/ +``` + +### Deployment Patterns + +#### 1. Foundation Deployment (Level 0) + +```bash +# Deploy launchpad (state management) +rover -lz ./landingzones/level0/launchpad \ + -a apply \ + -env production \ + -level level0 \ + --launchpad + +# Deploy platform networking +rover -lz ./landingzones/level0/networking \ + -a apply \ + -env production \ + -level level0 +``` + +#### 2. Shared Services (Level 1) + +```bash +# Deploy shared services +rover -lz ./landingzones/level1/shared-services \ + -a apply \ + -env production \ + -level level1 +``` + +#### 3. Application Landing Zones (Level 2) + +```bash +# Deploy corporate connected applications +rover -lz ./landingzones/level2/corp-apps \ + -a apply \ + -env production \ + -level level2 +``` + +#### 4. Application Resources (Level 3) + +```bash +# Deploy application-specific resources +rover -lz ./landingzones/level3/web-app \ + -a apply \ + -env production \ + -level level3 +``` + +### Variable Management + +#### Using Variable Folders + +```bash +# Structure +configs/ +โ”œโ”€โ”€ development/ +โ”‚ โ”œโ”€โ”€ level0/ +โ”‚ โ”‚ โ””โ”€โ”€ launchpad.tfvars +โ”‚ โ””โ”€โ”€ level1/ +โ”œโ”€โ”€ production/ +โ”‚ โ”œโ”€โ”€ level0/ +โ”‚ โ””โ”€โ”€ level1/ + +# Usage +rover -lz ./landingzones/level0/launchpad \ + -a apply \ + -env production \ + -level level0 \ + --var-folder ./configs/production/level0 +``` + +#### Environment Variables + +```bash +# Set Terraform variables +export TF_VAR_environment="production" +export TF_VAR_location="East US" +export TF_VAR_project_name="myproject" + +# Use with rover +rover -lz ./landingzone -a apply +``` + +## Workspace Management + +### Workspace Operations + +```bash +# List workspaces +rover workspace list + +# Create new workspace +rover workspace create feature-branch-123 + +# Select workspace (via environment variable) +export TF_VAR_workspace="feature-branch-123" +rover -lz ./landingzone -a plan + +# Delete workspace +rover workspace delete feature-branch-123 +``` + +### Workspace Patterns + +#### Environment-based Workspaces + +```bash +# Development workspace +export TF_VAR_workspace="development" +rover -lz ./landingzone -a apply -env development + +# Production workspace +export TF_VAR_workspace="production" +rover -lz ./landingzone -a apply -env production +``` + +#### Feature Branch Workspaces + +```bash +# Create workspace for feature +BRANCH_NAME=$(git branch --show-current) +rover workspace create "feature-${BRANCH_NAME}" + +# Deploy to feature workspace +export TF_VAR_workspace="feature-${BRANCH_NAME}" +rover -lz ./landingzone -a apply -env development +``` + +## CI/CD Commands + +### Symphony Configuration + +#### Basic Symphony.yml + +```yaml +# symphony.yml +version: "1.0" +environments: + development: + base_folder: "/tf/caf" + terraform_folder: "landingzones" + +tasks: + - name: "validate" + command: "terraform validate" + + - name: "tflint" + command: "tflint --config .tflint.hcl" + + - name: "terrascan" + command: "terrascan scan -t terraform" +``` + +#### Running CI Tasks + +```bash +# Run all CI tasks +rover ci -sc ./symphony.yml -b /tf/caf + +# Run specific task +rover ci -ct tflint -sc ./symphony.yml -b /tf/caf + +# Run with debug output +rover ci -sc ./symphony.yml -b /tf/caf -d +``` + +### Advanced CI/CD Patterns + +#### Multi-Environment Pipeline + +```bash +#!/bin/bash +# deploy-pipeline.sh + +environments=("development" "staging" "production") + +for env in "${environments[@]}"; do + echo "Deploying to $env..." + + # Run CI checks + rover ci -sc ./symphony.yml -b /tf/caf + + # Deploy if CI passes + if [ $? -eq 0 ]; then + rover -lz ./landingzones/level0/launchpad \ + -a apply \ + -env $env \ + -level level0 \ + --var-folder ./configs/$env + fi +done +``` + +## Configuration Management + +### Environment Configuration + +#### Directory Structure + +``` +configs/ +โ”œโ”€โ”€ global/ +โ”‚ โ”œโ”€โ”€ tags.tfvars +โ”‚ โ””โ”€โ”€ naming.tfvars +โ”œโ”€โ”€ development/ +โ”‚ โ”œโ”€โ”€ main.tfvars +โ”‚ โ””โ”€โ”€ networking.tfvars +โ”œโ”€โ”€ staging/ +โ”‚ โ”œโ”€โ”€ main.tfvars +โ”‚ โ””โ”€โ”€ networking.tfvars +โ””โ”€โ”€ production/ + โ”œโ”€โ”€ main.tfvars + โ””โ”€โ”€ networking.tfvars +``` + +#### Usage with Rover + +```bash +# Use environment-specific configuration +rover -lz ./landingzones/networking \ + -a apply \ + -env production \ + --var-folder ./configs/production + +# Combine global and environment configs +rover -lz ./landingzones/networking \ + -a apply \ + -env production \ + --var-folder ./configs/global \ + --var-folder ./configs/production +``` + +### State Configuration + +#### Backend Configuration + +```hcl +# backend.tf +terraform { + backend "azurerm" { + resource_group_name = "rg-terraform-state" + storage_account_name = "stterraformstate" + container_name = "tfstate" + key = "level0/launchpad.terraform.tfstate" + } +} +``` + +#### Custom State Names + +```bash +# Use custom state file name +rover -lz ./landingzone \ + -a apply \ + --tfstate "custom-app.tfstate" +``` + +## Advanced Usage + +### Debugging and Troubleshooting + +```bash +# Enable debug logging +rover -lz ./landingzone -a plan -d + +# Set log severity +rover -lz ./landingzone -a plan --log-severity DEBUG + +# Terraform debugging +export TF_LOG=DEBUG +rover -lz ./landingzone -a plan +``` + +### Performance Optimization + +```bash +# Increase parallelism +export TF_CLI_ARGS_plan="-parallelism=10" +export TF_CLI_ARGS_apply="-parallelism=10" + +# Use provider cache +export TF_PLUGIN_CACHE_DIR="$HOME/.terraform.d/plugin-cache" +mkdir -p $TF_PLUGIN_CACHE_DIR +``` + +### Custom Hooks + +#### Pre/Post Deployment Hooks + +```bash +# pre-deploy.sh +#!/bin/bash +echo "Running pre-deployment checks..." +az account show +terraform version + +# Use with rover +./pre-deploy.sh && \ +rover -lz ./landingzone -a apply && \ +./post-deploy.sh +``` + +## Examples + +### Basic Examples + +#### 1. Simple Landing Zone Deployment + +```bash +# Clone landing zones +git clone https://github.com/Azure/caf-terraform-landingzones.git /tf/caf + +# Deploy launchpad +rover -lz /tf/caf/landingzones/caf_launchpad \ + -a apply \ + -env demo \ + --launchpad +``` + +#### 2. Multi-Level Deployment + +```bash +# Level 0: Foundation +rover -lz /tf/caf/landingzones/caf_foundations \ + -a apply \ + -env production \ + -level level0 + +# Level 1: Shared Services +rover -lz /tf/caf/landingzones/caf_shared_services \ + -a apply \ + -env production \ + -level level1 + +# Level 2: Application Landing Zone +rover -lz /tf/caf/landingzones/caf_web_application \ + -a apply \ + -env production \ + -level level2 +``` + +### CI/CD Integration Examples + +#### GitHub Actions + +```yaml +name: Deploy Infrastructure +on: + push: + branches: [main] + +jobs: + deploy: + runs-on: ubuntu-latest + container: aztfmod/rover:latest + + steps: + - uses: actions/checkout@v3 + + - name: Azure Login + run: | + az login --service-principal \ + --username ${{ secrets.AZURE_CLIENT_ID }} \ + --password ${{ secrets.AZURE_CLIENT_SECRET }} \ + --tenant ${{ secrets.AZURE_TENANT_ID }} + + - name: Run CI + run: | + rover ci -sc ./symphony.yml -b ./ + + - name: Deploy Landing Zone + run: | + rover -lz ./landingzones/launchpad \ + -a apply \ + -env production \ + --launchpad +``` + +#### Azure DevOps + +```yaml +trigger: + branches: + include: + - main + +pool: + vmImage: 'ubuntu-latest' + +container: aztfmod/rover:latest + +steps: +- task: AzureCLI@2 + displayName: 'Deploy with Rover' + inputs: + azureSubscription: 'Production Service Connection' + scriptType: 'bash' + scriptLocation: 'inlineScript' + inlineScript: | + rover ci -sc ./symphony.yml -b ./ + rover -lz ./landingzones/launchpad -a apply -env production --launchpad +``` + +### Complex Deployment Scenarios + +#### Blue-Green Deployment + +```bash +#!/bin/bash +# blue-green-deploy.sh + +ENVIRONMENT=$1 +COLOR=$2 # blue or green + +# Deploy to staging slot +rover -lz ./landingzones/web-app \ + -a apply \ + -env ${ENVIRONMENT} \ + --tfstate "webapp-${COLOR}.tfstate" \ + --var-folder ./configs/${ENVIRONMENT} + +# Run health checks +./health-check.sh ${COLOR} + +# Switch traffic if healthy +if [ $? -eq 0 ]; then + echo "Switching traffic to ${COLOR}" + # Update DNS or load balancer configuration +fi +``` + +#### Multi-Region Deployment + +```bash +#!/bin/bash +# multi-region-deploy.sh + +REGIONS=("eastus" "westus2" "northeurope") +ENVIRONMENT=$1 + +for region in "${REGIONS[@]}"; do + echo "Deploying to $region..." + + export TF_VAR_location=$region + export TF_VAR_region_suffix=${region//-/} + + rover -lz ./landingzones/regional-app \ + -a apply \ + -env $ENVIRONMENT \ + --tfstate "app-${region}.tfstate" \ + --var-folder ./configs/$ENVIRONMENT +done +``` --- -## SYNOPSIS - -```shell -Usage: rover - commands: - login Start the interactive login process to get access to your azure subscription. Performs an az login. - logout Clear out login information related to the azure subscription. Performs an az logout. - ci Invoke the continuous integration workflow. - cd Invoke the continuous deployment workflow. - landingzone Commands for managing landing zones. - list Lists out all landing zones ( rover landingzone list) - workspace Commands for managing workspaces. - list List workspaces - create Create a new workspace - delete Delete a workspace - - switches: - -sc | --symphony-config (ci workflow) Path to a symphony.yml file. - -ct | --ci-task-name (ci workflow) CI Tool to invoke. If omitted all tools are run, if provided only that tool is run. - -b | --base-dir (ci workflow) Base directory for paths in symphony.yml. - -d | --debug Show debug (verbose) logs - | --log-severity This is the desired log degree. It can be set to FATAL,ERROR, WARN, INFO, DEBUG or VERBOSE - -lz | --landingzone Path to a landing zone - -a | --action Terraform action to perform (plan, apply , destroy) - -c | --cloud Name of the Azure Cloud to use (AzurePublic, AzureUSGovernment, AzureChinaCloud, AzureGermanCloud) or specific AzureStack name. - -env | --environment Name of the Environment to deploy. - -l | -level Level to associate landing zone to (0,1,2,3,4) - -tfstate Name of the state file to use. - -launchpad Flag that indicates that the current deployment is a launchpad. - -var-folder Path to the folder containing configurations for the lz. - -``` - -See [Continuous Integration document](CONTINOUS_INTEGRATION.md) for examples on running CI. +For detailed CI/CD configuration examples, see [Continuous Integration Guide](CONTINOUS_INTEGRATION.md). + +For troubleshooting common issues, see [Troubleshooting Guide](TROUBLESHOOTING.md). + +Next: [Architecture Overview](ARCHITECTURE.md) | [Security Guide](SECURITY.md) diff --git a/scripts/functions.sh b/scripts/functions.sh index eb5c1641e..6b276ed11 100644 --- a/scripts/functions.sh +++ b/scripts/functions.sh @@ -1,65 +1,132 @@ +# +# Rover Core Functions Library +# +# Description: +# This file contains core utility functions used throughout the rover system. +# It provides error handling, retry logic, file operations, Azure integration, +# and various helper functions for Terraform workflows. +# +# Functions included: +# - Error handling and logging +# - Retry logic with backoff +# - File and directory operations +# - Azure authentication helpers +# - Terraform state management +# - Version management +# - CI/CD integration helpers +# +# Usage: +# This file is sourced by rover.sh and should not be executed directly. +# Functions are called throughout the rover workflow. +# +# Dependencies: +# - Azure CLI (az) +# - Terraform +# - jq (for JSON processing) +# - Various Linux utilities (grep, sed, awk, etc.) +# + +# Source Terraform Cloud integration scripts for script in ${script_path}/tfcloud/*.sh; do source "$script" done +# +# Function: error +# Description: Central error handling function that provides consistent error reporting, +# logging, and cleanup before exiting the script. +# Parameters: +# $1 - Line number where error occurred +# $2 - Error message (optional) +# $3 - Exit code (optional, defaults to 1) +# Returns: Exits script with specified code +# Example: error ${LINENO} "Failed to initialize Terraform" 2 +# error() { + # If logging to file is enabled, create JUnit report and show log file location if [ "$LOG_TO_FILE" == "true" ];then local logFile=$CURRENT_LOG_FILE create_junit_report echo >&2 -e "\e[41mError: see log file $logFile\e[0m" fi - local parent_lineno="$1" - local message="$2" - local code="${3:-1}" + # Parse parameters with defaults + local parent_lineno="$1" # Line number where error occurred + local message="$2" # Error message + local code="${3:-1}" # Exit code (default: 1) local line_message="" - local source="${3:-${BASH_SOURCE[1]}}" + local source="${3:-${BASH_SOURCE[1]}}" # Source file where error occurred + + # Build line number context if available if [ "$parent_lineno" != "" ]; then line_message="on or near line ${parent_lineno}" fi + # Construct error message with color coding (red background) if [[ -n "$message" ]]; then error_message="\e[41mError $source $line_message: ${message}; exiting with status ${code}\e[0m" else error_message="\e[41mError $source $line_message; exiting with status ${code}\e[0m" fi + + # Output error message to stderr echo >&2 -e ${error_message} echo "" + # If using remote backend (Terraform Cloud), cancel any running operations if [[ "${backend_type_hybrid}" == "remote" ]]; then tfcloud_runs_cancel ${error_message} fi + + # Perform cleanup before exit clean_up_variables + # Exit with specified code exit ${code} } - # -# Execute a command and re-execute it with a backoff retry logic. This is mainly to handle throttling situations in CI +# Function: execute_with_backoff +# Description: Execute a command with exponential backoff retry logic. +# Useful for handling throttling, network issues, or transient failures. +# Parameters: +# $@ - Command and arguments to execute +# Environment Variables: +# ATTEMPTS - Maximum retry attempts (default: 5) +# TIMEOUT - Timeout between retries in seconds (default: 20) +# Returns: Exit code of the last command execution +# Example: execute_with_backoff az account show # function execute_with_backoff { - local max_attempts=${ATTEMPTS-5} - local timeout=${TIMEOUT-20} - local attempt=0 - local exitCode=0 + # Configuration with defaults + local max_attempts=${ATTEMPTS-5} # Maximum number of retry attempts + local timeout=${TIMEOUT-20} # Initial timeout in seconds + local attempt=0 # Current attempt counter + local exitCode=0 # Exit code of command execution + # Retry loop with exponential backoff while [[ $attempt < $max_attempts ]]; do + # Temporarily disable exit on error for command execution set +e - "$@" - exitCode=$? - set -e + "$@" # Execute the command with all its arguments + exitCode=$? # Capture exit code + set -e # Re-enable exit on error + # If command succeeded, break out of retry loop if [[ $exitCode == 0 ]]; then break fi + # Log retry attempt with current timeout echo "Failure! Return code ${exitCode} - Retrying in $timeout.." 1>&2 - sleep $timeout + sleep $timeout # Wait before retry + + # Increment attempt counter and double timeout for exponential backoff attempt=$((attempt + 1)) timeout=$((timeout * 2)) done + # If all retries failed, log final failure message if [[ $exitCode != 0 ]]; then echo "Hit the max retry count ($@)" 1>&2 fi @@ -67,7 +134,17 @@ function execute_with_backoff { return $exitCode } +# +# Function: parameter_value +# Description: Validates and returns a parameter value, ensuring it's not a flag (starts with -) +# Parameters: +# $1 - Parameter name (for error messages) +# $2 - Parameter value to validate +# Returns: The validated parameter value +# Example: value=$(parameter_value "environment" "$2") +# function parameter_value { + # Check if the value starts with a dash (indicating it's likely a flag, not a value) if [[ ${2} = -* ]]; then error ${LINENO} "Value not set for paramater ${1}" 1 fi @@ -75,19 +152,34 @@ function parameter_value { echo ${2} } +# +# Function: process_actions +# Description: Main action dispatcher that handles different rover commands and operations. +# Routes commands to appropriate handlers based on the caf_command variable. +# Parameters: None (uses global variables) +# Global Variables Used: +# - caf_command: The main command being executed +# - tf_command: Terraform-specific command parameters +# Returns: Exits with appropriate code after command execution +# Example: Called automatically by rover.sh based on parsed command line +# function process_actions { echo "@calling process_actions" + # Route to appropriate handler based on the main command case "${caf_command}" in bootstrap) + # Initialize rover environment and dependencies bootstrap exit 0 ;; ignite) + # Quick start workflow for new users ignite ${tf_command} exit 0 ;; init) + # Initialize Terraform working directory init exit 0 ;; diff --git a/scripts/rover.sh b/scripts/rover.sh index 638945afd..17ff0dec6 100755 --- a/scripts/rover.sh +++ b/scripts/rover.sh @@ -1,84 +1,176 @@ #!/bin/bash -# Initialize the launchpad first with rover -# deploy a landingzone with -# rover -lz [landingzone_folder_name] -a [plan | apply | destroy] [parameters] - - +# +# Azure Terraform SRE - Rover +# Main entry point and orchestration script for Terraform deployments on Azure +# +# Description: +# Rover is a comprehensive toolset for managing enterprise Terraform deployments +# on Microsoft Azure. It provides state management, CI/CD integration, and +# enterprise-grade features for Azure infrastructure deployments. +# +# Usage: +# rover [COMMAND] [OPTIONS] +# +# Common commands: +# rover login # Authenticate to Azure +# rover -lz -a plan -env # Run Terraform plan +# rover -lz -a apply -env # Deploy infrastructure +# rover -lz -a destroy -env # Destroy infrastructure +# rover ci -sc # Run CI pipeline +# +# Examples: +# # Deploy a launchpad (foundation) +# rover -lz ./landingzones/launchpad -a apply -env production --launchpad +# +# # Deploy application landing zone +# rover -lz ./landingzones/app -a apply -env production -level level2 +# +# # Run continuous integration +# rover ci -sc ./configs/symphony.yml -b ./ +# +# Author: Azure CAF Team +# Repository: https://github.com/aztfmod/rover +# Documentation: https://github.com/aztfmod/rover/docs +# + +# Get the directory path of this script for sourcing other components export script_path=$(dirname "$BASH_SOURCE") -source ${script_path}/clone.sh +# Source core functionality modules +# Order matters - dependencies must be loaded first + +# Core utility functions and helpers source ${script_path}/functions.sh + +# Get rover version for telemetry and compatibility checks export TF_VAR_rover_version=$(get_rover_version) -source ${script_path}/banner.sh -source ${script_path}/lib/bootstrap.sh -source ${script_path}/lib/init.sh -source ${script_path}/lib/logger.sh -source ${script_path}/lib/parse_parameters.sh -source ${script_path}/parse_command.sh -source ${script_path}/remote.sh -source ${script_path}/tfstate.sh -source ${script_path}/walkthrough.sh - - -# symphony -source ${script_path}/ci.sh -source ${script_path}/cd.sh -source ${script_path}/symphony_yaml.sh -source ${script_path}/test_runner.sh - -export ROVER_RUNNER=${ROVER_RUNNER:=false} - -export TF_VAR_workspace=${TF_VAR_workspace:="tfstate"} -export TF_VAR_environment=${TF_VAR_environment:="sandpit"} -export TF_VAR_level=${TF_VAR_level:="level0"} -export TF_CACHE_FOLDER=${TF_DATA_DIR:=$(echo ~)} -export ARM_SNAPSHOT=${ARM_SNAPSHOT:="true"} -export ARM_USE_AZUREAD=${ARM_USE_AZUREAD:="true"} -export ARM_STORAGE_USE_AZUREAD=${ARM_STORAGE_USE_AZUREAD:="true"} -export ARM_USE_MSAL=${ARM_USE_MSAL:="false"} -export skip_permission_check=${skip_permission_check:=false} -export symphony_run_all_tasks=true -export debug_mode=${debug_mode:="false"} -export devops=${devops:="false"} -export log_folder_path=${log_folderpath:=~/.terraform.logs} -export TF_IN_AUTOMATION="true" #Overriden in logger if log-severity is passed in. -export TF_VAR_tf_cloud_organization=${TF_CLOUD_ORGANIZATION} -export TF_VAR_tf_cloud_hostname=${TF_CLOUD_HOSTNAME:="app.terraform.io"} + +# Load foundational components +source ${script_path}/clone.sh # Repository cloning functionality +source ${script_path}/banner.sh # CLI banner and branding +source ${script_path}/lib/bootstrap.sh # Environment initialization +source ${script_path}/lib/init.sh # Terraform initialization helpers +source ${script_path}/lib/logger.sh # Logging and output management +source ${script_path}/lib/parse_parameters.sh # Parameter validation and parsing +source ${script_path}/parse_command.sh # CLI command parsing +source ${script_path}/remote.sh # Remote backend configuration +source ${script_path}/tfstate.sh # Terraform state management +source ${script_path}/walkthrough.sh # Interactive setup workflows + +# CI/CD and automation components +source ${script_path}/ci.sh # Continuous integration workflows +# CI/CD and automation components +source ${script_path}/ci.sh # Continuous integration workflows +source ${script_path}/cd.sh # Continuous deployment workflows +source ${script_path}/symphony_yaml.sh # Symphony configuration parsing +source ${script_path}/test_runner.sh # Test execution framework + +# Load Terraform Cloud integration components +for script in ${script_path}/tfcloud/*.sh; do + source "$script" +done + +# +# Default Environment Configuration +# These variables can be overridden via environment variables or command line parameters +# + +# Rover runtime configuration +export ROVER_RUNNER=${ROVER_RUNNER:=false} # Flag to indicate if running in agent mode + +# Terraform workspace and environment defaults +export TF_VAR_workspace=${TF_VAR_workspace:="tfstate"} # Default Terraform workspace name +export TF_VAR_environment=${TF_VAR_environment:="sandpit"} # Default environment (dev/staging/prod) +export TF_VAR_level=${TF_VAR_level:="level0"} # Landing zone level (0-4) + +# File system and caching configuration +export TF_CACHE_FOLDER=${TF_DATA_DIR:=$(echo ~)} # Terraform cache directory +export log_folder_path=${log_folderpath:=~/.terraform.logs} # Log file location + +# Azure Resource Manager configuration +export ARM_SNAPSHOT=${ARM_SNAPSHOT:="true"} # Enable ARM snapshots for state protection +export ARM_USE_AZUREAD=${ARM_USE_AZUREAD:="true"} # Use Azure AD authentication +export ARM_STORAGE_USE_AZUREAD=${ARM_STORAGE_USE_AZUREAD:="true"} # Use Azure AD for storage +export ARM_USE_MSAL=${ARM_USE_MSAL:="false"} # Use Microsoft Authentication Library + +# Security and validation settings +export skip_permission_check=${skip_permission_check:=false} # Skip Azure permission validation + +# CI/CD pipeline configuration +export symphony_run_all_tasks=true # Run all symphony tasks by default +export debug_mode=${debug_mode:="false"} # Debug logging mode +export devops=${devops:="false"} # DevOps integration mode +export TF_IN_AUTOMATION="true" # Indicate automated execution (overridden by logger) + +# Terraform Cloud/Enterprise configuration +export TF_VAR_tf_cloud_organization=${TF_CLOUD_ORGANIZATION} # TFC organization name +export TF_VAR_tf_cloud_hostname=${TF_CLOUD_HOSTNAME:="app.terraform.io"} # TFC hostname export REMOTE_credential_path_json=${REMOTE_credential_path_json:="$(echo ~)/.terraform.d/credentials.tfrc.json"} -export gitops_terraform_backend_type=${TF_VAR_backend_type:="azurerm"} -export gitops_agent_pool_name=${GITOPS_AGENT_POOL_NAME} -export gitops_number_runners=0 # 0 - auto-scale , or set the number of minimum runners -export backend_type_hybrid=${BACKEND_type_hybrid:=true} -export gitops_agent_pool_execution_mode=${GITOPS_AGENT_POOL_EXECUTION_MODE:="local"} -export TF_VAR_tenant_id=${ARM_TENANT_ID:=} -export TF_VAR_user_type=${TF_VAR_user_type:=ServicePrincipal} # assume Service Principal +# GitOps and agent configuration +export gitops_terraform_backend_type=${TF_VAR_backend_type:="azurerm"} # Backend type for GitOps +export gitops_agent_pool_name=${GITOPS_AGENT_POOL_NAME} # Agent pool name for GitOps +export gitops_number_runners=0 # Number of runners (0 = auto-scale) +export backend_type_hybrid=${BACKEND_type_hybrid:=true} # Enable hybrid backend support +export gitops_agent_pool_execution_mode=${GITOPS_AGENT_POOL_EXECUTION_MODE:="local"} # Execution mode + +# Azure authentication context +export TF_VAR_tenant_id=${ARM_TENANT_ID:=} # Azure tenant ID +export TF_VAR_user_type=${TF_VAR_user_type:=ServicePrincipal} # Authentication type (assume SP) + +# Clear parameter collection variable unset PARAMS +# Store current working directory for context current_path=$(pwd) +# +# Initialize Rover Environment +# + +# Create Terraform plugin cache directory mkdir -p ${TF_PLUGIN_CACHE_DIR} + +# Initialize logging subsystem __log_init__ -set_log_severity ERROR # Default Log Severity. This can be overriden via -log-severity or -d (shortcut for -log-severity DEBUG) -# Parse command line parameters +# Set default log severity (can be overridden by --log-severity or -d flags) +set_log_severity ERROR + +# +# Main Execution Flow +# + +# Parse and validate command line parameters parse_parameters "$@" +# Checkout and validate rover modules checkout_module verify_rover_version +# Set error handling and trapping +# -E: inherit ERR trap in functions, command substitutions, and subshells +# -T: inherit DEBUG and RETURN traps +# -e: exit immediately on command failure set -ETe trap 'error ${LINENO}' ERR 1 2 3 6 +# Process the terraform command string from parsed parameters tf_command=$(echo $PARAMS | sed -e 's/^[ \t]*//') +# +# Setup Working Directory Based on Command Type +# if [ "${caf_command}" == "landingzone" ]; then + # For landing zone deployments, create environment-specific directory TF_DATA_DIR=$(setup_rover_job "${TF_CACHE_FOLDER}/${TF_VAR_environment}") elif [ "${caf_command}" == "launchpad" ]; then + # For launchpad deployments, use environment subdirectory TF_DATA_DIR+="/${TF_VAR_environment}" fi +# Verify Azure authentication is valid before proceeding verify_azure_session # Check command and parameters