Skip to content

brfid/edcloud

Repository files navigation

edcloud

Single-instance AWS EC2 personal cloud lab for x86_64 Linux workloads.

Overview

A practical reliability/cost/security lab — not a production platform.

  • Scope: repeatable single-host lifecycle (provision, reprovision, verify), Tailscale-only access, snapshot/restore drill workflows.
  • Constraints: single operator, single instance, low monthly spend, no public inbound network exposure.
  • Tradeoffs: operational simplicity and clear guardrails over multi-node scale and deep automation complexity.
  • Current focus: continued thin-CLI extraction, restore-drill discipline, documentation clarity.
edc verify
edc restore-drill --attach-managed-instance
edc snapshot-cost --soft-cap-usd 2.0

Safe rebuild sequence:

edc tailscale reconcile --dry-run
edc snapshot -d pre-change
edc reprovision --confirm-instance-id <instance-id>
edc verify

Operator model

edcloud is designed to be operated from a lightweight terminal device (including small ARM/Linux nodes) using AWS + CLI tooling, not primarily from the AWS Console.

  • Primary path: edc + AWS CLI + Python + Tailscale from an operator device.
  • AWS Console path: inspection and break-glass/manual fallback.
  • Recurring lifecycle workflows (provision, verify, snapshot, reprovision, cleanup) should stay CLI-driven for repeatability and safety guardrails.

Repository workflow

  • main is protected and should be updated via pull request only.
  • Merge policy is squash-only (merge commits and rebase merges disabled).
  • Linear history is required on main.
  • Force-push and branch deletion are disabled on main.
  • For agent/operator changes: work on task branches (agent/<topic>-YYYYMMDD), clean history before push, then merge via PR.

Public collaboration expectations

Core design:

  • Tailscale-only access (zero inbound rules)
  • Tag-based resource discovery (no state files)
  • Persistent home on state volume
  • Persistent Tailscale node identity on state volume
  • Portainer for container management

Quick start

# Prerequisites: AWS CLI, Python 3.10+, Tailscale account

git clone <repo> && cd edcloud
python3 -m venv .venv && source .venv/bin/activate
pip install -e '.[dev]'

# Store Tailscale key in SSM
aws ssm put-parameter --name /edcloud/tailscale_auth_key \
  --type SecureString --value '<TAILSCALE_AUTH_KEY>'

# Provision
edc provision

ARM/Linux operator note:

  • Prefer running from the repo-local virtualenv (.venv) to keep tooling reproducible.
  • Use either source .venv/bin/activate or invoke commands directly as .venv/bin/edc ....

Commands

edc tailscale reconcile --dry-run   # Detect edcloud naming conflicts before lifecycle actions
edc provision [--cleanup]  # Create instance (requires existing state volume by default)
edc up/down                          # Start/stop instance (up also fail-fast on naming conflicts)
edc ssh [command]                    # SSH via Tailscale
edc status                   # Instance state, IPs, cost estimate
edc setup-ssm-tokens         # Store GitHub/Tailscale tokens in SSM
edc permissions show         # Show required IAM actions by command profile
edc permissions policy       # Emit operator least-privilege IAM policy JSON
edc permissions verify       # Preflight-check your current AWS principal permissions
edc sync-cline-auth          # Sync Cline OAuth secrets to headless remote host
edc sync-cline-auth --remote-diagnostics  # Print remote Cline path/version + config target
edc verify                   # Bootstrap validation
edc backup-policy status     # Show AWS DLM backup policy status
edc backup-policy apply      # Ensure DLM policy (daily 7 + weekly 4 + monthly 2)
edc backup-policy disable    # Disable managed DLM policy
edc snapshot [-d desc]       # Create snapshot (state volume only)
edc snapshot --list          # List snapshots
edc snapshot --prune [--keep N] [--apply]  # Optional manual cleanup for ad-hoc snapshots
edc snapshot-cost [--soft-cap-usd 2.0] [--fail-on-cap]  # Snapshot spend guardrail signal
edc restore-drill [--attach-managed-instance]  # Run non-destructive restore drill with temp volume
edc destroy --confirm-instance-id ID  # Terminate instance (snapshot + cleanup run by default)

Use --allow-tailscale-name-conflicts only for break-glass cases.

Auth-sync diagnostics note:

  • --remote-diagnostics is provided by edc sync-cline-auth.
  • It is not a flag on cline auth.

Safe rebuild golden path:

edc tailscale reconcile --dry-run
edc snapshot -d pre-change-rebuild
edc reprovision --confirm-instance-id <instance-id>
edc verify

See RUNBOOK.md section "Canonical safe rebuild workflow (golden path)" for details and expected outcomes.

Destroy defaults: Auto-snapshot and cleanup (Tailscale devices + orphaned volumes) both run by default. Opt-out: --skip-snapshot, --skip-cleanup.

Backup defaults: AWS DLM is the native retention engine. Recommended baseline: edc backup-policy apply (daily keep 7 + weekly keep 4 + monthly keep 2 ≈ dense recent points plus ~1 and ~2 month recovery points).

Volume safety guardrails:

  • Managed volumes are role-tagged with edcloud:volume-role (root or state).
  • Cleanup protects state and unknown-role volumes by default.
  • To allow full deletion during cleanup, use --allow-delete-state-volume.
  • Provision now requires reusing an existing managed state volume by default.
  • To allow creating a fresh state volume (break-glass/new setup), use --allow-new-state-volume.

LazyVim compatibility:

  • Cloud-init installs Neovim v0.11.3 from upstream release tarball so LazyVim's >= 0.11.2 requirement is met on new builds.

Architecture

Compute: t3a.small, Ubuntu 24.04, Tailscale SSH only Storage: 16GB root (disposable), 20GB state at /opt/edcloud/state (persistent) Discovery: Tag edcloud:managed=true on all resources Secrets: AWS SSM Parameter Store (/edcloud/* namespace, read by instance IAM role at boot) Bootstrap secrets consumed automatically at cloud-init:

  • /edcloud/tailscale_auth_key — Tailscale join key (required)
  • /edcloud/github_token — GitHub CLI auth (optional)
  • /edcloud/rclone_config — rclone config with Dropbox OAuth token; mounts ~/Dropbox via FUSE on every build (optional)

Baseline: Docker, Portainer, Node.js, Python, and dev tooling are defined in cloud-init/user-data.yaml.

Bootstrap repo sync:

  • Dotfiles are always attempted first via cloud-init using configurable inputs:
    • --dotfiles-repo / EDCLOUD_DOTFILES_REPO (auto default)
    • --dotfiles-branch / EDCLOUD_DOTFILES_BRANCH (main default)
  • --dotfiles-repo auto resolution order:
    1. https://github.com/<gh-user>/dotfiles.git when gh auth is available
    2. existing ~/src/dotfiles origin URL (if present on persisted home)
  • Additional non-secret repos still sync from GitHub user namespace when gh auth is available:
    • https://github.com/<gh-user>/bin.git~/src/bin
    • https://github.com/<gh-user>/llm-config.git~/src/llm-config
    • https://github.com/<gh-user>/oldspeak.git~/src/oldspeak

For local MCP usage on edcloud, cloud-init also installs best-effort wrappers:

  • ~/.local/bin/oldspeak-mcp-stdio (recommended for Cline/Claude Code on-host)
  • ~/.local/bin/oldspeak-mcp-http [port] (localhost HTTP transport, optional)

For full technical detail, see:

  • RUNBOOK.md for durable host baseline, rebuild workflow, backup/restore operations, and operator procedures.
  • docs/ARCHITECTURE.md for code/module boundaries and architecture decisions.

Cost

4hr/day usage: ~$2.26 compute + ~$2.88 storage + $1.00 snapshots ≈ **$6–7/month** Auto-shutdown after 30min idle.

Docs

  • Cold-start sequence (recommended): README.mdCHANGELOG.md ([Unreleased]) → SECURITY.mdRUNBOOK.mddocs/ARCHITECTURE.md.
  • RUNBOOK.md - Complete operator runbook
  • CHANGELOG.md - Project history + current mutable status ([Unreleased])
  • SECURITY.md - Threat model
  • AGENTS.md - AI agent constraints
  • docs/ARCHITECTURE.md - Code structure and architecture decisions

About

My daily driver (AWS EC2 personal cloud lab for x86_64 Linux workloads)

Resources

License

Security policy

Stars

Watchers

Forks

Contributors