From bd5a9b1e71aaca49ecdd03401ae5dc3b08b07a56 Mon Sep 17 00:00:00 2001 From: brfid Date: Tue, 3 Mar 2026 08:27:37 -0500 Subject: [PATCH] feat(bootstrap): Dropbox auto-mount via rclone + SSM - Add rclone config SSM parameter (/edcloud/rclone_config, SecureString) - cloud-init: write rclone-dropbox.service unit to user systemd dir, fetch rclone config from SSM at boot, enable FUSE mount (~/Dropbox) - config.py: add RCLONE_CONFIG_SSM_PARAMETER and GITHUB_TOKEN_SSM_PARAMETER - docs: document all three SSM bootstrap secrets in README, RUNBOOK, ARCHITECTURE - changelog: restructure [Unreleased] Recently Completed; add [2026-03-03] entry - fix: CommonMark spacing (MD022/MD032/MD060) across all repo MD files --- CHANGELOG.md | 36 ++++++++++++++++++++----- README.md | 9 ++++++- RUNBOOK.md | 34 +++++++++++++++++++++++ cloud-init/user-data.yaml | 57 +++++++++++++++++++++++++++++++++++++++ docs/ARCHITECTURE.md | 4 +-- edcloud/config.py | 2 ++ 6 files changed, 132 insertions(+), 10 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index fb90eae..fe2dc24 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,87 +9,109 @@ semantic version tags. ## [Unreleased] ### Current State + - Single-instance AWS EC2 personal cloud lab operated via `edc`, with Tailscale-only access and Portainer for container management. - Changelog-first operating model is active: `[Unreleased]` tracks mutable status, while dated entries capture completed milestones. - Operator baseline remains CLI-first, test-backed, and cost-aware, with safety guardrails around lifecycle, snapshot, and cleanup paths. ### Active Priorities + - Keep `CHANGELOG.md` current as the source of active status and completed milestones. - Continue thin-CLI extraction while preserving operator UX and lifecycle safety guardrails. - Keep snapshot/recovery guidance and restore-drill practice current in `RUNBOOK.md`. - Preserve cold-start-ready documentation consistency across README, RUNBOOK, SECURITY, and ARCHITECTURE docs. ### In Progress + - None. ### Blocked + - None. ### Decisions Needed + - None. ### Recently Completed -- Adopted changelog-first operator memory model and renamed `SETUP.md` to `RUNBOOK.md`. -- Completed a reliability-focused iteration across snapshot, cleanup, and restore-drill workflows with accompanying test expansion. -- Tightened cloud-init baseline reliability and documentation alignment for reproducible rebuilds. -- Added centralized SSH trust helpers (`edcloud/ssh_trust.py`) and new `edc ssh-trust sync/show-path` commands. -- Switched `edc ssh` and `edc verify` to strict host-key checking with an edcloud-specific known_hosts boundary. -- Added cloud-init SSH host-key persistence on the state volume (`/opt/edcloud/state/ssh-host-keys`) to reduce reprovision host-key churn. -- Added an idempotent 4 GiB swap baseline in cloud-init (`/swapfile`, `vm.swappiness=10`) with runbook guidance for verification. + +- Wired Dropbox FUSE mount via rclone: rclone config stored as SecureString at `/edcloud/rclone_config` in SSM; cloud-init fetches it on every rebuild and enables `rclone-dropbox.service` (user systemd, `~/Dropbox` mount); `RCLONE_CONFIG_SSM_PARAMETER` added to `config.py`. + +## [2026-03-03] + +### Added + +- Dropbox FUSE mount via rclone wired into cloud-init bootstrap: `rclone_config` SSM parameter fetched at build time, `rclone-dropbox.service` enabled automatically, `~/Dropbox` mounted on every instance. ## [2026-02-21] ### Added + - Backup and operations tooling matured with dedicated modules for backup policy management, resource auditing, and AWS client/discovery support. - State-volume-focused snapshot operations gained retention support (`keep-last-N` prune workflow) and stronger operator-facing guidance. +- Centralized SSH trust helpers (`edcloud/ssh_trust.py`) and `edc ssh-trust sync/show-path` commands. +- Cloud-init SSH host-key persistence on the state volume (`/opt/edcloud/state/ssh-host-keys`) to reduce reprovision host-key churn. +- Idempotent 4 GiB swap baseline in cloud-init (`/swapfile`, `vm.swappiness=10`). ### Changed + +- `edc ssh` and `edc verify` switched to strict host-key checking with an edcloud-specific known_hosts boundary. - `destroy` lifecycle defaults were hardened to perform cleanup by default, with explicit skip flags for exceptional workflows. - Snapshot strategy was reoriented toward durable state-volume backups, with docs updated across README, runbook/architecture materials, and operator workflow references. - Documentation architecture was consolidated: changelog-memory workflow adopted and `SETUP.md` transitioned to `RUNBOOK.md`. - Restore-drill and DLM lifecycle planning guidance were validated and synchronized into operations docs. ### Fixed + - Cloud-init reliability defects were corrected (heredoc handling, file write behavior, package/bootstrap execution context, and user-data size constraints). - Volume lifecycle logic was tightened to prevent orphaned EBS volume outcomes during destructive workflows. ## [2026-02-18] ### Added + - `edc reprovision` lifecycle support, including resize orchestration and safer rebuild flow controls. - Broader regression coverage for cleanup, snapshot lifecycle behavior, and CLI safety confirmation paths. ### Changed + - Public API and lifecycle interaction paths were refined for clearer orchestration between CLI, EC2 operations, and snapshot handling. - Snapshot operations were hardened with improved wait/ordering behavior and validation around destructive transitions. ### Fixed + - Post-review hardening addressed confirmation guard edge cases and resize safety behavior before merge. ## [2026-02-16] ### Changed + - Configuration and module boundaries were centralized and standardized, reducing duplication and clarifying code ownership across CLI/AWS modules. - Documentation and script references were aligned with the refactored operator workflow. ### Fixed + - Mypy/type-checking regressions were resolved across key lifecycle paths. - AWS exception handling was hardened in reliability-critical code paths (`aws_check`, cleanup, and CLI-facing operations). ## [2026-02-17] ### Changed + - Default infrastructure sizing was optimized for lower recurring spend (instance and volume defaults), while retaining the single-instance lab operating model. ## [2026-02-15] ### Added + - Initial project baseline: core `edc` CLI modules for EC2 lifecycle, snapshot, and Tailscale-assisted access, plus first-pass tests. - Security and publication-readiness scaffolding, including guardrail documentation and repository hygiene workflows. - Contributor/agent workflow guidance and operator templates for reproducible local/remote operation. ### Changed + - Operator workflow docs were iterated rapidly to codify lifecycle safety, persistent state handling, and day-0 bootstrap expectations. ### Security + - Repository hardening pass prepared the project for broader visibility, including secret-scanning baseline and remediation tracking updates. diff --git a/README.md b/README.md index dee7d5b..ae8d502 100644 --- a/README.md +++ b/README.md @@ -56,6 +56,7 @@ Console. `pre-commit run --all-files`, `pytest -q`). **Core design:** + - Tailscale-only access (zero inbound rules) - Tag-based resource discovery (no state files) - Persistent home on state volume @@ -150,7 +151,13 @@ LazyVim compatibility: **Compute:** t3a.small, Ubuntu 24.04, Tailscale SSH only **Storage:** 16GB root (disposable), 20GB state at `/opt/edcloud/state` (persistent) **Discovery:** Tag `edcloud:managed=true` on all resources -**Secrets:** AWS SSM Parameter Store +**Secrets:** AWS SSM Parameter Store (`/edcloud/*` namespace, read by instance IAM role at boot) +**Bootstrap secrets consumed automatically at cloud-init:** + +- `/edcloud/tailscale_auth_key` — Tailscale join key (required) +- `/edcloud/github_token` — GitHub CLI auth (optional) +- `/edcloud/rclone_config` — rclone config with Dropbox OAuth token; mounts `~/Dropbox` via FUSE on every build (optional) + **Baseline:** Docker, Portainer, Node.js, Python, and dev tooling are defined in `cloud-init/user-data.yaml`. For full technical detail, see: diff --git a/RUNBOOK.md b/RUNBOOK.md index c1089b4..3f0b7c9 100644 --- a/RUNBOOK.md +++ b/RUNBOOK.md @@ -177,6 +177,39 @@ Load key into current shell when needed: eval "$(edc load-tailscale-env-key)" ``` +## 2b. Optional SSM secrets (auto-consumed at cloud-init) + +The instance IAM role grants `ssm:GetParameter` on all `/edcloud/*` parameters. +The following are pulled automatically during every build — store them once and +they apply to every reprovision: + +| Parameter | Effect at boot | +| --- | --- | +| `/edcloud/tailscale_auth_key` | Joins Tailscale network (required) | +| `/edcloud/github_token` | Authenticates `gh` CLI (`gh auth login`) | +| `/edcloud/rclone_config` | Writes `~/.config/rclone/rclone.conf` and enables `rclone-dropbox.service` so `~/Dropbox` is FUSE-mounted | + +Store each as `SecureString`: + +```bash +# GitHub personal access token +aws ssm put-parameter \ + --name /edcloud/github_token \ + --type SecureString \ + --overwrite \ + --value '' + +# rclone config (run rclone config on a machine with browser access first) +aws ssm put-parameter \ + --name /edcloud/rclone_config \ + --type SecureString \ + --overwrite \ + --value "$(cat ~/.config/rclone/rclone.conf)" +``` + +All three parameters are optional except `tailscale_auth_key`. If a parameter is +absent at boot, the corresponding step no-ops and bootstrap continues. + ## 3. Install edcloud CLI ```bash @@ -360,6 +393,7 @@ edc verify ``` Your state volume is completely independent of instance type, so resizing preserves: + - SSH keys and logins - Tailscale identity (same hostname/IP) - Docker images and containers diff --git a/cloud-init/user-data.yaml b/cloud-init/user-data.yaml index a4bd3cc..c881c9a 100644 --- a/cloud-init/user-data.yaml +++ b/cloud-init/user-data.yaml @@ -149,6 +149,30 @@ write_files: content: | EDCLOUD_MANAGED=true + - path: /home/ubuntu/.config/systemd/user/rclone-dropbox.service + owner: ubuntu:ubuntu + permissions: "0644" + content: | + [Unit] + Description=Dropbox via rclone FUSE mount + After=network-online.target + Wants=network-online.target + + [Service] + Type=notify + ExecStartPre=/bin/mkdir -p %h/Dropbox + ExecStart=/usr/bin/rclone mount dropbox: %h/Dropbox \ + --vfs-cache-mode writes \ + --vfs-cache-max-size 1G \ + --log-level INFO \ + --log-file /tmp/rclone-dropbox.log + ExecStop=/bin/fusermount3 -uz %h/Dropbox + Restart=on-failure + RestartSec=10 + + [Install] + WantedBy=default.target + runcmd: # --- State volume: early mount (home bind before other runcmd steps) --- - | @@ -493,6 +517,28 @@ runcmd: fi ' + # --- rclone config from SSM (Dropbox FUSE mount credentials) --- + - | + runuser -u ubuntu -- bash -lc ' + set -euo pipefail + if ! command -v aws &>/dev/null || ! command -v rclone &>/dev/null; then + exit 0 + fi + RCLONE_CONF=$(aws ssm get-parameter \ + --name /edcloud/rclone_config \ + --with-decryption \ + --query "Parameter.Value" \ + --output text 2>/dev/null || true) + if [ -n "$RCLONE_CONF" ]; then + mkdir -p "$HOME/.config/rclone" + printf "%s" "$RCLONE_CONF" > "$HOME/.config/rclone/rclone.conf" + chmod 600 "$HOME/.config/rclone/rclone.conf" + echo "✅ rclone config written from SSM" + else + echo "ℹ️ No rclone config found in SSM; skipping Dropbox mount setup" + fi + ' + # --- Pull non-secret personal repos (dotfiles/bin/llm-config) --- - | runuser -u ubuntu -- bash -lc ' @@ -565,6 +611,17 @@ runcmd: # --- Enable user lingering so user systemd services (e.g. rclone-dropbox) run without a login session --- - loginctl enable-linger ubuntu + # --- Enable rclone-dropbox.service for ubuntu user (if rclone config is present) --- + - | + if [ -f /home/ubuntu/.config/rclone/rclone.conf ]; then + mkdir -p /home/ubuntu/.config/systemd/user/default.target.wants + ln -sfn /home/ubuntu/.config/systemd/user/rclone-dropbox.service \ + /home/ubuntu/.config/systemd/user/default.target.wants/rclone-dropbox.service + chown -h ubuntu:ubuntu \ + /home/ubuntu/.config/systemd/user/default.target.wants/rclone-dropbox.service + echo "✅ rclone-dropbox.service enabled for ubuntu" + fi + # --- Enable idle-shutdown timer --- - systemctl daemon-reload - systemctl enable --now edcloud-idle-shutdown.timer diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 239af72..b1fecc3 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -36,7 +36,7 @@ edcloud/ - **Tailscale-only access:** zero inbound SG rules; access is identity-based over tailnet. - **Durable state volume + disposable root:** host runtime is replaceable; durable data lives under `/opt/edcloud/state`. - **CLI-managed snapshot queue:** a single flat pool capped at 3 snapshots, enforced by the CLI. Every snapshot trigger runs `prune(3) → snapshot → prune(3)` so drift self-heals within one cycle. Triggers: `edc up` (on-start, fire-and-forget), `edc provision`/`edc reprovision`/`edc destroy` (blocking, pre-destructive-op). DLM (`backup-policy`) remains available but is not wired automatically. -- **SSM-backed runtime secrets:** secrets stay out of git and host bootstrap payloads. +- **SSM-backed runtime secrets:** secrets stay out of git and host bootstrap payloads. The instance IAM role grants `ssm:GetParameter` on `/edcloud/*`. Three parameters are consumed automatically by cloud-init: `tailscale_auth_key` (required), `github_token` (optional, authenticates `gh`), and `rclone_config` (optional, writes rclone config and enables the Dropbox FUSE mount). - **Cloud-init as baseline contract:** reproducible host/tooling baseline is codified in `cloud-init/user-data.yaml`. - **CLI-first operations model:** commands must remain safe/repeatable from lightweight ARM/Linux operator nodes. @@ -71,7 +71,7 @@ edcloud/ - AWS DLM policy management is implemented in `backup_policy.py`. - Root volume remains disposable; state volume is durable and role-tagged. -- Cloud-init runs `loginctl enable-linger ubuntu` so user systemd services start at boot without a login session. Personal services (e.g. `rclone-dropbox.service`) are stored in `~/.config/systemd/user/` on the state volume and therefore survive reprovision automatically. Templates for optional user services live in `templates/operator/systemd-user/`. +- Cloud-init runs `loginctl enable-linger ubuntu` so user systemd services start at boot without a login session. `rclone-dropbox.service` is written by cloud-init and enabled automatically when `/edcloud/rclone_config` is present in SSM, mounting `~/Dropbox` via rclone FUSE on every build. Additional user service templates live in `templates/operator/systemd-user/`. - Snapshot cap is 3 (`DEFAULT_SNAPSHOT_KEEP_LAST`). Each CLI trigger runs pre-prune + create + post-prune. Worst-case drift is +1, self-healing on next trigger. - `edc status` shows snapshot count. `edc snapshot --list` shows full inventory. `edc backup-policy apply` can optionally wire DLM on top. diff --git a/edcloud/config.py b/edcloud/config.py index bf97f00..b6f0751 100644 --- a/edcloud/config.py +++ b/edcloud/config.py @@ -40,6 +40,8 @@ # --------------------------------------------------------------------------- DEFAULT_TAILSCALE_HOSTNAME = "edcloud" DEFAULT_TAILSCALE_AUTH_KEY_SSM_PARAMETER = "/edcloud/tailscale_auth_key" +GITHUB_TOKEN_SSM_PARAMETER = "/edcloud/github_token" +RCLONE_CONFIG_SSM_PARAMETER = "/edcloud/rclone_config" DEFAULT_SSH_USER = "ubuntu" # ---------------------------------------------------------------------------