Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 29 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,87 +9,109 @@ semantic version tags.
## [Unreleased]

### Current State

- Single-instance AWS EC2 personal cloud lab operated via `edc`, with Tailscale-only access and Portainer for container management.
- Changelog-first operating model is active: `[Unreleased]` tracks mutable status, while dated entries capture completed milestones.
- Operator baseline remains CLI-first, test-backed, and cost-aware, with safety guardrails around lifecycle, snapshot, and cleanup paths.

### Active Priorities

- Keep `CHANGELOG.md` current as the source of active status and completed milestones.
- Continue thin-CLI extraction while preserving operator UX and lifecycle safety guardrails.
- Keep snapshot/recovery guidance and restore-drill practice current in `RUNBOOK.md`.
- Preserve cold-start-ready documentation consistency across README, RUNBOOK, SECURITY, and ARCHITECTURE docs.

### In Progress

- None.

### Blocked

- None.

### Decisions Needed

- None.

### Recently Completed
- Adopted changelog-first operator memory model and renamed `SETUP.md` to `RUNBOOK.md`.
- Completed a reliability-focused iteration across snapshot, cleanup, and restore-drill workflows with accompanying test expansion.
- Tightened cloud-init baseline reliability and documentation alignment for reproducible rebuilds.
- Added centralized SSH trust helpers (`edcloud/ssh_trust.py`) and new `edc ssh-trust sync/show-path` commands.
- Switched `edc ssh` and `edc verify` to strict host-key checking with an edcloud-specific known_hosts boundary.
- Added cloud-init SSH host-key persistence on the state volume (`/opt/edcloud/state/ssh-host-keys`) to reduce reprovision host-key churn.
- Added an idempotent 4 GiB swap baseline in cloud-init (`/swapfile`, `vm.swappiness=10`) with runbook guidance for verification.

- Wired Dropbox FUSE mount via rclone: rclone config stored as SecureString at `/edcloud/rclone_config` in SSM; cloud-init fetches it on every rebuild and enables `rclone-dropbox.service` (user systemd, `~/Dropbox` mount); `RCLONE_CONFIG_SSM_PARAMETER` added to `config.py`.

## [2026-03-03]

### Added

- Dropbox FUSE mount via rclone wired into cloud-init bootstrap: `rclone_config` SSM parameter fetched at build time, `rclone-dropbox.service` enabled automatically, `~/Dropbox` mounted on every instance.

## [2026-02-21]

### Added

- Backup and operations tooling matured with dedicated modules for backup policy management, resource auditing, and AWS client/discovery support.
- State-volume-focused snapshot operations gained retention support (`keep-last-N` prune workflow) and stronger operator-facing guidance.
- Centralized SSH trust helpers (`edcloud/ssh_trust.py`) and `edc ssh-trust sync/show-path` commands.
- Cloud-init SSH host-key persistence on the state volume (`/opt/edcloud/state/ssh-host-keys`) to reduce reprovision host-key churn.
- Idempotent 4 GiB swap baseline in cloud-init (`/swapfile`, `vm.swappiness=10`).

### Changed

- `edc ssh` and `edc verify` switched to strict host-key checking with an edcloud-specific known_hosts boundary.
- `destroy` lifecycle defaults were hardened to perform cleanup by default, with explicit skip flags for exceptional workflows.
- Snapshot strategy was reoriented toward durable state-volume backups, with docs updated across README, runbook/architecture materials, and operator workflow references.
- Documentation architecture was consolidated: changelog-memory workflow adopted and `SETUP.md` transitioned to `RUNBOOK.md`.
- Restore-drill and DLM lifecycle planning guidance were validated and synchronized into operations docs.

### Fixed

- Cloud-init reliability defects were corrected (heredoc handling, file write behavior, package/bootstrap execution context, and user-data size constraints).
- Volume lifecycle logic was tightened to prevent orphaned EBS volume outcomes during destructive workflows.

## [2026-02-18]

### Added

- `edc reprovision` lifecycle support, including resize orchestration and safer rebuild flow controls.
- Broader regression coverage for cleanup, snapshot lifecycle behavior, and CLI safety confirmation paths.

### Changed

- Public API and lifecycle interaction paths were refined for clearer orchestration between CLI, EC2 operations, and snapshot handling.
- Snapshot operations were hardened with improved wait/ordering behavior and validation around destructive transitions.

### Fixed

- Post-review hardening addressed confirmation guard edge cases and resize safety behavior before merge.

## [2026-02-16]

### Changed

- Configuration and module boundaries were centralized and standardized, reducing duplication and clarifying code ownership across CLI/AWS modules.
- Documentation and script references were aligned with the refactored operator workflow.

### Fixed

- Mypy/type-checking regressions were resolved across key lifecycle paths.
- AWS exception handling was hardened in reliability-critical code paths (`aws_check`, cleanup, and CLI-facing operations).

## [2026-02-17]

### Changed

- Default infrastructure sizing was optimized for lower recurring spend (instance and volume defaults), while retaining the single-instance lab operating model.

## [2026-02-15]

### Added

- Initial project baseline: core `edc` CLI modules for EC2 lifecycle, snapshot, and Tailscale-assisted access, plus first-pass tests.
- Security and publication-readiness scaffolding, including guardrail documentation and repository hygiene workflows.
- Contributor/agent workflow guidance and operator templates for reproducible local/remote operation.

### Changed

- Operator workflow docs were iterated rapidly to codify lifecycle safety, persistent state handling, and day-0 bootstrap expectations.

### Security

- Repository hardening pass prepared the project for broader visibility, including secret-scanning baseline and remediation tracking updates.
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ Console.
`pre-commit run --all-files`, `pytest -q`).

**Core design:**

- Tailscale-only access (zero inbound rules)
- Tag-based resource discovery (no state files)
- Persistent home on state volume
Expand Down Expand Up @@ -150,7 +151,13 @@ LazyVim compatibility:
**Compute:** t3a.small, Ubuntu 24.04, Tailscale SSH only
**Storage:** 16GB root (disposable), 20GB state at `/opt/edcloud/state` (persistent)
**Discovery:** Tag `edcloud:managed=true` on all resources
**Secrets:** AWS SSM Parameter Store
**Secrets:** AWS SSM Parameter Store (`/edcloud/*` namespace, read by instance IAM role at boot)
**Bootstrap secrets consumed automatically at cloud-init:**

- `/edcloud/tailscale_auth_key` — Tailscale join key (required)
- `/edcloud/github_token` — GitHub CLI auth (optional)
- `/edcloud/rclone_config` — rclone config with Dropbox OAuth token; mounts `~/Dropbox` via FUSE on every build (optional)

**Baseline:** Docker, Portainer, Node.js, Python, and dev tooling are defined in `cloud-init/user-data.yaml`.

For full technical detail, see:
Expand Down
34 changes: 34 additions & 0 deletions RUNBOOK.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,39 @@ Load key into current shell when needed:
eval "$(edc load-tailscale-env-key)"
```

## 2b. Optional SSM secrets (auto-consumed at cloud-init)

The instance IAM role grants `ssm:GetParameter` on all `/edcloud/*` parameters.
The following are pulled automatically during every build — store them once and
they apply to every reprovision:

| Parameter | Effect at boot |
| --- | --- |
| `/edcloud/tailscale_auth_key` | Joins Tailscale network (required) |
| `/edcloud/github_token` | Authenticates `gh` CLI (`gh auth login`) |
| `/edcloud/rclone_config` | Writes `~/.config/rclone/rclone.conf` and enables `rclone-dropbox.service` so `~/Dropbox` is FUSE-mounted |

Store each as `SecureString`:

```bash
# GitHub personal access token
aws ssm put-parameter \
--name /edcloud/github_token \
--type SecureString \
--overwrite \
--value '<GITHUB_TOKEN>'

# rclone config (run rclone config on a machine with browser access first)
aws ssm put-parameter \
--name /edcloud/rclone_config \
--type SecureString \
--overwrite \
--value "$(cat ~/.config/rclone/rclone.conf)"
```

All three parameters are optional except `tailscale_auth_key`. If a parameter is
absent at boot, the corresponding step no-ops and bootstrap continues.

## 3. Install edcloud CLI

```bash
Expand Down Expand Up @@ -360,6 +393,7 @@ edc verify
```

Your state volume is completely independent of instance type, so resizing preserves:

- SSH keys and logins
- Tailscale identity (same hostname/IP)
- Docker images and containers
Expand Down
57 changes: 57 additions & 0 deletions cloud-init/user-data.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,30 @@ write_files:
content: |
EDCLOUD_MANAGED=true

- path: /home/ubuntu/.config/systemd/user/rclone-dropbox.service
owner: ubuntu:ubuntu
permissions: "0644"
content: |
[Unit]
Description=Dropbox via rclone FUSE mount
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
ExecStartPre=/bin/mkdir -p %h/Dropbox
ExecStart=/usr/bin/rclone mount dropbox: %h/Dropbox \
--vfs-cache-mode writes \
--vfs-cache-max-size 1G \
--log-level INFO \
--log-file /tmp/rclone-dropbox.log
ExecStop=/bin/fusermount3 -uz %h/Dropbox
Restart=on-failure
RestartSec=10

[Install]
WantedBy=default.target

runcmd:
# --- State volume: early mount (home bind before other runcmd steps) ---
- |
Expand Down Expand Up @@ -493,6 +517,28 @@ runcmd:
fi
'

# --- rclone config from SSM (Dropbox FUSE mount credentials) ---
- |
runuser -u ubuntu -- bash -lc '
set -euo pipefail
if ! command -v aws &>/dev/null || ! command -v rclone &>/dev/null; then
exit 0
fi
RCLONE_CONF=$(aws ssm get-parameter \
--name /edcloud/rclone_config \
--with-decryption \
--query "Parameter.Value" \
--output text 2>/dev/null || true)
if [ -n "$RCLONE_CONF" ]; then
mkdir -p "$HOME/.config/rclone"
printf "%s" "$RCLONE_CONF" > "$HOME/.config/rclone/rclone.conf"
chmod 600 "$HOME/.config/rclone/rclone.conf"
echo "✅ rclone config written from SSM"
else
echo "ℹ️ No rclone config found in SSM; skipping Dropbox mount setup"
fi
'

# --- Pull non-secret personal repos (dotfiles/bin/llm-config) ---
- |
runuser -u ubuntu -- bash -lc '
Expand Down Expand Up @@ -565,6 +611,17 @@ runcmd:
# --- Enable user lingering so user systemd services (e.g. rclone-dropbox) run without a login session ---
- loginctl enable-linger ubuntu

# --- Enable rclone-dropbox.service for ubuntu user (if rclone config is present) ---
- |
if [ -f /home/ubuntu/.config/rclone/rclone.conf ]; then
mkdir -p /home/ubuntu/.config/systemd/user/default.target.wants
ln -sfn /home/ubuntu/.config/systemd/user/rclone-dropbox.service \
/home/ubuntu/.config/systemd/user/default.target.wants/rclone-dropbox.service
chown -h ubuntu:ubuntu \
/home/ubuntu/.config/systemd/user/default.target.wants/rclone-dropbox.service
echo "✅ rclone-dropbox.service enabled for ubuntu"
fi

# --- Enable idle-shutdown timer ---
- systemctl daemon-reload
- systemctl enable --now edcloud-idle-shutdown.timer
Expand Down
4 changes: 2 additions & 2 deletions docs/ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ edcloud/
- **Tailscale-only access:** zero inbound SG rules; access is identity-based over tailnet.
- **Durable state volume + disposable root:** host runtime is replaceable; durable data lives under `/opt/edcloud/state`.
- **CLI-managed snapshot queue:** a single flat pool capped at 3 snapshots, enforced by the CLI. Every snapshot trigger runs `prune(3) → snapshot → prune(3)` so drift self-heals within one cycle. Triggers: `edc up` (on-start, fire-and-forget), `edc provision`/`edc reprovision`/`edc destroy` (blocking, pre-destructive-op). DLM (`backup-policy`) remains available but is not wired automatically.
- **SSM-backed runtime secrets:** secrets stay out of git and host bootstrap payloads.
- **SSM-backed runtime secrets:** secrets stay out of git and host bootstrap payloads. The instance IAM role grants `ssm:GetParameter` on `/edcloud/*`. Three parameters are consumed automatically by cloud-init: `tailscale_auth_key` (required), `github_token` (optional, authenticates `gh`), and `rclone_config` (optional, writes rclone config and enables the Dropbox FUSE mount).
- **Cloud-init as baseline contract:** reproducible host/tooling baseline is codified in `cloud-init/user-data.yaml`.
- **CLI-first operations model:** commands must remain safe/repeatable from lightweight ARM/Linux operator nodes.

Expand Down Expand Up @@ -71,7 +71,7 @@ edcloud/

- AWS DLM policy management is implemented in `backup_policy.py`.
- Root volume remains disposable; state volume is durable and role-tagged.
- Cloud-init runs `loginctl enable-linger ubuntu` so user systemd services start at boot without a login session. Personal services (e.g. `rclone-dropbox.service`) are stored in `~/.config/systemd/user/` on the state volume and therefore survive reprovision automatically. Templates for optional user services live in `templates/operator/systemd-user/`.
- Cloud-init runs `loginctl enable-linger ubuntu` so user systemd services start at boot without a login session. `rclone-dropbox.service` is written by cloud-init and enabled automatically when `/edcloud/rclone_config` is present in SSM, mounting `~/Dropbox` via rclone FUSE on every build. Additional user service templates live in `templates/operator/systemd-user/`.
- Snapshot cap is 3 (`DEFAULT_SNAPSHOT_KEEP_LAST`). Each CLI trigger runs pre-prune + create + post-prune. Worst-case drift is +1, self-healing on next trigger.
- `edc status` shows snapshot count. `edc snapshot --list` shows full inventory. `edc backup-policy apply` can optionally wire DLM on top.

Expand Down
2 changes: 2 additions & 0 deletions edcloud/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@
# ---------------------------------------------------------------------------
DEFAULT_TAILSCALE_HOSTNAME = "edcloud"
DEFAULT_TAILSCALE_AUTH_KEY_SSM_PARAMETER = "/edcloud/tailscale_auth_key"
GITHUB_TOKEN_SSM_PARAMETER = "/edcloud/github_token"
RCLONE_CONFIG_SSM_PARAMETER = "/edcloud/rclone_config"
DEFAULT_SSH_USER = "ubuntu"

# ---------------------------------------------------------------------------
Expand Down
Loading