Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
*.iso
*.qcow2
*.img
seed.iso

# Logs
*.log
Expand All @@ -25,6 +26,9 @@ Thumbs.db
# Claude Code local settings
.claude/

# MCP server config (local)
.mcp.json

# Secrets - never commit these
secrets/
*.pem
Expand Down
8 changes: 0 additions & 8 deletions .mcp.json

This file was deleted.

13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,29 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Golden image now includes full authentication (Claude OAuth, GitHub token, Codex, Gemini)
- Clones inherit all authentication - no setup required
- Health check on connect (`./agent.sh --health`)
- **Agent resilience phase** in bootstrap with crash recovery tools:
- Claude memory watchdog systemd service (warns at 8GB, kills at 13GB)
- `run-claude-limited` cgroups wrapper for hard memory limits
- `agent-session` tmux wrapper with session persistence
- Enhanced `vm-health-check` with memory trend prediction and OOM alerts
- Crash event logging to `~/.agent-session/crashes.log`
- `RESOURCES.md` - Comprehensive guide for parallel agent memory planning
- Default RAM increased to 16GB (from 8GB) for Claude CLI memory leak protection
- Default swap increased to 8GB (from 4GB)
- `--memory` and `--vcpus` flags for `setup_cloud.sh`

### Changed
- Updated README with one-command agent workflow
- Updated README with quick reference table
- Updated README with shell aliases
- Updated README with agent resilience commands
- Bootstrap now has 11 phases (added agent resilience phase)

### Fixed
- CI shellcheck warnings (SC2155, SC2088)
- Pinned GitHub Actions to stable versions (ludeeus/action-shellcheck@2.0.0, ibiqlik/action-yamllint@v3.1.1)
- Updated golden image dependencies (npm 11.8.0, corepack 0.34.6, nexus-agents latest)
- Removed `.mcp.json` from git tracking (local MCP config should not be shared)

## [1.1.0] - 2026-02-01

Expand Down
23 changes: 18 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,9 +100,15 @@ sudo shutdown -h now
moltdown/
├── README.md # This file
├── CLAUDE.md # Development guidelines
├── RESOURCES.md # Memory planning for parallel agents
├── CHANGELOG.md # Release history
├── Makefile # Common operations
├── agent.sh # One-command agent VM creation
├── setup_cloud.sh # One-command setup (cloud images, RECOMMENDED)
├── setup.sh # One-command setup (ISO installer)
├── update-golden.sh # Update golden image CLIs and auth
├── sync-ai-auth.sh # Sync AI CLI auth to VMs
├── code-connect.sh # VS Code Remote SSH connection
├── generate_cloud_seed.sh # Create seed ISO for cloud images
├── generate_nocloud_iso.sh # Create seed ISO for ISO installer
├── virt_install_agent_vm.sh # Create VM with virt-install
Expand All @@ -116,8 +122,7 @@ moltdown/
│ ├── user-data # Autoinstall config (for ISO installer)
│ └── meta-data # Cloud-init metadata
├── guest/
│ ├── bootstrap_agent_vm.sh # Run inside VM
│ └── vm-health-check.sh # Health monitoring script
│ └── bootstrap_agent_vm.sh # Run inside VM (includes health check)
├── docs/
│ └── CLOUD_IMAGES.md # Cloud image workflow docs
├── examples/
Expand Down Expand Up @@ -244,17 +249,25 @@ images stay in `/var/lib/libvirt/images/` and are excluded by `.gitignore`.

VMs are hardened for multi-day or multi-week agent sessions:

- **Swap file**: 4GB for memory pressure
- **Swap file**: 8GB for memory pressure (Claude CLI can leak to 13GB+)
- **Journal limits**: 100MB max, prevents disk fill
- **No auto-reboot**: Security updates don't restart
- **Cloud-init disabled**: Prevents reconfiguration
- **Memory watchdog**: Auto-kills runaway Claude CLI processes at 13GB threshold
- **cgroups limits**: Optional hard memory limits via `run-claude-limited`

Monitor health inside VM:
```bash
vm-health-check # Quick status
vm-health-check --watch # Live monitoring
vm-health-check # Quick status with Claude memory tracking
vm-health-check --watch # Live monitoring (30s refresh)
vm-health-check --trend # Memory trend analysis with OOM prediction
run-claude-limited # Run Claude with 12GB memory limit
run-claude-limited 8G # Run with custom limit
agent-session # Persistent tmux session with auto-reattach
```

See [RESOURCES.md](RESOURCES.md) for detailed memory planning and parallel agent deployment guidance.

## Scripts Reference

### bootstrap_agent_vm.sh
Expand Down
65 changes: 0 additions & 65 deletions guest/vm-health-check.sh

This file was deleted.