feat: infrastructure — AWS EC2 deploy, Docker stack, Grafana, Tailscale#724
feat: infrastructure — AWS EC2 deploy, Docker stack, Grafana, Tailscale#724Z0mb13V1 wants to merge 1 commit intomindcraft-bots:developfrom
Conversation
- aws/: EC2 deploy/teardown scripts, S3 backup/restore, Ollama proxy setup, user-data bootstrap, env toggle for prod/dev switching - docker-compose.yml: multi-container stack (bot + viaproxy + chromadb) - docker-compose.aws.yml: EC2 production override with LiteLLM proxy + Tailscale - Dockerfile: multi-stage build, non-root node user, secrets excluded from context - Tasks.Dockerfile: isolated task runner container - prometheus-aws.yml: Prometheus scrape config for EC2 metrics - grafana-provisioning/: pre-built dashboards and alerting rules - start.ps1: cross-platform startup helper
There was a problem hiding this comment.
Pull request overview
This PR introduces an AWS EC2 deployment path and a production-oriented Docker Compose stack (Minecraft + Mindcraft agents + Discord bot + monitoring), along with supporting bootstrap/ops scripts and Grafana/Prometheus provisioning.
Changes:
- Add AWS infrastructure lifecycle scripts (setup/bootstrap/deploy/backup/restore/teardown) and an EC2-focused compose stack.
- Expand local Docker Compose into a multi-service stack (Minecraft server, Mindcraft agents, optional Discord bot/LiteLLM/ViaProxy, monitoring stubs).
- Add monitoring configuration (Prometheus scrape config + Grafana provisioning skeleton).
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 24 comments.
Show a summary per file
| File | Description |
|---|---|
start.ps1 |
Adds a Windows helper to launch compose profiles for bots. |
services/viaproxy/README.md |
Updates ViaProxy setup documentation/formatting. |
prometheus-aws.yml |
Adds Prometheus scrape configuration for AWS stack (node-exporter/cAdvisor). |
grafana-provisioning/datasources.yml |
Provisions Prometheus datasource in Grafana. |
grafana-provisioning/dashboards.yml |
Configures Grafana dashboard file provider. |
grafana-provisioning/dashboard-json/.gitkeep |
Keeps dashboards JSON directory in git. |
grafana-provisioning/alerting/rules.yml |
Adds alerting provisioning file (rule deletion stub). |
grafana-provisioning/alerting/.gitkeep |
Keeps alerting directory in git. |
docker-compose.yml |
Reworks local stack: Minecraft server + agents + optional Discord/LiteLLM/ViaProxy + GPU exporter stub. |
docker-compose.aws.yml |
Adds EC2/AWS compose stack with Minecraft, agents, Discord, ChromaDB, monitoring, backups, Tailscale. |
aws/user-data.sh |
EC2 first-boot bootstrap (Docker, AWS CLI, directories, placeholder cron tab). |
aws/setup.sh |
Provisions AWS infra (VPC/SG/S3/IAM/SSM/EC2) from local machine. |
aws/deploy.sh |
Rsync-based deployment to EC2 + SSM secret materialization + compose up. |
aws/ec2-go.sh |
One-command deploy helper (local or remote) for pull/secrets/build/restart. |
aws/ec2-deploy.sh |
EC2-internal bootstrap/update script (clone/pull, SSM secrets, compose up). |
aws/setup-ollama-proxy.sh |
Creates a systemd socat proxy to reach Ollama over Tailscale. |
aws/backup.sh |
S3 backup script for world + bot memory (cron-friendly). |
aws/restore.sh |
S3 restore script for world + bot memory. |
aws/teardown.sh |
Destroys AWS resources created by setup (optionally S3). |
aws/env-toggle.sh |
Utility to toggle between AWS and local environments (advisory for local). |
aws/s3-policy.json |
Template S3 bucket policy reference. |
Tasks.Dockerfile |
Updates benchmark/tasks image build (Node 22 + Java 21 + AWS CLI + non-root). |
Dockerfile |
Updates app image build steps (caching, tests during build, non-root runtime). |
.dockerignore |
Tightens build context to avoid baking secrets/runtime data into images. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| COPY ./server_data.zip /mindcraft/ | ||
| RUN unzip -q server_data.zip && rm server_data.zip | ||
|
|
||
| RUN npm ci --omit=dev |
There was a problem hiding this comment.
npm ci --omit=dev requires a lockfile (package-lock.json/npm-shrinkwrap.json). This repo doesn’t include a lockfile (and .gitignore excludes package-lock.json), so this step will fail. Either commit a lockfile and keep using npm ci, or switch back to npm install --omit=dev for this image.
| RUN npm ci --omit=dev | |
| RUN npm install --omit=dev |
| ) | ||
|
|
||
| Write-Host "Launching both bots..." -ForegroundColor Cyan | ||
| docker compose --profile both up -d |
There was a problem hiding this comment.
docker compose --profile both up -d will fail because this repo's compose profiles are local, monitoring, cloud, discord, litellm, viaproxy (no both). Consider either adding a both profile in docker-compose.yml or updating this script to use existing profiles / no profile (and optionally gate -d on $Detach).
| docker compose --profile both up -d | |
| if ($Detach) { | |
| docker compose up -d | |
| } else { | |
| docker compose up | |
| } |
| image: node:22-slim | ||
| container_name: discord-bot | ||
| working_dir: /app | ||
| command: sh -c "npm install --production 2>/dev/null; node discord-bot.js" |
There was a problem hiding this comment.
The Discord bot container runs npm install on every start and discards stderr (2>/dev/null), which can hide install failures and makes restarts slow/non-deterministic. Prefer baking dependencies into an image (or using npm ci with a lockfile) and avoid suppressing install errors.
| command: sh -c "npm install --production 2>/dev/null; node discord-bot.js" | |
| command: sh -c "npm ci --omit=dev; node discord-bot.js" |
| environment: | ||
| EULA: "TRUE" | ||
| TYPE: "PAPER" | ||
| VERSION: "LATEST" # Latest MC version — ensure mineflayer supports it |
There was a problem hiding this comment.
Using floating tags (itzg/minecraft-server without a tag + VERSION: "LATEST") makes deployments non-reproducible and can break unexpectedly when upstream releases change. Pin the Docker image tag and/or Minecraft version to a known-good value, then update deliberately.
| VERSION: "LATEST" # Latest MC version — ensure mineflayer supports it | |
| VERSION: "1.20.4" # Pinned MC version for reproducible deployments |
| discord-bot: | ||
| image: app-mindcraft | ||
| container_name: discord-bot | ||
| working_dir: /app | ||
| command: node discord-bot.js | ||
| volumes: |
There was a problem hiding this comment.
discord-bot specifies image: app-mindcraft, but this compose file doesn't define/build an image with that name (and there’s no build: here). Unless app-mindcraft is built/pushed separately, docker compose up will fail. Consider build: . (possibly with a different Dockerfile/target) or set an explicit image: on the mindcraft build and reuse it here.
| read -r -p "Are you sure? Type 'yes' to confirm: " CONFIRM | ||
| [[ "$CONFIRM" == "yes" ]] || { echo "Aborted."; exit 0; } | ||
|
|
||
| SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=no" |
There was a problem hiding this comment.
The restore path uses SSH with -o StrictHostKeyChecking=no, which turns off verification of the EC2 host’s SSH key. This allows a man-in-the-middle attacker to impersonate the instance, have restore operations executed against their server, and potentially exfiltrate world data or secrets restored from S3. Configure SSH to verify the server host key (e.g., StrictHostKeyChecking=accept-new plus a pinned key) instead of disabling host key checking.
| SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=no" | |
| # Prefer a pinned EC2 host key if provided, otherwise safely accept new hosts | |
| if [[ -n "${EC2_HOST_KEY:-}" ]]; then | |
| KNOWN_HOSTS_FILE="$(mktemp)" | |
| echo "${EC2_HOST_KEY}" > "${KNOWN_HOSTS_FILE}" | |
| SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=yes -o UserKnownHostsFile=${KNOWN_HOSTS_FILE}" | |
| else | |
| SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=accept-new" | |
| fi |
| return | ||
| fi | ||
| info "Stopping Mindcraft containers on EC2..." | ||
| SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=no -o ConnectTimeout=5" |
There was a problem hiding this comment.
In stop_aws, SSH is invoked with -o StrictHostKeyChecking=no, disabling host key verification for connections used to stop containers on EC2. With host key checks turned off, a network attacker who can intercept traffic could impersonate the EC2 host and have arbitrary commands executed with your EC2 SSH key, leading to full compromise of the remote environment. Use SSH host key verification (for example StrictHostKeyChecking=accept-new with a pinned host key) instead of disabling it.
| SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=no -o ConnectTimeout=5" | |
| SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=accept-new -o ConnectTimeout=5" |
| curl -s "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o /tmp/awscliv2.zip | ||
| unzip -q /tmp/awscliv2.zip -d /tmp | ||
| /tmp/aws/install |
There was a problem hiding this comment.
aws/user-data.sh downloads and executes the AWS CLI installer via curl without any checksum or signature verification (curl ... awscli-exe-linux-x86_64.zip → unzip → /tmp/aws/install). If an attacker can tamper with or man-in-the-middle that download, they can execute arbitrary code as root during EC2 bootstrap, fully compromising the instance before other controls are applied. Fetch the installer from a trusted location and verify an official checksum or signature (or use the distro package manager) before invoking the installer binary.
| RUN curl -fsSL "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o /tmp/awscliv2.zip \ | ||
| && unzip -q /tmp/awscliv2.zip -d /tmp \ | ||
| && /tmp/aws/install \ | ||
| && rm -rf /tmp/awscliv2.zip /tmp/aws |
There was a problem hiding this comment.
The Docker build downloads and runs the AWS CLI installer via curl without any checksum or signature verification. If the awscli-exe-linux-x86_64.zip payload is ever compromised or intercepted, the build process will execute attacker-controlled code as root inside the build environment, tainting the resulting image and any workloads that use it. Fetch the installer from a trusted source and verify an official checksum or signature (or rely on the base image’s package manager) before running /tmp/aws/install.
| [[ -n "${KEY_FILE:-}" ]] || error "KEY_FILE not set in config.env" | ||
| [[ -f "$KEY_FILE" ]] || error "SSH key not found: ${KEY_FILE}. Run aws/setup.sh first." | ||
|
|
||
| SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=no -o ConnectTimeout=10" |
There was a problem hiding this comment.
Using ssh with -o StrictHostKeyChecking=no disables SSH host key verification, so this deploy script will trust any server claiming the EC2 IP and can be transparently man-in-the-middled. An attacker who can intercept network traffic could impersonate the instance and have these commands executed against their host with your private key, gaining remote code execution and access to any secrets or AWS credentials on that host. Configure SSH to verify the server identity (for example by using StrictHostKeyChecking=accept-new or a pinned host key in known_hosts) instead of disabling host key checking.
| SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=no -o ConnectTimeout=10" | |
| SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=accept-new -o UserKnownHostsFile=${SCRIPT_DIR}/known_hosts -o ConnectTimeout=10" |
AWS deployment scripts, multi-container Docker stack, Grafana dashboards, and production hardening. See branch for full diff.