Skip to content

feat: infrastructure — AWS EC2 deploy, Docker stack, Grafana, Tailscale#724

Open
Z0mb13V1 wants to merge 1 commit intomindcraft-bots:developfrom
Z0mb13V1:feat/pr5-infrastructure
Open

feat: infrastructure — AWS EC2 deploy, Docker stack, Grafana, Tailscale#724
Z0mb13V1 wants to merge 1 commit intomindcraft-bots:developfrom
Z0mb13V1:feat/pr5-infrastructure

Conversation

@Z0mb13V1
Copy link

@Z0mb13V1 Z0mb13V1 commented Mar 4, 2026

AWS deployment scripts, multi-container Docker stack, Grafana dashboards, and production hardening. See branch for full diff.

- aws/: EC2 deploy/teardown scripts, S3 backup/restore, Ollama proxy setup,
  user-data bootstrap, env toggle for prod/dev switching
- docker-compose.yml: multi-container stack (bot + viaproxy + chromadb)
- docker-compose.aws.yml: EC2 production override with LiteLLM proxy + Tailscale
- Dockerfile: multi-stage build, non-root node user, secrets excluded from context
- Tasks.Dockerfile: isolated task runner container
- prometheus-aws.yml: Prometheus scrape config for EC2 metrics
- grafana-provisioning/: pre-built dashboards and alerting rules
- start.ps1: cross-platform startup helper
Copilot AI review requested due to automatic review settings March 4, 2026 01:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an AWS EC2 deployment path and a production-oriented Docker Compose stack (Minecraft + Mindcraft agents + Discord bot + monitoring), along with supporting bootstrap/ops scripts and Grafana/Prometheus provisioning.

Changes:

  • Add AWS infrastructure lifecycle scripts (setup/bootstrap/deploy/backup/restore/teardown) and an EC2-focused compose stack.
  • Expand local Docker Compose into a multi-service stack (Minecraft server, Mindcraft agents, optional Discord bot/LiteLLM/ViaProxy, monitoring stubs).
  • Add monitoring configuration (Prometheus scrape config + Grafana provisioning skeleton).

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 24 comments.

Show a summary per file
File Description
start.ps1 Adds a Windows helper to launch compose profiles for bots.
services/viaproxy/README.md Updates ViaProxy setup documentation/formatting.
prometheus-aws.yml Adds Prometheus scrape configuration for AWS stack (node-exporter/cAdvisor).
grafana-provisioning/datasources.yml Provisions Prometheus datasource in Grafana.
grafana-provisioning/dashboards.yml Configures Grafana dashboard file provider.
grafana-provisioning/dashboard-json/.gitkeep Keeps dashboards JSON directory in git.
grafana-provisioning/alerting/rules.yml Adds alerting provisioning file (rule deletion stub).
grafana-provisioning/alerting/.gitkeep Keeps alerting directory in git.
docker-compose.yml Reworks local stack: Minecraft server + agents + optional Discord/LiteLLM/ViaProxy + GPU exporter stub.
docker-compose.aws.yml Adds EC2/AWS compose stack with Minecraft, agents, Discord, ChromaDB, monitoring, backups, Tailscale.
aws/user-data.sh EC2 first-boot bootstrap (Docker, AWS CLI, directories, placeholder cron tab).
aws/setup.sh Provisions AWS infra (VPC/SG/S3/IAM/SSM/EC2) from local machine.
aws/deploy.sh Rsync-based deployment to EC2 + SSM secret materialization + compose up.
aws/ec2-go.sh One-command deploy helper (local or remote) for pull/secrets/build/restart.
aws/ec2-deploy.sh EC2-internal bootstrap/update script (clone/pull, SSM secrets, compose up).
aws/setup-ollama-proxy.sh Creates a systemd socat proxy to reach Ollama over Tailscale.
aws/backup.sh S3 backup script for world + bot memory (cron-friendly).
aws/restore.sh S3 restore script for world + bot memory.
aws/teardown.sh Destroys AWS resources created by setup (optionally S3).
aws/env-toggle.sh Utility to toggle between AWS and local environments (advisory for local).
aws/s3-policy.json Template S3 bucket policy reference.
Tasks.Dockerfile Updates benchmark/tasks image build (Node 22 + Java 21 + AWS CLI + non-root).
Dockerfile Updates app image build steps (caching, tests during build, non-root runtime).
.dockerignore Tightens build context to avoid baking secrets/runtime data into images.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

COPY ./server_data.zip /mindcraft/
RUN unzip -q server_data.zip && rm server_data.zip

RUN npm ci --omit=dev
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

npm ci --omit=dev requires a lockfile (package-lock.json/npm-shrinkwrap.json). This repo doesn’t include a lockfile (and .gitignore excludes package-lock.json), so this step will fail. Either commit a lockfile and keep using npm ci, or switch back to npm install --omit=dev for this image.

Suggested change
RUN npm ci --omit=dev
RUN npm install --omit=dev

Copilot uses AI. Check for mistakes.
)

Write-Host "Launching both bots..." -ForegroundColor Cyan
docker compose --profile both up -d
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker compose --profile both up -d will fail because this repo's compose profiles are local, monitoring, cloud, discord, litellm, viaproxy (no both). Consider either adding a both profile in docker-compose.yml or updating this script to use existing profiles / no profile (and optionally gate -d on $Detach).

Suggested change
docker compose --profile both up -d
if ($Detach) {
docker compose up -d
} else {
docker compose up
}

Copilot uses AI. Check for mistakes.
image: node:22-slim
container_name: discord-bot
working_dir: /app
command: sh -c "npm install --production 2>/dev/null; node discord-bot.js"
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Discord bot container runs npm install on every start and discards stderr (2>/dev/null), which can hide install failures and makes restarts slow/non-deterministic. Prefer baking dependencies into an image (or using npm ci with a lockfile) and avoid suppressing install errors.

Suggested change
command: sh -c "npm install --production 2>/dev/null; node discord-bot.js"
command: sh -c "npm ci --omit=dev; node discord-bot.js"

Copilot uses AI. Check for mistakes.
environment:
EULA: "TRUE"
TYPE: "PAPER"
VERSION: "LATEST" # Latest MC version — ensure mineflayer supports it
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using floating tags (itzg/minecraft-server without a tag + VERSION: "LATEST") makes deployments non-reproducible and can break unexpectedly when upstream releases change. Pin the Docker image tag and/or Minecraft version to a known-good value, then update deliberately.

Suggested change
VERSION: "LATEST" # Latest MC version — ensure mineflayer supports it
VERSION: "1.20.4" # Pinned MC version for reproducible deployments

Copilot uses AI. Check for mistakes.
Comment on lines +87 to +92
discord-bot:
image: app-mindcraft
container_name: discord-bot
working_dir: /app
command: node discord-bot.js
volumes:
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discord-bot specifies image: app-mindcraft, but this compose file doesn't define/build an image with that name (and there’s no build: here). Unless app-mindcraft is built/pushed separately, docker compose up will fail. Consider build: . (possibly with a different Dockerfile/target) or set an explicit image: on the mindcraft build and reuse it here.

Copilot uses AI. Check for mistakes.
read -r -p "Are you sure? Type 'yes' to confirm: " CONFIRM
[[ "$CONFIRM" == "yes" ]] || { echo "Aborted."; exit 0; }

SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=no"
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The restore path uses SSH with -o StrictHostKeyChecking=no, which turns off verification of the EC2 host’s SSH key. This allows a man-in-the-middle attacker to impersonate the instance, have restore operations executed against their server, and potentially exfiltrate world data or secrets restored from S3. Configure SSH to verify the server host key (e.g., StrictHostKeyChecking=accept-new plus a pinned key) instead of disabling host key checking.

Suggested change
SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=no"
# Prefer a pinned EC2 host key if provided, otherwise safely accept new hosts
if [[ -n "${EC2_HOST_KEY:-}" ]]; then
KNOWN_HOSTS_FILE="$(mktemp)"
echo "${EC2_HOST_KEY}" > "${KNOWN_HOSTS_FILE}"
SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=yes -o UserKnownHostsFile=${KNOWN_HOSTS_FILE}"
else
SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=accept-new"
fi

Copilot uses AI. Check for mistakes.
return
fi
info "Stopping Mindcraft containers on EC2..."
SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=no -o ConnectTimeout=5"
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In stop_aws, SSH is invoked with -o StrictHostKeyChecking=no, disabling host key verification for connections used to stop containers on EC2. With host key checks turned off, a network attacker who can intercept traffic could impersonate the EC2 host and have arbitrary commands executed with your EC2 SSH key, leading to full compromise of the remote environment. Use SSH host key verification (for example StrictHostKeyChecking=accept-new with a pinned host key) instead of disabling it.

Suggested change
SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=no -o ConnectTimeout=5"
SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=accept-new -o ConnectTimeout=5"

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +52
curl -s "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o /tmp/awscliv2.zip
unzip -q /tmp/awscliv2.zip -d /tmp
/tmp/aws/install
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aws/user-data.sh downloads and executes the AWS CLI installer via curl without any checksum or signature verification (curl ... awscli-exe-linux-x86_64.zipunzip/tmp/aws/install). If an attacker can tamper with or man-in-the-middle that download, they can execute arbitrary code as root during EC2 bootstrap, fully compromising the instance before other controls are applied. Fetch the installer from a trusted location and verify an official checksum or signature (or use the distro package manager) before invoking the installer binary.

Copilot uses AI. Check for mistakes.
Comment on lines +24 to +27
RUN curl -fsSL "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o /tmp/awscliv2.zip \
&& unzip -q /tmp/awscliv2.zip -d /tmp \
&& /tmp/aws/install \
&& rm -rf /tmp/awscliv2.zip /tmp/aws
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Docker build downloads and runs the AWS CLI installer via curl without any checksum or signature verification. If the awscli-exe-linux-x86_64.zip payload is ever compromised or intercepted, the build process will execute attacker-controlled code as root inside the build environment, tainting the resulting image and any workloads that use it. Fetch the installer from a trusted source and verify an official checksum or signature (or rely on the base image’s package manager) before running /tmp/aws/install.

Copilot uses AI. Check for mistakes.
[[ -n "${KEY_FILE:-}" ]] || error "KEY_FILE not set in config.env"
[[ -f "$KEY_FILE" ]] || error "SSH key not found: ${KEY_FILE}. Run aws/setup.sh first."

SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=no -o ConnectTimeout=10"
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ssh with -o StrictHostKeyChecking=no disables SSH host key verification, so this deploy script will trust any server claiming the EC2 IP and can be transparently man-in-the-middled. An attacker who can intercept network traffic could impersonate the instance and have these commands executed against their host with your private key, gaining remote code execution and access to any secrets or AWS credentials on that host. Configure SSH to verify the server identity (for example by using StrictHostKeyChecking=accept-new or a pinned host key in known_hosts) instead of disabling host key checking.

Suggested change
SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=no -o ConnectTimeout=10"
SSH_OPTS="-i ${KEY_FILE} -o StrictHostKeyChecking=accept-new -o UserKnownHostsFile=${SCRIPT_DIR}/known_hosts -o ConnectTimeout=10"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants