diff --git a/README.md b/README.md
index d945b5c..89e1fba 100644
--- a/README.md
+++ b/README.md
@@ -11,12 +11,13 @@
[](https://github.com/TerrifiedBug/vectorflow/actions/workflows/ci.yml)
[](https://github.com/TerrifiedBug/vectorflow/releases)
[](LICENSE)
+[](https://terrifiedbug.gitbook.io/vectorflow)
**π Design, deploy, and monitor [Vector](https://vector.dev) data pipelines β visually**
Stop hand-editing YAML. Build observability pipelines with drag-and-drop
and deploy them across your entire fleet from a single dashboard.
-[Quick Start](#-quick-start) Β· [Deployment](#-deployment) Β· [Features](#-features) Β· [Configuration](#%EF%B8%8F-configuration) Β· [Development](#-development)
+[Documentation](https://terrifiedbug.gitbook.io/vectorflow) Β· [Quick Start](#-quick-start) Β· [Deployment](#-deployment) Β· [Features](#-features) Β· [Configuration](#%EF%B8%8F-configuration) Β· [Development](#-development)
diff --git a/docs/public/.gitbook.yaml b/docs/public/.gitbook.yaml
new file mode 100644
index 0000000..6acdd6a
--- /dev/null
+++ b/docs/public/.gitbook.yaml
@@ -0,0 +1,5 @@
+root: ./
+
+structure:
+ readme: README.md
+ summary: SUMMARY.md
diff --git a/docs/public/.gitbook/gitbook-skill.md b/docs/public/.gitbook/gitbook-skill.md
new file mode 100644
index 0000000..ddc02e0
--- /dev/null
+++ b/docs/public/.gitbook/gitbook-skill.md
@@ -0,0 +1,64 @@
+# GitBook Documentation Editing Skill
+
+This guide enables AI tools to write GitBook-compatible markdown.
+
+## Key Formatting Elements
+
+GitBook extends standard markdown with custom blocks:
+
+### Tabs
+{% tabs %}
+{% tab title="Docker" %}
+Content for Docker tab
+{% endtab %}
+{% tab title="Standalone" %}
+Content for Standalone tab
+{% endtab %}
+{% endtabs %}
+
+### Hints / Callouts
+{% hint style="info" %}
+Informational callout
+{% endhint %}
+
+{% hint style="warning" %}
+Warning callout
+{% endhint %}
+
+{% hint style="danger" %}
+Danger callout
+{% endhint %}
+
+{% hint style="success" %}
+Success callout
+{% endhint %}
+
+### Steppers
+{% stepper %}
+{% step %}
+### Step Title
+Step content
+{% endstep %}
+{% endstepper %}
+
+### Expandable Content
+
+Click to expand
+Hidden content here
+
+
+## Configuration
+
+- **.gitbook.yaml** -- Space configuration (root directory, readme/summary paths)
+- **SUMMARY.md** -- Table of contents defining sidebar navigation
+- **/.gitbook/vars.yaml** -- Space-level reusable variables
+
+## Writing Guidelines
+
+- Use hierarchical headings (H1-H3)
+- Keep paragraphs short, use bullet points
+- Include code snippets and practical examples
+- Minimize jargon
+- Use tabs for platform-specific content (Docker vs Standalone, Linux vs macOS)
+- Use hints for important callouts
+- Use steppers for sequential procedures
diff --git a/docs/public/README.md b/docs/public/README.md
new file mode 100644
index 0000000..654c6eb
--- /dev/null
+++ b/docs/public/README.md
@@ -0,0 +1,47 @@
+# Introduction
+
+VectorFlow is a visual pipeline management platform for [Vector](https://vector.dev), the high-performance observability data pipeline. It gives teams a web-based UI for designing, deploying, and monitoring Vector pipelines across an entire fleet of nodes -- no hand-edited TOML required.
+
+
+
+## Why VectorFlow?
+
+Observability pipelines are critical infrastructure, yet managing them usually means juggling config files across dozens of servers. VectorFlow replaces that workflow with a centralized control plane:
+
+- **Visual Pipeline Editor** -- Drag-and-drop sources, transforms, and sinks onto a canvas. The editor generates valid Vector configuration automatically.
+- **Fleet Management** -- Enroll agents on every node and push pipeline updates from a single dashboard. See which agents are online, what they are running, and roll back instantly.
+- **Multi-Environment Support** -- Maintain separate development, staging, and production environments. Promote pipelines through your deployment lifecycle with confidence.
+- **VRL Snippet Testing** -- Write and test Vector Remap Language transforms interactively before they reach production.
+- **Real-Time Metrics** -- Monitor throughput, error rates, and pipeline health at a glance.
+- **Alerting** -- Define alert rules so you know the moment a pipeline degrades.
+
+## Technology
+
+| Layer | Technology |
+|-------|------------|
+| Frontend | Next.js 16, React 19, TypeScript, Tailwind CSS |
+| Flow Editor | React Flow (@xyflow/react) |
+| Code Editor | Monaco Editor with VRL syntax |
+| API | tRPC (type-safe RPC) |
+| Authentication | NextAuth (credentials + OIDC) |
+| Database | PostgreSQL, Prisma ORM |
+| Agent | Go (zero external dependencies) |
+| Data Engine | Vector (vector.dev) |
+
+{% hint style="info" %}
+**Get started in 5 minutes.** Follow the [Quick Start](getting-started/quick-start.md) guide to spin up VectorFlow with Docker and build your first pipeline.
+{% endhint %}
+
+## Quick Links
+
+| | |
+|---|---|
+| **Getting Started** | [Quick Start](getting-started/quick-start.md) -- Install and run VectorFlow |
+| **Deploy** | [Server](getting-started/deploy-server.md) -- Set up the VectorFlow server |
+| | [Agents](getting-started/deploy-agents.md) -- Enroll agents on your nodes |
+| **Learn** | [Pipeline Editor](user-guide/pipeline-editor.md) -- Build pipelines visually |
+| | [Fleet Management](user-guide/fleet.md) -- Manage your agent fleet |
+| **Operate** | [Architecture](operations/architecture.md) -- Understand how it all fits together |
+| | [Configuration](operations/configuration.md) -- Environment variables and settings |
+| **Reference** | [API](reference/api.md) -- Full API documentation |
+| | [Pipeline YAML](reference/pipeline-yaml.md) -- Pipeline configuration format |
diff --git a/docs/public/SUMMARY.md b/docs/public/SUMMARY.md
new file mode 100644
index 0000000..51f211c
--- /dev/null
+++ b/docs/public/SUMMARY.md
@@ -0,0 +1,37 @@
+# Summary
+
+* [Introduction](README.md)
+
+## Getting Started
+
+* [Quick Start](getting-started/quick-start.md)
+* [Deploy the Server](getting-started/deploy-server.md)
+* [Deploy Agents](getting-started/deploy-agents.md)
+* [Your First Pipeline](getting-started/first-pipeline.md)
+
+## User Guide
+
+* [Dashboard](user-guide/dashboard.md)
+* [Pipelines](user-guide/pipelines.md)
+* [Pipeline Editor](user-guide/pipeline-editor.md)
+* [VRL Snippets](user-guide/vrl-snippets.md)
+* [Environments](user-guide/environments.md)
+* [Fleet Management](user-guide/fleet.md)
+* [Alerts](user-guide/alerts.md)
+* [Templates](user-guide/templates.md)
+
+## Operations
+
+* [Architecture](operations/architecture.md)
+* [Configuration](operations/configuration.md)
+* [Authentication](operations/authentication.md)
+* [Backup & Restore](operations/backup-restore.md)
+* [Security](operations/security.md)
+* [Upgrading](operations/upgrading.md)
+
+## Reference
+
+* [API Reference](reference/api.md)
+* [Agent Reference](reference/agent.md)
+* [Database Schema](reference/database.md)
+* [Pipeline YAML](reference/pipeline-yaml.md)
diff --git a/docs/public/getting-started/deploy-agents.md b/docs/public/getting-started/deploy-agents.md
new file mode 100644
index 0000000..12ca870
--- /dev/null
+++ b/docs/public/getting-started/deploy-agents.md
@@ -0,0 +1,208 @@
+# Deploy Agents
+
+The VectorFlow agent (`vf-agent`) is a lightweight Go binary that runs on each node in your fleet. It manages Vector processes, pulls pipeline configurations from the server, and reports health via heartbeats.
+
+{% hint style="info" %}
+**Pull-based architecture** -- agents poll the server for config updates. No inbound ports are required on your fleet nodes. The agent initiates all connections, making it firewall-friendly and easy to deploy behind NAT.
+{% endhint %}
+
+## How it works
+
+1. The agent starts and **enrolls** with the VectorFlow server using a one-time enrollment token
+2. After enrollment, the server issues a persistent **node token** stored on disk
+3. The agent **polls** the server every 15 seconds (configurable) for pipeline configuration changes
+4. When a new config is received, the agent writes it to disk and starts or restarts the Vector process
+5. The agent sends **heartbeats** with host metrics (CPU, memory, disk, network) so the server can track fleet health
+
+Each pipeline runs as an isolated Vector process -- a crash in one pipeline does not affect others.
+
+## Enrollment flow
+
+{% stepper %}
+{% step %}
+### Generate an enrollment token
+
+In the VectorFlow UI, navigate to the **Fleet** page. Select the environment you want to enroll the agent into, then click **Add Node**.
+
+VectorFlow generates a one-time enrollment token. Copy it -- you will need it in the next step.
+
+{% hint style="warning" %}
+Enrollment tokens are single-use. Each agent needs its own token. Generate a new one for each node you want to enroll.
+{% endhint %}
+{% endstep %}
+
+{% step %}
+### Deploy the agent
+
+Choose one of the deployment methods below and start the agent with the enrollment token.
+{% endstep %}
+
+{% step %}
+### Verify enrollment
+
+Once the agent starts, it enrolls with the server and appears on the **Fleet** page with an **Online** status. You can now deploy pipelines to this node.
+
+If the agent does not appear, check its logs for enrollment errors (`docker compose logs -f` or `journalctl -u vf-agent -f`).
+{% endstep %}
+{% endstepper %}
+
+## Deployment methods
+
+{% tabs %}
+{% tab title="Docker" %}
+The simplest approach -- ideal for containerized environments.
+
+```bash
+mkdir -p vectorflow-agent && cd vectorflow-agent
+
+curl -sSfL -o docker-compose.yml \
+ https://raw.githubusercontent.com/TerrifiedBug/vectorflow/main/docker/agent/docker-compose.yml
+```
+
+Create a `.env` file with your server URL and enrollment token:
+
+```bash
+cat > .env << 'EOF'
+VF_URL=https://your-vectorflow-server:3000
+VF_TOKEN=paste-enrollment-token-here
+EOF
+```
+
+Start the agent:
+
+```bash
+docker compose up -d
+```
+
+The Docker image bundles Vector, so no additional dependencies are needed. Two named volumes persist agent state and Vector data across restarts:
+
+| Volume | Mount point | Contents |
+|--------|-------------|----------|
+| `vectorflow-agent-data` | `/var/lib/vf-agent` | Node token, pipeline configs |
+| `vectorflow-vector-data` | `/var/lib/vector` | Vector checkpoints and buffer data |
+
+The agent container uses `network_mode: host` so Vector can bind to local ports (e.g., for syslog or socket sources).
+
+{% hint style="warning" %}
+After the first successful enrollment, the agent persists a node token to disk. The `VF_TOKEN` enrollment token is no longer needed. You can remove it from your `.env` file, but leaving it has no effect.
+{% endhint %}
+{% endtab %}
+
+{% tab title="Install script" %}
+The install script downloads the agent binary, installs Vector if needed, and configures a systemd service -- all in one command.
+
+```bash
+curl -sSfL https://raw.githubusercontent.com/TerrifiedBug/vectorflow/main/agent/install.sh | \
+ sudo bash -s -- --url https://vectorflow.example.com --token
+```
+
+**Managing the service:**
+
+```bash
+systemctl status vf-agent # Check status
+journalctl -u vf-agent -f # Follow logs
+sudo systemctl restart vf-agent # Restart
+```
+
+**Upgrading:**
+
+```bash
+# Upgrade to the latest release
+curl -sSfL https://raw.githubusercontent.com/TerrifiedBug/vectorflow/main/agent/install.sh | sudo bash
+
+# Install a specific version
+curl -sSfL https://raw.githubusercontent.com/TerrifiedBug/vectorflow/main/agent/install.sh | \
+ sudo bash -s -- --version v0.3.0
+```
+
+Existing configuration at `/etc/vectorflow/agent.env` is preserved during upgrades.
+
+**Uninstalling:**
+
+```bash
+sudo systemctl stop vf-agent
+sudo systemctl disable vf-agent
+sudo rm /etc/systemd/system/vf-agent.service
+sudo systemctl daemon-reload
+sudo rm /usr/local/bin/vf-agent
+sudo rm -rf /var/lib/vf-agent /etc/vectorflow
+```
+{% endtab %}
+
+{% tab title="Standalone binary" %}
+Download the binary from the [Releases](https://github.com/TerrifiedBug/vectorflow/releases) page and run it directly. Useful for testing or environments without systemd.
+
+```bash
+# Download the latest release for your platform
+curl -sSfL -o vf-agent \
+ https://github.com/TerrifiedBug/vectorflow/releases/latest/download/vf-agent-linux-amd64
+chmod +x vf-agent
+```
+
+Run with environment variables:
+
+```bash
+VF_URL=https://your-vectorflow-server:3000 \
+VF_TOKEN=paste-enrollment-token-here \
+./vf-agent
+```
+
+Or with a manual systemd unit:
+
+```ini
+# /etc/systemd/system/vf-agent.service
+[Unit]
+Description=VectorFlow Agent
+After=network-online.target
+Wants=network-online.target
+
+[Service]
+Type=simple
+EnvironmentFile=/etc/vectorflow/agent.env
+ExecStart=/usr/local/bin/vf-agent
+Restart=on-failure
+RestartSec=5
+
+# Security hardening
+NoNewPrivileges=true
+ProtectSystem=strict
+ReadWritePaths=/var/lib/vf-agent
+
+[Install]
+WantedBy=multi-user.target
+```
+
+Make sure Vector is installed and available in `PATH`, or set the `VF_VECTOR_BIN` variable to point to the binary.
+{% endtab %}
+{% endtabs %}
+
+## Environment variables
+
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `VF_URL` | Yes | -- | VectorFlow server URL (e.g., `https://vectorflow.example.com:3000`) |
+| `VF_TOKEN` | First run | -- | One-time enrollment token from the UI. Not needed after successful enrollment. |
+| `VF_DATA_DIR` | No | `/var/lib/vf-agent` | Directory for agent state (node token, pipeline configs). Mapped to a Docker volume by default. |
+| `VF_VECTOR_BIN` | No | `vector` | Path to the Vector binary. Set automatically in the Docker image. |
+| `VF_POLL_INTERVAL` | No | `15s` | How often the agent polls for config changes. Uses Go duration format (`15s`, `1m`, `30s`). |
+| `VF_LOG_LEVEL` | No | `info` | Log verbosity: `debug`, `info`, `warn`, or `error`. |
+
+## Troubleshooting
+
+**Agent does not appear in Fleet**
+
+- Verify `VF_URL` is correct and reachable from the agent host
+- Check that the enrollment token has not already been used
+- Look at agent logs for `enrollment failed` errors
+
+**Agent shows as "Unhealthy"**
+
+- The server marks a node unhealthy after 3 missed heartbeats (default: 45 seconds)
+- Check network connectivity between the agent and server
+- Verify the agent process is running (`systemctl status vf-agent` or `docker ps`)
+
+**Pipeline not starting on agent**
+
+- Confirm the pipeline is deployed to the correct environment
+- Check agent logs for Vector process errors
+- Verify Vector is installed and accessible at the path in `VF_VECTOR_BIN`
diff --git a/docs/public/getting-started/deploy-server.md b/docs/public/getting-started/deploy-server.md
new file mode 100644
index 0000000..716f1e5
--- /dev/null
+++ b/docs/public/getting-started/deploy-server.md
@@ -0,0 +1,296 @@
+# Deploy the Server
+
+The VectorFlow server is a Next.js application backed by PostgreSQL. This page covers deployment options, environment variables, persistent storage, and production hardening.
+
+{% tabs %}
+{% tab title="Docker (recommended)" %}
+## Docker Compose
+
+The quickest path to production. The provided `docker-compose.yml` starts both the VectorFlow server and PostgreSQL.
+
+### 1. Download the Compose file
+
+```bash
+mkdir -p vectorflow && cd vectorflow
+
+curl -sSfL -o docker-compose.yml \
+ https://raw.githubusercontent.com/TerrifiedBug/vectorflow/main/docker/server/docker-compose.yml
+```
+
+### 2. Create your `.env` file
+
+```bash
+cat > .env << 'EOF'
+POSTGRES_PASSWORD=
+NEXTAUTH_SECRET=
+# NEXTAUTH_URL=https://vectorflow.example.com
+EOF
+```
+
+Generate secrets with `openssl rand -base64 32`.
+
+### 3. Start the stack
+
+```bash
+docker compose up -d
+```
+
+The entrypoint automatically runs database migrations on every start, so upgrades are handled by pulling a new image and restarting.
+
+### Compose file breakdown
+
+```yaml
+services:
+ postgres:
+ image: postgres:17-alpine
+ environment:
+ POSTGRES_DB: vectorflow
+ POSTGRES_USER: vectorflow
+ POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
+ volumes:
+ - pgdata:/var/lib/postgresql/data
+ healthcheck:
+ test: ["CMD-SHELL", "pg_isready -U vectorflow"]
+ restart: unless-stopped
+
+ vectorflow:
+ image: ghcr.io/terrifiedbug/vectorflow-server:${VF_VERSION:-latest}
+ depends_on:
+ postgres:
+ condition: service_healthy
+ ports:
+ - "3000:3000"
+ environment:
+ DATABASE_URL: postgresql://vectorflow:${POSTGRES_PASSWORD}@postgres:5432/vectorflow
+ NEXTAUTH_SECRET: ${NEXTAUTH_SECRET}
+ NEXTAUTH_URL: ${NEXTAUTH_URL}
+ volumes:
+ - vfdata:/app/.vectorflow
+ - backups:/backups
+ restart: unless-stopped
+
+volumes:
+ pgdata:
+ vfdata:
+ backups:
+```
+
+### Persistent volumes
+
+| Volume | Mount point | Contents |
+|--------|-------------|----------|
+| `vectorflow-pgdata` | `/var/lib/postgresql/data` | PostgreSQL database files |
+| `vectorflow-data` | `/app/.vectorflow` | Application state, system Vector config |
+| `vectorflow-backups` | `/backups` | Database backup snapshots |
+
+{% hint style="warning" %}
+Never delete the `vectorflow-pgdata` volume without a backup. All pipeline definitions, environments, users, and audit history live in PostgreSQL.
+{% endhint %}
+
+### Pinning a version
+
+By default the Compose file pulls `latest`. To pin a specific release:
+
+```bash
+VF_VERSION=v0.3.0 docker compose up -d
+```
+
+Or set `VF_VERSION` in your `.env` file.
+
+### Networking
+
+The server listens on port 3000. The PostgreSQL port is not exposed by default -- only the VectorFlow container can reach it over the internal Docker network. If you need direct database access for debugging:
+
+```yaml
+# Uncomment in docker-compose.yml
+ports:
+ - "127.0.0.1:5432:5432"
+```
+{% endtab %}
+
+{% tab title="Standalone" %}
+## Standalone deployment
+
+Run VectorFlow directly on a Linux host without Docker. This approach gives you full control over the process manager, database, and networking.
+
+### Prerequisites
+
+- **Node.js 22+** and **pnpm**
+- **PostgreSQL 17** (running and accessible)
+- **Vector 0.44.0+** binary (for pipeline validation and VRL testing)
+
+### 1. Download the release
+
+Download the latest release archive from the [Releases](https://github.com/TerrifiedBug/vectorflow/releases) page and extract it.
+
+```bash
+curl -sSfL -o vectorflow.tar.gz \
+ https://github.com/TerrifiedBug/vectorflow/releases/latest/download/vectorflow-server.tar.gz
+mkdir -p /opt/vectorflow && tar xzf vectorflow.tar.gz -C /opt/vectorflow
+```
+
+### 2. Set up PostgreSQL
+
+Create a database and user:
+
+```sql
+CREATE USER vectorflow WITH PASSWORD 'your-strong-password';
+CREATE DATABASE vectorflow OWNER vectorflow;
+```
+
+### 3. Configure environment
+
+Create an environment file at `/etc/vectorflow/server.env`:
+
+```bash
+DATABASE_URL=postgresql://vectorflow:your-strong-password@localhost:5432/vectorflow
+NEXTAUTH_SECRET=generate-a-random-32-char-string
+NEXTAUTH_URL=https://vectorflow.example.com
+PORT=3000
+NODE_ENV=production
+```
+
+### 4. Run database migrations
+
+```bash
+cd /opt/vectorflow
+npx prisma migrate deploy
+```
+
+### 5. Start the server
+
+```bash
+node server.js
+```
+
+### Systemd service
+
+For production, run VectorFlow as a systemd service:
+
+```ini
+# /etc/systemd/system/vectorflow.service
+[Unit]
+Description=VectorFlow Server
+After=network-online.target postgresql.service
+Wants=network-online.target
+
+[Service]
+Type=simple
+User=vectorflow
+Group=vectorflow
+WorkingDirectory=/opt/vectorflow
+EnvironmentFile=/etc/vectorflow/server.env
+ExecStart=/usr/bin/node server.js
+Restart=on-failure
+RestartSec=5
+
+# Security hardening
+NoNewPrivileges=true
+ProtectSystem=strict
+ProtectHome=true
+ReadWritePaths=/opt/vectorflow/.vectorflow /backups
+
+[Install]
+WantedBy=multi-user.target
+```
+
+Enable and start:
+
+```bash
+sudo systemctl daemon-reload
+sudo systemctl enable --now vectorflow
+```
+{% endtab %}
+{% endtabs %}
+
+## Environment variables
+
+{% hint style="warning" %}
+Always use strong, random values for `NEXTAUTH_SECRET` and `POSTGRES_PASSWORD`. These protect session data, encrypted secrets (TOTP, certificates), and your database.
+{% endhint %}
+
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `DATABASE_URL` | Yes | -- | PostgreSQL connection string (e.g., `postgresql://user:pass@host:5432/vectorflow`) |
+| `NEXTAUTH_SECRET` | Yes | -- | Session encryption key. Must be 32+ characters. Generate with `openssl rand -base64 32` |
+| `NEXTAUTH_URL` | No | -- | Canonical server URL (e.g., `https://vectorflow.example.com`). When unset, inferred from the `Host` header |
+| `PORT` | No | `3000` | HTTP listen port |
+| `NODE_ENV` | No | `production` | Set automatically in Docker. Use `production` for standalone deployments |
+
+When using the Docker Compose setup, the following variables go in your `.env` file and are interpolated into the Compose file:
+
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `POSTGRES_PASSWORD` | Yes | -- | Password for the PostgreSQL `vectorflow` user |
+| `VF_VERSION` | No | `latest` | Docker image tag to pull |
+
+## Production considerations
+
+### Reverse proxy
+
+In production, place VectorFlow behind a reverse proxy for TLS termination.
+
+
+Nginx example
+
+```nginx
+server {
+ listen 443 ssl http2;
+ server_name vectorflow.example.com;
+
+ ssl_certificate /etc/ssl/certs/vectorflow.crt;
+ ssl_certificate_key /etc/ssl/private/vectorflow.key;
+
+ location / {
+ proxy_pass http://127.0.0.1:3000;
+ proxy_set_header Host $host;
+ proxy_set_header X-Real-IP $remote_addr;
+ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+ proxy_set_header X-Forwarded-Proto $scheme;
+
+ # WebSocket support (live metrics)
+ proxy_http_version 1.1;
+ proxy_set_header Upgrade $http_upgrade;
+ proxy_set_header Connection "upgrade";
+ }
+}
+```
+
+
+
+Caddy example
+
+```
+vectorflow.example.com {
+ reverse_proxy localhost:3000
+}
+```
+
+Caddy handles TLS certificates automatically via Let's Encrypt.
+
+
+When using a reverse proxy, set `NEXTAUTH_URL` to your public URL (e.g., `https://vectorflow.example.com`).
+
+### TLS
+
+- **Agents communicate with the server over HTTPS.** Always terminate TLS in production.
+- If you cannot use a reverse proxy, consider a TLS-terminating load balancer.
+
+### Database tuning
+
+For deployments managing more than 50 agents, consider tuning PostgreSQL:
+
+- Increase `shared_buffers` to 25% of available RAM
+- Set `work_mem` to 64 MB
+- Enable `pg_stat_statements` for query monitoring
+- Schedule regular `VACUUM ANALYZE` runs
+
+### Resource requirements
+
+| Scale | CPU | RAM | Disk |
+|-------|-----|-----|------|
+| Small (1-10 agents) | 1 core | 1 GB | 10 GB |
+| Medium (10-50 agents) | 2 cores | 2 GB | 25 GB |
+| Large (50+ agents) | 4 cores | 4 GB | 50 GB+ |
+
+These are minimums. The server is lightweight -- most resources go to PostgreSQL.
diff --git a/docs/public/getting-started/first-pipeline.md b/docs/public/getting-started/first-pipeline.md
new file mode 100644
index 0000000..beb0817
--- /dev/null
+++ b/docs/public/getting-started/first-pipeline.md
@@ -0,0 +1,150 @@
+# Your First Pipeline
+
+This walkthrough guides you through creating a simple pipeline that generates demo log events, transforms them with VRL (Vector Remap Language), and outputs the result to the console. By the end, you will have a working pipeline deployed to your fleet.
+
+## Prerequisites
+
+- A running VectorFlow server ([Quick Start](quick-start.md))
+- At least one enrolled agent ([Deploy Agents](deploy-agents.md))
+
+## Build the pipeline
+
+{% stepper %}
+{% step %}
+### Create a new pipeline
+
+Navigate to **Pipelines** in the sidebar and click **New Pipeline**.
+{% endstep %}
+
+{% step %}
+### Name and configure
+
+Give the pipeline a name (e.g., `demo-pipeline`) and select the **environment** where your agent is enrolled. Click **Create**.
+
+You are now in the pipeline editor -- a drag-and-drop canvas with three panels:
+
+- **Left** -- Component palette with all available sources, transforms, and sinks
+- **Center** -- Canvas where you build the pipeline graph
+- **Right** -- Detail panel for configuring the selected node
+
+
+{% endstep %}
+
+{% step %}
+### Add a source
+
+In the component palette on the left, find **Demo Logs** under the **Testing** category (or type "demo" in the search box).
+
+Drag it onto the canvas. A green source node appears. Click on it to open the detail panel on the right, and set the **format** to `json`. This generates fake JSON log events every second.
+{% endstep %}
+
+{% step %}
+### Add a transform
+
+Search for **Remap (VRL)** in the component palette and drag it onto the canvas to the right of your source node.
+
+Click the Remap node to open the detail panel. In the **VRL Source** editor, write a simple transformation:
+
+```coffeescript
+.message = "processed: " + string!(.message)
+.processed_at = now()
+```
+
+This prepends "processed: " to each log message and adds a timestamp field.
+{% endstep %}
+
+{% step %}
+### Add a sink
+
+Search for **Console** in the component palette and drag it onto the canvas to the right of your transform node.
+
+Click the Console node and set the **encoding** codec to `json`. The console sink prints events to Vector's stdout, which the agent captures and forwards to VectorFlow for viewing.
+{% endstep %}
+
+{% step %}
+### Connect the nodes
+
+Draw connections between your components:
+
+1. Hover over the **output port** (small circle on the right edge) of the Demo Logs source node
+2. Click and drag a line to the **input port** (small circle on the left edge) of the Remap transform node
+3. Release to create the connection
+4. Repeat: connect the Remap output to the Console sink input
+
+Your pipeline graph should now show: **Demo Logs** -> **Remap (VRL)** -> **Console**
+
+{% hint style="info" %}
+VectorFlow validates connection compatibility in real time. You cannot connect a metrics-only source to a logs-only sink, for example. Invalid connections are rejected automatically.
+{% endhint %}
+{% endstep %}
+
+{% step %}
+### Configure component keys
+
+Each node has a **Component Key** in the detail panel (e.g., `demo_logs_0`). This key becomes the component ID in the generated Vector configuration. You can rename keys to something more descriptive like `demo_source`, `add_timestamp`, and `debug_output`.
+
+Keys must contain only letters, numbers, and underscores.
+{% endstep %}
+
+{% step %}
+### Validate the pipeline
+
+Click the **Validate** button (checkmark icon) in the toolbar at the top of the editor.
+
+VectorFlow generates the Vector YAML configuration from your graph and sends it to Vector for validation. If everything is correct, you see a green "Pipeline is valid!" toast notification.
+
+If validation fails, the error message tells you exactly which component has an issue. Fix the configuration and validate again.
+{% endstep %}
+
+{% step %}
+### Save the pipeline
+
+Click the **Save** button in the toolbar (or press `Cmd+S` / `Ctrl+S`). This persists your pipeline graph to the database but does not deploy it yet.
+{% endstep %}
+
+{% step %}
+### Deploy
+
+Click the **Deploy** button in the toolbar. The deploy dialog opens and shows:
+
+- The **target environment** and how many agents are enrolled
+- A **validation check** (the pipeline must be valid to deploy)
+- A **YAML diff** comparing the new config against the previously deployed version (if any)
+- A **Deployment Reason** field -- describe what changed and why
+
+Enter a deployment reason (e.g., "Initial demo pipeline"), then click **Publish to Agents**.
+
+VectorFlow publishes the pipeline configuration. Agents pick up the new config on their next poll cycle (default: 15 seconds).
+{% endstep %}
+
+{% step %}
+### Verify
+
+Navigate to the **Fleet** page. You should see your agent with an **Online** status. The pipeline status shows as **Running** once Vector picks up the configuration.
+
+Back in the pipeline editor, the toolbar shows a green **Deployed** badge. If you enabled metrics, you can see live event rates on the canvas edges.
+{% endstep %}
+{% endstepper %}
+
+{% hint style="success" %}
+**Congratulations!** You have built, validated, and deployed your first VectorFlow pipeline. Demo log events are flowing through your Remap transform and printing to the console.
+{% endhint %}
+
+## What just happened
+
+Under the hood, VectorFlow:
+
+1. Converted your visual graph into a Vector YAML configuration
+2. Validated the config using Vector's built-in `--config-yaml validate` command
+3. Created an immutable **version snapshot** with your changelog entry
+4. Published the config to all agents in the target environment
+5. Each agent pulled the new config, wrote it to disk, and started a new Vector process
+
+## Next steps
+
+Now that you have the basics, explore more of VectorFlow:
+
+- [Pipeline Editor](../user-guide/pipeline-editor.md) -- keyboard shortcuts, import/export, templates, and metrics overlay
+- [VRL Snippets](../user-guide/vrl-snippets.md) -- save and reuse VRL patterns across pipelines
+- [Environments](../user-guide/environments.md) -- organize your fleet into staging, production, and other environments
+- [Fleet Management](../user-guide/fleet.md) -- monitor agent health, view logs, and manage nodes
diff --git a/docs/public/getting-started/quick-start.md b/docs/public/getting-started/quick-start.md
new file mode 100644
index 0000000..3af4005
--- /dev/null
+++ b/docs/public/getting-started/quick-start.md
@@ -0,0 +1,97 @@
+# Quick Start
+
+Get VectorFlow running in under 5 minutes. This guide walks you through starting the server with Docker Compose and completing the initial setup.
+
+## Prerequisites
+
+- [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/) (v2+)
+- A machine with at least 1 CPU core and 1 GB RAM
+
+{% stepper %}
+{% step %}
+### Download and configure
+
+Create a directory for VectorFlow, download the Docker Compose file, and create an environment file with your secrets.
+
+```bash
+mkdir vectorflow && cd vectorflow
+
+# Download the server compose file
+curl -sSfL -o docker-compose.yml \
+ https://raw.githubusercontent.com/TerrifiedBug/vectorflow/main/docker/server/docker-compose.yml
+```
+
+Create a `.env` file next to `docker-compose.yml`:
+
+```bash
+cat > .env << 'EOF'
+# Database password β use a random 32+ character string
+POSTGRES_PASSWORD=changeme
+
+# Session & encryption key β generate with: openssl rand -base64 32
+NEXTAUTH_SECRET=changeme
+EOF
+```
+
+{% hint style="warning" %}
+Replace both `changeme` values with strong, random strings before starting. You can generate them with:
+
+```bash
+openssl rand -base64 32
+```
+{% endhint %}
+
+`NEXTAUTH_URL` is optional. When omitted, VectorFlow infers the URL from the incoming `Host` header. Set it explicitly if you place VectorFlow behind a reverse proxy (e.g., `https://vectorflow.example.com`).
+{% endstep %}
+
+{% step %}
+### Start VectorFlow
+
+```bash
+docker compose up -d
+```
+
+Docker pulls the VectorFlow server and PostgreSQL images, runs database migrations automatically, and starts both services. You can follow the logs with:
+
+```bash
+docker compose logs -f vectorflow
+```
+
+Wait until you see `Starting VectorFlow...` in the output.
+{% endstep %}
+
+{% step %}
+### Complete the setup wizard
+
+Open your browser and navigate to [http://localhost:3000](http://localhost:3000).
+
+The setup wizard walks you through creating your first admin account. Enter a username, email, and password, then click **Create Account**.
+
+Once logged in, you land on the VectorFlow dashboard.
+{% endstep %}
+{% endstepper %}
+
+{% hint style="success" %}
+**You're ready!** VectorFlow is running and your admin account is set up. Time to deploy some agents and build your first pipeline.
+{% endhint %}
+
+## What's in the stack
+
+The Docker Compose file starts two containers:
+
+| Container | Image | Purpose |
+|-----------|-------|---------|
+| `vectorflow-server` | `ghcr.io/terrifiedbug/vectorflow-server` | Next.js application, tRPC API, pipeline validation |
+| `vectorflow-postgres` | `postgres:17-alpine` | Database for pipelines, environments, users, and audit logs |
+
+Three named volumes persist data across restarts:
+
+- **`vectorflow-pgdata`** -- PostgreSQL data
+- **`vectorflow-data`** -- VectorFlow application state
+- **`vectorflow-backups`** -- Database backups
+
+## Next steps
+
+- [Deploy Agents](deploy-agents.md) -- enroll your first fleet node
+- [Your First Pipeline](first-pipeline.md) -- build and deploy a pipeline in the visual editor
+- [Deploy the Server](deploy-server.md) -- production deployment options, TLS, and environment variables
diff --git a/docs/public/operations/architecture.md b/docs/public/operations/architecture.md
new file mode 100644
index 0000000..45caafa
--- /dev/null
+++ b/docs/public/operations/architecture.md
@@ -0,0 +1,186 @@
+# Architecture
+
+VectorFlow uses a **hub-and-spoke architecture** where a central server manages configuration and state while lightweight agents run on each node to execute Vector pipelines.
+
+## System overview
+
+```
+ βββββββββββββββββββββββββββ
+ β Browser (React) β
+ β Pipeline Editor, Fleet β
+ β Dashboard, Settings β
+ βββββββββββββ¬ββββββββββββββ
+ β HTTPS
+ βββββββββββββΌββββββββββββββ
+ β VectorFlow Server β
+ β (Next.js + tRPC) β
+ β β
+ β ββββββββββββββββββββ β
+ β β PostgreSQL β β
+ β β (all state) β β
+ β ββββββββββββββββββββ β
+ βββββ¬βββββββββββ¬βββββββ¬ββββ
+ β β β
+ βββββββββββΌβββ ββββββΌββββ ββΌβββββββββββ
+ β Agent A β βAgent B β β Agent N β
+ β (Go) β β (Go) β β (Go) β
+ β βββββββββ β ββββββββ β β βββββββββ β
+ β βVector β β ββVec. β β β βVector β β
+ β βββββββββ β ββββββββ β β βββββββββ β
+ ββββββββββββββ ββββββββββ βββββββββββββ
+```
+
+## Components
+
+### Server
+
+The VectorFlow server is a **Next.js** application that provides the web UI, REST API, and all management logic. It is the single source of truth for pipeline definitions, environment configuration, user accounts, and audit history.
+
+Key responsibilities:
+- Serve the browser-based pipeline editor and dashboard
+- Store pipeline graphs, configurations, and deployment versions
+- Generate Vector configuration files (YAML/TOML) from visual pipeline graphs
+- Manage user authentication, teams, and role-based access
+- Accept agent heartbeats and store fleet metrics
+- Evaluate alert rules and fire webhook notifications
+
+### Agent
+
+The VectorFlow **agent** is a lightweight Go binary that runs on each node where you want to execute Vector pipelines. Agents are stateless -- all configuration comes from the server.
+
+Key responsibilities:
+- Enroll with the server using a one-time enrollment token
+- Poll the server for configuration changes and pending actions
+- Start, stop, and reload Vector processes on the local node
+- Report metrics, pipeline status, and logs back to the server
+- Self-update when a new agent version is available
+
+### Database
+
+VectorFlow uses **PostgreSQL** as its sole data store. All state lives in the database:
+
+- Pipeline definitions and version history
+- Environment, team, and user configuration
+- Encrypted secrets and certificates
+- Agent node registrations and metrics
+- Audit log entries
+- System settings (OIDC, backup schedule, fleet tuning)
+
+The schema is managed by Prisma ORM, and migrations run automatically on server startup.
+
+### Vector
+
+[Vector](https://vector.dev) is the high-performance data router that does the actual work of collecting, transforming, and shipping observability data. VectorFlow does not replace Vector -- it provides a management layer on top of it.
+
+Each agent manages one or more Vector processes on its node. When a pipeline is deployed, the agent receives a generated Vector configuration file, writes it to disk, and starts or reloads the Vector process.
+
+## Data flow
+
+### Pipeline lifecycle
+
+A pipeline moves through these stages from creation to execution:
+
+```
+Editor (browser)
+ β User builds pipeline graph visually
+ βΌ
+Server (tRPC mutation)
+ β Pipeline graph saved to PostgreSQL
+ βΌ
+Deploy preview
+ β Server generates Vector YAML from graph
+ β Resolves secrets and certificates
+ β Validates configuration
+ βΌ
+Deploy to agents
+ β Creates a PipelineVersion snapshot
+ β Sends config to each agent via heartbeat actions
+ βΌ
+Agent receives config
+ β Writes YAML to disk
+ β Starts or reloads Vector process
+ βΌ
+Vector runs pipeline
+ β Data flows from sources β transforms β sinks
+ β Agent reports metrics back via heartbeat
+ βΌ
+Dashboard
+ Events processed, errors, throughput visible in UI
+```
+
+### Metrics collection
+
+Agents report metrics to the server on every heartbeat cycle (default: every 15 seconds):
+
+- **Node metrics** -- CPU, memory, disk, and network usage
+- **Pipeline status** -- Events in/out, errors, bytes processed per component
+- **Logs** -- Pipeline log output
+- **Event samples** -- Sample events for schema discovery
+
+The server stores these in PostgreSQL and evaluates alert rules against configured thresholds on each heartbeat.
+
+## Agent communication
+
+### Pull-based polling
+
+Agents use a **pull-based** communication model. The agent initiates all connections -- the server never connects to agents. This design was chosen for three reasons:
+
+1. **Security** -- Agents can run behind firewalls and NATs without exposing any ports. Only outbound HTTPS is required.
+2. **Simplicity** -- No need for service discovery, message brokers, or persistent connections.
+3. **Scalability** -- The server handles agents as stateless HTTP clients. No per-agent connection state to manage.
+
+### Protocol
+
+Agents communicate via three REST endpoints:
+
+| Endpoint | Method | Purpose |
+|----------|--------|---------|
+| `/api/agent/enroll` | POST | One-time enrollment. Agent sends enrollment token, receives a persistent node token. |
+| `/api/agent/heartbeat` | POST | Periodic check-in. Agent sends metrics and status, receives pending actions (deploy, undeploy, update). |
+| `/api/agent/config` | POST | Fetch the generated Vector configuration for a specific pipeline. |
+
+### Heartbeat cycle
+
+On each heartbeat, the agent sends:
+- Current agent version
+- Node resource metrics (CPU, memory, disk)
+- Status of each running pipeline (events processed, errors)
+- Pipeline logs since last heartbeat
+
+The server responds with any **pending actions**:
+- Deploy a new pipeline version
+- Undeploy a pipeline
+- Self-update to a new agent version
+
+### Enrollment
+
+When an agent starts for the first time, it sends the enrollment token (provided via `VF_TOKEN`) to the server. The server validates the token, registers the node in the target environment, and returns a persistent **node token**. The agent stores this token locally and uses it for all future heartbeat requests.
+
+```
+Agent Server
+ β β
+ βββ POST /api/agent/enroll βββββββββΆβ
+ β { enrollmentToken } β
+ β β Validate token
+ β β Create node record
+ ββββ { nodeToken, nodeId } βββββββββ
+ β β
+ β (stores node token to disk) β
+ β β
+ βββ POST /api/agent/heartbeat βββββΆβ
+ β { nodeToken, metrics, ... } β
+ ββββ { pendingActions: [...] } βββββ
+ β β
+```
+
+## Security model
+
+VectorFlow's architecture is designed with defense in depth:
+
+- **Agent-initiated connections only** -- The server never opens connections to agent nodes. Agents poll the server over HTTPS, so they work behind firewalls without exposing any inbound ports.
+- **Encrypted secrets** -- Sensitive values (API keys, passwords, certificates) are encrypted with AES-256-GCM before storage. They are only decrypted at deploy time when generating Vector configuration.
+- **Token-based agent auth** -- Each agent has a unique node token issued during enrollment. Tokens are stored with restricted file permissions (`0600`) on the agent host.
+- **Role-based access control** -- Users are assigned roles (Viewer, Editor, Admin) per team. Super Admins have platform-wide access.
+- **Audit logging** -- Every mutation is logged with the user, IP address, timestamp, and a diff of changed fields.
+
+For a detailed security guide, see [Security](security.md).
diff --git a/docs/public/operations/authentication.md b/docs/public/operations/authentication.md
new file mode 100644
index 0000000..1fb1e36
--- /dev/null
+++ b/docs/public/operations/authentication.md
@@ -0,0 +1,182 @@
+# Authentication
+
+VectorFlow supports multiple authentication methods: local credentials, OIDC/SSO, and two-factor authentication (2FA). This page covers initial setup, login methods, user management, and role-based access control.
+
+
+
+## Initial setup
+
+When VectorFlow starts for the first time with an empty database, it redirects to a setup wizard. The wizard creates the first admin account and the first team.
+
+{% stepper %}
+{% step %}
+### Create admin account
+Enter your name, email address, and a password (minimum 8 characters).
+{% endstep %}
+{% step %}
+### Name your team
+Choose a name for your first team (e.g., "Platform Engineering"). Teams organize environments and pipelines.
+{% endstep %}
+{% step %}
+### Start using VectorFlow
+After setup completes, you are redirected to the login page. Sign in with the credentials you just created.
+{% endstep %}
+{% endstepper %}
+
+The first user is automatically a **Super Admin** with full platform access.
+
+## Credentials authentication
+
+By default, users log in with an email address and password. Passwords are hashed with bcrypt before storage.
+
+When a Super Admin creates a new user, VectorFlow generates a random temporary password. The new user must change their password on first login.
+
+## OIDC / SSO
+
+VectorFlow supports any OpenID Connect provider (Okta, Entra ID, Keycloak, Google Workspace, Auth0, etc.). SSO is configured from the Settings page by a Super Admin.
+
+{% stepper %}
+{% step %}
+### Register VectorFlow with your identity provider
+Create an OAuth2/OIDC application in your identity provider. Use the following redirect URI:
+
+```
+https://your-vectorflow-url/api/auth/callback/oidc
+```
+{% endstep %}
+{% step %}
+### Open Settings in VectorFlow
+Navigate to **Settings > Authentication** (Super Admin required).
+{% endstep %}
+{% step %}
+### Enter provider details
+Fill in the OIDC configuration:
+
+| Field | Description | Example |
+|-------|-------------|---------|
+| Issuer URL | OIDC discovery endpoint | `https://accounts.google.com` |
+| Client ID | OAuth2 client ID from your provider | `abc123.apps.googleusercontent.com` |
+| Client Secret | OAuth2 client secret | `GOCSPX-xxxxxxxxxxxx` |
+| Display Name | Button label on the login page | `Sign in with Okta` |
+| Token Auth Method | How the client authenticates to the token endpoint | `client_secret_post` (default) or `client_secret_basic` |
+
+{% endstep %}
+{% step %}
+### Test the connection
+Click **Test Connection** to verify VectorFlow can reach your provider's discovery endpoint. The test fetches `/.well-known/openid-configuration` and validates that required fields are present.
+{% endstep %}
+{% step %}
+### Save and verify
+Save the settings. An SSO button will appear on the login page. Test the flow by logging in with an SSO account.
+{% endstep %}
+{% endstepper %}
+
+{% hint style="info" %}
+OIDC settings are stored encrypted in the database. The client secret is encrypted with AES-256-GCM before storage.
+{% endhint %}
+
+### OIDC role mapping
+
+Map identity provider groups to VectorFlow roles so users are automatically assigned the correct permissions when they sign in via SSO.
+
+| Setting | Default | Description |
+|---------|---------|-------------|
+| Groups Claim | `groups` | JWT claim containing group memberships |
+| Default Role | VIEWER | Role assigned to users not matching any group rule |
+| Admin Groups | -- | Comma-separated group names that map to the **Admin** role |
+| Editor Groups | -- | Comma-separated group names that map to the **Editor** role |
+
+### OIDC team mapping
+
+For more granular control, map identity provider groups directly to VectorFlow teams with specific roles:
+
+```json
+[
+ { "group": "platform-admins", "teamId": "team_abc", "role": "ADMIN" },
+ { "group": "sre-team", "teamId": "team_abc", "role": "EDITOR" },
+ { "group": "developers", "teamId": "team_xyz", "role": "VIEWER" }
+]
+```
+
+You can also set a **default team** as a fallback for users who do not match any group mapping.
+
+## Two-factor authentication (2FA)
+
+VectorFlow supports TOTP-based two-factor authentication compatible with any authenticator app (Google Authenticator, Authy, 1Password, etc.).
+
+### Enabling 2FA
+
+Users can enable 2FA from their **Profile** page. Teams can also require 2FA for all members -- when enabled, users who have not set up 2FA are redirected to the enrollment page on their next login.
+
+{% stepper %}
+{% step %}
+### Start 2FA setup
+Navigate to your **Profile** page and click **Enable Two-Factor Authentication**, or you will be redirected automatically if your team requires it.
+{% endstep %}
+{% step %}
+### Scan the QR code
+Scan the displayed QR code with your authenticator app. Alternatively, enter the secret key manually.
+{% endstep %}
+{% step %}
+### Save backup codes
+VectorFlow generates **10 single-use backup codes**. Save these in a secure location. Each code can be used once to log in if you lose access to your authenticator app.
+{% endstep %}
+{% step %}
+### Verify
+Enter the 6-digit code from your authenticator app to confirm setup. 2FA is now active on your account.
+{% endstep %}
+{% endstepper %}
+
+### Logging in with 2FA
+
+After entering your email and password, you are prompted for a 6-digit verification code. Enter the code from your authenticator app, or use one of your backup codes.
+
+{% hint style="info" %}
+2FA is only available for local (credentials) accounts. OIDC/SSO users should configure MFA through their identity provider.
+{% endhint %}
+
+## User management
+
+Super Admins can manage users from the **Settings > Users** page.
+
+### Creating users
+
+Super Admins can create new local user accounts. VectorFlow generates a random temporary password that must be shared with the user securely. The user is required to change their password on first login.
+
+When creating a user, you can optionally assign them to a team with a specific role immediately.
+
+### Managing users
+
+| Action | Description |
+|--------|-------------|
+| **Assign to team** | Add a user to a team with a specified role (Viewer, Editor, or Admin) |
+| **Remove from team** | Remove a user's membership from a specific team |
+| **Reset password** | Generate a new temporary password (local accounts only) |
+| **Lock account** | Prevent the user from logging in. Locked users see an error on the login page |
+| **Unlock account** | Restore login access for a locked account |
+| **Toggle Super Admin** | Grant or revoke platform-wide Super Admin privileges |
+| **Delete user** | Permanently remove the user and all their data |
+
+{% hint style="warning" %}
+You cannot delete your own account, remove your own Super Admin status, or lock your own account.
+{% endhint %}
+
+## Roles and permissions
+
+VectorFlow uses a hierarchical role system. Roles are assigned **per team**, so a user can be an Admin in one team and a Viewer in another.
+
+| Role | Permissions |
+|------|-------------|
+| **Viewer** | View pipelines, fleet status, dashboards, and audit logs. Cannot make changes. |
+| **Editor** | Everything a Viewer can do, plus: create/edit/delete pipelines, manage secrets and certificates, deploy pipelines, manage alerts. |
+| **Admin** | Everything an Editor can do, plus: manage environments, manage team members and roles, configure team settings (e.g., require 2FA). |
+| **Super Admin** | Platform-wide access. Can manage all teams, configure system settings (OIDC, fleet, backups), create and delete users. Super Admin is a flag on the user, not a team role. |
+
+### Role hierarchy
+
+```
+Viewer < Editor < Admin < Super Admin
+ (0) (1) (2) (bypass)
+```
+
+Higher roles inherit all permissions from lower roles. Super Admin bypasses all team-level access checks.
diff --git a/docs/public/operations/backup-restore.md b/docs/public/operations/backup-restore.md
new file mode 100644
index 0000000..c2d8ee9
--- /dev/null
+++ b/docs/public/operations/backup-restore.md
@@ -0,0 +1,134 @@
+# Backup & Restore
+
+VectorFlow includes built-in database backup and restore functionality. Backups capture the entire PostgreSQL database, including all pipelines, environments, users, secrets, audit history, and system settings.
+
+## What gets backed up
+
+Backups are full PostgreSQL dumps in compressed custom format (`pg_dump --format=custom`). Everything stored in the database is included:
+
+- Pipeline definitions and version history
+- Environments, teams, and user accounts
+- Encrypted secrets and certificates
+- Agent node registrations
+- Alert rules and webhook configurations
+- Audit log entries
+- System settings (OIDC, fleet, backup schedule)
+
+{% hint style="info" %}
+Backups do **not** include the Vector data directory (`/var/lib/vector/`) on agent nodes. Vector's internal state (e.g., file checkpoints, disk buffers) is managed by each agent independently.
+{% endhint %}
+
+## Automatic backups
+
+VectorFlow can run backups on a cron schedule with automatic retention cleanup.
+
+### Configuring the schedule
+
+Navigate to **Settings > Backup** (Super Admin required) to configure:
+
+| Setting | Default | Description |
+|---------|---------|-------------|
+| Enabled | Off | Toggle automatic backups on or off |
+| Cron Schedule | `0 2 * * *` | Standard cron expression. Default runs at 2:00 AM daily |
+| Retention Count | 7 | Number of backups to keep. Older backups are automatically deleted |
+
+**Common cron schedules:**
+
+| Schedule | Cron Expression |
+|----------|----------------|
+| Every day at 2:00 AM | `0 2 * * *` |
+| Every 6 hours | `0 */6 * * *` |
+| Every day at midnight | `0 0 * * *` |
+| Every Sunday at 3:00 AM | `0 3 * * 0` |
+| Every weekday at 1:00 AM | `0 1 * * 1-5` |
+
+After each scheduled backup completes, VectorFlow automatically runs retention cleanup to delete the oldest backups beyond the configured retention count.
+
+## Manual backup
+
+You can trigger a backup at any time from the **Settings > Backup** page by clicking **Create Backup**. The backup runs immediately and appears in the backup list when complete.
+
+Each backup generates two files:
+- `vectorflow-.dump` -- The compressed PostgreSQL dump
+- `vectorflow-.meta.json` -- Metadata (VectorFlow version, migration count, PostgreSQL version, file size)
+
+## Backup storage
+
+Backups are stored on the server's local filesystem in the directory configured by the `VF_BACKUP_DIR` environment variable (default: `/backups`).
+
+In the Docker Compose setup, this directory is mounted as a Docker volume:
+
+```yaml
+volumes:
+ - backups:/backups
+```
+
+{% hint style="warning" %}
+For production deployments, consider mounting `VF_BACKUP_DIR` to a location that is backed up by your infrastructure-level backup system (e.g., an NFS share, or a directory included in your host backup schedule).
+{% endhint %}
+
+## Restore procedure
+
+Restoring from a backup replaces the entire database with the contents of the backup file.
+
+{% hint style="danger" %}
+**Restoring a backup overwrites all current data.** All pipelines, users, secrets, and settings will be replaced with the state from the backup. This action cannot be undone (though VectorFlow automatically creates a safety backup before restoring).
+{% endhint %}
+
+### Restore from the UI
+
+{% stepper %}
+{% step %}
+### Navigate to Settings > Backup
+Open the backup management page (Super Admin required).
+{% endstep %}
+{% step %}
+### Select a backup
+Find the backup you want to restore in the list. Review the metadata (timestamp, VectorFlow version, size).
+{% endstep %}
+{% step %}
+### Click Restore
+Confirm the restore action. VectorFlow will:
+1. Validate version compatibility (blocks if the backup has more migrations than the current version)
+2. Create a safety backup of the current database
+3. Run `pg_restore --clean --if-exists` to replace the database
+4. Exit the process so the container restarts with the restored data
+{% endstep %}
+{% step %}
+### Wait for restart
+The server process exits after restore. If running in Docker, the container restarts automatically. Database migrations run on startup to bring the schema up to date.
+{% endstep %}
+{% endstepper %}
+
+### Manual restore (CLI)
+
+If you cannot access the UI, you can restore directly using `pg_restore`:
+
+```bash
+# Stop the VectorFlow server first
+docker compose stop vectorflow
+
+# Restore the backup
+docker compose exec postgres pg_restore \
+ --clean --if-exists \
+ -U vectorflow -d vectorflow \
+ /backups/vectorflow-2025-01-15T02-00-00-000Z.dump
+
+# Restart the server (migrations run automatically)
+docker compose start vectorflow
+```
+
+## Version compatibility
+
+VectorFlow tracks the number of database migrations in each backup's metadata. When restoring:
+
+- **Same version or older backup β newer server**: Works. Migrations run automatically on startup to bring the schema up to date.
+- **Newer backup β older server**: Blocked. If the backup contains more migrations than the current server version, the restore is rejected. Upgrade VectorFlow first, then restore.
+
+## Recommended backup strategy
+
+1. **Enable automatic daily backups** with a retention count of at least 7.
+2. **Mount the backup directory** to storage that is included in your infrastructure backup system.
+3. **Test restores periodically** in a staging environment to verify your backups are valid.
+4. **Create a manual backup** before upgrading VectorFlow or making major configuration changes.
+5. **Monitor backup status** on the Settings page. Failed backups are logged with error details.
diff --git a/docs/public/operations/configuration.md b/docs/public/operations/configuration.md
new file mode 100644
index 0000000..0945a6d
--- /dev/null
+++ b/docs/public/operations/configuration.md
@@ -0,0 +1,150 @@
+# Configuration
+
+VectorFlow is configured through environment variables (for the server and agents) and through the Settings page in the UI (for fleet tuning, OIDC, and backups).
+
+## Server environment variables
+
+### Required
+
+{% hint style="warning" %}
+These variables must be set before the server can start. Without them, the application will fail to launch.
+{% endhint %}
+
+| Variable | Description | Example |
+|----------|-------------|---------|
+| `DATABASE_URL` | PostgreSQL connection string | `postgresql://vectorflow:pass@localhost:5432/vectorflow` |
+| `NEXTAUTH_SECRET` | Session encryption key (min 32 characters) | Output of `openssl rand -base64 32` |
+
+{% hint style="danger" %}
+`NEXTAUTH_SECRET` is used to encrypt sessions, TOTP secrets, stored credentials, and all sensitive values in the database. Use a strong, random value and keep it safe. If you lose this key, all encrypted data becomes unrecoverable.
+{% endhint %}
+
+### Optional
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `NEXTAUTH_URL` | *(inferred from Host header)* | Canonical server URL. Set this when running behind a reverse proxy (e.g., `https://vectorflow.example.com`) |
+| `PORT` | `3000` | HTTP listen port |
+| `NODE_ENV` | `production` | Set automatically in Docker. Use `production` for standalone deployments |
+| `VF_BACKUP_DIR` | `/backups` | Directory for database backup files |
+
+### Docker Compose variables
+
+When using the Docker Compose setup, these variables go in your `.env` file and are interpolated into the Compose file:
+
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `POSTGRES_PASSWORD` | Yes | -- | Password for the PostgreSQL `vectorflow` user |
+| `VF_VERSION` | No | `latest` | Docker image tag to pull |
+
+## Agent environment variables
+
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `VF_URL` | Yes | -- | VectorFlow server URL (e.g., `https://vectorflow.example.com`) |
+| `VF_TOKEN` | First run only | -- | Enrollment token from the environment detail page. Only needed for initial registration |
+| `VF_DATA_DIR` | No | `/var/lib/vf-agent` | Data directory for configs, tokens, and certificates |
+| `VF_VECTOR_BIN` | No | `vector` | Path to the Vector binary |
+| `VF_POLL_INTERVAL` | No | `15s` | How often the agent polls the server for changes |
+| `VF_LOG_LEVEL` | No | `info` | Logging verbosity: `debug`, `info`, `warn`, `error` |
+
+## Database connection
+
+VectorFlow requires PostgreSQL 17 or later. The connection is configured via `DATABASE_URL`.
+
+**Connection string format:**
+
+```
+postgresql://[user]:[password]@[host]:[port]/[database]?[options]
+```
+
+**Common options:**
+
+| Option | Description |
+|--------|-------------|
+| `sslmode=require` | Enforce TLS for the database connection |
+| `connection_limit=10` | Limit the Prisma connection pool size |
+
+## Example `.env` file
+
+### Server (Docker Compose)
+
+```bash
+# Required
+POSTGRES_PASSWORD=my-strong-database-password
+NEXTAUTH_SECRET=Kj8mN2pQ4rT6vX9zA1cE3fG5hI7jL0nO2qR4sU6wY8
+
+# Optional
+NEXTAUTH_URL=https://vectorflow.example.com
+VF_VERSION=latest
+```
+
+### Agent
+
+```bash
+# Required
+VF_URL=https://vectorflow.example.com
+
+# Only for first enrollment
+VF_TOKEN=env_abc123_enrollment_token
+
+# Optional
+VF_DATA_DIR=/var/lib/vf-agent
+VF_VECTOR_BIN=/usr/bin/vector
+VF_POLL_INTERVAL=15s
+VF_LOG_LEVEL=info
+```
+
+## System settings (UI)
+
+The following settings are configured through the **Settings** page in the VectorFlow UI. Only Super Admins can access this page. These values are stored in the database and take effect immediately.
+
+### Fleet settings
+
+| Setting | Default | Range | Description |
+|---------|---------|-------|-------------|
+| Poll Interval | 15,000 ms | 1,000--300,000 | How frequently agents check in with the server |
+| Unhealthy Threshold | 3 | 1--100 | Number of missed heartbeats before an agent is marked **Unreachable** |
+| Metrics Retention | 7 days | 1--365 | How long node and pipeline metrics are kept |
+| Logs Retention | 3 days | 1--30 | How long pipeline logs are kept |
+
+### Backup settings
+
+| Setting | Default | Description |
+|---------|---------|-------------|
+| Enabled | Off | Toggle automatic scheduled backups |
+| Cron Schedule | `0 2 * * *` | Cron expression for backup timing (default: 2:00 AM daily) |
+| Retention Count | 7 | Number of backups to keep before deleting the oldest |
+
+For more details, see [Backup & Restore](backup-restore.md).
+
+### OIDC / SSO settings
+
+OIDC is configured in the Settings page under the **Authentication** tab. See [Authentication](authentication.md) for full setup instructions.
+
+## Ports reference
+
+| Service | Default Port | Description |
+|---------|-------------|-------------|
+| VectorFlow Server | 3000 | Web UI and API |
+| PostgreSQL | 5432 | Database (not exposed externally in Docker) |
+| Vector API | 8686 | Vector GraphQL API (per node, managed by agent) |
+
+## File paths
+
+### Server
+
+| Path | Description |
+|------|-------------|
+| `/app/.vectorflow/` | Server data directory (Docker volume mount) |
+| `/backups/` | Database backup storage (Docker volume mount) |
+
+### Agent
+
+| Path | Description |
+|------|-------------|
+| `/var/lib/vf-agent/` | Agent data directory (default) |
+| `/var/lib/vf-agent/node-token` | Persistent authentication token (mode `0600`) |
+| `/var/lib/vf-agent/pipelines/` | Pipeline configuration files |
+| `/var/lib/vf-agent/certs/` | Deployed TLS certificates |
+| `/var/lib/vector/` | Vector data directory |
diff --git a/docs/public/operations/security.md b/docs/public/operations/security.md
new file mode 100644
index 0000000..ad51401
--- /dev/null
+++ b/docs/public/operations/security.md
@@ -0,0 +1,148 @@
+# Security
+
+This page covers VectorFlow's security architecture: how secrets are managed, how data is encrypted, and recommended hardening practices for production deployments.
+
+## Secret management
+
+VectorFlow provides a built-in secret store for each environment. Secrets hold sensitive values -- API keys, database passwords, authentication tokens -- that pipelines need at runtime but should not be stored in plain text in pipeline configurations.
+
+### Creating secrets
+
+Secrets are created on the **environment detail page** under **Secrets & Certificates**. Each secret has a name and a value. Secret names must start with a letter or number and can contain letters, numbers, hyphens, and underscores.
+
+Secrets are scoped to a single environment. The same secret name can hold different values in different environments (e.g., a `DB_PASSWORD` secret with a test value in dev and a production value in prod).
+
+### How secrets are stored
+
+When you create or update a secret, the value is encrypted with **AES-256-GCM** before being written to the database. The plaintext value is never stored. Only the encrypted ciphertext is persisted.
+
+### How secrets are resolved
+
+When a pipeline is deployed, VectorFlow generates the Vector configuration file. During generation, it scans the configuration for **secret references** and replaces them with the actual decrypted values.
+
+Secret references use the syntax:
+
+```
+SECRET[secret-name]
+```
+
+For example, a Kafka sink node configured with:
+
+```yaml
+sasl:
+ username: my-user
+ password: SECRET[KAFKA_PASSWORD]
+```
+
+At deploy time, `SECRET[KAFKA_PASSWORD]` is resolved to the decrypted value of the `KAFKA_PASSWORD` secret in the pipeline's environment.
+
+### Certificate references
+
+TLS certificates work the same way, using the `CERT[name]` syntax. When a pipeline references a certificate, VectorFlow decrypts the certificate data and deploys it as a file on the agent node:
+
+```yaml
+tls:
+ crt_file: CERT[my-tls-cert]
+```
+
+The agent receives the certificate file and writes it to `/var/lib/vf-agent/certs/`.
+
+## Encryption
+
+### At rest
+
+VectorFlow encrypts sensitive data before storing it in PostgreSQL:
+
+| Data | Algorithm | Key derivation |
+|------|-----------|---------------|
+| Secrets (user-created) | AES-256-GCM | SHA-256 hash of `NEXTAUTH_SECRET` |
+| Certificates | AES-256-GCM | SHA-256 hash of `NEXTAUTH_SECRET` |
+| OIDC client secret | AES-256-GCM | SHA-256 hash of `NEXTAUTH_SECRET` |
+| Sensitive node config fields | AES-256-GCM | SHA-256 hash of `NEXTAUTH_SECRET` |
+| User passwords | bcrypt (cost 12) | Built-in salt |
+| TOTP secrets | AES-256-GCM | SHA-256 hash of `NEXTAUTH_SECRET` |
+| 2FA backup codes | SHA-256 hash | -- |
+| Webhook signing | HMAC-SHA256 | Per-webhook secret |
+
+{% hint style="danger" %}
+`NEXTAUTH_SECRET` is the master encryption key for all sensitive data. If this value is changed or lost, all encrypted data (secrets, certificates, OIDC config) becomes permanently unrecoverable. Back up this value securely.
+{% endhint %}
+
+### Sensitive field auto-encryption
+
+Pipeline node configurations may contain sensitive fields (passwords, API keys, tokens). VectorFlow automatically detects and encrypts these fields when saving a pipeline, based on:
+
+1. Fields marked as `sensitive: true` in the Vector component schema
+2. Field names matching patterns like `password`, `secret`, `token`, or `api_key`
+
+These fields are encrypted before database storage and decrypted only when generating the Vector configuration for deployment.
+
+### In transit
+
+- **Browser to server** -- HTTPS (TLS termination via reverse proxy or load balancer)
+- **Agent to server** -- HTTPS over the same endpoint. Agents authenticate with a bearer token issued during enrollment.
+- **Server to database** -- Configurable via `sslmode` in the `DATABASE_URL` connection string
+
+## Network security
+
+### Agent connections
+
+Agents initiate all connections to the server. The server never connects outbound to agents. This means:
+
+- Agents can run behind firewalls and NATs
+- No inbound ports need to be opened on agent nodes
+- Only outbound HTTPS (port 443) to the VectorFlow server is required
+
+### Reverse proxy
+
+In production, place VectorFlow behind a reverse proxy (Nginx, Caddy, Traefik) for TLS termination. See [Deploy the Server](../getting-started/deploy-server.md) for example configurations.
+
+### Agent authentication
+
+Each agent authenticates using a **node token** -- a unique bearer token issued during enrollment. The token is stored at `/var/lib/vf-agent/node-token` with file permissions `0600` (readable only by the owner).
+
+Enrollment tokens (used for initial registration) can be regenerated or revoked from the environment detail page.
+
+## Audit logging
+
+Every mutation in VectorFlow is logged to an audit trail. Audit entries include:
+
+- **Who** -- The authenticated user
+- **What** -- The action performed and entity affected
+- **When** -- Timestamp
+- **Where** -- Client IP address
+- **Changes** -- A diff of the fields that were modified
+
+Sensitive fields (passwords, tokens, secrets) are automatically redacted in audit log entries.
+
+View the audit log from the **Audit** page in the sidebar.
+
+## Security hardening checklist
+
+Use this checklist to harden your VectorFlow deployment for production:
+
+{% hint style="warning" %}
+Complete all items before exposing VectorFlow to untrusted networks.
+{% endhint %}
+
+- [ ] **Generate a strong `NEXTAUTH_SECRET`** -- Use `openssl rand -base64 32` to generate a random 32+ character secret. Never use default or weak values.
+
+- [ ] **Generate a strong `POSTGRES_PASSWORD`** -- Use a random, high-entropy password for the database.
+
+- [ ] **Enable TLS/HTTPS** -- Place VectorFlow behind a reverse proxy with TLS termination. All agent communication should use HTTPS.
+
+- [ ] **Enable 2FA for all users** -- Use team-level "Require 2FA" settings to enforce two-factor authentication for all team members.
+
+- [ ] **Use OIDC/SSO** -- Integrate with your organization's identity provider for centralized authentication and MFA.
+
+- [ ] **Restrict network access** -- Limit access to the VectorFlow server to trusted networks. Use firewall rules or network policies.
+
+- [ ] **Enable database TLS** -- Add `sslmode=require` to your `DATABASE_URL` to encrypt the connection between VectorFlow and PostgreSQL.
+
+- [ ] **Regular backups** -- Enable automatic daily backups and verify restores periodically. See [Backup & Restore](backup-restore.md).
+
+- [ ] **Keep software updated** -- Regularly update VectorFlow server and agents to get security patches. See [Upgrading](upgrading.md).
+
+- [ ] **Review audit logs** -- Periodically review the audit log for unexpected actions or unauthorized access attempts.
+
+- [ ] **Lock unused accounts** -- Lock user accounts that are no longer active instead of leaving them accessible.
diff --git a/docs/public/operations/upgrading.md b/docs/public/operations/upgrading.md
new file mode 100644
index 0000000..7213ba3
--- /dev/null
+++ b/docs/public/operations/upgrading.md
@@ -0,0 +1,237 @@
+# Upgrading
+
+VectorFlow is designed for zero-downtime upgrades. The server handles database migrations automatically on startup, and agents can self-update without manual intervention.
+
+## Pre-upgrade checklist
+
+Before upgrading, complete these steps:
+
+- [ ] **Create a database backup** -- Navigate to Settings > Backup and click **Create Backup**, or verify that a recent automatic backup exists. See [Backup & Restore](backup-restore.md).
+- [ ] **Review release notes** -- Check the [Releases](https://github.com/TerrifiedBug/vectorflow/releases) page for breaking changes, required actions, or migration notes.
+- [ ] **Verify agent compatibility** -- Server and agent versions should be kept in sync. The server is backward-compatible with older agents, but newer agents may require a newer server.
+
+## Version checking
+
+VectorFlow automatically checks for new releases every 24 hours by querying the GitHub Releases API. When a new version is available, a notification appears on the Settings page showing:
+
+- Current server version
+- Latest available version
+- Link to the release notes
+
+You can force a version check from **Settings** by clicking **Check for Updates**.
+
+## Server upgrade
+
+The server upgrade process is the same regardless of how you deployed: replace the binary or image, restart, and migrations run automatically.
+
+{% tabs %}
+{% tab title="Docker" %}
+### Docker upgrade
+
+{% stepper %}
+{% step %}
+### Pull the new image
+```bash
+docker compose pull vectorflow
+```
+
+Or pin a specific version in your `.env` file:
+```bash
+VF_VERSION=v0.4.0
+```
+{% endstep %}
+{% step %}
+### Restart the server
+```bash
+docker compose up -d
+```
+
+The entrypoint runs `prisma migrate deploy` automatically. Database schema changes are applied before the application starts.
+{% endstep %}
+{% step %}
+### Verify
+Check the logs to confirm the server started successfully:
+```bash
+docker compose logs -f vectorflow
+```
+
+Look for the migration output and the "Ready" message.
+{% endstep %}
+{% endstepper %}
+
+{% endtab %}
+{% tab title="Standalone" %}
+### Standalone upgrade
+
+{% stepper %}
+{% step %}
+### Download the new release
+```bash
+curl -sSfL -o vectorflow.tar.gz \
+ https://github.com/TerrifiedBug/vectorflow/releases/latest/download/vectorflow-server.tar.gz
+```
+{% endstep %}
+{% step %}
+### Stop the server
+```bash
+sudo systemctl stop vectorflow
+```
+{% endstep %}
+{% step %}
+### Extract the new release
+```bash
+tar xzf vectorflow.tar.gz -C /opt/vectorflow
+```
+{% endstep %}
+{% step %}
+### Run migrations
+```bash
+cd /opt/vectorflow
+npx prisma migrate deploy
+```
+{% endstep %}
+{% step %}
+### Start the server
+```bash
+sudo systemctl start vectorflow
+```
+{% endstep %}
+{% step %}
+### Verify
+```bash
+sudo systemctl status vectorflow
+journalctl -u vectorflow -f
+```
+{% endstep %}
+{% endstepper %}
+
+{% endtab %}
+{% endtabs %}
+
+## Agent upgrade
+
+### Automatic self-update
+
+Agents can update themselves automatically. When the server detects that a newer agent version is available, it includes a **self-update action** in the heartbeat response. The agent then:
+
+1. Downloads the new binary from the release URL
+2. Computes a SHA-256 checksum and verifies it against the expected value
+3. Writes the new binary to a temporary file alongside the current executable
+4. Atomically replaces the current binary (`rename`)
+5. Re-executes the process (`syscall.Exec`) with the same arguments and environment
+
+The update is seamless -- running Vector pipelines are not interrupted during the agent binary swap. After re-exec, the agent resumes its heartbeat loop with the new version.
+
+{% hint style="info" %}
+Self-update requires the agent binary to be writable by the process. If the agent runs as a restricted user, ensure it has write permission to its own executable path.
+{% endhint %}
+
+### Manual agent update
+
+If automatic updates are not suitable for your environment, you can update agents manually:
+
+{% tabs %}
+{% tab title="Docker" %}
+```bash
+# Pull the new agent image
+docker compose pull vf-agent
+
+# Restart the agent
+docker compose up -d vf-agent
+```
+{% endtab %}
+{% tab title="Standalone" %}
+```bash
+# Download the new binary
+curl -sSfL -o /usr/local/bin/vf-agent \
+ https://github.com/TerrifiedBug/vectorflow/releases/latest/download/vf-agent-linux-amd64
+
+# Make it executable
+chmod +x /usr/local/bin/vf-agent
+
+# Restart the agent
+sudo systemctl restart vf-agent
+```
+{% endtab %}
+{% endtabs %}
+
+## Database migrations
+
+VectorFlow uses Prisma ORM for database schema management. Migrations are:
+
+- **Automatically applied** on server startup in the Docker image (the entrypoint runs `prisma migrate deploy`)
+- **Forward-only** -- there is no automatic rollback of migrations
+- **Non-destructive** where possible -- VectorFlow avoids dropping columns or tables in migrations
+
+If a migration fails, the server will not start. Check the logs for the specific error and resolve it before restarting.
+
+## Rollback
+
+### Server rollback
+
+If an upgrade causes issues, you can roll back to the previous version:
+
+{% tabs %}
+{% tab title="Docker" %}
+Pin the previous version in your `.env` file and restart:
+
+```bash
+# Set the previous version
+VF_VERSION=v0.3.0
+
+# Restart with the old image
+docker compose up -d
+```
+
+{% hint style="warning" %}
+Rolling back the server after a database migration has run may cause errors if the application code expects the old schema. If migrations were applied, restore from a pre-upgrade backup instead.
+{% endhint %}
+{% endtab %}
+{% tab title="Standalone" %}
+Replace the application files with the previous release archive:
+
+```bash
+sudo systemctl stop vectorflow
+tar xzf vectorflow-v0.3.0.tar.gz -C /opt/vectorflow
+sudo systemctl start vectorflow
+```
+
+If database migrations were applied, restore from a backup:
+
+```bash
+# Stop the server
+sudo systemctl stop vectorflow
+
+# Restore the pre-upgrade backup
+pg_restore --clean --if-exists \
+ -U vectorflow -d vectorflow \
+ /backups/vectorflow-pre-upgrade.dump
+
+# Start the old version
+sudo systemctl start vectorflow
+```
+{% endtab %}
+{% endtabs %}
+
+### Agent rollback
+
+For Docker-based agents, pin the previous image tag. For standalone agents, replace the binary with the previous version:
+
+```bash
+# Download the specific previous version
+curl -sSfL -o /usr/local/bin/vf-agent \
+ https://github.com/TerrifiedBug/vectorflow/releases/download/v0.3.0/vf-agent-linux-amd64
+
+chmod +x /usr/local/bin/vf-agent
+sudo systemctl restart vf-agent
+```
+
+## Version compatibility
+
+| Server Version | Minimum Agent Version | Notes |
+|---------------|----------------------|-------|
+| Current | Current - 2 minor versions | Agents within 2 minor versions of the server are fully supported |
+
+{% hint style="info" %}
+The server is generally backward-compatible with older agents. Older agents may not support newer features (e.g., new pipeline actions), but they will continue to run existing pipelines without issues. It is recommended to keep agents updated to match the server version.
+{% endhint %}
diff --git a/docs/public/reference/agent.md b/docs/public/reference/agent.md
new file mode 100644
index 0000000..b677f96
--- /dev/null
+++ b/docs/public/reference/agent.md
@@ -0,0 +1,329 @@
+# Agent Reference
+
+The VectorFlow agent is a lightweight Go binary that runs on each node where you want to execute Vector pipelines. It has zero external dependencies -- a single binary is all you need. The agent communicates with the VectorFlow server to receive pipeline configurations, report status and metrics, and apply updates.
+
+## Overview
+
+- **Single binary**: No runtime dependencies, no package managers. Download and run.
+- **Zero config files**: All configuration is via environment variables.
+- **Process-per-pipeline**: Each deployed pipeline runs as a separate Vector child process, providing isolation and independent lifecycle management.
+- **Stateless**: The agent stores only its node token on disk. All pipeline configuration comes from the server on every poll.
+
+---
+
+## Lifecycle
+
+The agent follows a predictable lifecycle from first startup to steady-state operation:
+
+```
+ββββββββββββ ββββββββββββ ββββββββββββββββ ββββββββββββββ
+β Start βββββΆβ Enroll βββββΆβ Poll + Run βββββΆβ Heartbeat β
+β β β β β β β β
+β Load env β β Send β β Fetch config β β Report β
+β vars, β β hostname β β Start/stop β β pipeline β
+β detect β β + token β β pipelines β β status, β
+β Vector β β to serverβ β β β metrics, β
+ββββββββββββ ββββββββββββ ββββββββββββββββ β logs β
+ β β² βββββββ¬βββββββ
+ β β β
+ β ββββββββββββββββββββ
+ β (every poll interval)
+ β
+ βββββββΌβββββββ
+ β Save node β
+ β token to β
+ β disk β
+ ββββββββββββββ
+```
+
+### Enrollment
+
+On first startup, the agent enrolls with the server using the enrollment token (`VF_TOKEN`). The server responds with a **node token** -- a unique credential for this specific agent instance. The node token is saved to `/node-token` and reused on subsequent starts. After enrollment, the `VF_TOKEN` is no longer needed.
+
+The enrollment request includes the agent's hostname, OS/architecture, agent version, and Vector version.
+
+### Polling
+
+After enrollment, the agent enters a poll loop. On each tick (default: every 15 seconds), it:
+
+1. **Fetches configuration** from the server (`GET /api/agent/config`)
+2. **Compares** the received pipeline configs against locally known state (by checksum)
+3. **Takes action**: starts new pipelines, restarts pipelines with changed configs, stops removed pipelines
+4. **Reconciles** orphaned config files on disk from previous runs
+5. **Processes** any pending sample requests or server-initiated actions (e.g., self-update)
+
+### Heartbeat
+
+After each poll, the agent sends a heartbeat (`POST /api/agent/heartbeat`) that includes:
+
+- Status of each running pipeline (RUNNING, STARTING, STOPPED, CRASHED)
+- Per-pipeline metrics scraped from Vector's Prometheus endpoint (events in/out, bytes, errors)
+- Per-component metrics for the visual editor node overlays
+- Host system metrics (CPU, memory, disk, network)
+- Recent stdout/stderr log lines from each pipeline process
+- Agent and Vector version information
+
+---
+
+## Environment variables
+
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `VF_URL` | Yes | -- | VectorFlow server URL (e.g., `https://vectorflow.example.com`) |
+| `VF_TOKEN` | On first run | -- | Enrollment token from the VectorFlow UI. Not needed after initial enrollment. |
+| `VF_DATA_DIR` | No | `/var/lib/vf-agent` | Directory for node token, pipeline configs, and certificate files |
+| `VF_VECTOR_BIN` | No | `vector` | Path to the Vector binary. Use if Vector is not on the system `PATH`. |
+| `VF_POLL_INTERVAL` | No | `15s` | How often to poll the server for config changes. Accepts Go duration syntax (e.g., `10s`, `1m`). |
+| `VF_LOG_LEVEL` | No | `info` | Agent log level: `debug`, `info`, `warn`, `error` |
+
+{% hint style="warning" %}
+`VF_URL` is the only strictly required variable. However, `VF_TOKEN` must be set on the first run for enrollment. After the agent writes its node token to disk, `VF_TOKEN` can be removed.
+{% endhint %}
+
+---
+
+## CLI flags
+
+The agent accepts two flags:
+
+| Flag | Description |
+|------|-------------|
+| `--version`, `-v` | Print the agent version and exit |
+| `--help`, `-h` | Show usage help including the environment variable reference |
+
+All runtime configuration is via environment variables -- there are no flags for server URL, token, etc.
+
+---
+
+## Agent communication protocol
+
+The agent communicates with the VectorFlow server over three HTTP endpoints. All requests use JSON. Authenticated requests include the node token as a Bearer token.
+
+### `POST /api/agent/enroll`
+
+Called once on first startup. No authentication required (the enrollment token is in the request body).
+
+**Request:**
+```json
+{
+ "token": "vf_enroll_abc123...",
+ "hostname": "web-server-01",
+ "os": "linux/amd64",
+ "agentVersion": "0.5.0",
+ "vectorVersion": "vector 0.41.1 (x86_64-unknown-linux-gnu)"
+}
+```
+
+**Response:**
+```json
+{
+ "nodeId": "clxyz789",
+ "nodeToken": "vfn_abc123...",
+ "environmentId": "clxyz456",
+ "environmentName": "Production"
+}
+```
+
+### `GET /api/agent/config`
+
+Called on every poll cycle. Returns all deployed pipeline configurations for this node's environment.
+
+**Headers:** `Authorization: Bearer `
+
+**Response:**
+```json
+{
+ "pipelines": [
+ {
+ "pipelineId": "clxyz001",
+ "pipelineName": "syslog-to-s3",
+ "version": 3,
+ "configYaml": "sources:\n syslog_in:\n type: syslog\n ...",
+ "checksum": "sha256:abc123...",
+ "logLevel": "info",
+ "secrets": {
+ "VF_SECRET_AWS_KEY": "AKIAIOSFODNN7EXAMPLE"
+ },
+ "certFiles": [
+ {
+ "name": "ca-cert",
+ "filename": "ca.pem",
+ "data": ""
+ }
+ ]
+ }
+ ],
+ "pollIntervalMs": 15000,
+ "secretBackend": "BUILTIN",
+ "sampleRequests": [],
+ "pendingAction": null
+}
+```
+
+Key fields:
+- **`secrets`**: Pre-resolved secret values with `VF_SECRET_` prefix. The agent injects these as environment variables into the Vector process.
+- **`certFiles`**: Certificate data written to `/certs/` before starting the pipeline.
+- **`checksum`**: Used to detect config changes without re-parsing YAML.
+- **`pendingAction`**: Server-initiated action (currently only `self_update`).
+
+### `POST /api/agent/heartbeat`
+
+Called after every poll. Sends status and metrics for all managed pipelines.
+
+**Headers:** `Authorization: Bearer `, `Content-Type: application/json`
+
+**Request:**
+```json
+{
+ "pipelines": [
+ {
+ "pipelineId": "clxyz001",
+ "version": 3,
+ "status": "RUNNING",
+ "pid": 12345,
+ "uptimeSeconds": 3600,
+ "eventsIn": 150000,
+ "eventsOut": 148500,
+ "bytesIn": 75000000,
+ "bytesOut": 72000000,
+ "errorsTotal": 12,
+ "componentMetrics": [
+ {
+ "componentId": "syslog_in",
+ "componentKind": "source",
+ "receivedEvents": 150000,
+ "sentEvents": 150000,
+ "receivedBytes": 75000000
+ }
+ ],
+ "recentLogs": ["2025-01-15T10:30:00Z INFO vector: Pipeline running"]
+ }
+ ],
+ "hostMetrics": {
+ "memoryTotalBytes": 8589934592,
+ "memoryUsedBytes": 4294967296,
+ "cpuSecondsTotal": 12345.67,
+ "loadAvg1": 1.5
+ },
+ "agentVersion": "0.5.0",
+ "vectorVersion": "vector 0.41.1",
+ "deploymentMode": "STANDALONE"
+}
+```
+
+---
+
+## Process supervision
+
+The agent manages Vector processes with full lifecycle control:
+
+- **Start**: Spawns `vector --config .yaml --config .yaml.vf-metrics.yaml`. The second config file is a sidecar that adds internal metrics, host metrics, a Prometheus exporter, and the Vector API.
+- **Stop**: Sends `SIGTERM`, waits up to 30 seconds for graceful shutdown, then sends `SIGKILL` if needed.
+- **Restart**: Stops the running process then starts a new one with the updated config.
+- **Crash recovery**: If a Vector process exits unexpectedly, the agent automatically restarts it with exponential backoff (1s, 2s, 4s, ... up to 60s).
+
+### Environment injection
+
+Each Vector process receives:
+- `VECTOR_LOG=` -- controls Vector's log verbosity
+- All resolved secrets as environment variables with `VF_SECRET_` prefix (e.g., `VF_SECRET_AWS_KEY=value`)
+
+### Metrics sidecar
+
+The agent automatically generates a sidecar config for each pipeline that adds:
+- `vf_internal_metrics` source (Vector internal metrics)
+- `vf_host_metrics` source (host system metrics)
+- `vf_metrics_exporter` sink (Prometheus exporter on a dynamic port)
+- Vector API enabled on `127.0.0.1:`
+
+The agent scrapes the Prometheus endpoint on each heartbeat to collect per-component and host metrics.
+
+---
+
+## Auto-update mechanism
+
+Standalone agents (not Docker) support in-place binary updates:
+
+1. An admin triggers an update from the VectorFlow UI, specifying a target version and download URL
+2. The server stores a `pendingAction` of type `self_update` on the node
+3. On the next poll, the agent receives the pending action
+4. The agent downloads the new binary to a temp file next to the current executable
+5. The SHA-256 checksum is verified against the expected value
+6. The temp file is atomically renamed over the current executable
+7. The agent re-executes itself via `syscall.Exec`, replacing the process in-place
+
+{% hint style="info" %}
+Docker agents ignore `self_update` actions. Update Docker agents by pulling a new image version instead.
+{% endhint %}
+
+---
+
+## Deployment mode detection
+
+The agent automatically detects whether it is running inside a container:
+
+- Checks for `/.dockerenv`
+- Inspects `/proc/1/cgroup` for `docker`, `containerd`, or `kubepods` entries
+
+The detected mode (`STANDALONE` or `DOCKER`) is reported in every heartbeat and displayed in the fleet UI.
+
+---
+
+## Data directory layout
+
+```
+/var/lib/vf-agent/ # VF_DATA_DIR
+ node-token # Persisted node credential (0600)
+ pipelines/
+ .yaml # Pipeline config from server (0600)
+ .yaml.vf-metrics.yaml # Auto-generated metrics sidecar
+ certs/
+ ca.pem # Certificate files (0600)
+ server.crt
+ server.key
+```
+
+---
+
+## Troubleshooting
+
+### Agent won't enroll
+
+| Symptom | Cause | Fix |
+|---------|-------|-----|
+| `config error: VF_URL is required` | `VF_URL` not set | Set the `VF_URL` environment variable |
+| `enrollment failed: ... connection refused` | Server unreachable | Verify `VF_URL` is correct and the server is running |
+| `enrollment failed: (status 401)` | Invalid enrollment token | Generate a new enrollment token in the VectorFlow UI |
+| `enrollment failed: (status 403)` | Token already used or revoked | Generate a new enrollment token |
+| `no node token found at ... and VF_TOKEN is not set` | First run without `VF_TOKEN` | Set `VF_TOKEN` to the enrollment token from the UI |
+
+### Agent shows offline
+
+| Symptom | Cause | Fix |
+|---------|-------|-----|
+| Node shows "Unreachable" in fleet UI | Agent not sending heartbeats | Check agent process is running, check network connectivity to server |
+| Heartbeat errors in agent logs | Network issue or server down | Check `VF_URL`, firewalls, and server health |
+| Agent enrolled but no heartbeats | Node token was revoked | Re-enroll by deleting `/node-token` and restarting with a new `VF_TOKEN` |
+
+### Pipeline won't start
+
+| Symptom | Cause | Fix |
+|---------|-------|-----|
+| `start vector for pipeline ...: exec: "vector": executable file not found` | Vector binary not on PATH | Install Vector or set `VF_VECTOR_BIN` to the full path |
+| Pipeline status shows CRASHED | Vector config error or runtime crash | Check the pipeline logs in the VectorFlow UI or agent stderr |
+| Pipeline stuck in STARTING | Vector process started but may have issues | Check agent logs at `debug` level (`VF_LOG_LEVEL=debug`) |
+
+### Diagnostic logging
+
+Enable debug logging to see all HTTP requests, poll results, and pipeline actions:
+
+```bash
+VF_LOG_LEVEL=debug vf-agent
+```
+
+This logs:
+- Every HTTP request and response status to the server
+- Poll results including the number of pipeline actions taken
+- Pipeline start/stop/restart events with PIDs
+- Heartbeat payloads including pipeline and metrics data
+- Certificate file writes and sample request processing
diff --git a/docs/public/reference/api.md b/docs/public/reference/api.md
new file mode 100644
index 0000000..d515b0f
--- /dev/null
+++ b/docs/public/reference/api.md
@@ -0,0 +1,474 @@
+# API Reference
+
+VectorFlow exposes its API via [tRPC](https://trpc.io/) -- a type-safe RPC framework built on HTTP. All API calls go through a single endpoint at `/api/trpc`, rather than traditional REST paths. This page documents every router, its procedures, and how to call them programmatically.
+
+## Calling convention
+
+tRPC uses a URL-based calling convention where the procedure name is encoded in the path:
+
+```
+# Query (read operation) β HTTP GET
+GET /api/trpc/.?input=
+
+# Mutation (write operation) β HTTP POST
+POST /api/trpc/.
+Content-Type: application/json
+
+{"json": }
+```
+
+Responses are JSON-wrapped:
+
+```json
+{
+ "result": {
+ "data": {
+ "json": { ... }
+ }
+ }
+}
+```
+
+VectorFlow uses [SuperJSON](https://github.com/blitz-js/superjson) as its serialization transformer, which means Date objects and BigInts are automatically serialized and deserialized. When calling the API with raw HTTP, you receive SuperJSON-encoded output -- dates appear as ISO strings with type annotations.
+
+### Example: list pipelines
+
+{% tabs %}
+{% tab title="curl" %}
+```bash
+# Query β GET with URL-encoded JSON input
+curl -s 'https://vectorflow.example.com/api/trpc/pipeline.list?input=%7B%22json%22%3A%7B%22environmentId%22%3A%22clxyz123%22%7D%7D' \
+ -H 'Cookie: authjs.session-token='
+```
+{% endtab %}
+{% tab title="fetch" %}
+```typescript
+const input = encodeURIComponent(
+ JSON.stringify({ json: { environmentId: "clxyz123" } })
+);
+
+const res = await fetch(
+ `https://vectorflow.example.com/api/trpc/pipeline.list?input=${input}`,
+ {
+ headers: { Cookie: `authjs.session-token=${sessionToken}` },
+ }
+);
+
+const { result } = await res.json();
+const pipelines = result.data.json;
+```
+{% endtab %}
+{% endtabs %}
+
+### Example: create a pipeline
+
+{% tabs %}
+{% tab title="curl" %}
+```bash
+# Mutation β POST with JSON body
+curl -s -X POST 'https://vectorflow.example.com/api/trpc/pipeline.create' \
+ -H 'Content-Type: application/json' \
+ -H 'Cookie: authjs.session-token=' \
+ -d '{"json": {"name": "syslog-to-s3", "environmentId": "clxyz123"}}'
+```
+{% endtab %}
+{% tab title="fetch" %}
+```typescript
+const res = await fetch(
+ "https://vectorflow.example.com/api/trpc/pipeline.create",
+ {
+ method: "POST",
+ headers: {
+ "Content-Type": "application/json",
+ Cookie: `authjs.session-token=${sessionToken}`,
+ },
+ body: JSON.stringify({
+ json: { name: "syslog-to-s3", environmentId: "clxyz123" },
+ }),
+ }
+);
+
+const { result } = await res.json();
+const pipeline = result.data.json;
+```
+{% endtab %}
+{% endtabs %}
+
+---
+
+## Authentication
+
+All API procedures (except the agent enrollment endpoint) require an authenticated session.
+
+VectorFlow uses [Auth.js](https://authjs.dev/) session cookies for authentication. When you sign in through the web UI, a `authjs.session-token` cookie is set. Include this cookie in API requests.
+
+{% hint style="info" %}
+There is no separate API key mechanism. If you need to call the API programmatically, sign in via the UI and extract the session cookie, or use the tRPC client with your session context.
+{% endhint %}
+
+### Roles
+
+Every procedure enforces a minimum role. VectorFlow has three roles, in ascending order of privilege:
+
+| Role | Level | Description |
+|------|-------|-------------|
+| `VIEWER` | 0 | Read-only access to pipelines, fleet, metrics, and logs |
+| `EDITOR` | 1 | Create, update, deploy, and delete pipelines, secrets, and alerts |
+| `ADMIN` | 2 | Manage environments, teams, members, enrollment tokens, and agent revocation |
+
+Some procedures require **Super Admin** access -- this is a server-wide flag on the user account, separate from team roles.
+
+---
+
+## Router index
+
+| Router | Prefix | Description |
+|--------|--------|-------------|
+| `pipeline` | `pipeline.*` | Pipeline CRUD, graph saving, versioning, deployment status, metrics, logs, event sampling |
+| `deploy` | `deploy.*` | Deploy preview, deploy to agents, undeploy |
+| `fleet` | `fleet.*` | Fleet node management, node logs, node metrics, agent updates |
+| `environment` | `environment.*` | Environment CRUD, enrollment tokens |
+| `alert` | `alert.*` | Alert rules, webhooks, alert events |
+| `template` | `template.*` | Pipeline template management |
+| `secret` | `secret.*` | Encrypted secret management |
+| `certificate` | `certificate.*` | TLS certificate management |
+| `dashboard` | `dashboard.*` | Dashboard statistics and chart data |
+| `team` | `team.*` | Team management, member roles |
+| `user` | `user.*` | User profile, password changes, TOTP setup |
+| `audit` | `audit.*` | Audit log queries |
+| `vrl` | `vrl.*` | VRL expression testing |
+| `vrlSnippet` | `vrlSnippet.*` | VRL snippet library |
+| `settings` | `settings.*` | System settings (Super Admin) |
+| `admin` | `admin.*` | Admin operations (Super Admin) |
+| `metrics` | `metrics.*` | Real-time metric streaming |
+| `validator` | `validator.*` | Pipeline config validation |
+
+---
+
+## Pipeline router
+
+Manage pipeline definitions, graphs, versions, and runtime data.
+
+| Procedure | Type | Min Role | Input | Description |
+|-----------|------|----------|-------|-------------|
+| `pipeline.list` | query | VIEWER | `{ environmentId: string }` | List all pipelines in an environment |
+| `pipeline.get` | query | VIEWER | `{ id: string }` | Get a pipeline with its nodes, edges, and config change status |
+| `pipeline.create` | mutation | EDITOR | `{ name: string, description?: string, environmentId: string }` | Create a new draft pipeline |
+| `pipeline.update` | mutation | EDITOR | `{ id: string, name?: string, description?: string \| null }` | Update pipeline name or description |
+| `pipeline.delete` | mutation | EDITOR | `{ id: string }` | Delete a pipeline (undeploys first if deployed) |
+| `pipeline.clone` | mutation | EDITOR | `{ pipelineId: string }` | Clone a pipeline within the same environment |
+| `pipeline.promote` | mutation | EDITOR | `{ pipelineId: string, targetEnvironmentId: string, name?: string }` | Copy a pipeline to a different environment (strips secrets) |
+| `pipeline.saveGraph` | mutation | EDITOR | `{ pipelineId: string, nodes: Node[], edges: Edge[], globalConfig?: object }` | Save the visual pipeline graph |
+| `pipeline.versions` | query | VIEWER | `{ pipelineId: string }` | List all deployed versions of a pipeline |
+| `pipeline.getVersion` | query | VIEWER | `{ versionId: string }` | Get a specific version with its config YAML |
+| `pipeline.createVersion` | mutation | EDITOR | `{ pipelineId: string, configYaml: string, changelog?: string }` | Create a new pipeline version |
+| `pipeline.rollback` | mutation | EDITOR | `{ pipelineId: string, targetVersionId: string }` | Roll back to a previous version |
+| `pipeline.deploymentStatus` | query | VIEWER | `{ pipelineId: string }` | Get per-node deployment status for a pipeline |
+| `pipeline.metrics` | query | VIEWER | `{ pipelineId: string, hours?: number }` | Get pipeline metrics (events, bytes, errors) over time |
+| `pipeline.logs` | query | VIEWER | `{ pipelineId: string, cursor?: string, limit?: number, levels?: LogLevel[], nodeId?: string, since?: Date }` | Paginated pipeline logs |
+| `pipeline.requestSamples` | mutation | EDITOR | `{ pipelineId: string, componentKeys: string[], limit?: number }` | Request live event samples from running components |
+| `pipeline.sampleResult` | query | VIEWER | `{ requestId: string }` | Poll for event sample results |
+| `pipeline.eventSchemas` | query | VIEWER | `{ pipelineId: string }` | Get discovered event schemas per component |
+
+
+Pipeline name validation
+
+Pipeline names must match the pattern `^[a-zA-Z0-9][a-zA-Z0-9 _-]*$` and be between 1 and 100 characters long. The name must start with a letter or number and may contain letters, numbers, spaces, hyphens, and underscores.
+
+
+
+Node schema
+
+Each node in the `saveGraph` input:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | `string?` | Optional ID (auto-generated if omitted) |
+| `componentKey` | `string` | Unique identifier within the pipeline (e.g., `my_syslog_source`). Must match `^[a-zA-Z_][a-zA-Z0-9_]*$` |
+| `componentType` | `string` | Vector component type (e.g., `syslog`, `remap`, `aws_s3`) |
+| `kind` | `"SOURCE" \| "TRANSFORM" \| "SINK"` | Component category |
+| `config` | `object` | Component configuration fields |
+| `positionX` | `number` | X coordinate in the visual editor |
+| `positionY` | `number` | Y coordinate in the visual editor |
+| `disabled` | `boolean` | Whether the node is excluded from the generated config |
+
+
+---
+
+## Deploy router
+
+Preview and execute pipeline deployments.
+
+| Procedure | Type | Min Role | Input | Description |
+|-----------|------|----------|-------|-------------|
+| `deploy.preview` | query | VIEWER | `{ pipelineId: string }` | Generate and validate the YAML config, return diff against deployed version |
+| `deploy.agent` | mutation | EDITOR | `{ pipelineId: string, changelog: string }` | Deploy a pipeline -- validates config, creates a version, marks as deployed |
+| `deploy.undeploy` | mutation | EDITOR | `{ pipelineId: string }` | Undeploy a pipeline (agents stop it on next poll) |
+| `deploy.environmentInfo` | query | VIEWER | `{ pipelineId: string }` | Get the environment and node list for a pipeline |
+
+---
+
+## Fleet router
+
+Manage agent nodes and view their status, logs, and metrics.
+
+| Procedure | Type | Min Role | Input | Description |
+|-----------|------|----------|-------|-------------|
+| `fleet.list` | query | VIEWER | `{ environmentId: string }` | List all nodes in an environment |
+| `fleet.get` | query | VIEWER | `{ id: string }` | Get a node with its pipeline statuses |
+| `fleet.create` | mutation | EDITOR | `{ name: string, host: string, apiPort?: number, environmentId: string }` | Register a node manually |
+| `fleet.update` | mutation | EDITOR | `{ id: string, name?: string }` | Update node name |
+| `fleet.delete` | mutation | EDITOR | `{ id: string }` | Delete a node |
+| `fleet.revokeNode` | mutation | ADMIN | `{ id: string }` | Revoke a node's token (prevents further communication) |
+| `fleet.nodeLogs` | query | VIEWER | `{ nodeId: string, cursor?: string, limit?: number, levels?: LogLevel[], pipelineId?: string }` | Paginated logs for a node |
+| `fleet.nodeMetrics` | query | VIEWER | `{ nodeId: string, hours?: number }` | System metrics for a node (CPU, memory, disk, network) |
+| `fleet.triggerAgentUpdate` | mutation | ADMIN | `{ nodeId: string, targetVersion: string, downloadUrl: string, checksum: string }` | Trigger a self-update on a standalone agent |
+| `fleet.listWithPipelineStatus` | query | VIEWER | `{ environmentId: string }` | List nodes with per-pipeline deployment status |
+
+---
+
+## Environment router
+
+Manage environments and enrollment tokens.
+
+| Procedure | Type | Min Role | Input | Description |
+|-----------|------|----------|-------|-------------|
+| `environment.list` | query | VIEWER | `{ teamId: string }` | List environments for a team |
+| `environment.get` | query | VIEWER | `{ id: string }` | Get environment details including node count |
+| `environment.create` | mutation | EDITOR | `{ name: string, teamId: string }` | Create a new environment |
+| `environment.update` | mutation | EDITOR | `{ id: string, name?: string, secretBackend?: string, secretBackendConfig?: any }` | Update environment name or secret backend |
+| `environment.delete` | mutation | ADMIN | `{ id: string }` | Delete an environment and all its pipelines and nodes |
+| `environment.generateEnrollmentToken` | mutation | ADMIN | `{ environmentId: string }` | Generate a new enrollment token for agent enrollment |
+| `environment.revokeEnrollmentToken` | mutation | ADMIN | `{ environmentId: string }` | Revoke the enrollment token |
+
+
+Secret backend options
+
+The `secretBackend` field accepts one of:
+
+| Value | Description |
+|-------|-------------|
+| `BUILTIN` | Secrets encrypted in the VectorFlow database (default) |
+| `VAULT` | HashiCorp Vault |
+| `AWS_SM` | AWS Secrets Manager |
+| `EXEC` | External command execution |
+
+
+---
+
+## Alert router
+
+Manage alert rules, webhook destinations, and view alert events.
+
+### Alert rules
+
+| Procedure | Type | Min Role | Input | Description |
+|-----------|------|----------|-------|-------------|
+| `alert.listRules` | query | VIEWER | `{ environmentId: string }` | List alert rules for an environment |
+| `alert.createRule` | mutation | EDITOR | `{ name: string, environmentId: string, pipelineId?: string, metric: AlertMetric, condition: AlertCondition, threshold: number, durationSeconds?: number, teamId: string }` | Create an alert rule |
+| `alert.updateRule` | mutation | EDITOR | `{ id: string, name?: string, enabled?: boolean, threshold?: number, durationSeconds?: number }` | Update an alert rule |
+| `alert.deleteRule` | mutation | EDITOR | `{ id: string }` | Delete an alert rule |
+
+
+AlertMetric values
+
+| Value | Description |
+|-------|-------------|
+| `node_unreachable` | Node has not sent a heartbeat |
+| `cpu_usage` | Host CPU utilization percentage |
+| `memory_usage` | Host memory utilization percentage |
+| `disk_usage` | Host disk utilization percentage |
+| `error_rate` | Pipeline error events per second |
+| `discarded_rate` | Pipeline discarded events per second |
+| `pipeline_crashed` | Pipeline process has crashed |
+
+
+
+AlertCondition values
+
+| Value | Description |
+|-------|-------------|
+| `gt` | Greater than threshold |
+| `lt` | Less than threshold |
+| `eq` | Equal to threshold |
+
+
+### Alert webhooks
+
+| Procedure | Type | Min Role | Input | Description |
+|-----------|------|----------|-------|-------------|
+| `alert.listWebhooks` | query | VIEWER | `{ environmentId: string }` | List webhook destinations |
+| `alert.createWebhook` | mutation | EDITOR | `{ environmentId: string, url: string, headers?: Record, hmacSecret?: string }` | Create a webhook |
+| `alert.updateWebhook` | mutation | EDITOR | `{ id: string, url?: string, headers?: Record \| null, hmacSecret?: string \| null, enabled?: boolean }` | Update a webhook |
+| `alert.deleteWebhook` | mutation | EDITOR | `{ id: string }` | Delete a webhook |
+| `alert.testWebhook` | mutation | EDITOR | `{ id: string }` | Send a test alert payload to a webhook |
+
+### Alert events
+
+| Procedure | Type | Min Role | Input | Description |
+|-----------|------|----------|-------|-------------|
+| `alert.listEvents` | query | VIEWER | `{ environmentId: string, limit?: number, cursor?: string }` | Paginated list of alert events |
+
+---
+
+## Template router
+
+Manage reusable pipeline templates.
+
+| Procedure | Type | Min Role | Input | Description |
+|-----------|------|----------|-------|-------------|
+| `template.list` | query | VIEWER | `{ teamId: string }` | List all templates for a team |
+| `template.get` | query | VIEWER | `{ id: string }` | Get a template with its nodes and edges |
+| `template.create` | mutation | EDITOR | `{ name: string, description: string, category: string, teamId: string, nodes: Node[], edges: Edge[] }` | Create a template |
+| `template.delete` | mutation | EDITOR | `{ id: string }` | Delete a template |
+
+---
+
+## Secret router
+
+Manage encrypted secrets for pipeline configurations.
+
+| Procedure | Type | Min Role | Input | Description |
+|-----------|------|----------|-------|-------------|
+| `secret.list` | query | VIEWER | `{ environmentId: string }` | List secrets (names only, no values) |
+| `secret.create` | mutation | EDITOR | `{ environmentId: string, name: string, value: string }` | Create a secret |
+| `secret.update` | mutation | EDITOR | `{ id: string, environmentId: string, value: string }` | Update a secret value |
+| `secret.delete` | mutation | EDITOR | `{ id: string, environmentId: string }` | Delete a secret |
+
+{% hint style="info" %}
+Secret values are never returned by the API. The `list` endpoint returns only names and timestamps. Values are encrypted at rest and only decrypted during pipeline deployment.
+{% endhint %}
+
+---
+
+## Certificate router
+
+Manage TLS certificates for pipeline components.
+
+| Procedure | Type | Min Role | Input | Description |
+|-----------|------|----------|-------|-------------|
+| `certificate.list` | query | VIEWER | `{ environmentId: string }` | List certificates (metadata only) |
+| `certificate.upload` | mutation | EDITOR | `{ environmentId: string, name: string, filename: string, fileType: "ca" \| "cert" \| "key", dataBase64: string }` | Upload a PEM-encoded certificate |
+| `certificate.delete` | mutation | EDITOR | `{ id: string, environmentId: string }` | Delete a certificate |
+
+---
+
+## Dashboard router
+
+Fetch dashboard statistics and chart data.
+
+| Procedure | Type | Min Role | Input | Description |
+|-----------|------|----------|-------|-------------|
+| `dashboard.stats` | query | VIEWER | `{ environmentId: string }` | Pipeline count, node count, fleet health, data reduction |
+| `dashboard.recentPipelines` | query | VIEWER | *(none)* | Last 5 recently updated pipelines |
+| `dashboard.recentAudit` | query | VIEWER | *(none)* | Last 10 audit log entries |
+| `dashboard.nodeCards` | query | VIEWER | *(none)* | Node overview cards with metrics and sparklines |
+| `dashboard.pipelineCards` | query | VIEWER | `{ environmentId: string }` | Pipeline cards with metrics, rates, and deployment status |
+| `dashboard.operationalOverview` | query | VIEWER | *(none)* | Unhealthy nodes, deployed pipelines, recent aggregate metrics |
+| `dashboard.chartMetrics` | query | VIEWER | `{ environmentId: string, nodeIds?: string[], pipelineIds?: string[], range?: "1h" \| "6h" \| "1d" \| "7d", groupBy?: "pipeline" \| "node" \| "aggregate" }` | Time-series chart data for dashboards |
+
+---
+
+## Team router
+
+Manage teams and team membership.
+
+| Procedure | Type | Min Role | Input | Description |
+|-----------|------|----------|-------|-------------|
+| `team.list` | query | VIEWER | *(none)* | List teams the current user belongs to |
+| `team.get` | query | VIEWER | `{ id: string }` | Get team details with members |
+| `team.myRole` | query | VIEWER | *(none)* | Get the current user's highest role |
+| `team.teamRole` | query | VIEWER | `{ teamId: string }` | Get the current user's role in a specific team |
+| `team.create` | mutation | Super Admin | `{ name: string }` | Create a new team |
+| `team.delete` | mutation | Super Admin | `{ teamId: string }` | Delete a team (must have no environments) |
+| `team.rename` | mutation | ADMIN | `{ teamId: string, name: string }` | Rename a team |
+| `team.addMember` | mutation | ADMIN | `{ teamId: string, email: string, role: "VIEWER" \| "EDITOR" \| "ADMIN" }` | Add a user to a team |
+| `team.removeMember` | mutation | ADMIN | `{ teamId: string, userId: string }` | Remove a member from a team |
+| `team.updateMemberRole` | mutation | ADMIN | `{ teamId: string, userId: string, role: "VIEWER" \| "EDITOR" \| "ADMIN" }` | Change a member's role |
+| `team.lockMember` | mutation | ADMIN | `{ teamId: string, userId: string }` | Lock a user account |
+| `team.unlockMember` | mutation | ADMIN | `{ teamId: string, userId: string }` | Unlock a user account |
+| `team.resetMemberPassword` | mutation | ADMIN | `{ teamId: string, userId: string }` | Reset a member's password (returns temporary password) |
+| `team.updateRequireTwoFactor` | mutation | ADMIN | `{ teamId: string, requireTwoFactor: boolean }` | Require 2FA for all team members |
+
+---
+
+## User router
+
+Manage the current user's profile and two-factor authentication.
+
+| Procedure | Type | Min Role | Input | Description |
+|-----------|------|----------|-------|-------------|
+| `user.me` | query | VIEWER | *(none)* | Get current user info (name, email, auth method, 2FA status) |
+| `user.changePassword` | mutation | VIEWER | `{ currentPassword: string, newPassword: string }` | Change password (min 8 characters) |
+| `user.updateProfile` | mutation | VIEWER | `{ name: string }` | Update display name |
+| `user.setupTotp` | mutation | VIEWER | *(none)* | Begin TOTP 2FA setup (returns QR URI and backup codes) |
+| `user.verifyAndEnableTotp` | mutation | VIEWER | `{ code: string }` | Verify a TOTP code and enable 2FA |
+| `user.disableTotp` | mutation | VIEWER | `{ code: string }` | Disable 2FA (requires valid TOTP or backup code) |
+
+---
+
+## Audit router
+
+Query the audit log.
+
+| Procedure | Type | Min Role | Input | Description |
+|-----------|------|----------|-------|-------------|
+| `audit.list` | query | VIEWER | `{ action?: string, userId?: string, entityType?: string, search?: string, teamId?: string, environmentId?: string, startDate?: string, endDate?: string, cursor?: string }` | Paginated, filterable audit log |
+| `audit.actions` | query | VIEWER | *(none)* | List distinct action values |
+| `audit.entityTypes` | query | VIEWER | *(none)* | List distinct entity type values |
+| `audit.users` | query | VIEWER | *(none)* | List users who appear in the audit log |
+
+---
+
+## VRL router
+
+Test VRL (Vector Remap Language) expressions.
+
+| Procedure | Type | Min Role | Input | Description |
+|-----------|------|----------|-------|-------------|
+| `vrl.test` | mutation | VIEWER | `{ source: string, input: string }` | Execute a VRL program against a test event and return the result |
+
+---
+
+## VRL Snippet router
+
+Manage the VRL snippet library.
+
+| Procedure | Type | Min Role | Input | Description |
+|-----------|------|----------|-------|-------------|
+| `vrlSnippet.list` | query | VIEWER | `{ teamId: string }` | List built-in and custom VRL snippets |
+| `vrlSnippet.create` | mutation | EDITOR | `{ teamId: string, name: string, description?: string, category: string, code: string }` | Create a custom snippet |
+| `vrlSnippet.update` | mutation | EDITOR | `{ id: string, name?: string, description?: string, category?: string, code?: string }` | Update a custom snippet |
+| `vrlSnippet.delete` | mutation | EDITOR | `{ id: string }` | Delete a custom snippet |
+
+---
+
+## Error handling
+
+tRPC errors are returned with a standard error shape:
+
+```json
+{
+ "error": {
+ "json": {
+ "message": "Pipeline not found",
+ "code": -32004,
+ "data": {
+ "code": "NOT_FOUND",
+ "httpStatus": 404
+ }
+ }
+ }
+}
+```
+
+Common error codes:
+
+| tRPC Code | HTTP Status | Meaning |
+|-----------|-------------|---------|
+| `UNAUTHORIZED` | 401 | Not signed in |
+| `FORBIDDEN` | 403 | Insufficient role or not a team member |
+| `NOT_FOUND` | 404 | Resource does not exist |
+| `BAD_REQUEST` | 400 | Invalid input |
+| `CONFLICT` | 409 | Resource already exists (duplicate name) |
+| `PRECONDITION_FAILED` | 412 | Operation requires a precondition (e.g., pipeline must be deployed) |
diff --git a/docs/public/reference/database.md b/docs/public/reference/database.md
new file mode 100644
index 0000000..1862ee3
--- /dev/null
+++ b/docs/public/reference/database.md
@@ -0,0 +1,292 @@
+# Database Schema
+
+{% hint style="info" %}
+This reference is for advanced self-hosters who need to understand the data model for backup planning, integrations, or troubleshooting. The schema is managed by Prisma migrations -- you do not need to create tables manually. Running `npx prisma migrate deploy` (or starting the Docker container) applies all pending migrations automatically.
+{% endhint %}
+
+VectorFlow uses **PostgreSQL** as its sole data store. All state -- pipeline definitions, fleet status, metrics, audit logs, secrets, and user accounts -- lives in the database.
+
+---
+
+## Entity relationship diagram
+
+```mermaid
+erDiagram
+ Team ||--o{ TeamMember : has
+ Team ||--o{ Environment : owns
+ Team ||--o{ Template : owns
+ Team ||--o{ AlertRule : owns
+ Team ||--o{ VrlSnippet : owns
+
+ User ||--o{ TeamMember : belongs_to
+ User ||--o{ AuditLog : creates
+ User ||--o{ VrlSnippet : creates
+
+ Environment ||--o{ VectorNode : contains
+ Environment ||--o{ Pipeline : contains
+ Environment ||--o{ Secret : stores
+ Environment ||--o{ Certificate : stores
+ Environment ||--o{ AlertRule : has
+ Environment ||--o{ AlertWebhook : has
+
+ Pipeline ||--o{ PipelineNode : has
+ Pipeline ||--o{ PipelineEdge : has
+ Pipeline ||--o{ PipelineVersion : tracks
+ Pipeline ||--o{ NodePipelineStatus : reports
+ Pipeline ||--o{ PipelineMetric : records
+ Pipeline ||--o{ PipelineLog : records
+ Pipeline ||--o{ AlertRule : monitors
+ Pipeline ||--o{ EventSampleRequest : requests
+ Pipeline ||--o{ EventSample : stores
+
+ VectorNode ||--o{ NodePipelineStatus : reports
+ VectorNode ||--o{ NodeMetric : records
+ VectorNode ||--o{ PipelineLog : emits
+ VectorNode ||--o{ AlertEvent : triggers
+
+ AlertRule ||--o{ AlertEvent : fires
+```
+
+---
+
+## Core entities
+
+| Table | Description |
+|-------|-------------|
+| `User` | User accounts with authentication credentials, TOTP secrets, and super admin flag |
+| `Team` | Organizational unit that groups environments, templates, and members |
+| `TeamMember` | Join table linking users to teams with a role (VIEWER, EDITOR, ADMIN) |
+| `Environment` | Logical grouping of nodes and pipelines (e.g., Production, Staging) |
+| `VectorNode` | An agent node registered in an environment |
+| `Pipeline` | A pipeline definition with its visual graph, deployment state, and global config |
+| `PipelineNode` | A single component (source, transform, or sink) within a pipeline graph |
+| `PipelineEdge` | A connection between two pipeline nodes |
+| `PipelineVersion` | An immutable snapshot of a pipeline's generated YAML config at deploy time |
+| `NodePipelineStatus` | Per-node runtime status for a deployed pipeline |
+| `PipelineMetric` | Time-series pipeline throughput data (events, bytes, errors) |
+| `NodeMetric` | Time-series host system metrics (CPU, memory, disk, network) |
+| `PipelineLog` | Log lines from pipeline processes, forwarded by agents |
+| `Secret` | Encrypted secret values scoped to an environment |
+| `Certificate` | Encrypted TLS certificate files scoped to an environment |
+| `Template` | Reusable pipeline template stored as JSON nodes/edges |
+| `AuditLog` | Immutable record of every significant action |
+| `SystemSettings` | Singleton row for global server configuration |
+| `AlertRule` | Alert condition definition (metric, threshold, duration) |
+| `AlertWebhook` | Webhook destination for alert notifications |
+| `AlertEvent` | Record of a fired or resolved alert |
+| `VrlSnippet` | Custom VRL code snippet in the team library |
+| `EventSampleRequest` | Request to sample live events from a running pipeline |
+| `EventSample` | Sampled event data and inferred schema for a pipeline component |
+| `Account` | OAuth/OIDC provider accounts linked to users |
+
+---
+
+## Key table details
+
+### Pipeline
+
+The central entity. Stores the pipeline definition and tracks deployment state.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | `String` (CUID) | Primary key |
+| `name` | `String` | Display name |
+| `description` | `String?` | Optional description |
+| `environmentId` | `String` | FK to Environment |
+| `globalConfig` | `Json?` | Global Vector config (API settings, enrichment tables, log level) |
+| `isDraft` | `Boolean` | `true` = not deployed, `false` = actively deployed |
+| `isSystem` | `Boolean` | `true` = system pipeline (audit log shipping) |
+| `deployedAt` | `DateTime?` | Timestamp of last deployment (null if never deployed) |
+| `createdById` | `String?` | FK to User who created the pipeline |
+| `updatedById` | `String?` | FK to User who last modified the pipeline |
+| `createdAt` | `DateTime` | Creation timestamp |
+| `updatedAt` | `DateTime` | Last modification timestamp |
+
+Relationships: `nodes`, `edges`, `versions`, `nodeStatuses`, `metrics`, `pipelineLogs`, `alertRules`, `sampleRequests`, `eventSamples`.
+
+### VectorNode
+
+Represents an enrolled agent node.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | `String` (CUID) | Primary key |
+| `name` | `String` | Display name (defaults to hostname at enrollment) |
+| `host` | `String` | Hostname or IP address |
+| `apiPort` | `Int` | Vector API port (default: 8686) |
+| `environmentId` | `String` | FK to Environment |
+| `status` | `NodeStatus` | Current health: `HEALTHY`, `DEGRADED`, `UNREACHABLE`, `UNKNOWN` |
+| `lastSeen` | `DateTime?` | Last time the server processed a heartbeat from this node |
+| `metadata` | `Json?` | Additional node metadata |
+| `nodeTokenHash` | `String?` | Hashed node authentication token (null = revoked) |
+| `enrolledAt` | `DateTime?` | When the node first enrolled |
+| `lastHeartbeat` | `DateTime?` | Timestamp of the last heartbeat |
+| `agentVersion` | `String?` | Reported agent binary version |
+| `vectorVersion` | `String?` | Reported Vector binary version |
+| `os` | `String?` | Operating system and architecture (e.g., `linux/amd64`) |
+| `deploymentMode` | `DeploymentMode` | `STANDALONE`, `DOCKER`, or `UNKNOWN` |
+| `pendingAction` | `Json?` | Server-initiated action (e.g., self-update command) |
+| `createdAt` | `DateTime` | Registration timestamp |
+
+### Environment
+
+Logical grouping that contains nodes, pipelines, secrets, and certificates.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | `String` (CUID) | Primary key |
+| `name` | `String` | Display name (e.g., "Production", "Staging") |
+| `isSystem` | `Boolean` | `true` = internal system environment (hidden from UI) |
+| `teamId` | `String?` | FK to Team (null for system environment) |
+| `enrollmentTokenHash` | `String?` | Hashed enrollment token for agent registration |
+| `enrollmentTokenHint` | `String?` | First few characters of the token for display |
+| `secretBackend` | `SecretBackend` | Secret storage: `BUILTIN`, `VAULT`, `AWS_SM`, `EXEC` |
+| `secretBackendConfig` | `Json?` | Configuration for external secret backends |
+| `createdAt` | `DateTime` | Creation timestamp |
+
+### PipelineVersion
+
+Immutable deployment snapshot. Created each time a pipeline is deployed.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | `String` (CUID) | Primary key |
+| `pipelineId` | `String` | FK to Pipeline |
+| `version` | `Int` | Auto-incrementing version number |
+| `configYaml` | `String` | The generated Vector YAML config |
+| `configToml` | `String?` | Optional TOML representation |
+| `logLevel` | `String?` | Vector log level at deploy time |
+| `globalConfig` | `Json?` | Global config snapshot |
+| `createdById` | `String` | FK to User who deployed |
+| `changelog` | `String?` | User-provided deploy message |
+| `createdAt` | `DateTime` | Deploy timestamp |
+
+### Secret
+
+Encrypted secrets scoped to an environment. Referenced in pipeline configs using `SECRET[name]` syntax.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | `String` (CUID) | Primary key |
+| `name` | `String` | Secret identifier (unique per environment) |
+| `encryptedValue` | `String` | AES-256-GCM encrypted value |
+| `environmentId` | `String` | FK to Environment |
+| `createdAt` | `DateTime` | Creation timestamp |
+| `updatedAt` | `DateTime` | Last update timestamp |
+
+### Certificate
+
+Encrypted TLS certificate files. Referenced in pipeline configs using `CERT[name]` syntax.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | `String` (CUID) | Primary key |
+| `name` | `String` | Certificate identifier (unique per environment) |
+| `filename` | `String` | Original filename (e.g., `ca.pem`) |
+| `fileType` | `String` | Type: `ca`, `cert`, or `key` |
+| `encryptedData` | `String` | AES-256-GCM encrypted PEM content |
+| `environmentId` | `String` | FK to Environment |
+| `createdAt` | `DateTime` | Upload timestamp |
+
+### AuditLog
+
+Immutable audit trail of all significant actions.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | `String` (CUID) | Primary key |
+| `userId` | `String?` | FK to User (null for system actions) |
+| `action` | `String` | Action identifier (e.g., `pipeline.created`, `deploy.agent`) |
+| `entityType` | `String` | Target entity type (e.g., `Pipeline`, `Environment`) |
+| `entityId` | `String` | ID of the affected entity |
+| `diff` | `Json?` | Before/after field changes |
+| `metadata` | `Json?` | Additional context |
+| `ipAddress` | `String?` | Client IP address |
+| `userEmail` | `String?` | Denormalized email for display |
+| `userName` | `String?` | Denormalized name for display |
+| `teamId` | `String?` | Owning team |
+| `environmentId` | `String?` | Owning environment |
+| `createdAt` | `DateTime` | Timestamp |
+
+---
+
+## Enums
+
+| Enum | Values | Description |
+|------|--------|-------------|
+| `Role` | `VIEWER`, `EDITOR`, `ADMIN` | Team membership role |
+| `AuthMethod` | `LOCAL`, `OIDC` | User authentication method |
+| `NodeStatus` | `HEALTHY`, `DEGRADED`, `UNREACHABLE`, `UNKNOWN` | Agent node health |
+| `DeploymentMode` | `STANDALONE`, `DOCKER`, `UNKNOWN` | How the agent is deployed |
+| `ComponentKind` | `SOURCE`, `TRANSFORM`, `SINK` | Pipeline node category |
+| `ProcessStatus` | `RUNNING`, `STARTING`, `STOPPED`, `CRASHED`, `PENDING` | Pipeline process state |
+| `LogLevel` | `TRACE`, `DEBUG`, `INFO`, `WARN`, `ERROR` | Log severity |
+| `SecretBackend` | `BUILTIN`, `VAULT`, `AWS_SM`, `EXEC` | Secret storage provider |
+| `AlertMetric` | `node_unreachable`, `cpu_usage`, `memory_usage`, `disk_usage`, `error_rate`, `discarded_rate`, `pipeline_crashed` | Metric to evaluate |
+| `AlertCondition` | `gt`, `lt`, `eq` | Comparison operator |
+| `AlertStatus` | `firing`, `resolved` | Alert event state |
+
+---
+
+## Encryption at rest
+
+Sensitive fields are encrypted using AES-256-GCM before being stored in the database:
+
+- **Secret values** (`Secret.encryptedValue`) -- pipeline credentials, API keys, passwords
+- **Certificate data** (`Certificate.encryptedData`) -- TLS certificates and private keys
+- **TOTP secrets** (`User.totpSecret`) -- two-factor authentication secrets
+- **TOTP backup codes** (`User.totpBackupCodes`) -- recovery codes
+- **Password hashes** (`User.passwordHash`) -- bcrypt-hashed, not AES-encrypted
+
+The encryption key is derived from the `NEXTAUTH_SECRET` environment variable. Losing this value means encrypted data cannot be recovered.
+
+---
+
+## Indexes
+
+Key database indexes for query performance:
+
+| Table | Index | Purpose |
+|-------|-------|---------|
+| `PipelineMetric` | `(pipelineId, timestamp)` | Time-range queries for pipeline charts |
+| `PipelineMetric` | `(timestamp)` | Retention cleanup |
+| `NodeMetric` | `(nodeId, timestamp)` | Time-range queries for node charts |
+| `PipelineLog` | `(pipelineId, timestamp)` | Pipeline log pagination |
+| `PipelineLog` | `(nodeId, timestamp)` | Node log pagination |
+| `AuditLog` | `(entityType, entityId)` | Entity-specific audit queries |
+| `AuditLog` | `(userId)` | User activity queries |
+| `AuditLog` | `(createdAt)` | Time-range audit queries |
+| `AlertRule` | `(environmentId)` | Environment-scoped alert listing |
+| `AlertEvent` | `(alertRuleId)` | Alert event history |
+| `AlertEvent` | `(firedAt)` | Time-range alert queries |
+
+---
+
+## Data retention
+
+VectorFlow automatically prunes time-series data based on system settings:
+
+| Data Type | Default Retention | Setting |
+|-----------|------------------|---------|
+| Pipeline metrics | 7 days | `metricsRetentionDays` |
+| Pipeline logs | 3 days | `logsRetentionDays` |
+| Node metrics | 7 days | `metricsRetentionDays` |
+| Audit logs | Indefinite | Not automatically pruned |
+| Alert events | Indefinite | Not automatically pruned |
+
+These values are configured in the `SystemSettings` table via the admin UI.
+
+---
+
+## Backup considerations
+
+{% hint style="warning" %}
+The database is the single source of truth for all VectorFlow state. Losing the database without a backup means losing all pipeline definitions, deployment history, secrets, and audit logs.
+{% endhint %}
+
+For backup and restore procedures, see [Backup & Restore](../operations/backup-restore.md).
+
+Key points:
+- Use `pg_dump` for logical backups or continuous archiving for point-in-time recovery
+- The `NEXTAUTH_SECRET` environment variable must match between backup and restore -- it is the encryption key for secrets and certificates
+- VectorFlow has built-in scheduled backup support (configured via `SystemSettings`)
diff --git a/docs/public/reference/pipeline-yaml.md b/docs/public/reference/pipeline-yaml.md
new file mode 100644
index 0000000..1854c0a
--- /dev/null
+++ b/docs/public/reference/pipeline-yaml.md
@@ -0,0 +1,280 @@
+# Pipeline YAML
+
+{% hint style="info" %}
+Users typically do not write pipeline YAML directly -- the visual editor generates it. This reference is for understanding the generated output, debugging deployment issues, and advanced use cases like importing existing Vector configs.
+{% endhint %}
+
+VectorFlow pipelines are ultimately Vector configuration files. The visual pipeline editor translates your node graph into standard [Vector YAML configuration](https://vector.dev/docs/reference/configuration/) that Vector can execute directly.
+
+---
+
+## How YAML is generated
+
+The pipeline YAML generation follows this flow:
+
+```
+ Visual Editor Server Agent
+ βββββββββββββ ββββββββββββββββ ββββββββββββββββ
+ β Drag nodesβ β Save graph β β Receive YAML β
+ β Connect βββββΆβ to database β β via config β
+ β edges β β β β endpoint β
+ βββββββββββββ β On deploy: β β β
+ β 1. Generate β poll β Write to diskβ
+ β YAML ββββββββββΆ β Start Vector β
+ β 2. Validate β β process β
+ β 3. Version β ββββββββββββββββ
+ ββββββββββββββββ
+```
+
+1. **Graph saved**: The visual editor saves pipeline nodes (components) and edges (connections) to the database
+2. **YAML generated**: At deploy time, the server converts the graph into Vector YAML
+3. **Validation**: The generated YAML is validated using `vector validate --no-environment`
+4. **Versioning**: A new `PipelineVersion` record stores the YAML and a version number
+5. **Distribution**: Agents poll the server and receive the YAML config for all deployed pipelines
+6. **Execution**: The agent writes the YAML to disk and starts a Vector process with it
+
+---
+
+## YAML structure
+
+The generated YAML follows Vector's standard configuration format with three top-level sections:
+
+```yaml
+sources:
+ :
+ type:
+ # ... source-specific fields
+
+transforms:
+ :
+ type:
+ inputs:
+ -
+ # ... transform-specific fields
+
+sinks:
+ :
+ type:
+ inputs:
+ -
+ # ... sink-specific fields
+```
+
+### Component keys
+
+Each node in the visual editor has a **component key** -- a unique identifier within the pipeline. Component keys must:
+- Start with a letter or underscore
+- Contain only letters, numbers, and underscores
+- Be between 1 and 128 characters
+
+These keys become the YAML block names under `sources`, `transforms`, or `sinks`.
+
+### Connections via `inputs`
+
+Vector uses an `inputs` field to define data flow. When you draw an edge from node A to node B in the visual editor, the generated YAML adds A's component key to B's `inputs` array.
+
+Sources never have `inputs` -- they are the entry points. Transforms and sinks always have at least one input.
+
+---
+
+## Complete example
+
+A pipeline that receives syslog, parses and enriches the events with VRL, then sends to S3:
+
+**Visual editor graph:**
+```
+[syslog_in] βββΆ [parse_logs] βββΆ [s3_output]
+ (source) (transform) (sink)
+```
+
+**Generated YAML:**
+```yaml
+sources:
+ syslog_in:
+ type: syslog
+ address: 0.0.0.0:514
+ mode: udp
+
+transforms:
+ parse_logs:
+ type: remap
+ inputs:
+ - syslog_in
+ source: |
+ .environment = "production"
+ .processed_at = now()
+ del(.source_type)
+
+sinks:
+ s3_output:
+ type: aws_s3
+ inputs:
+ - parse_logs
+ bucket: my-log-bucket
+ region: us-east-1
+ key_prefix: "logs/{{ .environment }}/%Y/%m/%d/"
+ encoding:
+ codec: json
+ auth:
+ access_key_id: "${VF_SECRET_AWS_ACCESS_KEY}"
+ secret_access_key: "${VF_SECRET_AWS_SECRET_KEY}"
+```
+
+---
+
+## Global configuration
+
+Pipelines can include global Vector configuration sections beyond sources, transforms, and sinks. These are set via the pipeline's `globalConfig` field and appear at the top level of the generated YAML.
+
+Common global config sections:
+
+- **`api`**: Enable the Vector API and GraphQL playground
+- **`enrichment_tables`**: Define lookup tables for enrichment
+
+The `log_level` key in `globalConfig` is handled specially -- it is **not** included in the generated YAML. Instead, it is passed to the Vector process as the `VECTOR_LOG` environment variable.
+
+---
+
+## Disabled nodes
+
+Nodes marked as **disabled** in the visual editor are excluded from the generated YAML entirely. Their edges are also removed. This lets you temporarily disable a component without deleting it from the graph.
+
+---
+
+## Secret references
+
+Secrets stored in VectorFlow can be referenced in pipeline component configurations. When you use a secret in a node's config, the reference is resolved at deploy time.
+
+### How it works
+
+1. Create a secret in the environment's secret store (e.g., name: `AWS_ACCESS_KEY`)
+2. Reference it in a component config field using the `SECRET[name]` syntax in the visual editor
+3. At deploy time, the server resolves all secret references
+4. The agent receives the resolved values in the config response and injects them as environment variables
+5. In the generated YAML, secrets appear as `${VF_SECRET_AWS_ACCESS_KEY}` -- standard Vector environment variable interpolation (all secrets use the `VF_SECRET_` prefix)
+
+### Example
+
+A sink configured with secret references:
+
+```yaml
+sinks:
+ elasticsearch:
+ type: elasticsearch
+ inputs:
+ - transform_logs
+ endpoints:
+ - "https://es.example.com:9200"
+ auth:
+ strategy: basic
+ user: "${VF_SECRET_ES_USER}"
+ password: "${VF_SECRET_ES_PASSWORD}"
+```
+
+The agent injects environment variables `VF_SECRET_ES_USER` and `VF_SECRET_ES_PASSWORD` with the decrypted values when starting the Vector process.
+
+---
+
+## Certificate references
+
+TLS certificates uploaded to VectorFlow are referenced using `CERT[name]` syntax. At deploy time:
+
+1. Certificate data is sent to the agent in the config response (base64-encoded)
+2. The agent writes cert files to `/certs/`
+3. The config YAML references the local file path
+
+### Example
+
+```yaml
+sinks:
+ kafka_out:
+ type: kafka
+ inputs:
+ - parse_logs
+ bootstrap_servers: "kafka.example.com:9093"
+ topic: logs
+ tls:
+ ca_file: "/var/lib/vf-agent/certs/ca.pem"
+ crt_file: "/var/lib/vf-agent/certs/client.crt"
+ key_file: "/var/lib/vf-agent/certs/client.key"
+```
+
+---
+
+## Validation
+
+Before any deployment, VectorFlow validates the generated YAML using the Vector binary:
+
+```bash
+vector validate --no-environment
+```
+
+The `--no-environment` flag skips environment variable validation (since secrets are resolved at runtime by the agent, not at validation time).
+
+### Validation results
+
+The validation returns:
+- **Valid**: The config is syntactically correct and all component types are recognized
+- **Errors**: Specific error messages, often with the affected component key identified
+- **Warnings**: Deprecation notices or non-fatal issues
+
+If validation fails, the deployment is blocked and errors are displayed in the UI.
+
+---
+
+## Metrics sidecar
+
+When the agent starts a pipeline, it automatically appends a **metrics sidecar config** as a second `--config` argument. This sidecar adds instrumentation without modifying the user's pipeline YAML:
+
+```yaml
+# Auto-generated by the VectorFlow agent
+api:
+ enabled: true
+ address: "127.0.0.1:"
+
+sources:
+ vf_internal_metrics:
+ type: internal_metrics
+ vf_host_metrics:
+ type: host_metrics
+
+sinks:
+ vf_metrics_exporter:
+ type: prometheus_exporter
+ inputs:
+ - vf_internal_metrics
+ - vf_host_metrics
+ address: "127.0.0.1:"
+```
+
+Vector merges both config files, so the pipeline's sources/transforms/sinks coexist with the metrics instrumentation. The `vf_` prefix prevents key collisions.
+
+---
+
+## Version history
+
+Every deployment creates an immutable `PipelineVersion` record containing:
+
+| Field | Description |
+|-------|-------------|
+| `version` | Auto-incrementing integer (1, 2, 3, ...) |
+| `configYaml` | The exact YAML that was deployed |
+| `logLevel` | The Vector log level at deploy time |
+| `changelog` | User-provided deploy message |
+| `createdById` | Who triggered the deployment |
+| `createdAt` | Deployment timestamp |
+
+You can view previous versions in the pipeline detail page and roll back to any prior version. Rolling back creates a **new** version with the old config, preserving the full history.
+
+---
+
+## Importing existing configs
+
+VectorFlow supports importing existing Vector YAML or TOML configurations into the visual editor. The importer:
+
+1. Parses the config file
+2. Creates nodes for each source, transform, and sink
+3. Creates edges based on `inputs` fields
+4. Auto-positions nodes in a left-to-right layout
+
+This is useful for migrating existing Vector deployments to VectorFlow's managed model.
diff --git a/docs/public/screenshots/dashboard.png b/docs/public/screenshots/dashboard.png
new file mode 100644
index 0000000..719f09f
Binary files /dev/null and b/docs/public/screenshots/dashboard.png differ
diff --git a/docs/public/screenshots/environments.png b/docs/public/screenshots/environments.png
new file mode 100644
index 0000000..a0a89a4
Binary files /dev/null and b/docs/public/screenshots/environments.png differ
diff --git a/docs/public/screenshots/fleet.png b/docs/public/screenshots/fleet.png
new file mode 100644
index 0000000..4640fce
Binary files /dev/null and b/docs/public/screenshots/fleet.png differ
diff --git a/docs/public/screenshots/login.png b/docs/public/screenshots/login.png
new file mode 100644
index 0000000..a4e6e87
Binary files /dev/null and b/docs/public/screenshots/login.png differ
diff --git a/docs/public/screenshots/node-details.png b/docs/public/screenshots/node-details.png
new file mode 100644
index 0000000..b65297a
Binary files /dev/null and b/docs/public/screenshots/node-details.png differ
diff --git a/docs/public/screenshots/pipeline-editor.png b/docs/public/screenshots/pipeline-editor.png
new file mode 100644
index 0000000..59b6177
Binary files /dev/null and b/docs/public/screenshots/pipeline-editor.png differ
diff --git a/docs/public/screenshots/pipelines.png b/docs/public/screenshots/pipelines.png
new file mode 100644
index 0000000..aa73689
Binary files /dev/null and b/docs/public/screenshots/pipelines.png differ
diff --git a/docs/public/user-guide/alerts.md b/docs/public/user-guide/alerts.md
new file mode 100644
index 0000000..1f34451
--- /dev/null
+++ b/docs/public/user-guide/alerts.md
@@ -0,0 +1,164 @@
+# Alerts
+
+The **Alerts** page lets you configure rules that monitor your pipelines and nodes, receive notifications when something needs attention, and review a history of past alert events. Alerts are scoped to the currently selected environment.
+
+## Overview
+
+The Alerts page is organized into three sections:
+
+- **Alert Rules** -- Define the conditions that trigger alerts.
+- **Webhooks** -- Configure HTTP endpoints that receive notifications when alerts fire or resolve.
+- **Alert History** -- Browse a chronological log of all alert events.
+
+## Alert rules
+
+An alert rule defines a metric to watch, a condition to evaluate, and how long the condition must persist before the alert fires.
+
+### Creating an alert rule
+
+{% stepper %}
+{% step %}
+### Open the Alerts page
+Select an environment from the header, then navigate to **Alerts** in the sidebar.
+{% endstep %}
+{% step %}
+### Click Add Rule
+Click the **Add Rule** button in the Alert Rules section.
+{% endstep %}
+{% step %}
+### Configure the rule
+Fill in the rule form:
+
+- **Name** -- A descriptive label (e.g., "High CPU on prod nodes").
+- **Pipeline** (optional) -- Scope the rule to a specific pipeline, or leave as "All pipelines" for environment-wide monitoring.
+- **Metric** -- The metric to evaluate (see supported metrics below).
+- **Threshold** -- The numeric value that triggers the alert (not required for binary metrics).
+- **Duration** -- How many seconds the condition must persist before firing. Defaults to 60 seconds.
+{% endstep %}
+{% step %}
+### Save
+Click **Create Rule**. The rule is enabled by default and begins evaluating on the next agent heartbeat.
+{% endstep %}
+{% endstepper %}
+
+### Supported metrics
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| **CPU Usage** | Percentage | CPU utilization derived from cumulative CPU seconds. |
+| **Memory Usage** | Percentage | Memory used as a percentage of total memory. |
+| **Disk Usage** | Percentage | Filesystem used as a percentage of total disk space. |
+| **Error Rate** | Percentage | Errors as a percentage of total events ingested. |
+| **Discarded Rate** | Percentage | Discarded events as a percentage of total events ingested. |
+| **Node Unreachable** | Binary | Fires when a node stops sending heartbeats. |
+| **Pipeline Crashed** | Binary | Fires when a pipeline enters the crashed state. |
+
+Percentage-based metrics use the conditions **>** (greater than), **<** (less than), or **=** (equals) against a threshold value. Binary metrics (Node Unreachable, Pipeline Crashed) fire automatically when the condition is detected -- no threshold is needed.
+
+### Condition evaluation
+
+Alert rules are evaluated during each agent heartbeat cycle. The evaluation logic works as follows:
+
+1. The metric value is read from the latest node data.
+2. If the value meets the condition (e.g., CPU > 80), a timer starts.
+3. If the condition persists for the configured **duration** (in seconds), the alert fires and an event is created.
+4. If the condition clears before the duration elapses, the timer resets.
+5. When a firing alert's condition clears, the alert automatically resolves.
+
+{% hint style="info" %}
+The duration setting prevents transient spikes from triggering alerts. A 60-second duration means the condition must hold for a full minute before an alert fires.
+{% endhint %}
+
+### Managing rules
+
+- **Enable / Disable** -- Toggle the switch in the rules table to enable or disable a rule without deleting it.
+- **Edit** -- Click the pencil icon to update the rule name, threshold, or duration.
+- **Delete** -- Click the trash icon to permanently remove the rule and stop future evaluations.
+
+## Webhooks
+
+Webhooks deliver alert notifications to external systems via HTTP POST requests. When an alert fires or resolves, VectorFlow sends a JSON payload to all enabled webhooks in the environment.
+
+### Adding a webhook
+
+{% stepper %}
+{% step %}
+### Click Add Webhook
+In the Webhooks section, click **Add Webhook**.
+{% endstep %}
+{% step %}
+### Configure the endpoint
+- **URL** -- The HTTPS endpoint that will receive alert payloads.
+- **Headers** (optional) -- A JSON object of custom headers to include with each request (e.g., `{"Authorization": "Bearer token"}`).
+- **HMAC Secret** (optional) -- If set, each request includes an `X-VectorFlow-Signature` header containing a SHA-256 HMAC of the request body. Use this to verify that payloads originate from VectorFlow.
+{% endstep %}
+{% step %}
+### Test the webhook
+After creating the webhook, click the **send** icon in the webhooks table to deliver a test payload. VectorFlow reports the HTTP status code so you can confirm your endpoint is reachable.
+{% endstep %}
+{% endstepper %}
+
+{% hint style="warning" %}
+Make sure to test your webhook endpoint after creating it. A misconfigured URL or authentication header will silently drop alert notifications.
+{% endhint %}
+
+### Webhook payload
+
+Each webhook delivery sends a JSON POST body with the following fields:
+
+```json
+{
+ "alertId": "evt_abc123",
+ "status": "firing",
+ "ruleName": "High CPU Usage",
+ "severity": "warning",
+ "environment": "Production",
+ "team": "Platform",
+ "node": "node-01.example.com",
+ "metric": "cpu_usage",
+ "value": 85.5,
+ "threshold": 80,
+ "message": "CPU usage is 85.50 (threshold: > 80)",
+ "timestamp": "2026-03-06T12:00:00.000Z",
+ "dashboardUrl": "https://vectorflow.example.com/alerts",
+ "content": "**Alert FIRING: High CPU Usage**\n> CPU usage is 85.50 ..."
+}
+```
+
+The `content` field contains a pre-formatted, human-readable summary suitable for chat platforms like Slack or Discord. Generic consumers can ignore it and use the structured fields instead.
+
+### Webhook security
+
+- **HMAC signing** -- When an HMAC secret is configured, VectorFlow computes `sha256=` over the raw JSON body and includes it in the `X-VectorFlow-Signature` header. Verify this on your server to ensure payload authenticity.
+- **SSRF protection** -- VectorFlow validates that webhook URLs resolve to public IP addresses. Private and reserved IP ranges are blocked.
+- **Timeout** -- Webhook deliveries time out after 10 seconds.
+
+### Managing webhooks
+
+- **Enable / Disable** -- Toggle the switch to pause or resume deliveries without deleting the webhook.
+- **Edit** -- Click the pencil icon to update the URL, headers, or HMAC secret.
+- **Test** -- Click the send icon to deliver a test payload.
+- **Delete** -- Click the trash icon to permanently remove the webhook.
+
+## Alert history
+
+The **Alert History** section shows a chronological list of all alert events in the current environment. Each row displays:
+
+| Column | Description |
+|--------|-------------|
+| **Timestamp** | When the alert fired. |
+| **Rule Name** | The alert rule that triggered. |
+| **Node** | The node where the condition was detected. |
+| **Pipeline** | The pipeline associated with the rule (or "-" for environment-wide rules). |
+| **Status** | **Firing** (red) or **Resolved** (green). |
+| **Value** | The metric value at the time the alert was evaluated. |
+| **Message** | A human-readable summary of the condition. |
+
+Click **Load more** at the bottom of the table to fetch older events. Events are ordered newest-first.
+
+## Alert states
+
+An alert event transitions through two states:
+
+- **Firing** -- The rule's condition has been met for the required duration. The alert is active and webhook notifications have been sent.
+- **Resolved** -- The condition is no longer met. The alert closes automatically and a resolution notification is sent to all enabled webhooks.
diff --git a/docs/public/user-guide/dashboard.md b/docs/public/user-guide/dashboard.md
new file mode 100644
index 0000000..8833ec2
--- /dev/null
+++ b/docs/public/user-guide/dashboard.md
@@ -0,0 +1,82 @@
+# Dashboard
+
+The dashboard is the landing page you see after logging in. It gives you a high-level view of your observability pipeline health across the selected environment, with real-time metrics, fleet status, and interactive charts.
+
+
+
+## KPI summary cards
+
+The top of the dashboard displays five summary cards that provide an at-a-glance overview of your environment.
+
+| Card | What it shows |
+|------|--------------|
+| **Total Nodes** | The number of Vector agent nodes registered in the current environment. |
+| **Node Health** | A breakdown of node statuses -- **Healthy**, **Degraded**, or **Unreachable** -- so you can spot issues quickly. |
+| **Pipelines** | The total number of deployed (non-draft) pipelines in the environment. |
+| **Pipeline Status** | Counts of pipelines by runtime state: **Running**, **Stopped**, or **Crashed**. |
+| **Log Reduction** | The percentage of data volume reduced by your transforms (see below). |
+
+## Log reduction percentage
+
+The **Log Reduction** card shows how effectively your pipelines are filtering and transforming data before it reaches your sinks. The formula is:
+
+```
+reduction % = (1 - eventsOut / eventsIn) * 100
+```
+
+- A **higher percentage** means more data is being filtered, sampled, or deduplicated by your transforms before reaching downstream destinations.
+- The value is **clamped to 0% minimum** -- if your pipeline produces more events than it receives (e.g. through event splitting), the card shows 0% rather than a negative number.
+- The card also displays the raw events-per-second rates for both input and output so you can see absolute throughput.
+- Color coding provides quick visual feedback: green for reductions above 50%, amber for 10--50%, and neutral for lower values.
+
+{% hint style="info" %}
+Reduction metrics are calculated from the last hour of aggregated pipeline data. If no traffic has flowed recently the card will display a dash.
+{% endhint %}
+
+## Metrics filter bar
+
+Below the summary cards, a filter bar lets you narrow down the charts displayed on the dashboard.
+
+- **Time range** -- Choose from **1 hour**, **6 hours**, **1 day**, or **7 days**. The selected window controls both the data shown in the charts and the automatic refresh interval.
+- **Pipeline filter** -- Select one or more pipelines to focus on. When no pipelines are selected, all deployed pipelines in the environment are shown.
+- **Node filter** -- Select specific agent nodes. Filtering by node automatically restricts the pipeline list to pipelines running on those nodes, and vice versa.
+- **Group by** -- Choose how chart series are broken down:
+ - **Pipeline** -- one series per pipeline (default)
+ - **Node** -- one series per agent node
+ - **Aggregate** -- a single "Total" series combining all pipelines and nodes
+
+## Pipeline metrics charts
+
+The **Pipeline Metrics** section includes three charts:
+
+- **Events In/Out per Second** -- Shows the rate of events entering your sources and leaving your sinks. Comparing the two lines reveals how much data your transforms are reducing.
+- **Bytes In/Out per Second** -- The same comparison in bytes, useful for understanding bandwidth and storage impact.
+- **Errors & Discarded** -- An area chart showing error rates and discarded event rates. A spike here may indicate a misconfigured transform or an unreachable sink.
+
+## System metrics charts
+
+The **System Metrics** section shows resource utilization for the agent nodes in your environment:
+
+- **CPU Usage** -- Percentage of CPU consumed by each Vector agent. Capped at 100%.
+- **Memory Usage** -- Percentage of available memory used by each agent.
+- **Disk I/O** -- Read and write throughput in bytes per second.
+- **Network I/O** -- Receive (Rx) and transmit (Tx) throughput in bytes per second.
+
+When **Group by** is set to **Aggregate**, CPU and memory are averaged across nodes while disk and network rates are summed.
+
+{% hint style="warning" %}
+System metrics require the VectorFlow agent to be reporting node-level telemetry. If a node is unreachable, its metrics will stop updating until connectivity is restored.
+{% endhint %}
+
+## Auto-refresh
+
+Dashboard data refreshes automatically based on the selected time range:
+
+| Time range | Refresh interval |
+|-----------|-----------------|
+| 1 hour | 15 seconds |
+| 6 hours | 60 seconds |
+| 1 day | 60 seconds |
+| 7 days | 5 minutes |
+
+Pipeline status cards also poll every 15 seconds regardless of the selected time range, so you will see status changes (Running, Stopped, Crashed) promptly.
diff --git a/docs/public/user-guide/environments.md b/docs/public/user-guide/environments.md
new file mode 100644
index 0000000..e32ed8b
--- /dev/null
+++ b/docs/public/user-guide/environments.md
@@ -0,0 +1,95 @@
+# Environments
+
+Environments are isolated deployment contexts that let you separate your pipelines, agents, and secrets across lifecycle stages such as **development**, **staging**, and **production**. Each environment maintains its own independent set of resources, so changes in one environment never affect another.
+
+
+
+## Why use environments?
+
+- **Isolation** -- Keep experimental pipeline changes out of production.
+- **Separate secrets** -- The same secret name (e.g., `API_KEY`) can hold different values in each environment.
+- **Independent fleets** -- Agent nodes enroll into a specific environment and only run pipelines assigned to that environment.
+- **Promotion workflow** -- Build and test a pipeline in dev, then promote it to staging and production when ready.
+
+## Environment selector
+
+The environment selector is the dropdown in the header bar. Switching it changes the global context for the entire application -- the pipeline list, fleet view, alerts, and all other pages update to show only resources belonging to the selected environment.
+
+{% hint style="info" %}
+When you switch environments, the pipeline list, fleet view, and alerts page update to show only resources for that environment. Your selection is persisted across sessions.
+{% endhint %}
+
+## Creating an environment
+
+{% stepper %}
+{% step %}
+### Open the Environments page
+Navigate to **Environments** in the sidebar and click **New Environment**.
+{% endstep %}
+{% step %}
+### Enter a name
+Give the environment a descriptive name (e.g., "Production", "Staging", "Dev"). Names can be up to 100 characters.
+{% endstep %}
+{% step %}
+### Create
+Click **Create Environment**. You are redirected to the environments list where your new environment appears.
+{% endstep %}
+{% endstepper %}
+
+## Environment detail page
+
+Click any environment name in the list to open its detail page. The detail view shows:
+
+- **Overview cards** -- At-a-glance counts for agent nodes and pipelines assigned to this environment.
+- **Vector Nodes table** -- All nodes registered in this environment with their name, host address, status, and last-seen timestamp. Click a node name to jump to its fleet detail page.
+- **Agent Enrollment** -- Generate or revoke the enrollment token that agents use to connect to this environment.
+- **Secret Backend** -- Configure how pipelines resolve secret references (see below).
+- **Secrets & Certificates** -- Manage the secrets and TLS certificates available to pipelines in this environment.
+
+You can edit the environment name or delete the environment from the detail page header.
+
+{% hint style="danger" %}
+Deleting an environment permanently removes all of its pipelines, nodes, and secrets. This action cannot be undone.
+{% endhint %}
+
+## Agent enrollment tokens
+
+Before an agent can connect to an environment, you must generate an **enrollment token** on the environment detail page. The token is displayed once -- copy it immediately and provide it to the agent at startup:
+
+```bash
+VF_URL=https://your-vectorflow-instance:3000
+VF_TOKEN=
+./vf-agent
+```
+
+You can regenerate or revoke the token at any time. Revoking a token prevents new agents from enrolling, but already-connected agents continue operating until their individual node tokens are revoked from the Fleet page.
+
+## Secret backends
+
+Each environment can use a different backend for resolving secret references in pipeline configurations:
+
+| Backend | Description |
+|---------|-------------|
+| **Built-in** | VectorFlow stores secrets internally and delivers them to agents as environment variables. This is the default. |
+| **HashiCorp Vault** | Secrets are fetched from a Vault instance. Configure the Vault address, auth method (token, AppRole, or Kubernetes), and mount path. |
+| **AWS Secrets Manager** | Secrets are resolved from AWS Secrets Manager at deploy time. |
+| **Exec** | A custom script on the agent host is executed to retrieve secrets. |
+
+## Secrets per environment
+
+Secrets are scoped to individual environments. The same secret name can hold different values in each environment. For example, you might have an `ELASTICSEARCH_API_KEY` secret that points to a test cluster in your dev environment and a production cluster in your production environment.
+
+Manage secrets from the **Secrets & Certificates** section on the environment detail page.
+
+## Pipeline promotion
+
+You can copy a pipeline from one environment to another using the **Promote to...** action on the Pipelines page. This is the recommended workflow for moving validated configurations through your lifecycle stages (e.g., dev to staging to production).
+
+{% hint style="warning" %}
+Secrets and certificates are stripped during promotion. After promoting a pipeline, configure the appropriate secrets in the target environment before deploying.
+{% endhint %}
+
+## Editing and deleting environments
+
+- **Edit** -- Click the **Edit** button on the environment detail page to rename the environment or change its secret backend configuration.
+- **Delete** -- Click the **Delete** button to permanently remove the environment. You must have the Admin role on the team to delete an environment.
diff --git a/docs/public/user-guide/fleet.md b/docs/public/user-guide/fleet.md
new file mode 100644
index 0000000..1c25d74
--- /dev/null
+++ b/docs/public/user-guide/fleet.md
@@ -0,0 +1,117 @@
+# Fleet Management
+
+The **Fleet** page gives you a centralized view of every agent node enrolled in the current environment. From here you can monitor node health, inspect system resources, view pipeline metrics, trigger agent updates, and stream live logs.
+
+
+
+## Node list
+
+All enrolled agent nodes are displayed in a table with the following columns:
+
+| Column | Description |
+|--------|-------------|
+| **Name** | The node name. Click it to open the node detail page. You can rename nodes from the detail view. |
+| **Host:Port** | The hostname or IP address and API port the agent is listening on. |
+| **Environment** | The environment the node is enrolled in. |
+| **Version** | The Vector version running on the node. |
+| **Agent Version** | The VectorFlow agent version, plus deployment mode (Docker or Binary). An **Update available** badge appears when a newer version exists. |
+| **Status** | Current health status (see statuses below). |
+| **Last Seen** | How recently the agent last communicated with the server. |
+
+If no agents have enrolled yet, the page shows a prompt directing you to generate an enrollment token in the environment settings.
+
+## Node health statuses
+
+Agent nodes report their health through periodic heartbeats. VectorFlow derives the following statuses:
+
+- **Online** -- The agent is sending heartbeats within the expected interval. The node is healthy and processing pipelines.
+- **Unreachable** -- The agent has missed heartbeats beyond the configured threshold (default: 3 missed intervals). This typically means the agent process has stopped, the host is down, or there is a network issue.
+
+The heartbeat threshold is calculated as `fleetPollIntervalMs * fleetUnhealthyThreshold`. With the default settings of a 15-second poll interval and a threshold of 3, a node is marked unreachable after approximately 45 seconds of silence.
+
+{% hint style="info" %}
+You can adjust the heartbeat interval and unhealthy threshold in the system settings.
+{% endhint %}
+
+## Node detail page
+
+Click a node name to open its detail page, which provides deep visibility into that specific agent.
+
+
+
+### Node details card
+
+A summary card shows key information at a glance:
+
+- **Status** -- Current health status
+- **Environment** -- Which environment the node belongs to
+- **Agent Version** -- The installed VectorFlow agent version
+- **Vector Version** -- The Vector binary version
+- **Last Heartbeat** -- Timestamp of the most recent heartbeat
+- **Enrolled** -- When the agent first enrolled
+- **Host / API Port** -- Network address details
+- **Last Seen / Created** -- Timestamps for tracking node lifecycle
+
+### System resources
+
+Charts display real-time and historical metrics for the node's host machine:
+
+- **CPU usage** -- Derived from cumulative CPU seconds
+- **Memory usage** -- Used vs. total memory
+- **Disk usage** -- Filesystem used vs. total bytes
+- **Load averages** -- 1, 5, and 15-minute load averages
+- **Network I/O** -- Bytes received and transmitted
+- **Disk I/O** -- Bytes read and written
+
+You can adjust the time window (up to 168 hours / 7 days) to view historical trends.
+
+### Pipeline metrics
+
+A table shows every pipeline deployed to the node along with live throughput data:
+
+| Column | Description |
+|--------|-------------|
+| **Pipeline** | Pipeline name |
+| **Status** | Running, Stopped, Starting, or Crashed |
+| **Events In / Out** | Total event counts with live per-second rates |
+| **Errors** | Total error count with live error rate (highlighted in red if non-zero) |
+| **Bytes In / Out** | Total bytes processed with live byte rates |
+| **Uptime** | How long the pipeline has been running on this node |
+
+### Logs
+
+A live log stream from the agent, with filtering options:
+
+- **Log level** -- Filter by severity (DEBUG, INFO, WARN, ERROR)
+- **Pipeline** -- Scope logs to a specific pipeline running on the node
+
+Logs are paginated and load on demand.
+
+## Agent updates
+
+When a newer agent version is available, an **Update available** badge appears in the node list. The update mechanism depends on the deployment mode:
+
+{% tabs %}
+{% tab title="Binary (Standalone)" %}
+Click the **Update** button in the node list to trigger a self-update. VectorFlow instructs the agent to download the new binary, verify its checksum, and restart. The node shows an **Update pending...** badge while the update is in progress.
+{% endtab %}
+{% tab title="Docker" %}
+Docker-based agents are updated by pulling the latest image. The **Update** button is disabled for Docker nodes -- update them by redeploying the container with the new image tag.
+{% endtab %}
+{% endtabs %}
+
+## Pipeline deployment matrix
+
+Below the node list, the **Pipeline Deployment Matrix** shows a grid of all deployed pipelines across all nodes in the environment. This lets you see at a glance which pipelines are running on which nodes and their current status.
+
+## Node management
+
+From the node detail page you can:
+
+- **Rename** -- Click the node name in the header to edit it inline.
+- **Revoke Token** -- Revokes the node's authentication token, preventing it from communicating with the server. The node is marked as unreachable.
+- **Delete Node** -- Permanently removes the node record from VectorFlow. This does not stop the agent process on the remote host.
+
+{% hint style="warning" %}
+Revoking a node token immediately prevents the agent from sending heartbeats or receiving pipeline updates. The agent process continues running on the host but operates in isolation until re-enrolled.
+{% endhint %}
diff --git a/docs/public/user-guide/pipeline-editor.md b/docs/public/user-guide/pipeline-editor.md
new file mode 100644
index 0000000..1998940
--- /dev/null
+++ b/docs/public/user-guide/pipeline-editor.md
@@ -0,0 +1,182 @@
+# Pipeline Editor
+
+The pipeline editor is a visual canvas where you design data pipelines by connecting sources, transforms, and sinks. It provides a drag-and-drop interface powered by a flow-graph layout, so you can build and modify pipelines without writing configuration files by hand.
+
+
+
+## Editor layout
+
+The editor is divided into four main areas:
+
+| Area | Location | Purpose |
+|------|----------|---------|
+| **Component Palette** | Left sidebar | Lists all available Vector component types, organized by kind and category. |
+| **Canvas** | Center | The main workspace where you arrange and connect nodes. |
+| **Detail Panel** | Right sidebar | Configuration form for the currently selected node. |
+| **Toolbar** | Top bar | Actions for saving, validating, deploying, and managing the pipeline. |
+
+## Component palette
+
+The left sidebar lists every available Vector component, grouped into three sections:
+
+- **Sources** -- Components that ingest data into the pipeline.
+- **Transforms** -- Components that process, filter, or reshape data in flight.
+- **Sinks** -- Components that send data to downstream destinations.
+
+Each section can be collapsed. When a section contains many components, they are further organized by category (e.g. "Cloud Platform", "Aggregating", "Messaging").
+
+Use the **search bar** at the top of the palette to filter components by name, type, description, or category.
+
+### Adding a component
+
+Drag a component from the palette and drop it onto the canvas. A new node appears at the drop position, pre-configured with sensible defaults. You can also right-click the canvas to paste previously copied nodes.
+
+## Canvas
+
+The canvas is where your pipeline takes visual shape. Each component is represented as a **node**, and data flow between components is represented as **edges** (connections).
+
+### Interacting with the canvas
+
+- **Pan** -- Click and drag on empty canvas space to move around.
+- **Zoom** -- Use the scroll wheel or the zoom controls in the bottom-left corner.
+- **Select a node** -- Click a node to select it. Its configuration appears in the detail panel on the right.
+- **Multi-select** -- Hold Shift and click additional nodes to select multiple components at once. The detail panel shows bulk actions (Copy All, Delete All).
+- **Reposition** -- Drag a node to move it to a new position on the canvas.
+- **Connect nodes** -- Drag from a node's output port (right side) to another node's input port (left side) to create a connection. The editor enforces data-type compatibility and prevents self-connections.
+- **Fit to view** -- The canvas automatically fits all nodes into view when first loaded. Use the zoom controls to reset the view.
+
+### Context menus
+
+- **Right-click a node** -- Opens a menu with Copy, Paste, Duplicate, and Delete actions.
+- **Right-click an edge** -- Opens a menu with a Delete connection action.
+
+## Node types
+
+### Sources
+
+Sources are where data **enters** the pipeline. They have an output port on the right side and are color-coded green. Examples include:
+
+- `syslog` -- Receive syslog messages over TCP or UDP
+- `file` -- Tail log files from disk
+- `kafka` -- Consume from Kafka topics
+- `http_server` -- Accept events over HTTP
+- `demo_logs` -- Generate synthetic log events for testing
+- `datadog_agent`, `splunk_hec` -- Receive data from vendor agents
+
+### Transforms
+
+Transforms **process and modify** data as it flows through the pipeline. They have both an input port (left) and an output port (right), and are color-coded blue. Key types include:
+
+- `remap` -- Apply VRL (Vector Remap Language) expressions to reshape events
+- `filter` -- Drop events that do not match a VRL condition
+- `sample` -- Randomly sample a percentage of events
+- `route` -- Split events into multiple outputs based on VRL conditions
+- `dedupe` -- Remove duplicate events based on field values
+- `reduce` -- Aggregate multiple events into one
+- `log_to_metric` -- Convert log events into metric data
+
+### Sinks
+
+Sinks are where data **exits** the pipeline to downstream destinations. They have an input port on the left side and are color-coded orange. Examples include:
+
+- `elasticsearch` -- Send to Elasticsearch / OpenSearch
+- `aws_s3` -- Write to Amazon S3 buckets
+- `http` -- Forward over HTTP to any endpoint
+- `console` -- Print to standard output (useful for debugging)
+- `datadog_logs`, `splunk_hec_logs` -- Send to vendor platforms
+- `kafka` -- Produce to Kafka topics
+- `loki` -- Send to Grafana Loki
+
+### Live metrics on nodes
+
+When a pipeline is deployed, each node displays live throughput metrics directly on the canvas:
+
+- **Events per second** and **bytes per second** in a compact readout
+- A **status dot** indicating health (green for healthy, amber for degraded)
+- A **sparkline** showing recent throughput trends
+
+
+
+## Detail panel
+
+Click any node on the canvas to open its configuration in the detail panel on the right.
+
+The panel shows:
+
+- **Component name and kind** -- The display name, a badge indicating source/transform/sink, and a delete button.
+- **Component Key** -- A unique identifier for this component within the pipeline (e.g. `traefik_logs`). Must contain only letters, numbers, and underscores.
+- **Enabled toggle** -- Disable a component to exclude it from the generated configuration without removing it from the canvas.
+- **Type** -- The Vector component type (read-only).
+- **Configuration form** -- Auto-generated form fields based on the component's configuration schema. Required fields are marked, and each field has contextual help.
+
+### VRL editor for transforms
+
+For **remap**, **filter**, and **route** transforms, the detail panel includes an integrated VRL editor (powered by Monaco) instead of a plain text field. The VRL editor provides:
+
+- Syntax highlighting for VRL code
+- Autocomplete for VRL functions
+- An inline snippet drawer to insert common VRL patterns (see [VRL Snippets](vrl-snippets.md))
+- A fields panel showing the schema of upstream source events
+- A test runner that lets you execute your VRL code against sample JSON events and see the output
+
+## Toolbar
+
+The toolbar runs along the top of the editor and provides the following actions (left to right):
+
+| Action | Shortcut | Description |
+|--------|----------|-------------|
+| **Save** | `Cmd+S` | Save the current pipeline state. A dot indicator appears when there are unsaved changes. |
+| **Validate** | -- | Run server-side validation on the generated Vector configuration and report any errors. |
+| **Undo** | `Cmd+Z` | Undo the last change. |
+| **Redo** | `Cmd+Shift+Z` | Redo a previously undone change. |
+| **Delete** | `Delete` | Delete the currently selected node or edge. |
+| **Import** | `Cmd+I` | Import a Vector configuration file (YAML or TOML) to populate the canvas. |
+| **Export** | `Cmd+E` | Export the pipeline as a YAML or TOML file. |
+| **Save as Template** | -- | Save the current pipeline layout as a reusable template. |
+| **Version History** | -- | View all deployed versions, compare diffs, and rollback to a previous version. |
+| **Metrics** | -- | Toggle the metrics chart panel at the bottom of the editor. |
+| **Logs** | -- | Toggle the live logs panel. A red dot appears when recent errors have been detected. |
+| **Settings** | -- | Open pipeline-level settings (log level, global configuration). A blue dot indicates active global config. |
+
+### Deploy and undeploy
+
+The right side of the toolbar shows the current deployment state:
+
+- **Deploy** button -- Appears when the pipeline has never been deployed, or when changes have been made since the last deployment. Clicking Deploy first auto-saves, then opens the deploy dialog.
+- **Deployed** indicator -- A green checkmark shown when the deployed configuration matches the saved state.
+- **Undeploy** button -- Stops the pipeline on all agents and reverts it to draft status. You can redeploy at any time.
+- **Process status** -- A colored dot and label showing the runtime status: Running (green), Starting (yellow), Stopped (gray), or Crashed (red).
+
+## Pipeline logs panel
+
+Toggle the logs panel from the toolbar to view real-time logs from the running pipeline. The panel supports:
+
+- Filtering by log level (ERROR, WARN, INFO, DEBUG)
+- Filtering by agent node
+- Cursor-based pagination for scrolling through history
+
+{% hint style="info" %}
+The logs panel only shows data for deployed pipelines. Draft pipelines have no running processes to produce logs.
+{% endhint %}
+
+## Pipeline rename
+
+Click the pipeline name in the top-left corner of the editor to rename it inline. Press Enter to confirm or Escape to cancel.
+
+## Keyboard shortcuts
+
+| Shortcut | Action |
+|----------|--------|
+| `Cmd+S` | Save pipeline |
+| `Cmd+Z` | Undo |
+| `Cmd+Shift+Z` | Redo |
+| `Cmd+C` | Copy selected node(s) |
+| `Cmd+V` | Paste copied node(s) |
+| `Cmd+D` | Duplicate selected node |
+| `Cmd+E` | Export configuration |
+| `Cmd+I` | Import configuration |
+| `Delete` / `Backspace` | Delete selected node or edge |
+
+{% hint style="info" %}
+On Windows and Linux, use `Ctrl` instead of `Cmd` for all keyboard shortcuts.
+{% endhint %}
diff --git a/docs/public/user-guide/pipelines.md b/docs/public/user-guide/pipelines.md
new file mode 100644
index 0000000..b649c68
--- /dev/null
+++ b/docs/public/user-guide/pipelines.md
@@ -0,0 +1,74 @@
+# Pipelines
+
+The **Pipelines** page lists all pipelines in the currently selected environment. From here you can create, clone, promote, and delete pipelines, as well as monitor their live throughput at a glance.
+
+
+
+## Pipeline list
+
+Pipelines are displayed in a table with the following columns:
+
+| Column | Description |
+|--------|------------|
+| **Name** | The pipeline name. Click it to open the pipeline in the editor. |
+| **Status** | Current lifecycle state (see statuses below). |
+| **Events/sec In** | Live event ingestion rate polled from the agent fleet. |
+| **Bytes/sec In** | Live byte ingestion rate. |
+| **Reduction** | Percentage of events reduced by transforms, color-coded green (>50%), amber (>10%), or neutral. |
+| **Created** | Date and avatar of the user who created the pipeline. |
+| **Last Updated** | Date and avatar of the user who last modified the pipeline. |
+
+## Pipeline statuses
+
+A pipeline moves through several states during its lifecycle:
+
+- **Draft** -- The pipeline has been created but never deployed. It exists only as a saved configuration.
+- **Running** -- The pipeline is deployed and actively processing events on at least one agent node.
+- **Starting** -- The pipeline was recently deployed or restarted and agents are bringing it online.
+- **Stopped** -- The pipeline is deployed but all agent nodes have stopped processing it.
+- **Crashed** -- One or more agent nodes report that the pipeline has crashed. Check the pipeline logs for details.
+- **Pending deploy** -- Shown as an additional badge when the saved configuration differs from what is currently deployed. Deploy the pipeline to push the latest changes.
+
+## Creating a pipeline
+
+{% stepper %}
+{% step %}
+### Click New Pipeline
+Click the **New Pipeline** button in the top-right corner. This navigates you to a new, empty pipeline in the editor.
+{% endstep %}
+{% step %}
+### Name your pipeline
+Give the pipeline a descriptive name. Names must start with a letter or number and can contain letters, numbers, spaces, hyphens, and underscores (up to 100 characters).
+{% endstep %}
+{% step %}
+### Build and save
+Add sources, transforms, and sinks in the pipeline editor. Save your work -- the pipeline starts as a **Draft** until you deploy it.
+{% endstep %}
+{% endstepper %}
+
+## Pipeline actions
+
+Each pipeline row has an actions menu (the three-dot icon on the right) with the following options:
+
+- **Metrics** -- Opens the dedicated metrics page for the pipeline, showing detailed throughput and error charts.
+- **Clone** -- Creates a copy of the pipeline (with " (Copy)" appended to the name) in the same environment and opens it in the editor.
+- **Promote to...** -- Copies the pipeline to a different environment within the same team. This is useful for promoting a pipeline from development to staging or production. Secrets and certificates are stripped during promotion and must be re-configured in the target environment.
+- **Delete** -- Permanently deletes the pipeline and all of its versions.
+
+{% hint style="danger" %}
+Deleting a deployed pipeline will automatically **undeploy** it from all agents before deletion. This means running agents will stop processing the pipeline on their next configuration poll. This action cannot be undone.
+{% endhint %}
+
+## Versioning
+
+Every time you deploy a pipeline, a new **version** is created that captures the full configuration YAML and a changelog entry. Versions let you:
+
+- **View history** -- See a list of all previously deployed versions with timestamps and the user who deployed them.
+- **Compare changes** -- View diffs between any two versions to understand what changed.
+- **Rollback** -- Restore a previous version if a deployment causes issues.
+
+The pipeline list shows a **Pending deploy** badge when the saved configuration differs from the most recently deployed version, so you always know if there are undeployed changes.
+
+## Filtering by environment
+
+Pipelines are scoped to the currently selected **environment** (shown in the sidebar). Switch environments to view pipelines in a different environment. Each environment maintains its own independent set of pipelines, agent nodes, and secrets.
diff --git a/docs/public/user-guide/templates.md b/docs/public/user-guide/templates.md
new file mode 100644
index 0000000..163a7ce
--- /dev/null
+++ b/docs/public/user-guide/templates.md
@@ -0,0 +1,98 @@
+# Templates
+
+Templates are reusable pipeline blueprints that capture a complete pipeline graph -- sources, transforms, sinks, their configurations, and how they connect. Save common patterns as templates so your team can spin up new pipelines in seconds without rebuilding from scratch.
+
+## Template library
+
+The **Templates** page displays all templates available to your team as a card grid. Each card shows:
+
+- **Name** -- The template name.
+- **Category** -- A label such as Logging, Metrics, Archival, Streaming, or a custom category you define.
+- **Description** -- A short summary of what the template does.
+- **Node and edge count** -- How many pipeline components and connections the template contains.
+
+{% hint style="info" %}
+Templates are shared across all environments within a team. Any team member with the Editor role or above can create and use templates.
+{% endhint %}
+
+## Saving a pipeline as a template
+
+You can save any pipeline from the pipeline editor as a reusable template.
+
+{% stepper %}
+{% step %}
+### Open a pipeline
+Navigate to the pipeline editor for the pipeline you want to save as a template.
+{% endstep %}
+{% step %}
+### Click Save as Template
+In the editor toolbar, click the **Save as Template** button. A dialog opens.
+{% endstep %}
+{% step %}
+### Fill in the details
+- **Name** -- A descriptive name (e.g., "Kafka to Elasticsearch").
+- **Description** -- Explain what the template does and when to use it.
+- **Category** -- Choose an existing category or type a custom one (e.g., "Logging", "Metrics", "Security").
+{% endstep %}
+{% step %}
+### Save
+Click **Save Template**. The template now appears in the template library for all team members.
+{% endstep %}
+{% endstepper %}
+
+## Creating a pipeline from a template
+
+{% stepper %}
+{% step %}
+### Open the Templates page
+Navigate to **Templates** in the sidebar. Make sure you have an environment selected in the header.
+{% endstep %}
+{% step %}
+### Choose a template
+Browse the template cards and click **Use Template** on the one you want.
+{% endstep %}
+{% step %}
+### Customize
+VectorFlow creates a new pipeline in the current environment with the template's graph pre-loaded. The pipeline opens in the editor where you can rename it, adjust configurations, and wire in environment-specific settings like secrets.
+{% endstep %}
+{% endstepper %}
+
+## What templates include
+
+Templates save the complete pipeline graph:
+
+- **Nodes** -- Every source, transform, and sink component, including its component type, component key, and configuration.
+- **Edges** -- All connections between components, preserving the data flow topology.
+- **Layout** -- The X/Y positions of nodes on the canvas, so the visual layout is preserved.
+
+## What templates do not include
+
+Templates are designed to be portable across environments, so they intentionally exclude:
+
+- **Secrets** -- Secret values (API keys, passwords, tokens) are never stored in templates. You must configure secrets in the target environment after creating a pipeline from a template.
+- **Certificates** -- TLS certificates are environment-specific and are not included.
+- **Environment bindings** -- Templates are not tied to any specific environment. They can be used in any environment within the team.
+- **Deployment state** -- Pipelines created from templates start as drafts. You deploy them when ready.
+
+{% hint style="warning" %}
+After creating a pipeline from a template, review the configuration and add any required secrets or environment-specific values before deploying.
+{% endhint %}
+
+## Managing templates
+
+- **Delete** -- Click the trash icon on a template card to permanently remove it. This does not affect pipelines that were already created from the template.
+- Templates cannot be edited after creation. To update a template, save the revised pipeline as a new template and delete the old one.
+
+## Template categories
+
+Templates are organized by category. Built-in categories include:
+
+| Category | Description |
+|----------|-------------|
+| **Getting Started** | Simple example pipelines for learning VectorFlow. |
+| **Logging** | Log collection, parsing, and routing patterns. |
+| **Archival** | Long-term storage and compliance pipelines. |
+| **Streaming** | Real-time event streaming configurations. |
+| **Metrics** | Metric collection and aggregation patterns. |
+
+You can also create custom categories by typing any name in the Category field when saving a template.
diff --git a/docs/public/user-guide/vrl-snippets.md b/docs/public/user-guide/vrl-snippets.md
new file mode 100644
index 0000000..48f8478
--- /dev/null
+++ b/docs/public/user-guide/vrl-snippets.md
@@ -0,0 +1,163 @@
+# VRL Snippets
+
+## What is VRL?
+
+**VRL (Vector Remap Language)** is a purpose-built expression language for transforming observability data. It is the primary way to reshape, filter, and enrich events inside Vector pipelines. VRL is used in `remap`, `filter`, and `route` transforms.
+
+{% hint style="info" %}
+VRL compiles to native Rust and runs inside the Vector process, so transforms execute with minimal overhead -- typically microseconds per event. For the full language reference, see the [official VRL documentation](https://vector.dev/docs/reference/vrl/).
+{% endhint %}
+
+## Snippet library
+
+VectorFlow ships with a built-in library of reusable VRL snippets that cover common transformation patterns. You can also create your own **custom snippets** that are shared with your team.
+
+Snippets are accessible from the VRL editor inside the pipeline editor. When you are editing a `remap`, `filter`, or `route` transform, a snippet drawer is available that lets you search, browse, and insert snippets directly into your code.
+
+### Built-in categories
+
+The built-in snippet library is organized into the following categories:
+
+| Category | Examples |
+|----------|---------|
+| **Parsing** | `parse_json`, `parse_syslog`, `parse_csv`, `parse_key_value`, `parse_regex`, `parse_grok`, `parse_apache_log`, `parse_nginx_log` |
+| **Filtering** | `del(.field)`, keep fields, `if/else` conditions, `abort` (drop event), `assert`, `compact` |
+| **Enrichment** | Set fields, rename fields, merge objects, add tags, set timestamp, `uuid_v4()` |
+| **Type Coercion** | `to_int`, `to_float`, `to_bool`, `to_string`, `to_timestamp` |
+| **Encoding** | `encode_json`, `encode_logfmt`, `encode_base64`, `decode_base64` |
+| **String** | `downcase`, `upcase`, `strip_whitespace`, `replace`, `contains`, `starts_with`, `split`, `join` |
+| **Timestamp** | `now()`, `format_timestamp`, `parse_timestamp`, `to_unix_timestamp` |
+| **Networking** | `ip_cidr_contains`, `parse_url`, `ip_to_ipv6`, `community_id` |
+
+### Inserting a snippet
+
+{% stepper %}
+{% step %}
+### Open the snippet drawer
+In the VRL editor, click the **snippets** toggle to open the snippet drawer below the code editor.
+{% endstep %}
+{% step %}
+### Browse or search
+Scroll through the categories or type in the search box to filter snippets by name, description, or code content.
+{% endstep %}
+{% step %}
+### Click to insert
+Click any snippet to insert its VRL code at the current cursor position in the editor. Adjust placeholder values (like field names) to match your data.
+{% endstep %}
+{% endstepper %}
+
+## Creating custom snippets
+
+Your team can create custom snippets that appear alongside the built-in library, tagged with a **Custom** badge.
+
+{% stepper %}
+{% step %}
+### Open the snippet form
+Click the **+** button in the snippet drawer header to open the creation form.
+{% endstep %}
+{% step %}
+### Fill in the details
+- **Name** -- A short, descriptive name (required, up to 100 characters).
+- **Description** -- An optional explanation of what the snippet does (up to 500 characters).
+- **Category** -- Choose from the built-in categories or select "Custom".
+- **Code** -- The VRL code to insert (required).
+{% endstep %}
+{% step %}
+### Save
+Click **Create** to save the snippet. It will immediately appear in the drawer for all team members.
+{% endstep %}
+{% endstepper %}
+
+Custom snippets can be **edited** or **deleted** by hovering over them in the drawer and clicking the pencil or trash icon. Only team members with Editor access or higher can create, edit, or delete custom snippets.
+
+## Testing VRL
+
+The VRL editor includes a built-in **test runner** that lets you execute your VRL code against sample data without deploying the pipeline.
+
+{% stepper %}
+{% step %}
+### Write your VRL code
+Enter your transform logic in the VRL editor.
+{% endstep %}
+{% step %}
+### Provide a sample event
+Enter a JSON object in the **input** panel. If you leave it blank, a default test event is used:
+```json
+{
+ "message": "test event",
+ "timestamp": "2026-01-01T00:00:00Z",
+ "host": "localhost"
+}
+```
+{% endstep %}
+{% step %}
+### Run the test
+Click the **Run** button. VectorFlow executes your VRL program against the input using the `vector vrl` CLI and displays the transformed output or any error messages.
+{% endstep %}
+{% endstepper %}
+
+{% hint style="warning" %}
+VRL testing requires the `vector` binary to be installed on the VectorFlow server. If it is not available, the test runner will show an error with installation instructions.
+{% endhint %}
+
+### Live event sampling
+
+For deployed pipelines, you can also sample **live events** flowing through upstream sources and use them as test input. The VRL editor provides:
+
+- A **Sample** button that requests recent events from the running pipeline
+- Navigation controls to step through multiple sampled events
+- A **Fields** panel that shows the inferred schema of upstream source events, so you know which fields are available for transformation
+
+## Example snippets
+
+Here are a few commonly used VRL patterns:
+
+### Parse syslog messages
+
+```coffee
+. = merge!(., parse_syslog!(.message))
+```
+
+Parses a syslog-formatted `.message` field and merges the structured fields (timestamp, hostname, severity, etc.) into the top-level event.
+
+### Extract and rename fields
+
+```coffee
+.host = del(.hostname)
+.severity = del(.level)
+.service = "my-app"
+```
+
+Renames fields by moving values to new keys and sets a static field.
+
+### Redact sensitive data
+
+```coffee
+.message = redact(.message, filters: [r'\d{3}-\d{2}-\d{4}', r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'])
+```
+
+Replaces patterns matching US Social Security numbers and email addresses with `[REDACTED]`.
+
+### Drop debug-level events
+
+```coffee
+if .level == "debug" {
+ abort
+}
+```
+
+Used in a `remap` transform with `drop_on_abort` enabled, this drops all events where the level is "debug", reducing volume before data reaches your sinks.
+
+### Conditional enrichment
+
+```coffee
+if starts_with(to_string(.path), "/api") {
+ .is_api_request = true
+ .team = "backend"
+} else {
+ .is_api_request = false
+ .team = "frontend"
+}
+```
+
+Adds metadata fields based on the request path, which can be used for routing or filtering downstream.