Two zero-human AI companies. Same product. Different strategies. One winner.
Gladiator pits two autonomous AI companies (Blitz, growth-first, vs Craft, quality-first) against each other to maximize GitHub stars on identical starter repos. Powered by Paperclip orchestration and Hermes Agent workers. A live dashboard visualizes the battle in real-time.
Built for the Nous Research Hermes Agent Hackathon (March 2026).
- Landing page: explains the product (llm-judge), the rules and the two rival companies
- Live dashboard: 9-section narrative: scoreboard, task boards, code comparison, Gantt chart, audit trail, learning evidence, merge controls
- Competition: 9 agents complete 10 tasks in ~5-6 minutes, creating skills, growing memory and writing code
- Winner announcement: auto-detects completion, declares winner by projected GitHub stars
- Merge: companies unite, skills transfer across teams, proving Hermes learning is real
| Component | Tech | Port |
|---|---|---|
| Paperclip | Node.js + PostgreSQL 16 | 3100 |
| Hermes Agent | Python CLI + Anthropic API | - |
| Dashboard | FastAPI + SSE + vanilla JS | 4000 |
| Evidence DB | SQLite (WAL mode) | - |
| Watcher | Python daemon | - |
| Requirement | Version | Notes |
|---|---|---|
| Python | 3.11+ | For Hermes Agent + Gladiator dashboard |
| Node.js | 20+ | For Paperclip |
| pnpm | 9+ | Paperclip uses pnpm workspaces |
| PostgreSQL | 16+ | Paperclip's data store |
| Anthropic API key | - | Claude Sonnet/Haiku access |
| ~$5-6 USD | - | Per 10-minute competition run (9 Sonnet agents) |
# Official installer
curl -fsSL https://hermes.nousresearch.com/install.sh | bash
# Verify
hermes --version
# Expected: Hermes Agent v0.2.0+
# Configure Anthropic provider
cat > ~/.hermes/.env << 'EOF'
ANTHROPIC_API_KEY=your-key-here
EOF
# Test
hermes chat -q "Say hello in 5 words" -Q --provider anthropic -m claude-haiku-4-5-20251001# Clone Paperclip
cd ~
git clone https://github.com/paperclipai/paperclip.git
cd paperclip
# Install dependencies (includes hermes-paperclip-adapter@0.1.1)
pnpm installPostgreSQL setup:
# Create database and user
sudo -u postgres psql -c "CREATE USER paperclip WITH PASSWORD 'paperclip';"
sudo -u postgres psql -c "CREATE DATABASE paperclip OWNER paperclip;"
# On WSL2, if PostgreSQL isn't running:
sudo service postgresql startApply the Hermes adapter (if not already in your Paperclip version):
The hermes-paperclip-adapter@0.1.1 npm package provides the integration. Paperclip needs three things:
- Add
"hermes_local"toAGENT_ADAPTER_TYPESinpackages/shared/src/constants.ts - Add
"hermes-paperclip-adapter": "0.1.1"toserver/package.jsondependencies - Import and register the adapter in
server/src/adapters/registry.ts:
// Add imports
import {
execute as hermesExecute,
testEnvironment as hermesTestEnvironment,
sessionCodec as hermesSessionCodec,
} from "hermes-paperclip-adapter/server";
import {
agentConfigurationDoc as hermesAgentConfigurationDoc,
models as hermesModels,
} from "hermes-paperclip-adapter";
// Add adapter definition
const hermesLocalAdapter: ServerAdapterModule = {
type: "hermes_local",
execute: hermesExecute,
testEnvironment: hermesTestEnvironment,
sessionCodec: hermesSessionCodec,
models: hermesModels,
supportsLocalAgentJwt: true,
agentConfigurationDoc: hermesAgentConfigurationDoc,
};
// Add to adaptersByType mapThen run pnpm install again to fetch the adapter package.
Known adapter patches (may already be fixed in newer versions):
The adapter v0.1.1 had two bugs we patched locally:
-
Env variable unwrapping: Paperclip wraps env vars as
{"type":"plain","value":"..."}objects. The adapter'sexecute.jsneeds to unwrap the.valueproperty:// In node_modules/hermes-paperclip-adapter/dist/server/execute.js // Find the env variable assignment loop and ensure it handles both formats: if (typeof v === "string") { env[k] = v; } else if (v && typeof v === "object" && typeof v.value === "string") { env[k] = v.value; }
-
Missing Anthropic provider: Add
"anthropic"toVALID_PROVIDERSinnode_modules/hermes-paperclip-adapter/dist/shared/constants.js:export const VALID_PROVIDERS = [ "auto", "anthropic", "openrouter", "nous", ... ];
cd ~/python_projects # or wherever you prefer
git clone https://github.com/runtimenoteslabs/gladiator.git
cd gladiator
# Create Python virtual environment
python3 -m venv base-product/.venv
source base-product/.venv/bin/activate
pip install fastapi uvicorn httpx rich sse-starlette
# Configure API key
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEYStart Paperclip first:
cd ~/paperclip
DATABASE_URL="postgres://paperclip:paperclip@localhost:5432/paperclip" pnpm dev &
sleep 10
curl -s http://localhost:3100/api/health # Should return {"status":"ok"}Then run the setup script:
cd ~/python_projects/gladiator
./base-product/.venv/bin/python scripts/setup_companies.pyThis creates:
- Blitz Corp: 4 agents (CEO, Engineer, CMO, Content)
- Craft Labs: 5 agents (CEO, CTO, Engineer 1, Engineer 2, Docs)
- Isolated
~/.hermes/gladiator/{agent-id}/homes with SOUL.md personalities gladiator_config.jsonwith all company/agent UUIDs
cd ~/python_projects/gladiator
./base-product/.venv/bin/python -m uvicorn dashboard.server:app --host 0.0.0.0 --port 4000Open http://localhost:4000/landing in your browser.
Click LAUNCH DEMO on the landing page. Everything else is automated:
- Evidence watcher auto-starts
- Git repos auto-initialize
- 9 agents wake up after 5-second delay
- 10-minute timer begins
- Winner announced when all tasks complete (or timer expires)
- Click MERGE after winner to demonstrate cross-company skill transfer
# Terminal 1: Paperclip
cd ~/paperclip
DATABASE_URL="postgres://paperclip:paperclip@localhost:5432/paperclip" pnpm dev
# Terminal 2: Dashboard
cd ~/python_projects/gladiator
./base-product/.venv/bin/python -m uvicorn dashboard.server:app --host 0.0.0.0 --port 4000
# Browser: http://localhost:4000/landing → Click LAUNCH DEMO| Time | What's happening |
|---|---|
| 0:00 | Reset wipes all state, restarts watcher, inits fresh git repos |
| 0:05 | 9 agents wake up and start working on assigned tasks |
| 1:00–5:00 | Tasks complete, skills get written, memory grows, code gets committed |
| ~5:30 | All 10 tasks done, winner announced, agents paused automatically |
| +1 min | Click MERGE: companies unite, skill transfer task runs |
Each run costs roughly $5-6 on the Anthropic API (9 Sonnet agents doing tool-heavy work). Budget accordingly.
| URL | Description |
|---|---|
/landing |
Pre-demo landing page with LAUNCH button |
/ |
Main dashboard (9 sections + merge controls) |
/comparison |
Head-to-head company comparison + agent details |
/intel |
System checks, strategic insights, heartbeat history |
gladiator/
├── base-product/ # Canonical llm-judge starter code
│ └── src/llm_judge/ # judge.py, cli.py, display.py
├── company_a/ # Blitz
│ ├── agents/ # SOUL.md personalities (source of truth)
│ │ ├── ceo/SOUL.md
│ │ ├── engineer/SOUL.md
│ │ ├── cmo/SOUL.md
│ │ └── content/SOUL.md
│ └── repo/ # Agent-modified llm-judge (gitignored, created at runtime)
├── company_b/ # Craft
│ ├── agents/ # SOUL.md personalities
│ │ ├── ceo/SOUL.md
│ │ ├── cto/SOUL.md
│ │ ├── engineer_1/SOUL.md
│ │ ├── engineer_2/SOUL.md
│ │ └── docs/SOUL.md
│ └── repo/ # Agent-modified llm-judge (gitignored, created at runtime)
├── dashboard/
│ ├── server.py # FastAPI + SSE backend (1300+ lines)
│ └── static/ # HTML/JS/CSS frontend
├── traces/
│ ├── db.py # SQLite schema (5 tables)
│ ├── collector.py # Evidence collection (skills, memory, heartbeats)
│ ├── analyzer.py # Learning report generation
│ └── watcher.py # Paperclip heartbeat poller
├── scripts/
│ ├── setup_companies.py # Create companies + agents in Paperclip
│ ├── merge_companies.py # Post-competition merger
│ ├── launch.sh # Start all services
│ └── stop.sh # Stop all services
├── .env.example # API key template
└── gladiator_config.example.json # Config template
| File | Purpose |
|---|---|
.env |
Your Anthropic API key |
gladiator_config.json |
Paperclip company/agent UUIDs |
evidence.db |
SQLite learning evidence |
merge_report.json |
Post-merge skill inventory |
~/.hermes/gladiator/ |
Agent homes (skills, memory, sessions) |
Both companies start with identical copies of llm-judge, a CLI tool that compares LLM responses side-by-side. Each company's agents work autonomously to improve the product and maximize projected GitHub stars.
Star formula: tasks_done × 8 + unique_skills × 5 + skill_versions × 3
| Feature | How It's Used | Evidence |
|---|---|---|
| Skills | Agents create reusable SKILL.md files after complex tasks | skill_snapshots table, version diffs |
| Memory | Agents save strategies and learnings to MEMORY.md | memory_snapshots table, char growth |
| Sessions | Session IDs chain across heartbeats | heartbeat_metrics.session_id |
| Skill Usage | Agents reference and apply their learned skills | skill_usage_events table |
| Cross-Agent Learning | Post-merge: agents use skills from rival team | learning_milestones type=cross_agent |
- Companies define budget and organizational structure
- Agents are autonomous workers with roles, models and heartbeat intervals
- Issues are tasks assigned to agents (checkout → work → mark done)
- Heartbeats trigger agent execution at configured intervals
- The hermes_local adapter spawns
hermes chat -q "prompt" -Qas a subprocess
Hermes has built-in Anthropic prompt caching (system_and_3 strategy, 5-minute TTL). We also trim unused bundled skills and reduce tool definitions for non-code agents to cut input tokens.
# Check PostgreSQL is running
pg_isready -h localhost -p 5432
# Check port isn't in use
lsof -i :3100
# WSL2: start PostgreSQL manually
sudo service postgresql start- Check agent status in Paperclip UI (http://localhost:3100)
- Check
~/.hermes/gladiator/{agent}/logs/errors.logfor API errors - "Invalid API response after 3 retries" = Anthropic API rate limit or empty response (usually transient)
- Verify
.envhas validANTHROPIC_API_KEY
- Evidence watcher must be running (auto-starts on LAUNCH DEMO)
- Check
/tmp/watcher.logfor errors - Verify
evidence.dbexists and has data
- The
"NaN" biginterror is cosmetic. Paperclip can't parse Hermes token output. Tasks still complete fine.
- Timer auto-stops at 600 seconds even if agents stall
- If dashboard was restarted mid-competition, in-memory state is lost. Use LAUNCH DEMO to restart
- Espionage mechanic. Agents browse each other's public GitHub and adapt strategy based on what the competitor shipped.
- Spectator voting. Dashboard lets visitors vote on which strategy they think will win, shown as a live poll.
- ClipMart export. Package both company configs as Paperclip ClipMart templates so anyone can import and run their own Gladiator match.
Built by RuntimeNotes Labs for the Nous Research Hermes Agent Hackathon.
- Hermes Agent by Nous Research. Autonomous AI agent with persistent memory, skills and session continuity
- Paperclip by Paperclip AI. AI company orchestration platform
- Claude Sonnet by Anthropic. Powers all 9 agents
This project is licensed under the MIT License.

