A professional-grade AI research workbench for testing, comparing, and benchmarking LLMs via OpenRouter.
Battle models head-to-head, run speed contests, orchestrate multi-agent debates, visualize token probabilities, and build your own ELO leaderboard — all from a sleek, mobile-first interface.
Features · Quick Start · Architecture · Arena · Authors
- Battle Mode — Same prompt sent to 2+ models simultaneously, side-by-side streaming, blind voting with ELO rating updates
- Debate Mode — Two models argue a topic across configurable rounds, each seeing and countering the other's arguments
- Speed Race — Real-time animated progress bars, TTFB/tok/s/total time comparison, podium finish (🥇🥈🥉)
- ELO Leaderboard — Persistent rankings with win/loss/draw stats, win rate tracking, full battle history
- Real-time SSE streaming with live Markdown rendering
- Full GFM support: code blocks with copy button, tables, lists, blockquotes
- Model selector with 300+ models, search, favorites
- Temperature, Top P, Max Tokens, Frequency/Presence Penalty sliders
- Run ID tracking, TTFB, tokens/sec metrics
- JSON export per run
- Color-coded token probability visualization (green → red)
- Floating tooltip with probability bar, logprob value, and top-5 alternative tokens
- Summary statistics: average, min, max probability with visual bars
- Touch-friendly for mobile
- Multi-model × multi-temperature parameter grid
- Streamed execution with live results
- Comparative results view with metrics per configuration
- Create agents with name, role, model, system prompt, memory policy
- Three orchestration modes with real-time streaming
- Memory system: facts, summaries, pin/purge
# Clone the repository
git clone https://github.com/simonpierreboucher02/agent-lab.git
cd agent-lab
# Install dependencies
npm install
# Configure environment
echo "OPENROUTER_API_KEY=sk-or-v1-your-key-here" > .env
echo "PORT=3001" >> .env
# Start development servers
npm run devnpm run buildai-lab-playground/
├── .env # OpenRouter API key (server-side only)
├── package.json # Monorepo workspaces
├── packages/
│ ├── server/ # Backend — Node + TypeScript + Express
│ │ └── src/
│ │ ├── index.ts # Server entry (port 3001)
│ │ ├── db/ # SQLite with WAL mode
│ │ ├── services/ # OpenRouter HTTPS streaming
│ │ ├── routes/
│ │ │ ├── chat.ts # POST /v1/chat/stream (SSE)
│ │ │ ├── models.ts # GET /v1/models (cached proxy)
│ │ │ ├── runs.ts # GET/DELETE /v1/runs
│ │ │ ├── experiments.ts # CRUD + SSE run
│ │ │ ├── agents.ts # CRUD agents
│ │ │ ├── memory.ts # CRUD + purge
│ │ │ └── multiagent.ts # POST /v1/multiagent/run (SSE)
│ │ └── types/
│ └── web/ # Frontend — React + Vite + Tailwind
│ └── src/
│ ├── components/
│ │ ├── arena/ # BattleMode, DebateMode, SpeedMode, ELO
│ │ ├── ChatMessage.tsx # Markdown-rendered chat bubbles
│ │ ├── MarkdownRenderer.tsx
│ │ ├── LogprobHeatmap.tsx # Token probability visualization
│ │ ├── ModelSelector.tsx
│ │ ├── ParamsPanel.tsx
│ │ └── MetricsBar.tsx
│ ├── pages/ # Playground, Arena, Experiments, Agents, Models, History
│ ├── stores/ # Zustand (appStore + arenaStore)
│ ├── hooks/ # useStream (SSE client)
│ └── lib/ # API client
- OpenRouter API key is never exposed to the frontend
.envis in.gitignore- All API calls proxied through the backend
- CORS configured for development
| Breakpoint | Layout |
|---|---|
| Mobile (< 768px) | Bottom nav, drawer panels, touch-optimized |
| Tablet (768px+) | Side panels, expanded controls |
| Desktop (1024px+) | Full 3-column layout with persistent panels |
| Author | Role | Links | |
|---|---|---|---|
| 🧑🔬 | Simon-Pierre Boucher | Creator & Lead Developer | |
| 🤖 | Claude Opus 4.6 | Co-Author & AI Engineer |
This project is licensed under the MIT License.