🧪 AI Lab Playground

Multi-Model LLM Arena & Research Workbench

A professional-grade AI research workbench for testing, comparing, and benchmarking LLMs via OpenRouter.

Battle models head-to-head, run speed contests, orchestrate multi-agent debates, visualize token probabilities, and build your own ELO leaderboard — all from a sleek, mobile-first interface.

Features · Quick Start · Architecture · Arena · Authors

📊 Key Metrics

Metric	Value
	Chat, Models, Runs, Experiments, Agents, Memory, Multi-Agent
	Real-time streaming with TTFB & tok/s tracking
	Battle, Debate, Speed Race, ELO Leaderboard
	Round-Robin, Coordinator+Specialists, Critic/Refiner
	Runs, Experiments, Results, Agents, Memory, Models Cache
	Playground, Arena, Experiments, Agents, Models, History
	Optimized production build
	Mobile-first responsive design

✨ Features

🏟️ Arena — Multi-Model Comparison

Battle Mode — Same prompt sent to 2+ models simultaneously, side-by-side streaming, blind voting with ELO rating updates
Debate Mode — Two models argue a topic across configurable rounds, each seeing and countering the other's arguments
Speed Race — Real-time animated progress bars, TTFB/tok/s/total time comparison, podium finish (🥇🥈🥉)
ELO Leaderboard — Persistent rankings with win/loss/draw stats, win rate tracking, full battle history

💬 Playground Chat

Real-time SSE streaming with live Markdown rendering
Full GFM support: code blocks with copy button, tables, lists, blockquotes
Model selector with 300+ models, search, favorites
Temperature, Top P, Max Tokens, Frequency/Presence Penalty sliders
Run ID tracking, TTFB, tokens/sec metrics
JSON export per run

🔬 Logprob Heatmap

Color-coded token probability visualization (green → red)
Floating tooltip with probability bar, logprob value, and top-5 alternative tokens
Summary statistics: average, min, max probability with visual bars
Touch-friendly for mobile

🧪 Experiment Runner

Multi-model × multi-temperature parameter grid
Streamed execution with live results
Comparative results view with metrics per configuration

🤖 Multi-Agent Sandbox

Create agents with name, role, model, system prompt, memory policy
Three orchestration modes with real-time streaming
Memory system: facts, summaries, pin/purge

📦 Model Browser

📜 Run History

🚀 Quick Start

Prerequisites

Installation

# Clone the repository
git clone https://github.com/simonpierreboucher02/agent-lab.git
cd agent-lab

# Install dependencies
npm install

# Configure environment
echo "OPENROUTER_API_KEY=sk-or-v1-your-key-here" > .env
echo "PORT=3001" >> .env

# Start development servers
npm run dev

Service	URL
	`http://localhost:5173`
	`http://localhost:3001`

Production Build

npm run build

🏗️ Architecture

ai-lab-playground/
├── .env                              # OpenRouter API key (server-side only)
├── package.json                      # Monorepo workspaces
├── packages/
│   ├── server/                       # Backend — Node + TypeScript + Express
│   │   └── src/
│   │       ├── index.ts              # Server entry (port 3001)
│   │       ├── db/                   # SQLite with WAL mode
│   │       ├── services/             # OpenRouter HTTPS streaming
│   │       ├── routes/
│   │       │   ├── chat.ts           # POST /v1/chat/stream (SSE)
│   │       │   ├── models.ts         # GET /v1/models (cached proxy)
│   │       │   ├── runs.ts           # GET/DELETE /v1/runs
│   │       │   ├── experiments.ts    # CRUD + SSE run
│   │       │   ├── agents.ts         # CRUD agents
│   │       │   ├── memory.ts         # CRUD + purge
│   │       │   └── multiagent.ts     # POST /v1/multiagent/run (SSE)
│   │       └── types/
│   └── web/                          # Frontend — React + Vite + Tailwind
│       └── src/
│           ├── components/
│           │   ├── arena/            # BattleMode, DebateMode, SpeedMode, ELO
│           │   ├── ChatMessage.tsx    # Markdown-rendered chat bubbles
│           │   ├── MarkdownRenderer.tsx
│           │   ├── LogprobHeatmap.tsx # Token probability visualization
│           │   ├── ModelSelector.tsx
│           │   ├── ParamsPanel.tsx
│           │   └── MetricsBar.tsx
│           ├── pages/                # Playground, Arena, Experiments, Agents, Models, History
│           ├── stores/               # Zustand (appStore + arenaStore)
│           ├── hooks/                # useStream (SSE client)
│           └── lib/                  # API client

Tech Stack

Layer	Technology	Badge
Frontend	React 18
Build	Vite 5
Styling	TailwindCSS 3
State	Zustand 4
Data	TanStack Query 5
Markdown	react-markdown + remark-gfm
Charts	Recharts
Icons	Lucide React
Backend	Express 4
Runtime	Node.js
Database	SQLite (better-sqlite3)
LLM API	OpenRouter
Language	TypeScript 5

🔌 API Endpoints

Method	Endpoint	Description
`GET`	`/v1/models`	List OpenRouter models (1h cache)
`POST`	`/v1/chat/stream`	SSE streaming chat completion
`GET`	`/v1/runs`	List run history
`GET`	`/v1/runs/:id`	Get run detail
`POST`	`/v1/experiments`	Create experiment
`POST`	`/v1/experiments/:id/run`	Run experiment (SSE)
`GET/POST/DELETE`	`/v1/agents`	CRUD agents
`GET/POST/DELETE`	`/v1/memory`	CRUD memory items
`POST`	`/v1/multiagent/run`	Multi-agent orchestration (SSE)
`GET`	`/health`	Health check

🔒 Security

OpenRouter API key is never exposed to the frontend
.env is in .gitignore
All API calls proxied through the backend
CORS configured for development

📱 Responsive Design

Breakpoint	Layout
Mobile (< 768px)	Bottom nav, drawer panels, touch-optimized
Tablet (768px+)	Side panels, expanded controls
Desktop (1024px+)	Full 3-column layout with persistent panels

👥 Authors

	Author	Role	Links
🧑‍🔬	Simon-Pierre Boucher	Creator & Lead Developer
🤖	Claude Opus 4.6	Co-Author & AI Engineer

📄 License

This project is licensed under the MIT License.

www.spboucher.ai · spbou4@protonmail.com

_{Built with precision and care — 2026}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
packages		packages
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧪 AI Lab Playground

Multi-Model LLM Arena & Research Workbench

📊 Key Metrics

✨ Features

🏟️ Arena — Multi-Model Comparison

💬 Playground Chat

🔬 Logprob Heatmap

🧪 Experiment Runner

🤖 Multi-Agent Sandbox

📦 Model Browser

📜 Run History

🚀 Quick Start

Prerequisites

Installation

Production Build

🏗️ Architecture

Tech Stack

🔌 API Endpoints

🔒 Security

📱 Responsive Design

👥 Authors

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧪 AI Lab Playground

Multi-Model LLM Arena & Research Workbench

📊 Key Metrics

✨ Features

🏟️ Arena — Multi-Model Comparison

💬 Playground Chat

🔬 Logprob Heatmap

🧪 Experiment Runner

🤖 Multi-Agent Sandbox

📦 Model Browser

📜 Run History

🚀 Quick Start

Prerequisites

Installation

Production Build

🏗️ Architecture

Tech Stack

🔌 API Endpoints

🔒 Security

📱 Responsive Design

👥 Authors

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages