code · pull · Dec 24, 2025 · Dec 24, 2025 · Dec 24, 2025 · Dec 24, 2025
diff --git a/apps/docs/docs.json b/apps/docs/docs.json
@@ -185,6 +185,17 @@
 				],
 				"tab": "Developer Platform"
 			},
+			{
+				"icon": "book-open",
+				"anchors": [
+					{
+						"anchor": "API Reference",
+						"icon": "unplug",
+						"openapi": "https://api.supermemory.ai/v3/openapi"
+					}
+				],
+				"tab": "API Reference"
+			},
 			{
 				"icon": "plug",
 				"anchors": [
@@ -234,15 +245,35 @@
 				"tab": "SDKs"
 			},
 			{
-				"icon": "book-open",
+				"icon": "flask-conical",
 				"anchors": [
 					{
-						"anchor": "API Reference",
-						"icon": "unplug",
-						"openapi": "https://api.supermemory.ai/v3/openapi"
+						"anchor": "MemoryBench",
+						"icon": "flask-conical",
+						"pages": [
+							"memorybench/overview",
+							"memorybench/github",
+							{
+								"group": "Getting Started",
+								"pages": ["memorybench/installation", "memorybench/quickstart"]
+							},
+							{
+								"group": "Development",
+								"pages": [
+									"memorybench/architecture",
+									"memorybench/extend-provider",
+									"memorybench/extend-benchmark",
+									"memorybench/contributing"
+								]
+							},
+							{
+								"group": "Reference",
+								"pages": ["memorybench/cli", "memorybench/integrations"]
+							}
+						]
 					}
 				],
-				"tab": "API Reference"
+				"tab": "MemoryBench"
 			},
 			{
 				"icon": "chef-hat",
@@ -269,7 +300,6 @@
 				],
 				"tab": "Cookbook"
 			},
-
 			{
 				"icon": "list-ordered",
 				"anchors": [

diff --git a/apps/docs/memorybench/architecture.mdx b/apps/docs/memorybench/architecture.mdx
@@ -0,0 +1,99 @@
+---
+title: "Architecture"
+description: "Understanding MemoryBench's design and implementation"
+sidebarTitle: "Architecture"
+---
+
+## System Overview
+
+```mermaid
+flowchart TB
+    B["Benchmarks<br/>(LoCoMo, LongMemEval..)"]
+    P["Providers<br/>(Supermemory, Mem0, Zep)"]
+    J["Judges<br/>(GPT-4o, Claude..)"]
+
+    B --> O[Orchestrator]
+    P --> O
+    J --> O
+
+    O --> Pipeline
+
+    subgraph Pipeline[" "]
+        direction LR
+        I[Ingest] --> IX[Indexing] --> S[Search] --> A[Answer] --> E[Evaluate]
+    end
+
+    style B fill:#E0F2FE,stroke:#0369A1,color:#0C4A6E
+    style P fill:#E0F2FE,stroke:#0369A1,color:#0C4A6E
+    style J fill:#E0F2FE,stroke:#0369A1,color:#0C4A6E
+    style O fill:#0369A1,stroke:#0369A1,color:#fff
+    style I fill:#F1F5F9,stroke:#64748B,color:#334155
+    style IX fill:#F1F5F9,stroke:#64748B,color:#334155
+    style S fill:#F1F5F9,stroke:#64748B,color:#334155
+    style A fill:#F1F5F9,stroke:#64748B,color:#334155
+    style E fill:#F1F5F9,stroke:#64748B,color:#334155
+```
+
+## Core Components
+
+| Component | Role |
+|-----------|------|
+| **Benchmarks** | Load test data and provide questions with ground truth answers |
+| **Providers** | Memory services being evaluated (handle ingestion and search) |
+| **Judges** | LLM-based evaluators that score answers against ground truth |
+
+See [Integrations](/memorybench/integrations) for all supported benchmarks, providers, and models.
+
+## Pipeline
+
+```mermaid
+flowchart LR
+    A[Ingest] --> B[Index] --> C[Search] --> D[Answer] --> E[Evaluate] --> F[Report]
+
+    style A fill:#E0F2FE,stroke:#0369A1,color:#0C4A6E
+    style B fill:#E0F2FE,stroke:#0369A1,color:#0C4A6E
+    style C fill:#E0F2FE,stroke:#0369A1,color:#0C4A6E
+    style D fill:#E0F2FE,stroke:#0369A1,color:#0C4A6E
+    style E fill:#E0F2FE,stroke:#0369A1,color:#0C4A6E
+    style F fill:#DCFCE7,stroke:#16A34A,color:#166534
+```
+
+| Phase | What Happens |
+|-------|--------------|
+| **Ingest** | Load benchmark sessions → Push to provider |
+| **Index** | Wait for provider indexing |
+| **Search** | Query provider → Retrieve context |
+| **Answer** | Build prompt → Generate answer via LLM |
+| **Evaluate** | Compare to ground truth → Score via judge |
+| **Report** | Aggregate scores → Output accuracy + latency |
+
+Each phase checkpoints independently. Failed runs resume from last successful point.
+
+## Advanced Checkpointing
+
+Runs persist to `data/runs/{runId}/`:
+
+```
+data/runs/my-run/
+├── checkpoint.json    # Run state and progress
+├── results/           # Search results per question
+└── report.json        # Final report
+```
+
+Re-running same ID resumes. Use `--force` to restart.
+
+## File Structure
+
+```
+src/
+├── cli/commands/             # run, compare, test, serve, status...
+├── orchestrator/phases/      # ingest, search, answer, evaluate, report
+├── benchmarks/
+│   └── <name>/index.ts       # e.g. locomo/, longmemeval/, convomem/
+├── providers/
+│   └── <name>/
+│       ├── index.ts          # Provider implementation
+│       └── prompts.ts        # Custom prompts (optional)
+├── judges/                   # openai.ts, anthropic.ts, google.ts
+└── types/                    # provider.ts, benchmark.ts, unified.ts
+```
diff --git a/apps/docs/memorybench/cli.mdx b/apps/docs/memorybench/cli.mdx
@@ -0,0 +1,117 @@
+---
+title: "CLI Reference"
+description: "Command-line interface for running MemoryBench evaluations"
+sidebarTitle: "CLI"
+---
+
+## Commands
+
+### run
+
+Execute the full benchmark pipeline.
+
+```bash
+bun run src/index.ts run -p <provider> -b <benchmark> -j <judge> -r <run-id>
+```
+
+| Option | Description |
+|--------|-------------|
+| `-p, --provider` | Memory provider (`supermemory`, `mem0`, `zep`) |
+| `-b, --benchmark` | Benchmark (`locomo`, `longmemeval`, `convomem`) |
+| `-j, --judge` | Judge model (default: `gpt-4o`) |
+| `-r, --run-id` | Run identifier (auto-generated if omitted) |
+| `-m, --answering-model` | Model for answer generation (default: `gpt-4o`) |
+| `-l, --limit` | Limit number of questions |
+| `-s, --sample` | Sample N questions per category |
+| `--sample-type` | Sampling strategy: `consecutive` (default), `random` |
+| `--force` | Clear checkpoint and restart |
+
+See [Supported Models](/memorybench/supported-models) for all available judge and answering models.
+
+---
+
+### compare
+
+Run benchmark across multiple providers in parallel.
+
+```bash
+bun run src/index.ts compare -p supermemory,mem0,zep -b locomo -j gpt-4o
+```
+
+---
+
+### test
+
+Evaluate a single question for debugging.
+
+```bash
+bun run src/index.ts test -r <run-id> -q <question-id>
+```
+
+---
+
+### status
+
+Check progress of a run.
+
+```bash
+bun run src/index.ts status -r <run-id>
+```
+
+---
+
+### show-failures
+
+Debug failed questions with full context.
+
+```bash
+bun run src/index.ts show-failures -r <run-id>
+```
+
+---
+
+### list-questions
+
+Browse benchmark questions.
+
+```bash
+bun run src/index.ts list-questions -b <benchmark>
+```
+
+---
+
+### Random Sampling
+
+Sample N questions per category with optional randomization.
+
+```bash
+bun run src/index.ts run -p supermemory -b longmemeval -s 3 --sample-type random
+```
+
+---
+
+### serve
+
+Start the web UI.
+
+```bash
+bun run src/index.ts serve
+```
+
+Opens at [http://localhost:3000](http://localhost:3000).
+
+---
+
+### help
+
+Get help on providers, models, or benchmarks.
+
+```bash
+bun run src/index.ts help providers
+bun run src/index.ts help models
+bun run src/index.ts help benchmarks
+```
+
+## Checkpointing
+
+Runs are saved to `data/runs/{runId}/` and automatically resume from the last successful phase. Use `--force` to restart.
diff --git a/apps/docs/memorybench/contributing.mdx b/apps/docs/memorybench/contributing.mdx
@@ -0,0 +1,89 @@
+---
+title: "Contributing"
+description: "Guidelines for contributing to MemoryBench"
+sidebarTitle: "Contributing"
+---
+
+## Getting Started
+
+1. Fork the repository
+2. Clone your fork:
+   ```bash
+   git clone https://github.com/YOUR_USERNAME/memorybench
+   cd memorybench
+   bun install
+   ```
+3. Create a branch:
+   ```bash
+   git checkout -b feature/your-feature
+   ```
+
+## Development Workflow
+
+### Running Tests
+
+```bash
+bun test
+```
+
+### Running the CLI
+
+```bash
+bun run src/index.ts <command>
+```
+
+### Running the Web UI
+
+```bash
+cd ui
+bun run dev
+```
+
+## Code Structure
+
+| Directory | Purpose |
+|-----------|---------|
+| `src/cli/` | CLI commands |
+| `src/orchestrator/` | Pipeline execution |
+| `src/benchmarks/` | Benchmark adapters |
+| `src/providers/` | Provider integrations |
+| `src/judges/` | LLM judge implementations |
+| `src/types/` | TypeScript interfaces |
+| `ui/` | Next.js web interface |
+
+## Contribution Types
+
+### Adding a Provider
+
+See [Extending MemoryBench](/memorybench/extend-provider) for the full guide.
+
+1. Create `src/providers/yourprovider/index.ts`
+2. Implement the `Provider` interface
+3. Register in `src/providers/index.ts`
+4. Add config in `src/utils/config.ts`
+5. Submit PR with tests
+
+### Adding a Benchmark
+
+1. Create `src/benchmarks/yourbenchmark/index.ts`
+2. Implement the `Benchmark` interface
+3. Register in `src/benchmarks/index.ts`
+4. Document question types
+5. Submit PR with sample data
+
+### Bug Fixes
+
+1. Create an issue describing the bug
+2. Reference the issue in your PR
+3. Include test cases that reproduce the bug
+
+## Pull Request Guidelines
+
+- Keep PRs focused on a single change
+- Update documentation if needed
+- Ensure all tests pass
+- Follow existing code style
+
+## Questions?
+
+Open an issue on [GitHub](https://github.com/supermemoryai/memorybench/issues).