Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .env.local.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# URL of the NexusRAG API (server-side proxy target)
NEXUSRAG_API_URL=http://localhost:8000

# API key passed as Bearer token from the browser (public — no secrets here)
NEXT_PUBLIC_API_KEY=your-api-key-here

# Corpus ID to use in the /run page
NEXT_PUBLIC_DEFAULT_CORPUS_ID=c1
30 changes: 26 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,8 +1,30 @@
.venv/
# Vercel
.vercel

# Next.js
.next/
out/
next-env.d.ts.bak
tsconfig.tsbuildinfo

# Node
node_modules/
package-lock.json

# Python
__pycache__/
.pytest_cache/
*.pyc
.pytest_cache/
.mypy_cache/
.ruff_cache/
.venv/
*.egg-info/
dist/
build/
*.egg-info/
.vercel

# OS
.DS_Store

# Local env
.env.local
.env*.local
134 changes: 38 additions & 96 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,111 +1,53 @@
# EvalOps Workbench
# Agent Runbook Orchestrator — Showcase Dashboard

A local-first evaluation harness for prompts, tools, and agents with regression tracking and experiment history.
Next.js 14 dashboard for the Agent Runbook Orchestrator showcase deploy.
Same Vercel-grade design system as the NexusRAG dashboard, adapted for
showcase tier (no auth, no BFF — only the public `/api/stats` endpoint
is real).

## Problem
## Stack

LLM teams lack a lightweight way to compare prompt and tool changes before shipping.
- Next.js 14 App Router · TypeScript strict · Tailwind 3 · Geist Sans + Mono
- Radix UI primitives · cmdk (⌘K) · sonner · next-themes · framer-motion
- vitest + Testing Library

## Users
## Routes

Agent builders, prompt engineers, applied AI teams
| path | what it shows |
|---|---|
| `/` | Overview — pitch banner, live `/api/stats` Tier-B counters, system status, audience + stack |
| `/telemetry` | Polling Tier-B telemetry consumer — full metric grid, raw JSON, contract docs, 30s visibility-aware polling |
| `/capabilities` | MVP scope, problem statement, why-now, audience, stack — sourced from `project.json` |
| `/roadmap` | Three-phase timeline (showcase → MVP build → Tier-A graduation) |
| `/settings` | Theme + project metadata |

## Core Capabilities

- Load datasets from JSON or CSV
- Run prompt or agent variants
- Score outputs with rubric functions
- Compare runs and export regressions

## Why This Matters

Evaluation is moving from optional best practice to baseline engineering hygiene.

## Architecture

- `core`: domain logic for evalops workbench.
- `cli`: operator-facing entrypoint for local workflows and smoke checks.
- `docs/`: product notes, roadmap, and architecture decisions.
- `tests/`: baseline regression coverage for the project contract.

## Local Usage
## Local development

```bash
uv run evalops-workbench summary
uv run evalops-workbench capabilities
uv run evalops-workbench roadmap
cd frontend
npm install
npm run dev # http://localhost:3000
```

## Initial Stack Direction
## Scripts

Python, Typer, DuckDB, OpenTelemetry
| command | what it does |
|---|---|
| `npm run dev` | Local dev server |
| `npm run build` | Production build |
| `npm run lint` | Next.js ESLint |
| `npm run type-check` | `tsc --noEmit` |
| `npm test` | Run the vitest suite |

## Delivery Standard

- Clear product thesis
- Setup that works locally
- Tests for the primary contract
- Documentation for roadmap and architecture
- Space for production integrations in the next iteration

## Showcase

This repository ships with a static Vercel-ready landing page for demos and previews.

```bash
vercel deploy -y
```
## Deployment

The deployed site presents EvalOps Workbench as a standalone product page.
Deploys as its own Vercel project pointing at `/frontend` rootDir; the
existing `agent-runbook-orchestrator` Vercel project continues to serve
the static landing page and `/api/stats` Python serverless function.

## Production telemetry
## Keyboard shortcuts

This deployment exposes public, aggregate metrics at `/api/stats`. The endpoint
is consumed by the Production Telemetry panel on https://eleventh.dev. The
schema is documented at
https://github.com/IgnazioDS/IgnazioDS/blob/main/TELEMETRY_SCHEMA.md.

This system is in **showcase mode** — the Vercel deploy is a public landing
page, not a system processing production workload. The endpoint exposes real
GitHub-derived metrics about the codebase rather than fabricated activity
counters. Tier-A workload metrics (`eval_runs_total`, `last_pass_rate`,
`regressions_caught_30d`, etc.) are added when the system is promoted from
showcase to production.

Sample response:

```bash
$ curl -i https://evalops-workbench.vercel.app/api/stats
HTTP/1.1 200 OK
Content-Type: application/json
Cache-Control: public, max-age=30, stale-while-revalidate=60
Access-Control-Allow-Origin: *

{
"system": "evalops",
"mode": "showcase",
"status": "operational",
"last_deployed_at": "2026-04-27T18:41:57Z",
"last_commit_at": "2026-04-01T16:54:50Z",
"metrics": {
"commits_30d": 1,
"commits_total": 3,
"primary_language": "Python",
"repo_stars": 0,
"lines_of_code": 1177
},
"schema_version": 1,
"generated_at": "2026-04-27T18:42:18Z"
}
```

The endpoint never returns HTTP 5xx. If GitHub is unreachable, the response
status flips to `"degraded"` and metric values fall back to last known good
(or zero) values, while the JSON contract remains valid.

To regenerate `lines_of_code` before deploying:

```bash
python3 scripts/compute_telemetry_static.py
git add api/_telemetry_static.json
```
| keys | action |
|---|---|
| ⌘K / Ctrl+K | Command palette |
| G then O / T / C / R | Overview / Telemetry / Capabilities / Roadmap |
137 changes: 0 additions & 137 deletions index.html

This file was deleted.

5 changes: 5 additions & 0 deletions next-env.d.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
/// <reference types="next" />
/// <reference types="next/image-types/global" />

// NOTE: This file should not be edited
// see https://nextjs.org/docs/basic-features/typescript for more information.
31 changes: 31 additions & 0 deletions next.config.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
// Hybrid Next.js + Python deploy. The dashboard and the public /api/stats
// Python serverless function (api/stats.py) live in the same Vercel project,
// so the dashboard reaches /api/stats via Vercel's standard routing — no
// rewrite required.

/** @type {import('next').NextConfig} */
const nextConfig = {
reactStrictMode: true,
poweredByHeader: false,
experimental: {
optimizePackageImports: ["lucide-react", "recharts"],
},
async headers() {
return [
{
source: "/(.*)",
headers: [
{ key: "X-Content-Type-Options", value: "nosniff" },
{ key: "Referrer-Policy", value: "strict-origin-when-cross-origin" },
{ key: "X-Frame-Options", value: "DENY" },
{
key: "Permissions-Policy",
value: "camera=(), microphone=(), geolocation=()",
},
],
},
];
},
};

export default nextConfig;
Loading
Loading