Fast, local-first LLM evaluation dashboard with universal JSONL support
Professional dashboard with dynamic columns that adapt to your custom metrics
Most LLM evaluation dashboards are either cloud-only (vendor lock-in), Python-heavy (complex setup), or overkill (full observability platforms with databases).
GoEvals is different:
- Single binary - No Python, no Docker, no dependencies
- Local-first - Your data stays on your machine
- Smart refresh - Polls for new results without flickering (5s intervals)
- Fast - Starts in <100ms, handles thousands of evals
- Simple - Works with standard JSONL files
Built for Go developers creating AI applications who want a lightweight, hackable eval dashboard.
Compare different model configurations side-by-side with dynamic columns for all custom parameters
Clean, professional table layout showing all test results with clickable rows
Click any test to see full details: question, model response, expected answer, score breakdown, and RAG configuration
- Universal JSONL - Automatically detects ALL custom fields and scores from your data
- Dynamic columns - Table adapts to show any RAG parameters (chunk_size, temperature, embedding_model, etc.)
- Smart polling - Efficient updates without full page reload (5s intervals)
- Sortable columns - Click any header to sort by that metric
- Color-coded scores - Instant visual feedback (green >0.7, yellow 0.4-0.7, red <0.4)
- Professional UI - Modern modal-based design like Linear/Vercel/Stripe
- Dark mode - Built-in dark theme with localStorage persistence
- Multiple files - Load and compare results from multiple JSONL files
- Overview - Total tests, models tested, average scores
- Model comparison - Side-by-side metrics with min/max/avg, shows ALL custom parameters
- Test details - Table view with modal dialogs for full question, response, and scoring breakdowns
# Clone the repository
git clone https://github.com/rchojn/goevals
cd goevals
# See all available commands
make help
# Build binary
make build
# Run with empty dashboard (no data needed)
make run-empty
# Run with your data
make run # requires evals.jsonl in current directory
# Run tests
make test
# Format code and check quality
make checkWindows users: Use .\task.ps1 <command> instead of make (PowerShell script with same targets).
# Build binary
go build -o bin/goevals main.go
# Run with sample data
./bin/goevals evals.jsonl
# Run on custom port
PORT=8080 ./bin/goevals evals.jsonl
# Compare multiple test runs
./bin/goevals run1.jsonl run2.jsonl run3.jsonl
# Visit http://localhost:3000GoEvals automatically detects all score fields in your JSONL and displays them in the dashboard.
The bare minimum (one JSON object per line):
{"timestamp":"2025-10-26T14:30:00Z","model":"gpt-4","scores":{"combined":0.85}}
{"timestamp":"2025-10-26T14:31:00Z","model":"claude-3","scores":{"combined":0.92}}Required fields:
timestamp- ISO8601 timestamp for ordering and smart pollingmodel- Model name (string)scores.combined- Overall score 0.0-1.0 (float)
With all optional fields:
{
"timestamp": "2025-10-26T14:30:00Z",
"model": "gemma2:2b",
"test_id": "eval_001",
"question": "What is the capital of France?",
"response": "The capital of France is Paris.",
"expected": "Paris",
"response_time_ms": 1234,
"scores": {
"combined": 0.85,
"accuracy": 0.90,
"fluency": 0.88,
"completeness": 0.82
},
"metadata": {
"run_id": "morning_test_run",
"temperature": 0.7,
"max_tokens": 2048
}
}Optional fields:
test_id- Unique test identifierquestion- Input question/promptresponse- Model's generated responseexpected- Expected/ground truth answerresponse_time_ms- Generation time in millisecondsscores.*- Any custom score metrics (auto-detected!)metadata- Any additional context
Custom scores - Add any metrics to scores object, they'll appear as sortable columns:
{"timestamp":"2025-10-26T14:30:00Z","model":"gpt-4","scores":{"combined":0.85,"accuracy":0.90,"creativity":0.88,"safety":0.95}}Custom fields - Add ANY top-level fields (RAG params, etc.), they'll appear as columns too:
{"timestamp":"2025-10-26T14:30:00Z","model":"llama3.2:1b","scores":{"combined":0.85},"chunk_size":500,"temperature":0.7,"embedding_model":"nomic-embed-text","top_k":5}GoEvals automatically detects and displays all custom fields - no configuration needed!
GoEvals uses efficient HTTP polling instead of WebSockets:
- Dashboard loads and remembers the latest
timestamp - Every 5 seconds, fetches
/api/evals/since?ts=<timestamp> - Server returns only new results added since that timestamp
- If new results found, dashboard refreshes to recalculate stats
- No flickering, no full reload, no WebSocket complexity
This is perfect for local development where you have:
- One developer, one browser tab
- Infrequent updates (tests complete in batches)
- Zero infrastructure complexity
┌─────────────┐ ┌─────────────┐ ┌──────────────┐
│ Tests │ │ GoEvals │ │ Browser │
│ (append │────────►│ Server │◄────────│ Dashboard │
│ to JSONL) │ write │ (reload) │ poll │ (refresh) │
└─────────────┘ └─────────────┘ └──────────────┘
No database, no queue, no complexity - just JSONL files and HTTP.
GoEvals uses sensible defaults but can be customized via environment variables:
# Custom port
PORT=9090 ./goevals evals.jsonl
# Auto-refresh interval is hardcoded to 5s (can be changed in code)GoEvals works with eval outputs from:
- gai/eval (Go) ← Recommended
- OpenAI Evals
- Any custom evaluation framework that outputs JSONL
f, _ := os.OpenFile("evals.jsonl", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
json.NewEncoder(f).Encode(map[string]any{
"timestamp": time.Now().Format(time.RFC3339),
"model": "gpt-4",
"test_id": "test_001",
"scores": map[string]float64{
"combined": 0.85,
"accuracy": 0.90,
},
"response_time_ms": 1234,
})See CHANGELOG.md for recent updates.
Future improvements:
- Date range filtering in UI
- Charts and graphs (Chart.js integration)
- Export to CSV/JSON
- Type-safe templates (a-h/templ)
- Test run comparison view
- WebSocket option for real-time updates
Current (v2.0):
- Pure Go stdlib (
net/http,html/template,encoding/json) - Zero external dependencies
- ~1000 lines of code
- Single file deployment
Philosophy:
- Local-first, no cloud required
- Simple > Complex
- Files > Databases
- HTTP polling > WebSockets (for this use case)
Star the repo if you find it useful!
Report bugs or request features in Issues.
Pull requests are welcome! See CONTRIBUTING.md for guidelines.
Check out the CHANGELOG.md for recent updates.
MIT License - Free forever, use anywhere.
See LICENSE for details.
Built by @rchojn - Go developer building AI/ML tools.
Inspired by evals.fun, Langfuse, and the philosophy that simple tools > complex platforms for local development.


