GoEvals

Fast, local-first LLM evaluation dashboard with universal JSONL support

Professional dashboard with dynamic columns that adapt to your custom metrics

Why GoEvals?

Most LLM evaluation dashboards are either cloud-only (vendor lock-in), Python-heavy (complex setup), or overkill (full observability platforms with databases).

GoEvals is different:

Single binary - No Python, no Docker, no dependencies
Local-first - Your data stays on your machine
Smart refresh - Polls for new results without flickering (5s intervals)
Fast - Starts in <100ms, handles thousands of evals
Simple - Works with standard JSONL files

Built for Go developers creating AI applications who want a lightweight, hackable eval dashboard.

Screenshots

Main Dashboard - Model Comparison

Compare different model configurations side-by-side with dynamic columns for all custom parameters

Test Details - Table View

Clean, professional table layout showing all test results with clickable rows

Test Details - Modal View

Click any test to see full details: question, model response, expected answer, score breakdown, and RAG configuration

Features

Core Features

Universal JSONL - Automatically detects ALL custom fields and scores from your data
Dynamic columns - Table adapts to show any RAG parameters (chunk_size, temperature, embedding_model, etc.)
Smart polling - Efficient updates without full page reload (5s intervals)
Sortable columns - Click any header to sort by that metric
Color-coded scores - Instant visual feedback (green >0.7, yellow 0.4-0.7, red <0.4)
Professional UI - Modern modal-based design like Linear/Vercel/Stripe
Dark mode - Built-in dark theme with localStorage persistence
Multiple files - Load and compare results from multiple JSONL files

Dashboard Views

Overview - Total tests, models tested, average scores
Model comparison - Side-by-side metrics with min/max/avg, shows ALL custom parameters
Test details - Table view with modal dialogs for full question, response, and scoring breakdowns

Quick Start

Using Make (Recommended)

# Clone the repository
git clone https://github.com/rchojn/goevals
cd goevals

# See all available commands
make help

# Build binary
make build

# Run with empty dashboard (no data needed)
make run-empty

# Run with your data
make run  # requires evals.jsonl in current directory

# Run tests
make test

# Format code and check quality
make check

Windows users: Use .\task.ps1 <command> instead of make (PowerShell script with same targets).

Manual Commands

# Build binary
go build -o bin/goevals main.go

# Run with sample data
./bin/goevals evals.jsonl

# Run on custom port
PORT=8080 ./bin/goevals evals.jsonl

# Compare multiple test runs
./bin/goevals run1.jsonl run2.jsonl run3.jsonl

# Visit http://localhost:3000

JSONL Format

GoEvals automatically detects all score fields in your JSONL and displays them in the dashboard.

Minimal Example

The bare minimum (one JSON object per line):

{"timestamp":"2025-10-26T14:30:00Z","model":"gpt-4","scores":{"combined":0.85}}
{"timestamp":"2025-10-26T14:31:00Z","model":"claude-3","scores":{"combined":0.92}}

Required fields:

timestamp - ISO8601 timestamp for ordering and smart polling
model - Model name (string)
scores.combined - Overall score 0.0-1.0 (float)

Full Example

With all optional fields:

{
  "timestamp": "2025-10-26T14:30:00Z",
  "model": "gemma2:2b",
  "test_id": "eval_001",
  "question": "What is the capital of France?",
  "response": "The capital of France is Paris.",
  "expected": "Paris",
  "response_time_ms": 1234,
  "scores": {
    "combined": 0.85,
    "accuracy": 0.90,
    "fluency": 0.88,
    "completeness": 0.82
  },
  "metadata": {
    "run_id": "morning_test_run",
    "temperature": 0.7,
    "max_tokens": 2048
  }
}

Optional fields:

test_id - Unique test identifier
question - Input question/prompt
response - Model's generated response
expected - Expected/ground truth answer
response_time_ms - Generation time in milliseconds
scores.* - Any custom score metrics (auto-detected!)
metadata - Any additional context

Custom Scores & Fields

Custom scores - Add any metrics to scores object, they'll appear as sortable columns:

{"timestamp":"2025-10-26T14:30:00Z","model":"gpt-4","scores":{"combined":0.85,"accuracy":0.90,"creativity":0.88,"safety":0.95}}

Custom fields - Add ANY top-level fields (RAG params, etc.), they'll appear as columns too:

{"timestamp":"2025-10-26T14:30:00Z","model":"llama3.2:1b","scores":{"combined":0.85},"chunk_size":500,"temperature":0.7,"embedding_model":"nomic-embed-text","top_k":5}

GoEvals automatically detects and displays all custom fields - no configuration needed!

How It Works

Smart Polling (No WebSockets Needed!)

GoEvals uses efficient HTTP polling instead of WebSockets:

Dashboard loads and remembers the latest timestamp
Every 5 seconds, fetches /api/evals/since?ts=<timestamp>
Server returns only new results added since that timestamp
If new results found, dashboard refreshes to recalculate stats
No flickering, no full reload, no WebSocket complexity

This is perfect for local development where you have:

One developer, one browser tab
Infrequent updates (tests complete in batches)
Zero infrastructure complexity

Architecture

┌─────────────┐         ┌─────────────┐         ┌──────────────┐
│  Tests      │         │  GoEvals    │         │  Browser     │
│  (append    │────────►│  Server     │◄────────│  Dashboard   │
│   to JSONL) │  write  │  (reload)   │  poll   │  (refresh)   │
└─────────────┘         └─────────────┘         └──────────────┘

No database, no queue, no complexity - just JSONL files and HTTP.

Configuration

GoEvals uses sensible defaults but can be customized via environment variables:

# Custom port
PORT=9090 ./goevals evals.jsonl

# Auto-refresh interval is hardcoded to 5s (can be changed in code)

Compatible With

GoEvals works with eval outputs from:

gai/eval (Go) ← Recommended
OpenAI Evals
Any custom evaluation framework that outputs JSONL

Example: Logging from Go

f, _ := os.OpenFile("evals.jsonl", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
json.NewEncoder(f).Encode(map[string]any{
    "timestamp": time.Now().Format(time.RFC3339),
    "model": "gpt-4",
    "test_id": "test_001",
    "scores": map[string]float64{
        "combined": 0.85,
        "accuracy": 0.90,
    },
    "response_time_ms": 1234,
})

Roadmap

See CHANGELOG.md for recent updates.

Future improvements:

Date range filtering in UI
Charts and graphs (Chart.js integration)
Export to CSV/JSON
Type-safe templates (a-h/templ)
Test run comparison view
WebSocket option for real-time updates

Tech Stack

Current (v2.0):

Pure Go stdlib (net/http, html/template, encoding/json)
Zero external dependencies
~1000 lines of code
Single file deployment

Philosophy:

Local-first, no cloud required
Simple > Complex
Files > Databases
HTTP polling > WebSockets (for this use case)

Contributing

Star the repo if you find it useful!

Report bugs or request features in Issues.

Pull requests are welcome! See CONTRIBUTING.md for guidelines.

Check out the CHANGELOG.md for recent updates.

License

MIT License - Free forever, use anywhere.

See LICENSE for details.

Author

Built by @rchojn - Go developer building AI/ML tools.

Inspired by evals.fun, Langfuse, and the philosophy that simple tools > complex platforms for local development.

Built with Go stdlib and common sense

github.com/rchojn/goevals

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.chglog		.chglog
.github		.github
assets		assets
.air.toml		.air.toml
.commitlintrc.yaml		.commitlintrc.yaml
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ROADMAP.md		ROADMAP.md
evals.jsonl		evals.jsonl
go.mod		go.mod
main.go		main.go
main_test.go		main_test.go
task.ps1		task.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GoEvals

Why GoEvals?

Screenshots

Main Dashboard - Model Comparison

Test Details - Table View

Test Details - Modal View

Features

Core Features

Dashboard Views

Quick Start

Using Make (Recommended)

Manual Commands

JSONL Format

Minimal Example

Full Example

Custom Scores & Fields

How It Works

Smart Polling (No WebSockets Needed!)

Architecture

Configuration

Compatible With

Example: Logging from Go

Roadmap

Tech Stack

Contributing

License

Author

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

rchojn/goevals

Folders and files

Latest commit

History

Repository files navigation

GoEvals

Why GoEvals?

Screenshots

Main Dashboard - Model Comparison

Test Details - Table View

Test Details - Modal View

Features

Core Features

Dashboard Views

Quick Start

Using Make (Recommended)

Manual Commands

JSONL Format

Minimal Example

Full Example

Custom Scores & Fields

How It Works

Smart Polling (No WebSockets Needed!)

Architecture

Configuration

Compatible With

Example: Logging from Go

Roadmap

Tech Stack

Contributing

License

Author

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages