Stop guessing if your agent works. Run reproducible, sovereign evaluations locally.
npm install quickbench
npm run demoSee signed report instantly with accuracy, latency, fairness metrics.
| Metric | Description | Formula |
|---|---|---|
| Accuracy | Exact match rate | correct/total |
| Latency | Response time | Mean + P95 (ms) |
| Fairness | Demographic parity | StdDev(accuracy per demographic) |
| Cost | Token cost | Placeholder (future LLM) |
import { runEvaluation, createMockAgent } from 'quickbench';
import { loadDataset } from 'quickbench';
const agent = (input: string) => 'your agent logic';
const dataset = await loadDataset('./my-data.csv');
const result = await runEvaluation({ agent, dataset });
console.log(result.scores.accuracy); // 0.87- Zero Cloud: No APIs, no telemetry
- Local Signing: HMAC-SHA256 receipts
- No PII: Metadata-only tracking
- Deterministic: Fixed seeds, reproducible
input,expected,metadata
"This is great!",positive,{"region":"en","demographic":"A"}
"Awful service.",negative=== Quickbench Signed Report ===
scores:
accuracy: 0.7
latency:
mean: 2ms
p95: 5ms
fairness:
demographicParity: 0.02
signature: abc123...
npm i capkit quickbench
# Secure agent with capkit, eval with quickbenchPart of the Agent Builder Suite
β capkit: Scoped capabilities for agents
β quickbench: Reproducible agent evaluation
β edge-run: Offline-first orchestration (coming soon)
β connector-starter: Generate adapters fast (coming soon)
Built for builders who ship. MIT licensed. Local-first by design.
MIT - Ships sovereign, stays sovereign.