Skip to content

iamGodofall/quickbench

Repository files navigation

Quickbench πŸƒβ€β™‚οΈ

Stop guessing if your agent works. Run reproducible, sovereign evaluations locally.

πŸš€ Quick Start (2 minutes)

npm install quickbench
npm run demo

See signed report instantly with accuracy, latency, fairness metrics.

πŸ“Š Metrics Explained

Metric Description Formula
Accuracy Exact match rate correct/total
Latency Response time Mean + P95 (ms)
Fairness Demographic parity StdDev(accuracy per demographic)
Cost Token cost Placeholder (future LLM)

πŸ“– Full Usage

import { runEvaluation, createMockAgent } from 'quickbench';
import { loadDataset } from 'quickbench';

const agent = (input: string) => 'your agent logic';
const dataset = await loadDataset('./my-data.csv');

const result = await runEvaluation({ agent, dataset });
console.log(result.scores.accuracy); // 0.87

πŸ”’ Security Model

  1. Zero Cloud: No APIs, no telemetry
  2. Local Signing: HMAC-SHA256 receipts
  3. No PII: Metadata-only tracking
  4. Deterministic: Fixed seeds, reproducible

πŸ—‚οΈ Dataset Format (CSV)

input,expected,metadata
"This is great!",positive,{"region":"en","demographic":"A"}
"Awful service.",negative

πŸ§ͺ Example Output

=== Quickbench Signed Report ===
scores:
  accuracy: 0.7
  latency: 
    mean: 2ms
    p95: 5ms
  fairness:
    demographicParity: 0.02
signature: abc123...

πŸ› οΈ Capkit Integration

npm i capkit quickbench
# Secure agent with capkit, eval with quickbench

Part of the Agent Builder Suite
β†’ capkit: Scoped capabilities for agents
β†’ quickbench: Reproducible agent evaluation
β†’ edge-run: Offline-first orchestration (coming soon)
β†’ connector-starter: Generate adapters fast (coming soon)

Built for builders who ship. MIT licensed. Local-first by design.

🀝 License

MIT - Ships sovereign, stays sovereign.

About

πŸƒ Sovereign Agent Evaluation Framework - Zero cloud dependencies. Local-only, cryptographically signed benchmarks for AI agents. npm run demo = instant evals.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors