Skip to content

Latest commit

 

History

History
116 lines (87 loc) · 4.02 KB

File metadata and controls

116 lines (87 loc) · 4.02 KB

AgentV Examples

This directory contains working examples demonstrating AgentV's evaluation capabilities.

Setup

Examples are self-contained packages with their own dependencies. Before running any example, install dependencies from the repository root:

# From repository root
bun run examples:install

This installs dependencies for all examples. Alternatively, install individually:

cd examples/features/execution-metrics
bun install

Directory Structure

Examples are organized into two categories:

examples/
├── features/       # Feature demonstrations (evaluators, metrics, SDK)
└── showcase/       # Real-world use cases and end-to-end demos

Features

Focused demonstrations of specific AgentV capabilities. Each example includes its own README with details.

SDK


Showcase

Real-world evaluation scenarios. Each example includes its own README with setup instructions.


Writing Your Own Examples

Each example follows this structure:

example-name/
├── evals/
│   ├── dataset.eval.yaml     # Primary eval file
│   ├── *.ts or *.py          # Code evaluators (optional)
│   └── *.md                  # LLM grader prompts (optional)
├── scripts/                  # Helper scripts (optional)
├── .agentv/
│   └── targets.yaml          # Target configuration (optional)
├── package.json              # Dependencies (if using @agentv/eval)
└── README.md                 # Example documentation

Using @agentv/eval SDK

For TypeScript code graders, add a package.json:

{
  "name": "my-example",
  "private": true,
  "type": "module",
  "dependencies": {
    "@agentv/eval": "file:../../../packages/eval"
  }
}

Then write type-safe code graders:

#!/usr/bin/env bun
import { defineCodeGrader } from '@agentv/eval';

export default defineCodeGrader(({ answer, criteria }) => ({
  score: answer.includes('expected') ? 1.0 : 0.0,
  hits: ['Found expected content'],
  misses: [],
}));