Demonstrates how a TypeScript code_grader evaluator can use defineCodeGrader from @agentv/eval for a declarative, zero-boilerplate approach.
evals/dataset.eval.yaml: Example test that uses a code_grader evaluator.scripts/verify-attachments.ts: Code grader script usingdefineCodeGrader.evals/example.txt,evals/python.instructions.md: Attachment fixtures.
From repository root:
bun install # Links workspace dependencies
bun run build # Builds @agentv/core packageTest the SDK-based code grader directly with a mock payload:
cd examples/features/code-grader-sdk
cat << 'EOF' | bun run scripts/verify-attachments.ts
{
"question": "Please echo this request",
"criteria": "The CLI echoes the prompt and lists attachment names.",
"expected_output": [{"role": "assistant", "content": "Attachments detected (2): example.txt, python.instructions.md."}],
"answer": "Attachments detected (2): example.txt, python.instructions.md.",
"guideline_files": ["evals/python.instructions.md"],
"input_files": ["evals/example.txt"],
"input": []
}
EOFFrom the repository root:
cd examples/features
bun agentv eval code-grader-sdk/evals/dataset.eval.yaml --target local_cliThis requires a CLI target named local_cli configured in .agentv/targets.yaml.
The defineCodeGrader helper:
- Reads JSON from stdin automatically
- Converts snake_case to camelCase
- Validates input and output with Zod schemas
- Handles errors gracefully
import { defineCodeGrader } from '@agentv/eval';
export default defineCodeGrader(({ answer, criteria }) => ({
score: answer.includes(criteria) ? 1.0 : 0.0,
hits: ['Check passed'],
misses: [],
}));