smoleval 🧪🤖🦀

Minimal and lightweight evaluation framework for AI agents written in Rust. Define test cases in YAML and get structured pass/fail reports.

📖 Read the blog post for motivation and notes on agentic engineering.

Features

YAML-driven evaluation datasets
CLI tool for running evals against HTTP agent endpoints
Registry-based check system with pluggable built-in validators
Structured reports with per-test scores and aggregate metrics
Pluggable Agent trait — back it with an HTTP API, local model, or a mock

Quick start

Define a dataset

# eval.yaml
name: "Weather Agent Eval"
tests:
  - name: basicLookup
    prompt: "What's the weather in Paris?"
    checks:
      - kind: responseContainsAny
        values: ["Paris"]
        caseSensitive: true
      - kind: toolUsedAtLeast
        name: "get_weather"
        times: 1

Run with the CLI

smoleval-cli --dataset eval.yaml --endpoint http://localhost:8080/chat

Run programmatically

See the smoleval crate README for the library API.

Built-in checks

Check	Description
`responseContainsAll`	Response contains all specified values
`responseContainsAny`	Response contains at least one of the specified values
`responseNotContains`	Response does not contain any of the specified values
`responseExactMatch`	Response exactly matches the expected value
`toolUsedAtLeast`	Tool was used at least `N` times (optional parameter matching)
`toolUsedAtMost`	Tool was used at most `N` times (optional parameter matching)
`toolUsedExactly`	Tool was used exactly `N` times (optional parameter matching)
`toolsUsedInOrder`	Tools were used in a specific order, with gaps allowed

Workspace crates

Crate	Description
`smoleval`	Core evaluation engine
`smoleval-cli`	CLI for running evals against HTTP endpoints
`smoleval-example`	Example with a mock agent and custom checks
`smoleval-cli-example`	Example HTTP agent servers

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github/workflows		.github/workflows
crates		crates
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

smoleval 🧪🤖🦀

Features

Quick start

Define a dataset

Run with the CLI

Run programmatically

Built-in checks

Workspace crates

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

smoleval 🧪🤖🦀

Features

Quick start

Define a dataset

Run with the CLI

Run programmatically

Built-in checks

Workspace crates

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages