Harness Test

Test your Claude Code harness the same way you test code — run a command, get a report, know what works.

About

Harness Test is a CLI testing framework for Claude Code harness components — CLAUDE.md, skills, hooks, subagents, and AGENTS.md.

It provides three testing layers with increasing depth and cost:

Layer	What it does	Token cost
Static validation	Syntax, structure, broken references, rule conflicts	Zero
Behavioral testing	Runs prompts against Agent SDK, evaluates assertions	Per-test
Statistical evaluation	Multi-run consistency and variance analysis	Multi-run

Tech Stack

Category	Technology
Language	Python 3.12+
CLI	Typer
Data Models	Pydantic v2
Terminal Output	Rich (tables, Live streaming)
YAML Parsing	ruamel.yaml (round-trip fidelity)
Test Executor	Claude Agent SDK
Package Manager	uv
Linter / Formatter	Ruff
Type Checker	Pyright
Tests	pytest + pytest-asyncio

Architecture

┌─────────────────────────────────────────────────────┐
│                     CLI (Typer)                      │
│              init  ·  run  ·  report                 │
├──────────┬──────────┬──────────────┬────────────────┤
│ Scanner  │Validator │   Runner     │   Reporter     │
│          │          │              │                │
│ claude_md│ claude_md│  executor    │  formatter     │
│ skills   │ skills   │  (Agent SDK) │  (Rich tables) │
│ hooks    │ hooks    │  assertions  │                │
│ subagents│ subagents│  collector   │                │
│ agents_md│references│              │                │
│ memory   │ rules    │              │                │
├──────────┴──────────┴──────────────┴────────────────┤
│                 Pydantic Models                      │
│   component · config · test_spec · results · valid. │
└─────────────────────────────────────────────────────┘

Data Flow:

init:     scan_project() → detect_auth() → write_config()
static:   load_config() → scan_project() → validate_all() → Rich table
behavior: load_config() → load_all_specs() → run_tests() → save_results() → Rich Live
report:   load_results() → scan_project() → render_report() → Rich tables / JSON

Project Structure

src/harness_test/
  cli.py            # All CLI commands (init, run, report)
  config.py         # Auth detection + config management
  spec_loader.py    # YAML test spec loader
  main.py           # Typer app entry point
  models/           # Pydantic models (component, config, test_spec, results, validation)
  scanner/          # One scanner per harness type + scan_project() orchestrator
  validator/        # One validator per concern + validate_all() orchestrator
  runner/           # Agent SDK executor, assertions, result collector, run_tests()
  reporter/         # Rich formatter + render_report()
tests/              # Mirrors src/ structure
.harness-test/      # Runtime config (config.yaml, test results)

Getting Started

Prerequisites

Python 3.12+
uv package manager
Claude Code subscription (for behavioral tests) or Anthropic API key

Quick Start (recommended)

Run directly without installing — like npx:

uvx harness-test init
uvx harness-test run --layer static
uvx harness-test report

Global Install

uv tool install harness-test
harness-test init

Or with pip:

pip install harness-test
harness-test init

From Source

git clone https://github.com/hgflima/harness-test.git
cd harness-test
uv sync
uv run harness-test init

Usage

1. Initialize

Discover harness components, detect auth method, and create config:

harness-test init

2. Run Tests

# Static validation (zero tokens)
harness-test run --layer static

# Behavioral tests via Agent SDK
harness-test run

# Re-run only failed tests
harness-test run --failed

3. View Report

# Rich terminal output
harness-test report

# JSON for CI pipelines
harness-test report --json

Development

uv run pytest                     # Run test suite (218 tests)
uv run ruff check src/ tests/     # Lint
uv run ruff format src/ tests/    # Format

Key Design Decisions

Pydantic everywhere — All data crosses module boundaries as Pydantic models, never raw dicts
Scanner/Validator symmetry — Each harness component type has a matched scanner + validator pair
ruamel.yaml over PyYAML — Round-trip fidelity for YAML specs
Executor isolation — Each behavioral test runs in its own Agent SDK session
Exit codes — 0 success, 1 failures/errors, 2 runtime/config errors

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Run tests and linting (uv run pytest && uv run ruff check src/ tests/)
Commit your changes
Open a Pull Request

Acknowledgements

Claude Code by Anthropic
Claude Agent SDK for behavioral test execution
Typer for CLI ergonomics
Rich for terminal output
Pydantic for data validation

If this project helps you test your Claude Code harness, consider giving it a star.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.bg-shell		.bg-shell
.claude		.claude
.harn		.harn
.planning		.planning
src/harness_test		src/harness_test
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Harness Test

About

Tech Stack

Architecture

Project Structure

Getting Started

Prerequisites

Quick Start (recommended)

Global Install

From Source

Usage

1. Initialize

2. Run Tests

3. View Report

Development

Key Design Decisions

Contributing

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Harness Test

About

Tech Stack

Architecture

Project Structure

Getting Started

Prerequisites

Quick Start (recommended)

Global Install

From Source

Usage

1. Initialize

2. Run Tests

3. View Report

Development

Key Design Decisions

Contributing

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages