Skip to content

hgflima/harness-test

Repository files navigation

Harness Test

Python 3.12+ Typer Pydantic v2 Rich Claude Agent SDK uv

Test your Claude Code harness the same way you test code — run a command, get a report, know what works.


About

Harness Test is a CLI testing framework for Claude Code harness components — CLAUDE.md, skills, hooks, subagents, and AGENTS.md.

It provides three testing layers with increasing depth and cost:

Layer What it does Token cost
Static validation Syntax, structure, broken references, rule conflicts Zero
Behavioral testing Runs prompts against Agent SDK, evaluates assertions Per-test
Statistical evaluation Multi-run consistency and variance analysis Multi-run

Tech Stack

Category Technology
Language Python 3.12+
CLI Typer
Data Models Pydantic v2
Terminal Output Rich (tables, Live streaming)
YAML Parsing ruamel.yaml (round-trip fidelity)
Test Executor Claude Agent SDK
Package Manager uv
Linter / Formatter Ruff
Type Checker Pyright
Tests pytest + pytest-asyncio

Architecture

┌─────────────────────────────────────────────────────┐
│                     CLI (Typer)                      │
│              init  ·  run  ·  report                 │
├──────────┬──────────┬──────────────┬────────────────┤
│ Scanner  │Validator │   Runner     │   Reporter     │
│          │          │              │                │
│ claude_md│ claude_md│  executor    │  formatter     │
│ skills   │ skills   │  (Agent SDK) │  (Rich tables) │
│ hooks    │ hooks    │  assertions  │                │
│ subagents│ subagents│  collector   │                │
│ agents_md│references│              │                │
│ memory   │ rules    │              │                │
├──────────┴──────────┴──────────────┴────────────────┤
│                 Pydantic Models                      │
│   component · config · test_spec · results · valid. │
└─────────────────────────────────────────────────────┘

Data Flow:

init:     scan_project() → detect_auth() → write_config()
static:   load_config() → scan_project() → validate_all() → Rich table
behavior: load_config() → load_all_specs() → run_tests() → save_results() → Rich Live
report:   load_results() → scan_project() → render_report() → Rich tables / JSON

Project Structure

src/harness_test/
  cli.py            # All CLI commands (init, run, report)
  config.py         # Auth detection + config management
  spec_loader.py    # YAML test spec loader
  main.py           # Typer app entry point
  models/           # Pydantic models (component, config, test_spec, results, validation)
  scanner/          # One scanner per harness type + scan_project() orchestrator
  validator/        # One validator per concern + validate_all() orchestrator
  runner/           # Agent SDK executor, assertions, result collector, run_tests()
  reporter/         # Rich formatter + render_report()
tests/              # Mirrors src/ structure
.harness-test/      # Runtime config (config.yaml, test results)

Getting Started

Prerequisites

  • Python 3.12+
  • uv package manager
  • Claude Code subscription (for behavioral tests) or Anthropic API key

Quick Start (recommended)

Run directly without installing — like npx:

uvx harness-test init
uvx harness-test run --layer static
uvx harness-test report

Global Install

uv tool install harness-test
harness-test init

Or with pip:

pip install harness-test
harness-test init

From Source

git clone https://github.com/hgflima/harness-test.git
cd harness-test
uv sync
uv run harness-test init

Usage

1. Initialize

Discover harness components, detect auth method, and create config:

harness-test init

2. Run Tests

# Static validation (zero tokens)
harness-test run --layer static

# Behavioral tests via Agent SDK
harness-test run

# Re-run only failed tests
harness-test run --failed

3. View Report

# Rich terminal output
harness-test report

# JSON for CI pipelines
harness-test report --json

Development

uv run pytest                     # Run test suite (218 tests)
uv run ruff check src/ tests/     # Lint
uv run ruff format src/ tests/    # Format

Key Design Decisions

  • Pydantic everywhere — All data crosses module boundaries as Pydantic models, never raw dicts
  • Scanner/Validator symmetry — Each harness component type has a matched scanner + validator pair
  • ruamel.yaml over PyYAML — Round-trip fidelity for YAML specs
  • Executor isolation — Each behavioral test runs in its own Agent SDK session
  • Exit codes0 success, 1 failures/errors, 2 runtime/config errors

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Run tests and linting (uv run pytest && uv run ruff check src/ tests/)
  4. Commit your changes
  5. Open a Pull Request

Acknowledgements


If this project helps you test your Claude Code harness, consider giving it a star.

About

A CLI testing framework for Claude Code harness components

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors