Claude Harness

AI Workflow Optimization Tool for Claude Code

A comprehensive harness that optimizes Claude Code sessions by addressing the four most common failures:

Early "done" - Agent declares victory too soon → Feature list as source of truth
Messy repo - Half-finished, no history → Git + progress log ritual
No real testing - Marks features done without verification → E2E browser tests
Chaotic setup - Re-learns how to run app every time → Single init.sh startup script

Features

Session Continuity: progress.md maintains context between sessions
Feature Management: Track features/tasks with status, subtasks, and E2E validation
Context Tracking: Monitor estimated token usage with session-based lifecycle
Compaction Indicator: Shows estimated compaction count when usage exceeds 100%
Auto-Save Handoff: Automatically saves handoff document on session exit
Discoveries Tracking: Capture findings, requirements, and institutional knowledge
Startup Ritual: init.sh (Bash) and init.ps1 (PowerShell) scripts
Git Safety Hooks: Block dangerous operations (commits to main, force pushes)
Auto-Hooks Setup: Creates .claude/settings.local.json with hooks during init
Subagent Delegation: Rule-based task delegation with token savings estimation
Orchestration Engine: Coordinate automatic subagent delegation with state machine
Optimization Suite: Exploration cache, file filtering, output compression, lazy loading
E2E Testing: Playwright integration with test generation
MCP Server: Playwright browser automation via Model Context Protocol
Stack Detection: Automatically detects your project's language, framework, database

Installation

# Clone the repository
git clone https://github.com/xanthar/claude-harness.git

# Install in development mode
cd claude-harness
pip install -e .

# Or install directly
pip install git+https://github.com/xanthar/claude-harness.git

Quick Start

Initialize a project

cd your-project
claude-harness init

The initializer will:

Detect your project stack (language, framework, database)
Ask configuration questions
Generate harness files in .claude-harness/
Create scripts/init.sh and scripts/init.ps1 startup scripts
Set up E2E testing structure
Update/create .claude/CLAUDE.md
Create .claude/settings.local.json with hooks (project-specific)

Start a session

./scripts/init.sh

This will:

Check git status (warn if on protected branch)
Activate virtual environment (Python)
Check if app is running, optionally start it
Verify database connection
Run quick test check
Show session progress and current feature

Refresh after upgrading

After upgrading claude-harness, refresh your project's scripts:

claude-harness refresh

This regenerates init.sh, hooks, and init.ps1 with the latest improvements while preserving your data (features.json, progress.md, config.json).

To also update CLAUDE.md with the latest harness integration section:

claude-harness refresh --update-claude-md

Manage features

# List features
claude-harness feature list

# Add a feature with subtasks
claude-harness feature add "User authentication" -s "Login form" -s "JWT handling" -s "Logout"

# Start working on a feature
claude-harness feature start F-001

# Mark subtask as done
claude-harness feature done F-001 0

# Mark tests as passing
claude-harness feature tests F-001

# Complete the feature
claude-harness feature complete F-001

Track progress

# Show current progress
claude-harness progress show

# Add completed item
claude-harness progress completed "Implemented login form"

# Add work in progress
claude-harness progress wip "Working on JWT handling"

# Add blocker
claude-harness progress blocker "Need API keys for OAuth"

# Start new session (archives previous)
claude-harness progress new-session

E2E Testing

# Install Playwright
claude-harness e2e install

# Generate test for a feature
claude-harness e2e generate F-001

# Run E2E tests
claude-harness e2e run
claude-harness e2e run --headed  # Visible browser

MCP Server (Playwright)

Claude Harness includes an MCP server for browser automation, allowing Claude Code to interact with web applications directly.

Setup for Claude Desktop:

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "playwright": {
      "command": "python",
      "args": ["-m", "claude_harness.mcp.playwright_server"]
    }
  }
}

Available Tools:

Tool	Description
`browser_launch`	Launch browser (chromium/firefox/webkit)
`browser_navigate`	Navigate to URL
`browser_click`	Click elements
`browser_fill`	Fill form inputs
`browser_type`	Type with keystroke simulation
`browser_screenshot`	Take screenshots
`browser_get_text`	Get element text
`browser_wait`	Wait for elements
`browser_evaluate`	Run JavaScript
`browser_select`	Select dropdown options
`browser_check`	Check checkboxes
`browser_press`	Press keyboard keys
`browser_close`	Close browser
`browser_content`	Get page HTML
`browser_query_all`	Query multiple elements

Run standalone:

python -m claude_harness.mcp.playwright_server

Context Tracking

Monitor estimated token usage with session-based lifecycle:

# Show context usage
claude-harness context show
claude-harness context show --full  # Detailed view with compaction info

# Show session info
claude-harness context session-info

# Mark session as closed (triggers reset on next start)
claude-harness context session-close

# Reset for new session
claude-harness context reset

# Set context budget
claude-harness context budget 200000

# Track per-task usage
claude-harness context start-task F-001
# ... do work ...
claude-harness context end-task F-001

# Output metadata for embedding
claude-harness context metadata

Session-Based Features:

Each session gets a unique session_id
Metrics automatically reset when a closed session is detected
Shows compaction indicator when usage exceeds 100% (e.g., 250% (~2 compactions))

The status command shows compact context usage:

[ * ] Context: 15.2% used | ~169,600 tokens remaining | 12 files read | 5 commands
[!!!] Context: 250% (~2 compactions) | 12 files read | 5 commands

Session Compression & Handoff

When context is filling up, compress your session for seamless continuation:

# Generate a session summary
claude-harness context summary

# Create a handoff document for the next session
claude-harness context handoff
claude-harness context handoff --save  # Save to file

# Full compression: handoff + archive progress + reset metrics
claude-harness context compress

The handoff document includes:

Project context and stack info
Current feature progress and subtasks
Completed work this session
Files modified
Pending features
Recommended next steps

Workflow for long sessions:

Work until context hits warning level (~70%)
Run claude-harness context compress
Start a new Claude Code session
Read the saved handoff document for context
Continue seamlessly

Hooks Setup

Claude Code hooks enable automatic tracking and safety enforcement. During claude-harness init, hooks are automatically configured in .claude/settings.local.json (project-specific, not committed).

Auto-created hooks include:

PreToolUse: Git safety checks (block commits to protected branches)
PostToolUse:
- Context tracking (file reads)
- Auto-progress tracking (file writes/edits added to progress.md)
- Activity logging
SessionEnd: Auto-save handoff, mark session closed, show summary

Note: The SessionEnd hook fires on all session endings including /exit. The older Stop hook only fires when Claude naturally stops, so we use SessionEnd to ensure handoffs are always saved.

See docs/HOOKS.md for detailed manual setup and customization.

Manual setup - add to .claude/settings.local.json:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [{"type": "command", "command": ".claude-harness/hooks/check-git-safety.sh"}]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Read",
        "hooks": [{"type": "command", "command": ".claude-harness/hooks/track-read.sh"}]
      },
      {
        "matcher": "Write",
        "hooks": [{"type": "command", "command": ".claude-harness/hooks/track-write.sh"}]
      },
      {
        "matcher": "Edit",
        "hooks": [{"type": "command", "command": ".claude-harness/hooks/track-edit.sh"}]
      },
      {
        "matcher": "Bash",
        "hooks": [{"type": "command", "command": ".claude-harness/hooks/log-activity.sh"}]
      }
    ],
    "SessionEnd": [
      {
        "hooks": [{"type": "command", "command": ".claude-harness/hooks/session-stop.sh"}]
      }
    ]
  }
}

Discoveries Tracking

Capture findings, requirements, and institutional knowledge during sessions:

# Add a discovery
claude-harness discovery add "Auth requires JWT secret in env" --context "Found during testing" --tags security,config

# List all discoveries
claude-harness discovery list
claude-harness discovery list --tag security  # Filter by tag
claude-harness discovery list --feature F-001  # Filter by feature

# Search discoveries
claude-harness discovery search "JWT"

# Show discovery details
claude-harness discovery show D-001

# View statistics
claude-harness discovery stats

# Generate summary for handoff
claude-harness discovery summary

# List all tags
claude-harness discovery tags

Discoveries are persisted in .claude-harness/discoveries.json and included in handoff documents.

Project Structure After Init

your-project/
├── .claude/
│   ├── CLAUDE.md              # Enhanced with harness integration
│   └── settings.local.json    # Claude Code hooks (project-specific)
├── .claude-harness/
│   ├── config.json            # Project configuration
│   ├── features.json          # Feature/task tracking
│   ├── progress.md            # Session continuity log
│   ├── context_metrics.json   # Context usage tracking
│   ├── discoveries.json       # Captured findings and knowledge
│   ├── hooks/
│   │   ├── check-git-safety.sh
│   │   ├── track-read.sh
│   │   ├── track-write.sh
│   │   ├── track-edit.sh
│   │   ├── log-activity.sh
│   │   └── session-stop.sh
│   └── session-history/       # Archived sessions
├── scripts/
│   ├── init.sh                # Startup ritual (Bash)
│   └── init.ps1               # Startup ritual (PowerShell)
└── e2e/
    ├── conftest.py            # Playwright fixtures
    ├── pytest.ini
    └── tests/                 # E2E test files

Configuration

The .claude-harness/config.json file contains all project settings:

{
  "project_name": "my-project",
  "stack": {
    "language": "python",
    "framework": "flask",
    "database": "postgresql"
  },
  "startup": {
    "port": 8000,
    "health_endpoint": "/api/v1/health",
    "start_command": "python run.py"
  },
  "git": {
    "protected_branches": ["main", "master"],
    "require_merge_confirmation": true
  },
  "testing": {
    "framework": "pytest",
    "coverage_threshold": 80
  }
}

Feature Tracking Format

The .claude-harness/features.json file tracks all features:

{
  "current_phase": "Phase 1 - Core Features",
  "features": [
    {
      "id": "F-001",
      "name": "User authentication",
      "status": "in_progress",
      "priority": 1,
      "tests_passing": false,
      "e2e_validated": false,
      "subtasks": [
        {"name": "Login form", "done": true},
        {"name": "JWT handling", "done": false}
      ]
    }
  ],
  "completed": [],
  "blocked": []
}

Progress Tracking

The .claude-harness/progress.md file maintains session continuity:

# Session Progress Log

## Last Session: 2025-12-12 17:30 UTC

### Completed This Session
- [x] Implemented login form
- [x] Added form validation

### Current Work In Progress
- [ ] F-001: User authentication - JWT handling

### Blockers
- None

### Next Session Should
1. Run `./scripts/init.sh` to verify environment
2. Continue with JWT handling subtask
3. Write unit tests for auth module

### Files Modified This Session
- app/auth/login.py
- app/templates/login.html

CLAUDE.md Integration

The harness adds mandatory rituals to your CLAUDE.md:

Session Start Ritual

Run ./scripts/init.sh
Read .claude-harness/progress.md
Check .claude-harness/features.json
Pick ONE feature to work on
Update status to "in_progress"

Session End Ritual

Update progress.md with session summary
Update feature status/subtasks
Commit work if appropriate

CLI Reference

Core Commands

Command	Description
`claude-harness init`	Initialize harness in project
`claude-harness refresh [--update-claude-md]`	Refresh scripts without losing data
`claude-harness status`	Show current status
`claude-harness detect`	Preview stack detection
`claude-harness run`	Execute init.sh

Feature Management (`feature`)

Command	Description
`feature list`	List features
`feature add NAME`	Add new feature
`feature info ID`	Show feature details
`feature start ID`	Start working on feature
`feature complete ID`	Complete feature
`feature block ID`	Block feature with reason
`feature unblock ID`	Unblock feature
`feature subtask ID NAME`	Add subtask
`feature done ID INDEX/NAME`	Complete subtask
`feature note ID TEXT`	Add note to feature
`feature tests ID`	Mark tests as passing
`feature e2e ID`	Mark E2E as validated
`feature sync`	Infer subtask status from modified files
`feature phase NAME`	Set current phase

Progress Tracking (`progress`)

Command	Description
`progress show`	Show progress
`progress completed ITEM`	Add completed item
`progress wip ITEM`	Add WIP item
`progress blocker ITEM`	Add blocker
`progress file PATH`	Track modified file
`progress new-session`	Start new session
`progress history`	Show session history
`progress update`	Update progress fields

Context Tracking (`context`)

Command	Description
`context show`	Show context usage
`context reset`	Reset context metrics
`context budget N`	Set token budget
`context start-task ID`	Start tracking task
`context end-task ID`	End tracking task
`context summary`	Generate session summary
`context handoff`	Generate handoff document
`context compress`	Compress session
`context session-info`	Show session details
`context session-close`	Mark session as closed
`context metadata`	Output metadata for embedding

Discoveries (`discovery`)

Command	Description
`discovery add SUMMARY`	Add a discovery
`discovery list`	List all discoveries
`discovery show ID`	Show discovery details
`discovery search QUERY`	Search discoveries
`discovery delete ID`	Delete a discovery
`discovery tags`	List all unique tags
`discovery stats`	Show statistics
`discovery summary`	Generate summary

Delegation (`delegation`)

Command	Description
`delegation status`	Show delegation status
`delegation enable`	Enable delegation
`delegation disable`	Disable delegation
`delegation rules`	List delegation rules
`delegation add-rule`	Add custom rule
`delegation remove-rule NAME`	Remove rule
`delegation enable-rule NAME`	Enable specific rule
`delegation disable-rule NAME`	Disable specific rule
`delegation suggest ID`	Get suggestions for feature
`delegation auto --on/--off`	Configure auto-delegation

Orchestration (`orchestrate`)

Command	Description
`orchestrate status`	Show orchestration status
`orchestrate evaluate`	Evaluate feature for delegation
`orchestrate queue [ID]`	Generate delegation queue
`orchestrate start ID`	Start a delegation
`orchestrate complete ID`	Complete a delegation
`orchestrate reset`	Reset orchestration session

Optimization (`optimize`)

Command	Description
`optimize status`	Show optimization status
`optimize cache`	Show cache info
`optimize cache-clear`	Clear cache
`optimize prune`	Prune stale cache entries
`optimize categorize PATH`	Categorize a file
`optimize filter`	Show filter configuration
`optimize compress TEXT`	Compress output text
`optimize loading-plan`	Show lazy loading plan
`optimize summary`	Show optimization summary

E2E Testing (`e2e`)

Command	Description
`e2e install`	Install Playwright
`e2e run`	Run E2E tests
`e2e generate ID`	Generate E2E test

Slash Commands (`commands`)

Command	Description
`commands generate`	Generate slash commands
`commands list`	List available commands

Short Alias

The ch alias is also available:

ch init
ch status
ch feature list

Supported Stacks

Languages

Python (pip, poetry, pipenv)
JavaScript/TypeScript (npm, yarn, pnpm)
Go, Rust (basic detection)

Frameworks

Python: Flask, Django, FastAPI
JS/TS: Express, Next.js, React, Vue, NestJS

Databases

PostgreSQL, MySQL, SQLite, MongoDB, Redis

Testing

pytest, unittest, Jest, Vitest, Mocha, Playwright

Philosophy

The harness enforces these principles:

ONE feature at a time - Focus prevents half-finished work
Progress over perfection - Track what's done, what's blocked
Tests or it didn't happen - Features need tests and E2E validation
Clean repo always - Every session ends with a commit
Context is king - Progress.md ensures no context is lost

Comparison with Sequential Thinking MCP

Claude Harness and the Sequential Thinking MCP Server serve different purposes and can be used together:

Aspect	Claude Harness	Sequential Thinking MCP
Purpose	Project workflow management	Structured reasoning process
Focus	Session continuity & task tracking	Step-by-step thinking during tasks
Persistence	Saves to disk (features.json, progress.md)	In-memory only (session-scoped)
Scope	Across multiple sessions	Within a single reasoning task
What it tracks	Features, progress, context usage, git	Individual thought steps & revisions

Using Them Together

┌─────────────────────────────────────────────────────────┐
│                    Claude Session                        │
│                                                          │
│  ┌──────────────────┐     ┌──────────────────────────┐  │
│  │ Sequential       │     │ Claude Harness            │  │
│  │ Thinking MCP     │     │                          │  │
│  │                  │     │ • What feature am I on?  │  │
│  │ • How do I solve │     │ • What's done/remaining? │  │
│  │   this problem?  │     │ • How much context used? │  │
│  │ • Step 1...      │     │ • Session handoff        │  │
│  │ • Revise step 2  │     │                          │  │
│  │ • Branch idea... │     │                          │  │
│  └──────────────────┘     └──────────────────────────┘  │
│       ↑                            ↑                     │
│  MICRO: reasoning             MACRO: workflow            │
│  within a task                across sessions            │
└─────────────────────────────────────────────────────────┘

Harness tells Claude "Work on F003: Add authentication"
Sequential Thinking helps Claude reason through HOW to implement it
Harness tracks that F003 is complete and what files changed

Documentation

CHANGELOG.md - Version history and release notes
ROADMAP.md - Planned features and improvements
docs/HOOKS.md - Detailed hook configuration guide

Contributing

Fork the repository
Create a feature branch (feat/your-feature)
Make your changes
Write tests (aim for 100% coverage on new code)
Update CHANGELOG.md
Submit a pull request

See ROADMAP.md for planned features accepting contributions.

License

MIT License - see LICENSE file

Author

Created by Morten Elmstroem Hansen

Optimizing Claude Code workflows, one harness at a time.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
.claude		.claude
claude_harness		claude_harness
docs		docs
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Claude Harness

Features

Installation

Quick Start

Initialize a project

Start a session

Refresh after upgrading

Manage features

Track progress

E2E Testing

MCP Server (Playwright)

Context Tracking

Session Compression & Handoff

Hooks Setup

Discoveries Tracking

Project Structure After Init

Configuration

Feature Tracking Format

Progress Tracking

CLAUDE.md Integration

Session Start Ritual

Session End Ritual

CLI Reference

Core Commands

Feature Management (feature)

Progress Tracking (progress)

Context Tracking (context)

Discoveries (discovery)

Delegation (delegation)

Orchestration (orchestrate)

Optimization (optimize)

E2E Testing (e2e)

Slash Commands (commands)

Short Alias

Supported Stacks

Languages

Frameworks

Databases

Testing

Philosophy

Comparison with Sequential Thinking MCP

Using Them Together

Documentation

Contributing

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Feature Management (`feature`)

Progress Tracking (`progress`)

Context Tracking (`context`)

Discoveries (`discovery`)

Delegation (`delegation`)

Orchestration (`orchestrate`)

Optimization (`optimize`)

E2E Testing (`e2e`)

Slash Commands (`commands`)

Packages