diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 00000000..dcc81596 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,80 @@ +# CLAUDE.md + +Project-level context for AI coding assistants working on OpenHarness. + +## What is this project + +OpenHarness (`oh`) is a lightweight, open-source Python reimplementation of Claude Code's agent harness architecture. It provides 43 tools, 54 commands, and full plugin/skill compatibility in ~11.7K lines of Python. + +## Tech stack + +- **Language**: Python 3.11+, TypeScript (frontend TUI) +- **Build**: hatchling, uv for dependency management +- **Core deps**: anthropic SDK, pydantic, typer, httpx, mcp, rich, textual +- **Frontend**: React 18 + Ink 5 (terminal UI), communicates via JSON protocol over stdin/stdout +- **Tests**: pytest + pytest-asyncio (async-first), pexpect for E2E + +## Project layout + +``` +src/openharness/ + cli.py # Entry point (typer). Commands: oh, openharness + engine/ # Agent loop: run_query(), QueryEngine, messages, streaming + api/ # Anthropic API client with retry + streaming + tools/ # 43 tools, all extend BaseTool (Pydantic input validation) + permissions/ # 3 modes: DEFAULT, PLAN, FULL_AUTO + path/command rules + hooks/ # Lifecycle hooks: PRE/POST_TOOL_USE, hot-reload + config/ # Settings (multi-layer: CLI > env > file > defaults) + mcp/ # Model Context Protocol client + skills/ # On-demand markdown knowledge (compatible with anthropics/skills) + plugins/ # Plugin system (compatible with claude-code/plugins) + coordinator/ # Multi-agent: subagent spawning, team coordination + commands/ # 54 interactive slash commands + prompts/ # System prompt assembly, CLAUDE.md injection + memory/ # Persistent cross-session knowledge + tasks/ # Background agent/shell tasks + ui/ # Backend host, React launcher, JSON protocol, Textual fallback + services/ # Session storage, cron, LSP, compaction, OAuth +frontend/terminal/ # React + Ink TUI (TypeScript) +tests/ # Unit tests (mirrors src/ structure) + E2E in scripts/ +``` + +## Common commands + +```bash +# Install +uv sync --extra dev + +# Run +uv run oh # Interactive TUI +uv run oh -p "prompt" # Non-interactive (print mode) + +# Tests +uv run pytest # All unit/integration tests +uv run pytest tests/test_engine # Engine tests only +uv run pytest -x -q # Stop on first failure, quiet + +# Lint & type check +uv run ruff check src/ +uv run mypy src/openharness/ + +# Frontend +cd frontend/terminal && npm install && npm run dev +``` + +## Architecture notes + +- **Agent loop** (`engine/query.py`): User-driven request-response. Inner loop runs up to `max_turns=8` autonomous tool-call rounds per user message. Not a continuously running autonomous agent. +- **Tool execution pipeline**: Pre-hook -> permission check -> Pydantic validation -> execute -> post-hook. Errors return `ToolResultBlock(is_error=True)` rather than raising exceptions. +- **Streaming**: `run_query()` is an `AsyncIterator[StreamEvent]` — yields `AssistantTextDelta`, `ToolExecutionStarted/Completed`, `AssistantTurnComplete`. +- **API retry**: Exponential backoff with jitter, 3 retries, respects Retry-After header. Auth errors are never retried. +- **Concurrent tools**: Multiple tool calls in a single LLM turn execute via `asyncio.gather`. +- **Permission prompt**: In DEFAULT mode, mutating tools `await permission_prompt()` which blocks until user confirms in the TUI. + +## Code conventions + +- Async-first: all tool execution and API calls are async +- Pydantic v2 for all data models and tool input schemas +- Ruff for linting (line-length 100), mypy strict mode +- pytest with `asyncio_mode = "auto"` +- No docstrings required on obvious methods; keep code self-documenting diff --git a/DESIGN.md b/DESIGN.md new file mode 100644 index 00000000..e351a2d7 --- /dev/null +++ b/DESIGN.md @@ -0,0 +1,314 @@ +# OpenHarness 架构设计总结 + +OpenHarness(`oh`)是一个开源的 AI Agent Harness 框架,用纯 Python 实现了 Claude Code 的核心架构。以约 11,700 行代码(Claude Code 的 2.3%)实现了 43 个工具、54 个命令,覆盖 98% 的核心功能。 + +--- + +## 1. 整体架构 + +``` +┌──────────────────────────────────────────────────────┐ +│ CLI 入口 (Typer) │ +│ oh / openharness → cli.py │ +├──────────┬───────────────────────────────────────────┤ +│ 交互模式 │ React TUI (Ink 5) ←JSON Protocol→ Backend │ +│ 打印模式 │ oh -p "..." → 无头 Agent Loop │ +├──────────┴───────────────────────────────────────────┤ +│ RuntimeBundle (ui/runtime.py) │ +│ 组装: API Client / ToolRegistry / Hooks / Commands │ +├──────────────────────────────────────────────────────┤ +│ QueryEngine (engine/) │ +│ 管理对话历史 / Cost 追踪 / 提交查询 │ +├──────────────────────────────────────────────────────┤ +│ ┌────────┬────────┬──────────┬───────┬──────┐ │ +│ │ Tools │Perms │ Hooks │ MCP │ API │ │ +│ │ 43 个 │3 级模式 │ 生命周期 │ 外部服务│ 流式 │ │ +│ └────────┴────────┴──────────┴───────┴──────┘ │ +└──────────────────────────────────────────────────────┘ +``` + +--- + +## 2. Agent Loop(核心循环) + +位于 `engine/query.py` 的 `run_query()` 函数,是整个系统的心脏: + +```python +for _ in range(max_turns): # 默认最多 8 轮 + # 1. 流式调用 LLM + async for event in api_client.stream_message(request): + yield TextDelta / MessageComplete + + messages.append(assistant_message) + + # 2. 若无工具调用,结束 + if not message.tool_uses: + return + + # 3. 执行工具(单个串行,多个 asyncio.gather 并发) + for tool_call in tool_uses: + pre_hook → permission_check → validate → execute → post_hook + + # 4. 工具结果回传,进入下一轮 + messages.append(tool_results) +``` + +**关键设计:** +- 通过 `AsyncIterator[StreamEvent]` 逐事件 yield,支持实时流式输出 +- 事件类型:`AssistantTextDelta` / `AssistantTurnComplete` / `ToolExecutionStarted` / `ToolExecutionCompleted` +- 多工具调用时用 `asyncio.gather` 并行执行,提升效率 + +--- + +## 3. 子系统设计 + +### 3.1 工具系统(tools/) + +43 个工具,统一基于 `BaseTool` 抽象类: + +```python +class BaseTool(ABC): + name: str + description: str + input_model: type[BaseModel] # Pydantic 输入校验 + + async def execute(self, arguments, context) -> ToolResult + def is_read_only(self, arguments) -> bool + def to_api_schema(self) -> dict # 生成 JSON Schema 供 LLM 使用 +``` + +| 分类 | 工具 | +|------|------| +| 文件 I/O | Bash, Read, Write, Edit, Glob, Grep | +| 搜索 | WebFetch, WebSearch, ToolSearch, LSP | +| Agent | Agent, SendMessage, TeamCreate/Delete | +| 任务 | TaskCreate/Get/List/Update/Stop/Output | +| MCP | MCPTool, ListMcpResources, ReadMcpResource | +| 工作流 | EnterPlanMode, ExitPlanMode, Worktree | +| 定时 | CronCreate/List/Delete, RemoteTrigger | +| 元操作 | Skill, Config, Brief, Sleep, AskUser, NotebookEdit | + +通过 `ToolRegistry` 统一注册和查找,支持运行时动态扩展(MCP 工具)。 + +### 3.2 权限系统(permissions/) + +三级权限模式 + 细粒度规则: + +| 模式 | 行为 | 场景 | +|------|------|------| +| DEFAULT | 写操作/执行需确认 | 日常开发 | +| PLAN | 阻止所有写操作 | 大型重构前审查 | +| FULL_AUTO | 全部允许 | 沙箱环境 | + +评估流程(`PermissionChecker.evaluate`): +1. 显式拒绝列表 → 2. 显式允许列表 → 3. 路径 Glob 规则 → 4. 命令黑名单 → 5. 模式级检查 + +返回 `PermissionDecision(allowed, requires_confirmation, reason)`。 + +### 3.3 Hook 系统(hooks/) + +生命周期事件拦截,支持热重载: + +- **事件**: `PRE_TOOL_USE` / `POST_TOOL_USE` / `SESSION_START` 等 +- **Hook 类型**: Command(Shell 命令)/ HTTP(Webhook)/ Prompt(模型评估)/ Agent(深度评估) +- **拦截能力**: Pre-hook 可返回 `blocked=True` 阻止工具执行 +- **热重载**: `HookReloader` 监听配置文件变化,自动重载 + +### 3.4 API 客户端(api/) + +封装 Anthropic SDK,提供流式调用和容错: + +- **流式处理**: `stream_message()` yield `ApiTextDeltaEvent` + `ApiMessageCompleteEvent` +- **重试策略**: 指数退避(1s → 2s → 4s),最多 3 次,支持 Retry-After header +- **可重试状态码**: 429 / 500 / 502 / 503 / 529 +- **不可重试**: 认证错误(401/403)立即抛出 +- **用量追踪**: `UsageSnapshot` 记录 input/output tokens,`CostTracker` 累计会话总量 + +### 3.5 消息模型(engine/messages.py) + +Pydantic 模型,与 Anthropic API 格式对齐: + +``` +ConversationMessage(role, content: list[ContentBlock]) + ├── TextBlock(text) + ├── ToolUseBlock(id, name, input) + └── ToolResultBlock(tool_use_id, content, is_error) +``` + +### 3.6 配置系统(config/) + +多层配置解析,优先级:CLI 参数 > 环境变量 > `~/.openharness/settings.json` > 默认值 + +关键配置项: +- `api_key` / `model` / `base_url` — 模型连接 +- `permission.mode` / `path_rules` / `denied_commands` — 权限 +- `hooks` — 生命周期钩子 +- `mcp_servers` — MCP 服务器 + +### 3.7 MCP 集成(mcp/) + +Model Context Protocol 客户端,连接外部工具服务器: + +- `McpClientManager` 管理多个 MCP 服务器连接 +- 自动将 MCP 工具注册到 `ToolRegistry` +- 支持资源读取(`ListMcpResources` / `ReadMcpResource`) + +### 3.8 技能系统(skills/) + +按需加载的知识文档(Markdown 格式): + +内置技能:`commit` / `debug` / `plan` / `review` / `simplify` / `test` + +兼容 [anthropics/skills](https://github.com/anthropics/skills),用户可将 `.md` 文件放入 `~/.openharness/skills/` 扩展。 + +### 3.9 插件系统(plugins/) + +兼容 Claude Code 插件格式: +- 插件类型:Command / Hook / Agent / MCP Server +- 管理命令:`oh plugin list/install/uninstall` +- 运行时发现和加载 + +### 3.10 多 Agent 协调(coordinator/) + +支持子 Agent 派生和团队协作: +- `Agent` 工具可启动子 Agent(独立上下文窗口) +- `SendMessage` 向子 Agent 发送消息 +- `TeamCreate/Delete` 管理 Agent 团队 + +### 3.11 提示词构建(prompts/) + +动态组装系统提示词: +- 基础系统提示词 + 环境信息(OS、Shell、Git、CWD) +- CLAUDE.md 项目级知识注入 +- 技能文档按需拼接 + +--- + +## 4. 前端架构(frontend/terminal/) + +React 18 + Ink 5 构建的终端 TUI,通过 JSON 协议与 Python 后端通信: + +``` +Python Backend (BackendHost) + ↕ JSON Protocol (stdin/stdout) +React Frontend (App.tsx) + ├── Composer # 输入框 + ├── TranscriptPane # 对话记录 + ├── ToolCallDisplay # 工具调用展示 + ├── StatusBar # 状态栏 + ├── CommandPicker # 命令选择器 + ├── SelectModal # 权限确认弹窗 + └── WelcomeBanner # 欢迎页 +``` + +备用方案:Textual TUI(纯 Python,无需 Node.js)。 + +--- + +## 5. 数据流全景 + +``` +用户输入 + ↓ +CLI (typer) 解析参数 + ↓ +build_runtime() → RuntimeBundle + ├─ 加载 Settings + ├─ 创建 AnthropicApiClient + ├─ 注册 43 个 Tool + MCP 工具 + ├─ 加载 Plugins / Hooks / Skills + └─ 创建 QueryEngine + ↓ +QueryEngine.submit_message(prompt) + ↓ +run_query() 循环: + │ + ├─ api_client.stream_message() + │ ├─ TextDelta → 实时流式输出 + │ └─ MessageComplete → 完整响应 + │ + ├─ 检查 tool_uses + │ └─ 无 → 结束循环,返回结果 + │ + ├─ 执行工具链: + │ ├─ Pre-Hook (可拦截) + │ ├─ PermissionChecker.evaluate() + │ │ └─ requires_confirmation → 弹窗确认 + │ ├─ Pydantic 输入校验 + │ ├─ tool.execute() + │ └─ Post-Hook + │ + └─ tool_results → messages → 下一轮循环 + ↓ +输出(text / json / stream-json) +``` + +--- + +## 6. 关键设计模式 + +| 模式 | 应用 | +|------|------| +| **Agent Loop** | 核心递归工具调用循环 | +| **Async Iterator** | 流式事件 yield,支持实时 UI 更新 | +| **Pydantic 校验** | 所有工具输入强类型验证 | +| **权限组合** | 模式 + 规则 + Hook 多层安全检查 | +| **依赖注入** | API Client / Tools / Permissions / Hooks 均可替换 | +| **Hook 拦截** | Pre/Post 生命周期事件,可阻止或观察操作 | +| **懒加载** | Skills / MCP Server / Plugins 按需初始化 | +| **协议隔离** | Python 后端与 React 前端通过 JSON 协议通信 | +| **指数退避** | API 调用失败自动重试,带 Jitter | + +--- + +## 7. 目录结构 + +``` +src/openharness/ +├── cli.py # CLI 入口 (typer) +├── engine/ # Agent Loop 核心 +│ ├── query.py # run_query() 主循环 +│ ├── query_engine.py # QueryEngine 会话管理 +│ ├── messages.py # 消息模型 +│ ├── stream_events.py# 流式事件定义 +│ └── cost_tracker.py # 用量追踪 +├── api/ # API 客户端 +│ ├── client.py # Anthropic SDK 封装 + 重试 +│ ├── errors.py # 错误类型 +│ └── usage.py # Token 用量 +├── tools/ # 43 个工具 +│ ├── base.py # BaseTool 抽象类 +│ └── *.py # 各工具实现 +├── permissions/ # 权限系统 +├── hooks/ # 生命周期钩子 +├── config/ # 配置管理 +├── mcp/ # MCP 集成 +├── skills/ # 技能系统 +├── plugins/ # 插件系统 +├── coordinator/ # 多 Agent 协调 +├── commands/ # 54 个交互命令 +├── prompts/ # 系统提示词构建 +├── memory/ # 持久化记忆 +├── tasks/ # 后台任务 +├── ui/ # UI 层 (Backend + Protocol) +└── services/ # 公共服务 (session, cron, LSP, compact) + +frontend/terminal/ # React TUI (TypeScript) +tests/ # 114 单元测试 + 6 E2E 套件 +``` + +--- + +## 8. 技术栈 + +| 层 | 技术 | +|----|------| +| CLI | Python 3.10+, Typer | +| 数据校验 | Pydantic 2.0+ | +| HTTP | httpx, websockets | +| AI SDK | anthropic >= 0.40.0 | +| MCP | mcp >= 1.0.0 | +| 终端 UI | Rich, Textual (备用) | +| 前端 TUI | React 18, Ink 5, TypeScript | +| 测试 | pytest, pytest-asyncio, pexpect | +| 代码质量 | ruff, mypy | diff --git a/README.md b/README.md index 058b0a6a..abc16a06 100644 --- a/README.md +++ b/README.md @@ -126,43 +126,297 @@ oh -p "Fix the bug" --output-format stream-json ## 🏗️ Harness Architecture -OpenHarness implements the core Agent Harness pattern with 10 subsystems: +### System Architecture ``` -openharness/ - engine/ # 🧠 Agent Loop — query → stream → tool-call → loop - tools/ # 🔧 43 Tools — file I/O, shell, search, web, MCP - skills/ # 📚 Knowledge — on-demand skill loading (.md files) - plugins/ # 🔌 Extensions — commands, hooks, agents, MCP servers - permissions/ # 🛡️ Safety — multi-level modes, path rules, command deny - hooks/ # ⚡ Lifecycle — PreToolUse/PostToolUse event hooks - commands/ # 💬 54 Commands — /help, /commit, /plan, /resume, ... - mcp/ # 🌐 MCP — Model Context Protocol client - memory/ # 🧠 Memory — persistent cross-session knowledge - tasks/ # 📋 Tasks — background task management - coordinator/ # 🤝 Multi-Agent — subagent spawning, team coordination - prompts/ # 📝 Context — system prompt assembly, CLAUDE.md, skills - config/ # ⚙️ Settings — multi-layer config, migrations - ui/ # 🖥️ React TUI — backend protocol + frontend +┌──────────────────────────────────────────────────────────────┐ +│ CLI Entry (cli.py / Typer) │ +│ oh / openharness → Interactive or Print mode │ +├──────────┬───────────────────────────────────────────────────┤ +│ TUI Mode │ React/Ink Frontend ←─JSON Protocol─→ BackendHost│ +│ Print │ oh -p "..." → Headless agent loop → stdout │ +├──────────┴───────────────────────────────────────────────────┤ +│ RuntimeBundle (ui/runtime.py) │ +│ Assembles: ApiClient + ToolRegistry + Hooks + Commands │ +├──────────────────────────────────────────────────────────────┤ +│ QueryEngine (engine/) │ +│ Conversation history + Cost tracking + run_query() │ +├──────────────────────────────────────────────────────────────┤ +│ ┌─────────┬────────────┬────────┬───────┬───────┬────────┐ │ +│ │ Tools │ Permissions│ Hooks │ MCP │Skills │Plugins │ │ +│ │ 43 tools│ 3 modes │ Pre/ │ Proto │ .md │ claude │ │ +│ │ Pydantic│ Path rules │ Post │ Client│ files │ compat │ │ +│ └─────────┴────────────┴────────┴───────┴───────┴────────┘ │ +└──────────────────────────────────────────────────────────────┘ +``` + +### Project Structure & File Reference + +``` +OpenHarness/ +├── pyproject.toml # Build config (hatchling), deps, pytest/ruff/mypy settings +├── LICENSE # MIT License +├── README.md # This file +├── CLAUDE.md # AI assistant project context +├── DESIGN.md # Architecture design document (Chinese) +│ +├── src/openharness/ # ══════ Core Python Package ══════ +│ ├── __init__.py # Package marker +│ ├── __main__.py # `python -m openharness` entry +│ ├── cli.py # CLI entry point (Typer): oh [options], sub-commands +│ │ +│ ├── engine/ # ── Agent Loop (core) ── +│ │ ├── query.py # run_query(): async tool-call loop (max_turns=8) +│ │ ├── query_engine.py # QueryEngine: conversation history + cost tracking +│ │ ├── messages.py # ConversationMessage, TextBlock, ToolUseBlock, ToolResultBlock +│ │ ├── stream_events.py # StreamEvent types: TextDelta, TurnComplete, ToolStarted/Completed +│ │ └── cost_tracker.py # CostTracker: cumulative token usage per session +│ │ +│ ├── api/ # ── Anthropic API Client ── +│ │ ├── client.py # AnthropicApiClient: streaming + exponential backoff retry (3x) +│ │ ├── errors.py # AuthenticationFailure, RateLimitFailure, RequestFailure +│ │ ├── provider.py # ProviderInfo: detect API capabilities +│ │ └── usage.py # UsageSnapshot: input/output token counts +│ │ +│ ├── tools/ # ── 43 Tools (all extend BaseTool) ── +│ │ ├── base.py # BaseTool ABC, ToolResult, ToolExecutionContext, ToolRegistry +│ │ ├── bash_tool.py # Execute shell commands via subprocess +│ │ ├── file_read_tool.py # Read file contents with offset/limit +│ │ ├── file_write_tool.py # Create or overwrite files atomically +│ │ ├── file_edit_tool.py # Search-and-replace edits within files +│ │ ├── glob_tool.py # Find files matching glob patterns +│ │ ├── grep_tool.py # Regex search with ripgrep integration +│ │ ├── web_fetch_tool.py # Fetch and parse HTML/text from URLs +│ │ ├── web_search_tool.py # Web search via search engine API +│ │ ├── agent_tool.py # Spawn sub-agent with separate context +│ │ ├── send_message_tool.py # Send message to sub-agent or team member +│ │ ├── team_create_tool.py # Create multi-agent team +│ │ ├── team_delete_tool.py # Delete team +│ │ ├── skill_tool.py # Load and apply skill knowledge +│ │ ├── mcp_tool.py # Call MCP server tools dynamically +│ │ ├── mcp_auth_tool.py # MCP server authentication +│ │ ├── list_mcp_resources_tool.py # List MCP resources +│ │ ├── read_mcp_resource_tool.py # Read MCP resource content +│ │ ├── task_create_tool.py # Create background shell/agent task +│ │ ├── task_list_tool.py # List running tasks +│ │ ├── task_get_tool.py # Get task status and output +│ │ ├── task_update_tool.py # Update task parameters +│ │ ├── task_stop_tool.py # Gracefully stop a task +│ │ ├── task_output_tool.py # Stream task output +│ │ ├── cron_create_tool.py # Schedule agents on cron +│ │ ├── cron_list_tool.py # List cron schedules +│ │ ├── cron_delete_tool.py # Delete cron schedule +│ │ ├── enter_plan_mode_tool.py # Switch to read-only plan mode +│ │ ├── exit_plan_mode_tool.py # Exit plan mode +│ │ ├── enter_worktree_tool.py # Enter git worktree for isolation +│ │ ├── exit_worktree_tool.py # Exit git worktree +│ │ ├── config_tool.py # Get/set configuration values +│ │ ├── sleep_tool.py # Delay execution for polling scenarios +│ │ ├── ask_user_question_tool.py # Ask user for input (blocks until response) +│ │ ├── brief_tool.py # Summarize conversation context +│ │ ├── tool_search_tool.py # Search available tools by name/description +│ │ ├── lsp_tool.py # Language Server Protocol integration +│ │ ├── notebook_edit_tool.py # Edit Jupyter notebook cells +│ │ ├── remote_trigger_tool.py # Trigger remote agent execution +│ │ └── todo_write_tool.py # Update task/todo list +│ │ +│ ├── permissions/ # ── Permission System ── +│ │ ├── modes.py # PermissionMode enum: DEFAULT, PLAN, FULL_AUTO +│ │ └── checker.py # PermissionChecker: evaluate tool calls against rules +│ │ +│ ├── hooks/ # ── Lifecycle Hooks ── +│ │ ├── executor.py # HookExecutor: run Command/HTTP/Prompt/Agent hooks +│ │ ├── loader.py # HookRegistry: load hooks from settings +│ │ ├── events.py # HookEvent: PRE_TOOL_USE, POST_TOOL_USE, SESSION_START +│ │ ├── schemas.py # HookDefinition pydantic model +│ │ ├── types.py # Hook type definitions +│ │ └── hot_reload.py # HookReloader: watch settings for hot-reload +│ │ +│ ├── config/ # ── Configuration ── +│ │ ├── settings.py # Settings model (Pydantic): model, perms, hooks, MCP +│ │ └── paths.py # Config/session/task/memory directory helpers +│ │ +│ ├── mcp/ # ── Model Context Protocol ── +│ │ ├── client.py # McpClientManager: connect to MCP servers via stdio +│ │ ├── config.py # Load MCP server configs from settings +│ │ └── types.py # McpStdioServerConfig, McpToolInfo, McpResourceInfo +│ │ +│ ├── skills/ # ── Skills (on-demand knowledge) ── +│ │ ├── loader.py # Load skills from bundled/ and ~/.openharness/skills/ +│ │ ├── registry.py # SkillRegistry: store loaded skill definitions +│ │ ├── types.py # SkillDefinition model (name, description, content) +│ │ └── bundled/content/ # Built-in skills: commit, debug, plan, review, simplify, test +│ │ +│ ├── plugins/ # ── Plugin System (claude-code compatible) ── +│ │ ├── loader.py # Load plugins from ~/.openharness/plugins/ +│ │ ├── installer.py # Plugin install/uninstall helpers +│ │ ├── schemas.py # PluginManifest model +│ │ └── types.py # LoadedPlugin dataclass +│ │ +│ ├── coordinator/ # ── Multi-Agent Coordination ── +│ │ ├── coordinator_mode.py # TeamRegistry: in-memory team/agent membership +│ │ └── agent_definitions.py # AgentDefinition: built-in roles (default, worker) +│ │ +│ ├── commands/ # ── 54 Interactive Slash Commands ── +│ │ └── registry.py # CommandRegistry: /help, /commit, /plan, /resume, /perms... +│ │ +│ ├── prompts/ # ── System Prompt Assembly ── +│ │ ├── system_prompt.py # Build system prompt from base + environment +│ │ ├── environment.py # EnvironmentInfo: OS, Python, git, cwd detection +│ │ ├── context.py # PromptContext: optional CLAUDE.md injection +│ │ └── claudemd.py # Load/parse CLAUDE.md from project root +│ │ +│ ├── memory/ # ── Persistent Cross-Session Memory ── +│ │ ├── manager.py # Memory file CRUD operations +│ │ ├── memdir.py # MemoryDirectory: persistent storage +│ │ ├── scan.py # Scan markdown memory files +│ │ ├── search.py # Heuristic memory search (token matching) +│ │ ├── paths.py # Memory directory path helpers +│ │ └── types.py # MemoryHeader, MemoryBlock dataclasses +│ │ +│ ├── tasks/ # ── Background Tasks ── +│ │ ├── manager.py # BackgroundTaskManager: spawn async tasks +│ │ ├── types.py # TaskRecord, TaskStatus, TaskType +│ │ ├── local_shell_task.py # ShellTask: background shell commands +│ │ ├── local_agent_task.py # AgentTask: background sub-agent processes +│ │ └── stop_task.py # Graceful task termination +│ │ +│ ├── ui/ # ── UI Layer ── +│ │ ├── app.py # run_repl(): interactive mode entry point +│ │ ├── runtime.py # RuntimeBundle: assemble all components for a session +│ │ ├── backend_host.py # JSON-lines backend server for React TUI (stdin/stdout) +│ │ ├── react_launcher.py # Launch React terminal UI subprocess +│ │ ├── textual_app.py # Fallback Textual TUI (pure Python, no Node.js) +│ │ ├── protocol.py # FrontendRequest/Response protocol models +│ │ ├── permission_dialog.py # Interactive permission confirmation dialog +│ │ ├── input.py # Input handling and line reading +│ │ └── output.py # Output formatting and streaming +│ │ +│ ├── bridge/ # ── External Session Management ── +│ │ ├── manager.py # BridgeSessionManager: track spawned sessions +│ │ ├── session_runner.py # SessionHandle: subprocess lifecycle +│ │ ├── types.py # Bridge communication types +│ │ └── work_secret.py # Secure session secret handling +│ │ +│ ├── services/ # ── Shared Services ── +│ │ ├── session_storage.py # Persist session history to ~/.openharness/sessions/ +│ │ ├── token_estimation.py # Rough token count heuristic +│ │ ├── cron.py # Local cron job registry for scheduled agents +│ │ ├── compact/ # Message compaction (context window management) +│ │ ├── lsp/ # Language Server Protocol integration +│ │ └── oauth/ # OAuth flow helpers for MCP auth +│ │ +│ ├── state/ # ── Application State ── +│ │ ├── app_state.py # AppState: model, mode, theme, cwd, auth, vim, voice +│ │ └── store.py # AppStateStore: observable state with listener pattern +│ │ +│ ├── keybindings/ # ── Keyboard Shortcuts ── +│ │ ├── loader.py # Load from ~/.claude/keybindings.json +│ │ ├── parser.py # Parse keybinding JSON format +│ │ ├── resolver.py # Resolve key combos to actions +│ │ └── default_bindings.py # Default keybinding presets +│ │ +│ ├── output_styles/ # ── Output Customization ── +│ │ └── loader.py # Load custom output styles +│ │ +│ ├── vim/ # ── Vim Mode ── +│ │ └── transitions.py # toggle_vim_mode() state helper +│ │ +│ ├── voice/ # ── Voice Input ── +│ │ ├── voice_mode.py # VoiceDiagnostics, toggle_voice_mode() +│ │ ├── keyterms.py # Voice command keyword mapping +│ │ └── stream_stt.py # Speech-to-text streaming integration +│ │ +│ └── types/ # ── Shared Type Definitions ── +│ └── __init__.py +│ +├── frontend/terminal/ # ══════ React + Ink TUI (TypeScript) ══════ +│ ├── package.json # Dependencies: React 18, Ink 5, TypeScript 5 +│ ├── tsconfig.json # TypeScript configuration +│ └── src/ +│ ├── index.tsx # Entry point, renders +│ ├── App.tsx # Main component: routing, modes, keyboard +│ ├── types.ts # TypeScript interfaces (Config, Transcript, Task...) +│ ├── hooks/ +│ │ └── useBackendSession.ts # JSON-lines backend communication hook +│ └── components/ +│ ├── Composer.tsx # Multi-line prompt input with history +│ ├── CommandPicker.tsx # Slash command autocomplete picker +│ ├── ConversationView.tsx # Render transcript (messages + tool calls) +│ ├── TranscriptPane.tsx # Scrollable transcript display +│ ├── ToolCallDisplay.tsx # Pretty-print tool invocations and results +│ ├── StatusBar.tsx # Top bar: model, cwd, auth status +│ ├── Footer.tsx # Bottom bar: keybindings +│ ├── SelectModal.tsx # Multi-choice selection modal +│ ├── PromptInput.tsx # Single-line input prompt +│ ├── SidePanel.tsx # Side panel: tasks, memory, sessions +│ ├── Spinner.tsx # Loading spinner animation +│ ├── ModalHost.tsx # Portal for modal dialogs +│ └── WelcomeBanner.tsx # Welcome/splash screen +│ +├── tests/ # ══════ Test Suite ══════ +│ ├── conftest.py # Shared pytest fixtures +│ ├── fixtures/ +│ │ └── fake_mcp_server.py # Mock MCP server for testing +│ ├── test_engine/ # Agent loop, message formatting, cost tracking +│ ├── test_api/ # API client, retry, error translation +│ ├── test_tools/ # Individual tool tests (bash, file, web, mcp...) +│ ├── test_permissions/ # Permission checker, mode evaluation +│ ├── test_hooks/ # Hook executor, loader, hot-reload +│ ├── test_commands/ # Slash command registry and execution +│ ├── test_config/ # Settings loading, path resolution +│ ├── test_mcp/ # MCP client connection +│ ├── test_skills/ # Skill loader, bundled skills +│ ├── test_plugins/ # Plugin loader, manifest validation +│ ├── test_memory/ # Memory search, file management +│ ├── test_tasks/ # Background task manager +│ ├── test_coordinator/ # Team registry +│ ├── test_prompts/ # System prompt building +│ ├── test_services/ # Session storage, cron, token estimation +│ ├── test_ui/ # Backend protocol, Textual app +│ └── test_bridge/ # Bridge session management +│ +└── scripts/ # ══════ E2E Test Scripts ══════ + ├── e2e_smoke.py # Full smoke test: real API calls, multiple scenarios + ├── test_harness_features.py # Feature tests: retry, skills, parallel, permissions + ├── test_cli_flags.py # CLI argument parsing tests + ├── test_real_skills_plugins.py # Real skill/plugin loading tests + ├── react_tui_e2e.py # React TUI end-to-end tests + ├── test_react_tui_redesign.py # React TUI redesign validation + ├── test_tui_interactions.py # Terminal UI interaction tests + ├── test_headless_rendering.py # Headless mode rendering tests + └── local_system_scenarios.py # Local filesystem scenario tests ``` ### The Agent Loop -The heart of the harness. One loop, endlessly composable: +The heart of the harness — a **user-driven request-response loop** with an inner autonomous tool-call cycle: -```python -while True: - response = await api.stream(messages, tools) - - if response.stop_reason != "tool_use": - break # Model is done - - for tool_call in response.tool_uses: - # Permission check → Hook → Execute → Hook → Result - result = await harness.execute_tool(tool_call) - - messages.append(tool_results) - # Loop continues — model sees results, decides next action +``` +┌─────────────────── Outer Loop (User-Driven) ───────────────────┐ +│ │ +│ User types prompt │ +│ └→ QueryEngine.submit_message(prompt) │ +│ └→ run_query(context, messages) │ +│ │ +│ ┌──────────── Inner Loop (LLM-Driven, max 8 turns) ────────┐ │ +│ │ │ │ +│ │ 1. api_client.stream_message() │ │ +│ │ ├→ yield TextDelta (real-time streaming) │ │ +│ │ └→ yield MessageComplete │ │ +│ │ │ │ +│ │ 2. If no tool_uses → break (return to user) │ │ +│ │ │ │ +│ │ 3. Execute tools: │ │ +│ │ Pre-Hook → Permission Check → Pydantic Validate │ │ +│ │ → tool.execute() → Post-Hook │ │ +│ │ (single: sequential / multiple: asyncio.gather) │ │ +│ │ │ │ +│ │ 4. Append ToolResultBlocks → next turn │ │ +│ └────────────────────────────────────────────────────────────┘ │ +│ │ +│ ← Wait for next user input │ +└─────────────────────────────────────────────────────────────────┘ ``` The model decides **what** to do. The harness handles **how** — safely, efficiently, with full observability.