CLI Agent Ergonomics

A specification for building CLI tools that work reliably under AI agent orchestration.

Purpose

Define a minimal, implementable contract that makes a CLI tool predictably usable by an AI agent — without requiring the agent to parse free-text output, guess retry safety, or handle tool-specific edge cases.

The spec is structured as:

65 challenges — documented failure modes observed when agents call real CLI tools
133 requirements — the contracts a framework or tool must satisfy to eliminate those failures
JSON schemas — machine-readable type definitions an agent or codegen tool can consume directly

Motivation

AI agents call CLI tools constantly — to deploy infrastructure, query APIs, manage files, run pipelines. When those tools misbehave under automation, the agent has no reliable way to recover:

exit 1 on every failure forces the agent to parse error text to understand what went wrong
No retryability signal means the agent either retries blindly (risking duplicate side effects) or gives up unnecessarily
Interactive prompts block execution indefinitely in non-TTY environments
Mixed stdout/stderr output breaks JSON parsing
Unbounded output exhausts the agent's context window
Inconsistent behavior across tool versions makes pre-planned retry strategies unreliable

These are not edge cases — they are the default behavior of most CLI tools today. The cost falls entirely on the agent: wasted tokens, stalled pipelines, data corruption from blind retries, and cascading failures that are hard to diagnose.

This specification eliminates those costs by defining what a CLI tool must guarantee so that an agent can call it safely, interpret the result unambiguously, and plan its next action without inspecting free-text output.

What's in this repo

Path	Contents
`challenges/`	65 failure modes grouped into 7 parts, each with severity, frequency, and agent impact
`requirements/`	133 requirements across 3 tiers that address the challenges
`schemas/`	JSON Schema definitions for exit codes, response envelopes, and the tool manifest
`comparison-matrix.md`	How 12 existing frameworks (argparse, click, cobra, clap, …) cover the 65 challenges
`research/alternatives-landscape.md`	Competitive landscape: MCP, OpenAPI, function calling, shell wrappers, and competing proposals evaluated against the spec
`skills/`	Agent skills for evaluating CLIs and implementing the spec

The 65 challenges

Grouped into 7 parts by category:

Part	Challenges	Focus
1	32	Ecosystem, runtime, agent-specific patterns
2	9	Execution and reliability
3	4	Security
4	8	Output and parsing
5	5	Environment and state
6	6	Errors and discoverability
7	1	Observability

Each challenge documents: severity, frequency, detectability, token spend, time cost, and context cost — from the agent's perspective.

The 133 requirements

Three tiers, implemented in order:

Tier	Count	Meaning
F — Framework-Automatic	66	Enforced by the framework without command author action
C — Command Contract	26	Declared by the command author at registration
O — Opt-In	41	Explicitly enabled by the application

Start with the P0 Framework requirements — they establish the exit code table, response envelope, and validation phase boundary that everything else depends on.

Key contracts

Exit codes — 14 named codes (0–13) covering every standard condition. Each carries machine-readable guarantees: whether the operation is retryable and how far side effects progressed. Commands declare every code they may emit at registration time. See exit-code.json.

Response envelope — every command output is wrapped in { ok, data, error, warnings, meta }. The same keys are always present regardless of success, failure, or result count. See response-envelope.json.

Tool manifest — a single tool manifest command returns the complete command tree as JSON: every subcommand, flag, type, description, exit code, and example. Agents can construct valid calls without iterating --help across every subcommand. See manifest-response.json.

For implementers

If you are implementing this specification in a CLI framework or tool, read IMPLEMENTING.md. It covers:

Requirement tier order (F → C → O)
How to read a requirement file
Generating language-specific types from the schemas (Python, TypeScript, Rust, Go, Java)
Key invariants that code generators do not enforce
Suggested implementation order

Agent skills

Three installable skills are available for any Agent Skills-compatible agent (Claude Code, Cursor, Gemini CLI, Copilot, and others):

Skill	Install	Purpose
`cli-agent-evaluate`	`npx skills install romamo/cli-agent-ergonomics/skills/cli-agent-evaluate`	Evaluate a CLI against a single challenge — scores 0–3, provides applicable workaround
`cli-agent-implement`	`npx skills install romamo/cli-agent-ergonomics/skills/cli-agent-implement`	Guide implementing the spec in a CLI framework, tier by tier
`cli-agent-onboard`	`npx skills install romamo/cli-agent-ergonomics/skills/cli-agent-onboard`	Profile a CLI tool before evaluation — detects runtime, binary, flags, and timeout method

Run cli-agent-onboard once per CLI, then use cli-agent-evaluate for targeted challenge evaluation or cli-agent-implement when building a framework.

For AI agents

Implementing the spec: use the cli-agent-implement skill or read IMPLEMENTING.md directly
Editing the spec: read AGENTS.md

For spec editors

AGENTS.md defines all conventions for adding or updating challenges, requirements, and schemas: file naming, required sections, styling rules, and cross-reference format.

To validate cross-links after any edit, use the /validate-links skill (Claude Code) or run the scripts in .claude/skills/validate-links/SKILL.md directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLI Agent Ergonomics

Purpose

Motivation

What's in this repo

The 65 challenges

The 133 requirements

Key contracts

For implementers

Agent skills

For AI agents

For spec editors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.agents		.agents
.claude/skills/validate-links		.claude/skills/validate-links
challenges		challenges
requirements		requirements
research		research
schemas		schemas
scripts		scripts
skills		skills
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
IMPLEMENTING.md		IMPLEMENTING.md
README.md		README.md
comparison-matrix.md		comparison-matrix.md

Folders and files

Latest commit

History

Repository files navigation

CLI Agent Ergonomics

Purpose

Motivation

What's in this repo

The 65 challenges

The 133 requirements

Key contracts

For implementers

Agent skills

For AI agents

For spec editors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages