Agentic CLI is a command line toolkit for researching topics, planning workflows, simulating shell commands (with optional execution), and optionally retrieving documentation from MCP-compatible servers. The current milestone delivers the project scaffold, a lightweight runner loop, artifact helpers, a fully functional web research workflow with explainable reports, deterministic planning with Mermaid diagram generation, and a simulation-first runner that produces auditable run logs.
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env # adjust values as needed
agent --helpRun a research session to validate the workflow and generated artifacts:
agent research "hello world in rust"The command writes a timestamped directory under artifacts/ containing report.md and
sources.json with numbered references. Add tokens like openai/openai-cookbook:README.md to the
topic to pull GitHub files into the report alongside the web sources.
The Typer-based CLI exposes four top-level commands:
| Command | Description |
|---|---|
agent research "topic" |
Perform web research, summarise findings, and emit markdown + JSON artifacts. |
agent plan "goal" |
Generate a structured plan, Markdown brief, and Mermaid diagram artifacts. |
agent run "goal" |
Generate a plan, propose shell commands, and simulate or execute them with a run log. |
agent docs "library@version" |
Retrieve documentation snapshots via Context7 MCP (requires configuration). |
Run agent --help or agent <command> --help for the latest options. For a complete, real
example of the planning flow see examples/nextauth_plan.md.
Prefer calling the workflows directly? Import the high-level SDK dataclasses/functions from
agentic_cli.sdk to bypass Typer option parsing and work with typed results:
from agentic_cli.sdk import (
DocsRequest,
PlanRequest,
ResearchRequest,
RunRequest,
create_plan,
execute_plan,
fetch_docs,
perform_research,
)
artifacts = perform_research(ResearchRequest(topic="vector databases"))
if artifacts:
print(artifacts.report_path)
plan = create_plan(PlanRequest(goal="set up Next.js + NextAuth"))
print(plan.plan_markdown, len(plan.steps))
run = execute_plan(RunRequest(goal="ship release", allow_run=False))
print(run.run_log, len(run.records))
docs = fetch_docs(DocsRequest(package="next-auth@5.0.0", limit=5))
print(docs.docs_markdown)Every SDK call returns dataclasses containing resolved artifact paths plus structured metadata (e.g. plan steps, execution records, Context7 responses), keeping programmatic integrations simple.
The acceptance test scenario, agent plan "set up Next.js + NextAuth", produces a timestamped
artifact directory containing Markdown, JSON, and Mermaid files. A captured output is available in
examples/nextauth_plan.md alongside the Mermaid definition so you can
render the SVG locally once @mermaid-js/mermaid-cli is installed.
- Search the web via a lightweight DuckDuckGo HTML query (
tools/web_search.py). - Fetch each result, extract readable text with Readability + BeautifulSoup, and chunk the content.
- Summarise the chunks deterministically and register each source with the citation registry.
- Detect
owner/repo:pathtokens in the topic, fetch the referenced files via the GitHub reader, summarise them, and add their links as additional citations. - Emit
report.mdcontaining a “Findings” section with inline references plus a numbered “Sources” list, and save the structured source metadata tosources.json. - Store all artifacts in
artifacts/<timestamp>_research/so runs are easy to audit.
The implementation is intentionally mockable: helper functions accept injectable HTTP clients, the summariser is deterministic, and the citation registry deduplicates URLs to keep references tidy.
This repository uses Black, Ruff, MyPy, and pytest. Convenience commands are provided in the
Makefile.
make install # install project and dev dependencies
make lint # run Ruff and Black (check mode)
make type # run mypy static type checks
make test # run pytest suite
make format # format code with Ruff (import order) and BlackTo run the CLI manually without installing the package globally:
python -m agentic_cli.cli.app --helpBuild custom tools and deterministic workflows with the extension SDK and scaffolding command. See docs/extending.md for a guided tour of the registry, code generation, and testing story.
src/agentic_cli/
__init__.py # package metadata
config.py # pydantic-powered settings helper
artifacts/
manager.py # artifact directory helpers
cli/
app.py # Typer application wiring subcommands
research_cmd.py # CLI wrappers delegating to the SDK research workflow
plan_cmd.py # CLI wrapper delegating to the SDK planner
run_cmd.py # CLI wrapper delegating to the SDK executor
docs_cmd.py # CLI wrapper delegating to the SDK documentation fetcher
sdk/
__init__.py # consolidated SDK exports
research.py # research workflow logic + dataclasses
plan.py # planning workflow logic + dataclasses
run.py # execution workflow logic + dataclasses
docs.py # documentation workflow logic + dataclasses
runner/
agent.py # agent + tool dataclasses
policy.py # heuristic policy helper
runner.py # orchestration loop with simulation + denylist
tools/
__init__.py
citations.py # citation registry helpers
github_reader.py # GitHub file fetch + summarisation helpers
mermaid_gen.py # Mermaid diagram helpers
shell.py # simulation-first shell execution helpers
web_search.py # search, fetch, chunk, summarise utilities
Tests live under tests/ and cover the CLI scaffold, configuration parsing, runner behaviour, the
research tooling (chunking, citation formatting, summarisation limits, GitHub ingestion), the
planning workflow, Context7 integration, and the shell simulator.
- Parse the goal to identify primary focus items (technologies, packages, or deliverables).
- Assemble a deterministic sequence of steps covering discovery, preparation, implementation, and validation, enriching the plan with detailed guidance.
- Render a Markdown briefing and JSON payload under a timestamped directory in
artifacts/. - Generate a Mermaid flowchart (
plan.mmd) and render an SVG diagram whenmmdcfrom@mermaid-js/mermaid-cliis installed.
This deterministic approach keeps plans auditable and makes it easy to version the diagrams alongside the textual steps.
- Generate a deterministic plan using the same focus extraction as the planning command.
- Convert each step into a safe shell command suggestion (currently an
echopreview) using the shell tool helpers. - Execute each command in simulation mode by default, capturing previews, stdout, and stderr in
memory. Pass
--allow-runto run the commands for real—dangerous patterns such asrm -rf,sudo,chmod 777, andcurl|shremain blocked by the denylist. - Persist a Markdown
run_log.mdand structuredrun_trace.jsonunderartifacts/<timestamp>_run/so every execution is auditable.
This flow gives you an end-to-end rehearsal of the plan before choosing to execute commands in your environment.
The optional agent docs command integrates with the Context7 MCP service to capture structured
documentation for a package. To enable it:
- Install Node.js 18+ and the Context7 CLI if you plan to run a local MCP server.
- Set
CONTEXT7_URLin your environment (e.g.,https://context7.your-domain/v1). - Optionally set
CONTEXT7_API_KEYwhen the service requires authentication. - Leave
CONTEXT7_MODEathttpfor HTTP-based access (the default in.env.example).
Running agent docs next-auth@5.0.0 writes a timestamped directory under artifacts/ containing:
docs.json— the structured payload from Context7 (library metadata + document entries).docs.md— a human-readable summary with titles, summaries, and source hints.
These artifacts can be referenced by subsequent agent research or agent plan runs to augment
their context with authoritative documentation.
The runner module introduces:
- Agent state (
agent.py) — dataclasses for messages, tool definitions, and tool call results, along with a lightweightAgentcontainer for instructions and memory. - Policy heuristics (
policy.py) — a deterministic decision helper that selects tools when the last message is from the user and falls back to final responses otherwise. - Execution loop (
runner.py) — coordinates policy decisions, invokes tools, enforces simulation mode for shell commands, blocks denylisted patterns (e.g.,rm -rf), logs progress viastructlog, and returns structured trace data for downstream formatting. - Tool plugins (
runner/plugins.py) — discover additionalToolDefinitionimplementations from configuration or package entry points. See docs/third_party_tools.md for details on distributing custom tools.
GitHub Actions runs make lint, make type, and make test on every push and pull request. The
workflow lives at .github/workflows/ci.yml and ensures Ruff, Black,
MyPy, and pytest stay green across environments.
Future iterations will focus on swapping deterministic stubs for real LLM calls (once API keys are configured), adding caching for repeated research runs, and enriching the runner policy with more representative traces.
🧪 Before contributing, READ THE BRANCHING GUIDE! We've had enough interdimensional Git disasters.
- CONTRIBUTING.md - Complete development workflow and branching strategy
- docs/branching-workflow.md - Mermaid diagrams showing proper Git workflow
TL;DR: Always create feature branches from latest main, keep them short-lived, and test before pushing. No exceptions.
This project is licensed under the terms of the MIT License. See LICENSE.