Shad enables AI to utilize virtually unlimited context.
Load any directory of markdown, code, or docs — then accomplish complex tasks that would be impossible with a single context window. Shad recursively decomposes tasks, retrieves targeted context for each subtask, generates outputs with type consistency, verifies them, and assembles coherent results.
# Build a full app using your team's patterns and docs
shad run "Build a task management app with auth, offline sync, and push notifications" \
--collection ~/TeamDocs \
--strategy software \
--write-files --output-dir ./TaskAppAI systems break down when:
- Context grows beyond the model's window
- Tasks require reasoning over many documents
- Output quality depends on following specific patterns
- Generated code needs consistent types across files
- You need reproducible, verifiable results
Current solutions (RAG, long-context models) help but don't scale. You can't fit a 100MB documentation collection into any context window.
Long-context reasoning is an inference problem, not a prompting problem.
Shad treats your collection as an explorable environment, not a fixed input:
- Decompose — Break complex tasks into subtasks using domain-specific strategy skeletons
- Retrieve — For each subtask, generate custom retrieval code that searches your collection(s)
- Generate — Produce output with contracts-first type consistency
- Verify — Check syntax, types, and tests with configurable strictness
- Assemble — Synthesize subtask results into coherent output (file manifests for code)
This allows Shad to effectively utilize gigabytes of context — not by loading it all at once, but by intelligently retrieving what's needed for each subtask.
- Python 3.11+
- At least one of:
- Claude CLI — uses your Claude subscription (default)
- Gemini CLI — uses your Google subscription
- Ollama — free, local open-source models
- A collection (any directory of markdown files, code, or docs)
- (Optional) Docker for Redis (enables cross-run caching)
- (Optional) qmd for hybrid semantic search
# One-liner install
curl -fsSL https://raw.githubusercontent.com/jonesj38/shad/main/install.sh | bash
# Or clone and run manually
git clone https://github.com/jonesj38/shad.git
cd shad
./install.shThe installer will:
- Clone the repo to
~/.shad - Create a Python virtual environment
- Install dependencies
- Install qmd for semantic search (if bun/npm available)
- Add
shadto your PATH
After installation, restart your terminal or run:
source ~/.zshrc # or ~/.bashrcshad server start # Start Redis + API server
shad server status # Check status
shad server logs -f # Follow logs# Validate that the collection is registered and searchable
shad collection --collection ~/MyCollection
shad search "oauth refresh token" --collection ~/MyCollection
shad context "How should this app handle auth?" --collection ~/MyCollection
# Preflight the task and get a recommended run command
shad plan "Build a task management app with auth, offline sync, and push notifications" \
--collection ~/Project \
--collection ~/Patterns \
--collection ~/Docs
# Execute the real run
shad run "Build a REST API for user management" \
--collection ~/TeamDocs \
--strategy software \
--profile deep \
--verify strict \
--write-files --output-dir ./api
# Check environment health
shad doctor
shad doctor --fix # Install qmd + register collection + embedshad server stopFor large app-building tasks, the best flow is:
# 1. Build or sync a collection
shad ingest github https://github.com/your-org/your-repo --collection ~/MyCollection --preset docs
shad sources add folder ~/TeamDocs --collection ~/MyCollection --schedule daily
# 2. Index it with qmd
qmd collection add ~/MyCollection --name mycollection
QMD_OPENAI=1 qmd embed
# 3. Validate retrieval before the expensive run
shad collection --collection ~/MyCollection
shad search "authentication patterns" --collection ~/MyCollection
shad context "What are the main architecture constraints?" --collection ~/MyCollection
# 4. Plan the run
shad plan "Build a task management app with auth, offline sync, and push notifications" \
--collection ~/MyCollection
# 5. Execute the run
shad run "Build a task management app with auth, offline sync, and push notifications" \
--collection ~/MyCollection \
--strategy software \
--profile deep \
--verify strict \
--write-files \
--output-dir ./TaskAppUse shad plan when you want Shad to recommend the right strategy, profile, verification level, and output mode before spending tokens on a full recursive run.
Shad supports three model backends. No API keys need to be configured in Shad — each CLI handles its own authentication.
# Use model tier aliases
shad run "Complex task" -O opus -W sonnet -L haiku
# Use haiku for everything (faster, cheaper)
shad run "Simple task" -O haiku -W haiku -L haiku# Use Gemini for everything
shad run "Task" --gemini
# Specify Gemini models per tier
shad run "Task" --gemini -O gemini-3-pro-preview -W gemini-3-flash-previewRequires Gemini CLI installed and authenticated (gemini auth login).
# Use local models (free, runs on your hardware)
shad run "Task" -O qwen3-coder -W llama3 -L llama3
# Mix Ollama with Claude
shad run "Task" -O opus -W llama3 -L qwen3:latestRequires Ollama installed with models pulled (ollama pull llama3). Any model name not matching Claude or Gemini patterns routes to Ollama automatically.
| Tier | Flag | Purpose | Claude Default | Gemini Default |
|---|---|---|---|---|
| Orchestrator | -O |
Planning and synthesis | sonnet |
gemini-3-pro-preview |
| Worker | -W |
Mid-depth execution | sonnet |
gemini-3-pro-preview |
| Leaf | -L |
Fast parallel execution | haiku |
gemini-3-flash-preview |
Instead of simple keyword search, Shad uses Code Mode — the LLM writes Python scripts to retrieve exactly what it needs:
# For task: "How should I implement OAuth?"
# LLM generates:
results = obsidian.search("OAuth implementation", limit=10)
patterns = obsidian.read_note("Patterns/Authentication/OAuth.md")
relevant = []
for r in results:
if "refresh token" in r["content"].lower():
relevant.append(r["content"][:2000])
__result__ = {
"context": f"## OAuth Patterns\n{patterns[:3000]}\n\n## Examples\n{'---'.join(relevant)}",
"citations": [...],
"confidence": 0.72
}This enables:
- Multi-step retrieval — search → read specific files → filter → aggregate
- Query-specific logic — different retrieval strategies per subtask
- Context efficiency — return only what's needed, not entire documents
- Confidence scoring — recovery when retrieval quality is low
Use --no-code-mode to disable Code Mode and use direct search instead.
Complex tasks are broken into manageable subtasks using strategy skeletons:
"Build a mobile app with auth" (software strategy)
↓
├── Types & Contracts (hard dependency for all below)
├── "Set up project structure"
├── "Implement navigation"
├── "Build authentication flow"
│ ├── "Create login screen"
│ ├── "Implement OAuth integration"
│ └── "Add session management"
├── "Create main features"
│ ├── "Task list view"
│ ├── "Task detail screen"
│ └── "Create/edit task form"
├── "Add offline sync"
└── Verification (syntax, types, tests)
Strategies: software, research, analysis, planning. Auto-selected by default, or override with --strategy.
For code generation, Shad uses two-pass import resolution:
- Generate an export index (which symbols live where)
- Generate implementations using the export index as ground truth
- Validate all imports resolve correctly
Output is a structured file manifest — writing to disk requires explicit --write-files.
For best retrieval quality, install qmd for hybrid BM25 + vector search with LLM reranking.
# Install (recommended fork with OpenAI embeddings)
bun install -g https://github.com/jonesj38/qmd#feat/openai-embeddings
# Register your collection as a collection
qmd collection add ~/MyVault --name myvault
# Generate embeddings
QMD_OPENAI=1 qmd embed| Search Mode | Command | Use Case |
|---|---|---|
hybrid |
qmd query |
Best quality (default) — BM25 + vector + RRF + reranking |
bm25 |
qmd search |
Fast keyword matching |
vector |
qmd vsearch |
Pure semantic similarity |
Without qmd, Shad falls back to filesystem search (basic keyword matching). Use shad doctor --fix to install qmd and set up your collection automatically.
# Preflight a task and get a recommended run command
shad plan "Build a task app" --collection ~/collection
# Execute a task with collection context
shad run "Your task" [options]
# Quick context retrieval (faster than run, richer than search)
shad context "query" -c ~/collection
# Search your collection
shad search "query" [--mode hybrid|bm25|vector]
# Check run status
shad status <run_id>
# Cancel a remote async run
shad cancel <run_id> [--api http://localhost:8000]
# View execution tree
shad trace tree <run_id>
# Inspect specific node
shad trace node <run_id> <node_id>
# Resume partial run
shad resume <run_id> [--profile deep] [--auto-profile] [--replay stale]
# Export files from completed run
shad export <run_id> --output ./out
# List available models
shad models [--refresh] [--ollama]
# Inspect collection/index status
shad collection [--collection ~/collection]--collection, -c Collection path(s) for context (repeatable)
--retriever, -r Backend: auto|qmd|filesystem (default: auto)
--strategy, -s Force strategy: software|research|analysis|planning
--profile Budget preset: fast|balanced|deep
--auto-profile Auto-select profile based on machine specs
--dry-run Show budgets/models and exit (no execution)
--max-depth, -d Maximum recursion depth (default: 3)
--max-nodes Maximum DAG nodes (default: 50)
--max-time, -t Maximum wall time in seconds (default: 1200)
--verify Verification level: off|basic|build|strict
--write-files Write output files to disk
--output-dir Output directory (requires --write-files)
--no-code-mode Disable Code Mode (use direct search)
--qmd-hybrid/--no-qmd-hybrid Toggle hybrid search with reranking (default: on)
--quiet, -q Suppress verbose output
-O Orchestrator model (opus, sonnet, haiku, or any model ID)
-W Worker model
-L Leaf model
--gemini Use Gemini CLI instead of Claude CLI
shad plan "Build a task app" --collection ~/Collection
shad plan "Analyze this architecture" --collection ~/Collection --jsonshad plan performs a low-cost preflight:
- resolves collections and retriever
- selects a recommended strategy
- suggests a machine-appropriate profile
- checks whether the goal retrieves useful context
- prints a recommended
shad run ...command
shad server start # Start Redis + API server
shad server stop # Stop all services
shad server status # Check service status
shad server logs [-f] # View/follow logsshad doctor # Check environment health (Python, qmd, Redis, collection)
shad doctor --fix # Auto-fix: install qmd, register collection, generate embeddings
shad init # Initialize project permissions for Claude Code
shad collection # Check collection + retriever statusAutomatically sync content from external sources on a schedule.
# Add sources
shad sources add github https://github.com/org/repo --schedule weekly --collection ~/Collection
shad sources add url https://docs.example.com/api --schedule daily --collection ~/Collection
shad sources add feed https://blog.example.com/rss --schedule hourly --collection ~/Collection
shad sources add folder ~/LocalDocs --schedule daily --collection ~/Collection
# Manage
shad sources list # List all sources
shad sources status # Detailed status (schedule, last/next sync)
shad sources sync # Sync due sources
shad sources sync --force # Force sync all
shad sources remove <id> # Remove a sourceSchedules: manual, hourly, daily, weekly, monthly
# Ingest a GitHub repo into your collection
shad ingest github <url> --collection ~/Collection --preset docs
# Presets: mirror (all files), docs (documentation only), deep (with code)# Cold-start (good default)
shad run "task" --collection ~/V -O sonnet -W sonnet -L haiku
# Fast + cheap
shad run "task" --collection ~/V --profile fast -O haiku -W haiku -L haiku
# Auto profile (adapts to your machine)
shad run "task" --collection ~/V --auto-profile
# Deep reasoning (large tasks)
shad run "task" --collection ~/V --profile deep -O opus -W sonnet -L haiku
# Preview before running
shad run "task" --collection ~/V --auto-profile --dry-run
# Or preflight the real command
shad plan "task" --collection ~/V --auto-profileLow-end laptop / small VM:
DEFAULT_MAX_DEPTH=2
DEFAULT_MAX_NODES=30
DEFAULT_MAX_WALL_TIME=600
DEFAULT_MAX_TOKENS=800000Mid-range dev machine (recommended):
DEFAULT_MAX_DEPTH=3
DEFAULT_MAX_NODES=50
DEFAULT_MAX_WALL_TIME=1200
DEFAULT_MAX_TOKENS=2000000High-end workstation:
DEFAULT_MAX_DEPTH=4
DEFAULT_MAX_NODES=80
DEFAULT_MAX_WALL_TIME=1800
DEFAULT_MAX_TOKENS=3000000User
│
▼
Shad CLI / API
│
├── RLM Engine
│ │
│ ├── Strategy Selection (heuristic + LLM)
│ │
│ ├── Decomposition (skeleton + LLM refinement)
│ │
│ ├── Code Mode (LLM generates retrieval scripts)
│ │ │
│ │ ▼
│ ├── CodeExecutor ──> RetrievalLayer ──> Your Collection(s)
│ │ │
│ │ ┌────┴────┐
│ │ │ │
│ │ qmd Filesystem
│ │ (semantic) (fallback)
│ │
│ ├── Verification (syntax, types, tests)
│ │
│ └── Synthesis (combine subtask results)
│
├── Redis (cache + budget ledger)
└── History (run artifacts)
| Component | Purpose |
|---|---|
| RLM Engine | Recursive decomposition and execution |
| Strategy Skeletons | Domain-specific decomposition templates (software, research, analysis, planning) |
| Code Mode | LLM-generated retrieval scripts |
| CodeExecutor | Sandboxed Python execution (configurable profiles) |
| RetrievalLayer | Collection search abstraction (qmd or filesystem fallback) |
| qmd | Hybrid BM25 + vector search with LLM reranking |
| Verification Layer | Syntax, type, import, test checking (progressive strictness) |
| Redis Cache | Cache subtask results with hash validation |
| LLM Provider | Multi-backend: Claude CLI, Gemini CLI, Ollama |
Shad works with minimal configuration. Set optional environment variables in ~/.shad/.env or your shell profile.
# Default collection (so you don't need --collection every time)
SHAD_COLLECTION_PATH=/path/to/your/collection
# Redis for cross-run caching (defaults to localhost:6379)
REDIS_URL=redis://localhost:6379/0
# Budget defaults
DEFAULT_MAX_DEPTH=3
DEFAULT_MAX_NODES=50
DEFAULT_MAX_WALL_TIME=1200
DEFAULT_MAX_TOKENS=2000000| Directory | Purpose |
|---|---|
~/.shad/history/ |
Run artifacts and history |
~/.shad/skills/ |
Skill definitions |
~/.shad/CORE/ |
Core system files |
~/.shad/repo/ |
Installed Shad source |
~/.shad/venv/ |
Python virtual environment |
| One Collection | Many Collections | |
|---|---|---|
| Pros | Single source of truth, cross-topic connections, simpler management | Faster indexing, focused retrieval, easier sharing/permissions |
| Cons | Slower as it grows, noise in retrieval, harder to share subsets | Context fragmentation, can't find cross-collection connections, more overhead |
Use one collection for personal/work knowledge — memory, tasks, notes, projects all interconnected.
Use separate collections for codebases, client deliverables needing isolation, or read-only reference material.
Multi-collection queries search in priority order:
shad run "Build auth system" --collection ~/Project --collection ~/Patterns --collection ~/Docs- Use consistent frontmatter for better filtering
- Include code examples with context, not just snippets
- Link related notes for better discovery
- Keep notes focused (one concept per note)
- Authoritative sources and worked examples improve output quality
All core phases complete:
- Foundation — CLI, API, RLM engine, Redis caching
- qmd migration — hybrid search, multi-collection, no Collection dependency
- Task-aware decomposition — strategy skeletons, soft dependencies
- File output mode — two-pass imports, contracts-first
- Verification layer — progressive strictness, repair loops
- Iterative refinement — HITL checkpoints, delta resume
- Collection curation — ingestion, gap detection
- Sources scheduler — automated sync from GitHub, URLs, feeds, folders
- Multi-provider — Claude CLI, Gemini CLI, Ollama support
- Context command — fast retrieval + synthesis without DAG overhead
- Doctor command — environment health checks with auto-fix
- Performance profiles — fast/balanced/deep presets, auto-profile
See SPEC.md for the technical specification, QMD_PIVOT.md for the qmd migration rationale.
Solve a problem once. Encode it as knowledge. Never solve it again.
Shad compounds your knowledge. Every document you add makes it more capable. The collection is the how — patterns, examples, documentation. Shad is the engine — decomposition, retrieval, generation, verification, assembly.
Together: complex tasks that learn from your accumulated knowledge.
Contributions welcome. See SPEC.md for architecture details before submitting PRs.
MIT