Shad (Shannon's Daemon)

Shad enables AI to utilize virtually unlimited context.

Load any directory of markdown, code, or docs — then accomplish complex tasks that would be impossible with a single context window. Shad recursively decomposes tasks, retrieves targeted context for each subtask, generates outputs with type consistency, verifies them, and assembles coherent results.

# Build a full app using your team's patterns and docs
shad run "Build a task management app with auth, offline sync, and push notifications" \
  --collection ~/TeamDocs \
  --strategy software \
  --write-files --output-dir ./TaskApp

The Problem

AI systems break down when:

Context grows beyond the model's window
Tasks require reasoning over many documents
Output quality depends on following specific patterns
Generated code needs consistent types across files
You need reproducible, verifiable results

Current solutions (RAG, long-context models) help but don't scale. You can't fit a 100MB documentation collection into any context window.

The Solution

Long-context reasoning is an inference problem, not a prompting problem.

Shad treats your collection as an explorable environment, not a fixed input:

Decompose — Break complex tasks into subtasks using domain-specific strategy skeletons
Retrieve — For each subtask, generate custom retrieval code that searches your collection(s)
Generate — Produce output with contracts-first type consistency
Verify — Check syntax, types, and tests with configurable strictness
Assemble — Synthesize subtask results into coherent output (file manifests for code)

This allows Shad to effectively utilize gigabytes of context — not by loading it all at once, but by intelligently retrieving what's needed for each subtask.

Quick Start

Prerequisites

Python 3.11+
At least one of:
- Claude CLI — uses your Claude subscription (default)
- Gemini CLI — uses your Google subscription
- Ollama — free, local open-source models
A collection (any directory of markdown files, code, or docs)
(Optional) Docker for Redis (enables cross-run caching)
(Optional) qmd for hybrid semantic search

Installation

# One-liner install
curl -fsSL https://raw.githubusercontent.com/jonesj38/shad/main/install.sh | bash

# Or clone and run manually
git clone https://github.com/jonesj38/shad.git
cd shad
./install.sh

The installer will:

Clone the repo to ~/.shad
Create a Python virtual environment
Install dependencies
Install qmd for semantic search (if bun/npm available)
Add shad to your PATH

After installation, restart your terminal or run:

source ~/.zshrc  # or ~/.bashrc

Start the Server

shad server start     # Start Redis + API server
shad server status    # Check status
shad server logs -f   # Follow logs

Basic Usage

# Validate that the collection is registered and searchable
shad collection --collection ~/MyCollection
shad search "oauth refresh token" --collection ~/MyCollection
shad context "How should this app handle auth?" --collection ~/MyCollection

# Preflight the task and get a recommended run command
shad plan "Build a task management app with auth, offline sync, and push notifications" \
  --collection ~/Project \
  --collection ~/Patterns \
  --collection ~/Docs

# Execute the real run
shad run "Build a REST API for user management" \
  --collection ~/TeamDocs \
  --strategy software \
  --profile deep \
  --verify strict \
  --write-files --output-dir ./api

# Check environment health
shad doctor
shad doctor --fix   # Install qmd + register collection + embed

Stop the Server

shad server stop

Recommended Workflow

For large app-building tasks, the best flow is:

# 1. Build or sync a collection
shad ingest github https://github.com/your-org/your-repo --collection ~/MyCollection --preset docs
shad sources add folder ~/TeamDocs --collection ~/MyCollection --schedule daily

# 2. Index it with qmd
qmd collection add ~/MyCollection --name mycollection
QMD_OPENAI=1 qmd embed

# 3. Validate retrieval before the expensive run
shad collection --collection ~/MyCollection
shad search "authentication patterns" --collection ~/MyCollection
shad context "What are the main architecture constraints?" --collection ~/MyCollection

# 4. Plan the run
shad plan "Build a task management app with auth, offline sync, and push notifications" \
  --collection ~/MyCollection

# 5. Execute the run
shad run "Build a task management app with auth, offline sync, and push notifications" \
  --collection ~/MyCollection \
  --strategy software \
  --profile deep \
  --verify strict \
  --write-files \
  --output-dir ./TaskApp

Use shad plan when you want Shad to recommend the right strategy, profile, verification level, and output mode before spending tokens on a full recursive run.

LLM Providers

Shad supports three model backends. No API keys need to be configured in Shad — each CLI handles its own authentication.

Claude CLI (default)

# Use model tier aliases
shad run "Complex task" -O opus -W sonnet -L haiku

# Use haiku for everything (faster, cheaper)
shad run "Simple task" -O haiku -W haiku -L haiku

Gemini CLI

# Use Gemini for everything
shad run "Task" --gemini

# Specify Gemini models per tier
shad run "Task" --gemini -O gemini-3-pro-preview -W gemini-3-flash-preview

Requires Gemini CLI installed and authenticated (gemini auth login).

Ollama (local models)

# Use local models (free, runs on your hardware)
shad run "Task" -O qwen3-coder -W llama3 -L llama3

# Mix Ollama with Claude
shad run "Task" -O opus -W llama3 -L qwen3:latest

Requires Ollama installed with models pulled (ollama pull llama3). Any model name not matching Claude or Gemini patterns routes to Ollama automatically.

Model Tiers

Tier	Flag	Purpose	Claude Default	Gemini Default
Orchestrator	`-O`	Planning and synthesis	`sonnet`	`gemini-3-pro-preview`
Worker	`-W`	Mid-depth execution	`sonnet`	`gemini-3-pro-preview`
Leaf	`-L`	Fast parallel execution	`haiku`	`gemini-3-flash-preview`

How It Works

Code Mode: Intelligent Retrieval

Instead of simple keyword search, Shad uses Code Mode — the LLM writes Python scripts to retrieve exactly what it needs:

# For task: "How should I implement OAuth?"
# LLM generates:

results = obsidian.search("OAuth implementation", limit=10)
patterns = obsidian.read_note("Patterns/Authentication/OAuth.md")

relevant = []
for r in results:
    if "refresh token" in r["content"].lower():
        relevant.append(r["content"][:2000])

__result__ = {
    "context": f"## OAuth Patterns\n{patterns[:3000]}\n\n## Examples\n{'---'.join(relevant)}",
    "citations": [...],
    "confidence": 0.72
}

This enables:

Multi-step retrieval — search → read specific files → filter → aggregate
Query-specific logic — different retrieval strategies per subtask
Context efficiency — return only what's needed, not entire documents
Confidence scoring — recovery when retrieval quality is low

Use --no-code-mode to disable Code Mode and use direct search instead.

Strategy-Based Decomposition

Complex tasks are broken into manageable subtasks using strategy skeletons:

"Build a mobile app with auth" (software strategy)
         ↓
├── Types & Contracts (hard dependency for all below)
├── "Set up project structure"
├── "Implement navigation"
├── "Build authentication flow"
│   ├── "Create login screen"
│   ├── "Implement OAuth integration"
│   └── "Add session management"
├── "Create main features"
│   ├── "Task list view"
│   ├── "Task detail screen"
│   └── "Create/edit task form"
├── "Add offline sync"
└── Verification (syntax, types, tests)

Strategies: software, research, analysis, planning. Auto-selected by default, or override with --strategy.

File Output with Type Consistency

For code generation, Shad uses two-pass import resolution:

Generate an export index (which symbols live where)
Generate implementations using the export index as ground truth
Validate all imports resolve correctly

Output is a structured file manifest — writing to disk requires explicit --write-files.

Semantic Search (qmd)

For best retrieval quality, install qmd for hybrid BM25 + vector search with LLM reranking.

# Install (recommended fork with OpenAI embeddings)
bun install -g https://github.com/jonesj38/qmd#feat/openai-embeddings

# Register your collection as a collection
qmd collection add ~/MyVault --name myvault

# Generate embeddings
QMD_OPENAI=1 qmd embed

Search Mode	Command	Use Case
`hybrid`	`qmd query`	Best quality (default) — BM25 + vector + RRF + reranking
`bm25`	`qmd search`	Fast keyword matching
`vector`	`qmd vsearch`	Pure semantic similarity

Without qmd, Shad falls back to filesystem search (basic keyword matching). Use shad doctor --fix to install qmd and set up your collection automatically.

CLI Reference

Core Commands

# Preflight a task and get a recommended run command
shad plan "Build a task app" --collection ~/collection

# Execute a task with collection context
shad run "Your task" [options]

# Quick context retrieval (faster than run, richer than search)
shad context "query" -c ~/collection

# Search your collection
shad search "query" [--mode hybrid|bm25|vector]

# Check run status
shad status <run_id>

# Cancel a remote async run
shad cancel <run_id> [--api http://localhost:8000]

# View execution tree
shad trace tree <run_id>

# Inspect specific node
shad trace node <run_id> <node_id>

# Resume partial run
shad resume <run_id> [--profile deep] [--auto-profile] [--replay stale]

# Export files from completed run
shad export <run_id> --output ./out

# List available models
shad models [--refresh] [--ollama]

# Inspect collection/index status
shad collection [--collection ~/collection]

Run Options

--collection, -c            Collection path(s) for context (repeatable)
--retriever, -r        Backend: auto|qmd|filesystem (default: auto)
--strategy, -s         Force strategy: software|research|analysis|planning
--profile              Budget preset: fast|balanced|deep
--auto-profile         Auto-select profile based on machine specs
--dry-run              Show budgets/models and exit (no execution)
--max-depth, -d        Maximum recursion depth (default: 3)
--max-nodes            Maximum DAG nodes (default: 50)
--max-time, -t         Maximum wall time in seconds (default: 1200)
--verify               Verification level: off|basic|build|strict
--write-files          Write output files to disk
--output-dir           Output directory (requires --write-files)
--no-code-mode         Disable Code Mode (use direct search)
--qmd-hybrid/--no-qmd-hybrid  Toggle hybrid search with reranking (default: on)
--quiet, -q            Suppress verbose output
-O                     Orchestrator model (opus, sonnet, haiku, or any model ID)
-W                     Worker model
-L                     Leaf model
--gemini               Use Gemini CLI instead of Claude CLI

Planning

shad plan "Build a task app" --collection ~/Collection
shad plan "Analyze this architecture" --collection ~/Collection --json

shad plan performs a low-cost preflight:

resolves collections and retriever
selects a recommended strategy
suggests a machine-appropriate profile
checks whether the goal retrieves useful context
prints a recommended shad run ... command

Server Management

shad server start      # Start Redis + API server
shad server stop       # Stop all services
shad server status     # Check service status
shad server logs [-f]  # View/follow logs

Environment & Setup

shad doctor            # Check environment health (Python, qmd, Redis, collection)
shad doctor --fix      # Auto-fix: install qmd, register collection, generate embeddings
shad init              # Initialize project permissions for Claude Code
shad collection        # Check collection + retriever status

Sources Scheduler

Automatically sync content from external sources on a schedule.

# Add sources
shad sources add github https://github.com/org/repo --schedule weekly --collection ~/Collection
shad sources add url https://docs.example.com/api --schedule daily --collection ~/Collection
shad sources add feed https://blog.example.com/rss --schedule hourly --collection ~/Collection
shad sources add folder ~/LocalDocs --schedule daily --collection ~/Collection

# Manage
shad sources list              # List all sources
shad sources status            # Detailed status (schedule, last/next sync)
shad sources sync              # Sync due sources
shad sources sync --force      # Force sync all
shad sources remove <id>       # Remove a source

Schedules: manual, hourly, daily, weekly, monthly

Collection Ingestion

# Ingest a GitHub repo into your collection
shad ingest github <url> --collection ~/Collection --preset docs

# Presets: mirror (all files), docs (documentation only), deep (with code)

Performance Profiles

Quick Patterns

# Cold-start (good default)
shad run "task" --collection ~/V -O sonnet -W sonnet -L haiku

# Fast + cheap
shad run "task" --collection ~/V --profile fast -O haiku -W haiku -L haiku

# Auto profile (adapts to your machine)
shad run "task" --collection ~/V --auto-profile

# Deep reasoning (large tasks)
shad run "task" --collection ~/V --profile deep -O opus -W sonnet -L haiku

# Preview before running
shad run "task" --collection ~/V --auto-profile --dry-run

# Or preflight the real command
shad plan "task" --collection ~/V --auto-profile

Budget Defaults by Machine

Low-end laptop / small VM:

DEFAULT_MAX_DEPTH=2
DEFAULT_MAX_NODES=30
DEFAULT_MAX_WALL_TIME=600
DEFAULT_MAX_TOKENS=800000

Mid-range dev machine (recommended):

DEFAULT_MAX_DEPTH=3
DEFAULT_MAX_NODES=50
DEFAULT_MAX_WALL_TIME=1200
DEFAULT_MAX_TOKENS=2000000

High-end workstation:

DEFAULT_MAX_DEPTH=4
DEFAULT_MAX_NODES=80
DEFAULT_MAX_WALL_TIME=1800
DEFAULT_MAX_TOKENS=3000000

Architecture

User
   │
   ▼
Shad CLI / API
   │
   ├── RLM Engine
   │       │
   │       ├── Strategy Selection (heuristic + LLM)
   │       │
   │       ├── Decomposition (skeleton + LLM refinement)
   │       │
   │       ├── Code Mode (LLM generates retrieval scripts)
   │       │       │
   │       │       ▼
   │       ├── CodeExecutor ──> RetrievalLayer ──> Your Collection(s)
   │       │                         │
   │       │                    ┌────┴────┐
   │       │                    │         │
   │       │                   qmd    Filesystem
   │       │               (semantic)  (fallback)
   │       │
   │       ├── Verification (syntax, types, tests)
   │       │
   │       └── Synthesis (combine subtask results)
   │
   ├── Redis (cache + budget ledger)
   └── History (run artifacts)

Key Components

Component	Purpose
RLM Engine	Recursive decomposition and execution
Strategy Skeletons	Domain-specific decomposition templates (`software`, `research`, `analysis`, `planning`)
Code Mode	LLM-generated retrieval scripts
CodeExecutor	Sandboxed Python execution (configurable profiles)
RetrievalLayer	Collection search abstraction (qmd or filesystem fallback)
qmd	Hybrid BM25 + vector search with LLM reranking
Verification Layer	Syntax, type, import, test checking (progressive strictness)
Redis Cache	Cache subtask results with hash validation
LLM Provider	Multi-backend: Claude CLI, Gemini CLI, Ollama

Configuration

Shad works with minimal configuration. Set optional environment variables in ~/.shad/.env or your shell profile.

# Default collection (so you don't need --collection every time)
SHAD_COLLECTION_PATH=/path/to/your/collection

# Redis for cross-run caching (defaults to localhost:6379)
REDIS_URL=redis://localhost:6379/0

# Budget defaults
DEFAULT_MAX_DEPTH=3
DEFAULT_MAX_NODES=50
DEFAULT_MAX_WALL_TIME=1200
DEFAULT_MAX_TOKENS=2000000

Data Directories

Directory	Purpose
`~/.shad/history/`	Run artifacts and history
`~/.shad/skills/`	Skill definitions
`~/.shad/CORE/`	Core system files
`~/.shad/repo/`	Installed Shad source
`~/.shad/venv/`	Python virtual environment

Collection Strategy

One vs Many Collections

	One Collection	Many Collections
Pros	Single source of truth, cross-topic connections, simpler management	Faster indexing, focused retrieval, easier sharing/permissions
Cons	Slower as it grows, noise in retrieval, harder to share subsets	Context fragmentation, can't find cross-collection connections, more overhead

Use one collection for personal/work knowledge — memory, tasks, notes, projects all interconnected.

Use separate collections for codebases, client deliverables needing isolation, or read-only reference material.

Multi-collection queries search in priority order:

shad run "Build auth system" --collection ~/Project --collection ~/Patterns --collection ~/Docs

Collection Preparation Tips

Use consistent frontmatter for better filtering
Include code examples with context, not just snippets
Link related notes for better discovery
Keep notes focused (one concept per note)
Authoritative sources and worked examples improve output quality

Project Status

All core phases complete:

See SPEC.md for the technical specification, QMD_PIVOT.md for the qmd migration rationale.

Philosophy

Solve a problem once. Encode it as knowledge. Never solve it again.

Shad compounds your knowledge. Every document you add makes it more capable. The collection is the how — patterns, examples, documentation. Shad is the engine — decomposition, retrieval, generation, verification, assembly.

Together: complex tasks that learn from your accumulated knowledge.

Contributing

Contributions welcome. See SPEC.md for architecture details before submitting PRs.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
CORE		CORE
History		History
Skills		Skills
scripts		scripts
services/shad-api		services/shad-api
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
COLLECTION_PIVOT.md		COLLECTION_PIVOT.md
LICENSE		LICENSE
PLAN.md		PLAN.md
QMD_PIVOT.md		QMD_PIVOT.md
README.md		README.md
SPEC.md		SPEC.md
docker-compose.yml		docker-compose.yml
install.sh		install.sh

Folders and files

Latest commit

History

Repository files navigation

Shad (Shannon's Daemon)

The Problem

The Solution

Quick Start

Prerequisites

Installation

Start the Server

Basic Usage

Stop the Server

Recommended Workflow

LLM Providers

Claude CLI (default)

Gemini CLI

Ollama (local models)

Model Tiers

How It Works

Code Mode: Intelligent Retrieval

Strategy-Based Decomposition

File Output with Type Consistency

Semantic Search (qmd)

CLI Reference

Core Commands

Run Options

Planning

Server Management

Environment & Setup

Sources Scheduler

Collection Ingestion

Performance Profiles

Quick Patterns

Budget Defaults by Machine

Architecture

Key Components

Configuration

Data Directories

Collection Strategy

One vs Many Collections

Collection Preparation Tips

Project Status

Philosophy

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages