ClaudeFlow

Composable AI task pipelines — Zod for LLM workflows.

Define steps with typed schemas. Compose with loops, branches, maps. Analyze cost before running. Get full execution traces. Same pipeline runs on Claude CLI (free) or API (production).

npm install claudeflow

Quick start

import { step, pipeline, z, ClaudeCliRuntime } from "claudeflow";

const summarize = step("summarize")
  .input(z.object({ url: z.string() }))
  .output(z.object({ title: z.string(), summary: z.string() }))
  .prompt("Summarize this URL: {url}")
  .retry({ maxAttempts: 2 });

const classify = step("classify")
  .output(z.object({ category: z.string(), confidence: z.number() }))
  .prompt("Classify: {summarize.summary}");

const result = await pipeline("digest")
  .step(summarize)
  .step(classify)
  .run({ url: "https://example.com" }, {
    runtime: new ClaudeCliRuntime(),
    verbose: true,
  });

Output:

[claudeflow] digest
[claudeflow] Runtime: ClaudeCliRuntime

[1/2] summarize ✓ 3200ms  847 tokens
[2/2] classify  ✓ 1400ms  210 tokens

[claudeflow] ✓ completed in 4602ms
[claudeflow] Tokens: 412 in / 645 out
[claudeflow] Cost: $0.0118

Why

When you ask Claude to do a complex multi-step task, it fails. No persistent state, no control flow, no validation, no observability. The conversation disappears when you're done.

ClaudeFlow replaces throwaway chats with reusable pipeline files:

Composable — steps, loops, branches, maps
Typed — Zod schemas validate every step boundary
Analyzable — predict tokens, cost, and time before running
Observable — full trace with timing, tokens, cost per step
Testable — MockRuntime for zero-token development, validate() for static checks
Portable — same pipeline runs on CLI (free) or API (production)
Shareable — YAML files checked into git, reviewed in PRs

Core concepts

5 primitives:

Primitive	What it does
`step()`	One LLM call with typed input/output
`pipeline()`	Ordered composition of steps
`loop()`	Repeat until condition met
`branch()`	Route based on predicate
`map()`	Run step over each item in array

YAML pipelines

Define pipelines as shareable config files:

name: investigate-bug
steps:
  - id: parse-error
    prompt: "Parse this error: {error_log}"
    output: { module: string, error_type: string }
  - id: diagnose
    prompt: "Find root cause in {parse-error.module}"
    output: { root_cause: string, fix: string }
    retry: { maxAttempts: 3 }

import { loadYaml, ClaudeCliRuntime } from "claudeflow";
const result = await loadYaml("investigate-bug.yaml").run(
  { error_log: "..." },
  { runtime: new ClaudeCliRuntime() }
);

Analyze before running

Like a compiler — predict cost and time without burning tokens:

import { analyze, formatAnalysis } from "claudeflow";
console.log(formatAnalysis(analyze(myPipeline)));

Pipeline: digest
Steps: 2 (2 LLM, 0 deterministic)

Token estimate:
  Input:  ~620 (496-930)
  Output: ~90 (45-180)

Cost estimate:
  claude-sonnet-4-6: $0.0033/run
  claude-haiku-4-5: $0.0009/run

Warnings:
  - Step "classify" has no retry config

Testing without tokens

import { MockRuntime, validate, benchmark } from "claudeflow";

// Static validation — no execution
const errors = validate(myPipeline);

// Mock runtime — deterministic, instant
const mock = new MockRuntime({
  summarize: { title: "Test", summary: "AI pipelines" },
  classify: { category: "tech", confidence: 0.95 },
});
const result = await myPipeline.run(input, { runtime: mock });

// Benchmark — run N times, get statistics
const stats = await benchmark(myPipeline, input, { runtime: mock, runs: 50 });
// → { successRate: 1.0, duration: { p50Ms: 2, p95Ms: 3 }, ... }

Self-hosting

ClaudeFlow is built using ClaudeFlow. The self-audit pipeline (pipelines/self-audit.yaml) found 3 improvements in its own codebase, which were then applied (PR #14). Execution traces are stored in traces/ as proof.

traces/self-audit-2026-04-12.json
  Status: completed
  Duration: 120.9s
  Tokens: 1,681 in / 4,253 out
  Cost: $0.48
  Steps: 3/3 completed

Architecture

src/core/       → Step, Pipeline, Context, Schema
src/control/    → Loop, Branch, Map + shared resolve helper
src/runtime/    → ClaudeCliRuntime, ClaudeApiRuntime, MockRuntime
src/analyzer/   → Token/cost/time prediction
src/loader/     → YAML parser, prompt interpolation
src/testing/    → validate(), benchmark()

Documentation

Cookbook — 17 practical recipes: code review, bug fixing, overnight optimization, multi-agent crews, paper review, checkpointing, and more
Pipeline Ideas — 20+ real-world pipeline templates for every use case
Benchmark — ClaudeFlow vs raw Claude CLI comparison with real data
CLAUDE.md — Instructions for AI agents working on this codebase
AGENTS.md — Universal agent instructions (works with Cursor, Copilot, etc.)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
docs		docs
examples		examples
pipelines		pipelines
src		src
test		test
traces		traces
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
biome.json		biome.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClaudeFlow

Quick start

Why

Core concepts

YAML pipelines

Analyze before running

Testing without tokens

Self-hosting

Architecture

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ClaudeFlow

Quick start

Why

Core concepts

YAML pipelines

Analyze before running

Testing without tokens

Self-hosting

Architecture

Documentation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages