From 0bafed133b1de2636f7a7e15c6a47fa90cd14e5b Mon Sep 17 00:00:00 2001 From: Lisa Date: Thu, 9 Apr 2026 15:46:44 +0200 Subject: [PATCH] docs: update README, CLAUDE.md, and add CHANGELOG for v1.4 and v1.5 --- CHANGELOG.md | 149 ++++++ CLAUDE.md | 43 +- README.md | 1329 +++++++++++++++++++++++++++----------------------- 3 files changed, 897 insertions(+), 624 deletions(-) create mode 100644 CHANGELOG.md diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..d35a180 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,149 @@ +# Changelog + +All notable changes to SwarmWire are documented here. + +--- + +## [1.5.0] — 2026-04-09 + +### New Features + +**Execution** +- `reduceTrajectory` — AgentDiet-style trajectory pruning (drop empty/duplicate/superseded tool results, token budget trim). 39-60% input token reduction. +- `SpeculativeToolExecutor` — PASTE-inspired prefetch of likely tool calls in parallel while the LLM generates. +- `createReducedSkillSet` / `selectRelevantTools` — Progressive skill disclosure: compact one-liners first, full schemas on demand. ~48% prompt compression. + +**Memory** +- `AMem` — A-MEM Zettelkasten living memory graph. On every write, notes auto-link to related memories via cosine similarity. +- `TemporalMemory` — CMA temporal decay + spreading activation. Strength decays per-hour, reinforces on access, propagates relevance to temporal neighbors. +- `SelfEditingMemory` — Letta/MemGPT-style named memory blocks. Agents read and mutate versioned text blocks mid-execution. Full edit history + revert. +- `createFlatVectorStore` / `createPineconeStore` / `createQdrantStore` / `createRedisVectorStore` — External vector store adapters, all implementing `MemoryBackend`. +- `SleepTimeAgent` — LLM-driven background consolidation. Synthesizes insights from recent memory during idle periods. + +**Core** +- `ReputationBoard` — `MessageBoard` extended with per-agent reputation scoring. Upvotes, citations, correct answers drive scores. Findings weighted by sender reputation. +- Typed DI — `AgentContext` and `AgentDefinition`. Agents declare typed dependencies; `context.deps` is fully typed at callsite. + +**Testing & Evaluation** +- `evalTrajectory` / `compareTrajectories` — TRACE-style multi-dimension trajectory evaluation: step efficiency, tool precision, backtrack rate, plan adherence, outcome quality. + +**Workflow** +- `StateMachine` / `buildLinearStateMachine` — LangGraph-style directed graph with cycles, conditional edges, and `maxIterations` guard. + +**Patterns** +- `runLoop` — LoopAgent primitive. Runs an agent iteratively until convergence (`shouldStop` predicate, DONE signal, or `maxIterations`). Full iteration history. + +**Session** +- `BranchManager` — Fork a session at any message index to explore alternative continuations. Diff, merge, and tree visualization. + +**Observability** +- `exportToOTLP` / `createOTelExporter` / `withOTelExport` — Auto-push traces to any OTLP/HTTP endpoint (Jaeger, Tempo, Honeycomb, OTEL Collector) after execution. + +--- + +## [1.4.0] — 2026-04-08 + +### New Features + +**Execution** +- `TimeTravelStore` — Rewind to any step and fork execution from that point with optional step modifications. +- `RollbackManager` — Snapshot state before tool calls; undo individual or full-execution actions in reverse order. + +**Optimizer** +- `PromptOptimizer` — DSPy-style prompt optimization. Bootstraps few-shot examples from `DistillationCollector`, generates prompt variants via LLM, scores against training pairs. + +**Testing** +- `EvalHarness` — Named harnesses with run history, pass-rate tracking, and regression detection. Computes trend (`improving` / `stable` / `degrading`) from last 3 runs. + +**Tools** +- `createNodeSandbox` / `createDockerSandbox` / `createE2BSandbox` — Code execution sandbox with three backends. Returns a `Tool` for `agent.tools[]`. +- `createBrowserTool` — Playwright-backed browser automation tool (navigate, click, type, screenshot, extract). +- `createComputerUseTool` — Anthropic Computer Use API tool wrapper. + +**Patterns** +- `runHierarchy` — Formal authority levels with escalation. Low-confidence outputs escalate to higher-authority agents. + +**Session** +- `SessionManager` — Named persistent conversation sessions. `swarm.runInSession()` prepends prior context automatically. + +**Workflow** +- `EventFlow` — Event-driven workflow runtime. Steps subscribe to events, emit new ones; execution is queue-driven rather than DAG-fixed. + +**Memory** +- `EpisodicMemory` — Stores specific past interactions with temporal ordering and tag-based recall. +- `ProceduralMemory` — Stores "how to" procedures with success rate tracking. + +**A2A Protocol — v1.0** +- `kind: 'task'` on `A2ATask`, `ContextId` type alias for cross-task threading. +- `AgentCard.offline?`, `A2ATaskState` gains `'streaming'`, `A2AMessage` gains `messageId` and `contextId`. +- `tasks/sendSubscribe` JSON-RPC method for SSE push. +- `streamSubscribe()` client function. +- Default `protocolVersion` bumped to `'1.0'`. + +**Catalog** +- `AgentCatalog` — Runtime agent discovery by capability, tag, availability, or semantic description. Heartbeat-based liveness. + +**Voice** +- `VoicePipeline` — STT → LLM → TTS pipeline. Factory methods for Deepgram, ElevenLabs, OpenAI STT/TTS. + +--- + +## [1.3.0] — prior + +### New Features +- Hooks system (`HookRegistry`, priority-ordered hooks, swarm event bridging) +- Consensus protocols (`RaftNode`, `ByzantineNode`, `GossipNode`) +- Hive-Mind pattern (`runHiveMind`) +- Federation hub (`FederationHub`) +- `ReasoningBank` — trajectory-based pattern memory with EWC +- Vector quantization (`createQuantizer` — binary, scalar, product) +- `AttentionRouter` — multi-head attention-based agent routing +- `RLRouter` / `RLRouterPPO` — reinforcement learning routers + +--- + +## [1.2.0] — prior + +### New Features +- A/B testing engine +- Judge agent for quality evaluation +- Weight table for dynamic routing +- Distillation collector for training pairs (LLMRouter) + +--- + +## [1.1.0] — prior + +### New Features +- Self-learning memory with Elastic Weight Consolidation (EWC) +- Vector memory with HNSW-like approximate nearest neighbor search +- 3-tier intelligent model routing +- Token optimizer (pattern caching, prompt compression, batch optimization) +- Knowledge graph with PageRank-based importance +- Background worker system (memory optimizer, pattern learner, metrics, health check) +- Threat detection system (SQL/command/XSS injection, path traversal, secrets, PII) +- ADR framework for spec-driven development +- 10 new agent templates (17 total) + +--- + +## [1.0.0] — initial release + +- Budget-first multi-agent orchestration +- Orchestrator-worker, pipeline, map-reduce, debate, blackboard, fan-out patterns +- Anthropic, OpenAI, Gemini, Ollama, generic OpenAI-compatible providers +- Circuit breaker, failover, rate limiter +- Routing stack: SemanticCache, LatencyRouter, CascadeRouter, SpeculativeCascade, QueryDecomposer +- MCP tool loading +- A2A v0.3 protocol +- Record/Replay testing, evals framework +- Guardrails (PII, injection, hallucination, length, content filter) +- Output contracts (schema + semantic validation) +- Approval gates +- YAML workflow compiler +- Dry-run cost projection +- Differential execution (skip unchanged steps) +- SSE streaming transport +- OpenTelemetry export +- Plugin system +- 7 agent templates diff --git a/CLAUDE.md b/CLAUDE.md index 7f67194..9cac89d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -24,29 +24,36 @@ npm run clean # Remove dist/ | Module | Path | Purpose | |--------|------|---------| | **Types** | `src/types/` | All TypeScript interfaces — agent, budget, plan, task, execution, provider, tool, memory, pattern | -| **Core** | `src/core/` | `Swarm` class, `createAgent()`, MCP tool loader, `MessageBoard` (inter-agent messaging), `stub-board` (no-op board for patterns), guardrails (input/output/tool guards with built-ins: PII, injection, hallucination, max-length, content-filter), output contracts (schema + semantic validation) | +| **Core** | `src/core/` | `Swarm` class, `createAgent()`, MCP tool loader, `MessageBoard`, `ReputationBoard` (reputation-weighted board), `stub-board`, guardrails, output contracts | | **Budget** | `src/budget/` | `BudgetLedger` (hard enforcement), cost optimizer | -| **Planner** | `src/planner/` | Task scorer, DAG builder, model router, adaptive router, cascade router, semantic cache, speculative cascade, query decomposer, latency router, **3-tier intelligent routing** | -| **Executor** | `src/executor/` | Parallel DAG runner, checkpoint/resume, dry-run cost projection, differential execution | -| **Patterns** | `src/patterns/` | Orchestrator-worker, pipeline, map-reduce, debate, blackboard | -| **Providers** | `src/providers/` | Anthropic, OpenAI, Gemini, Ollama adapters, generic OpenAI-compatible (LiteLLM/vLLM), circuit breaker, failover, rate limiter, model cascade on quality | +| **Planner** | `src/planner/` | Task scorer, DAG builder, model router, adaptive router, cascade router, semantic cache, speculative cascade, query decomposer, latency router, attention router, RL router, 3-tier routing | +| **Executor** | `src/executor/` | Parallel DAG runner, checkpoint/resume, dry-run, differential execution, time-travel debugging, rollback manager, trajectory reducer (AgentDiet), speculative tool executor (PASTE) | +| **Patterns** | `src/patterns/` | Orchestrator-worker, pipeline, map-reduce, debate, blackboard, fan-out, hive-mind, hierarchy, loop-agent | +| **Providers** | `src/providers/` | Anthropic, OpenAI, Gemini, Ollama, generic OpenAI-compatible (LiteLLM/vLLM), circuit breaker, failover, rate limiter, model cascade on quality | | **Conflict** | `src/conflict/` | Contradiction detector (Jaccard/structural), resolver (vote/evidence/escalate) | | **Context** | `src/context/` | Token-budget-aware context packer | -| **A2A** | `src/a2a/` | Agent2Agent protocol — server, client, agent cards | +| **A2A** | `src/a2a/` | Agent2Agent protocol v1.0 — server, client, agent cards, contextId, streaming state, `tasks/sendSubscribe` | | **Pool** | `src/pool/` | Worker pool with lifecycle, concurrency, warm pooling | -| **Trace** | `src/trace/` | Human-readable execution reports, DAG visualization | -| **Workflow** | `src/workflow/` | YAML workflow parser + compiler to executable Plans | -| **Templates** | `src/templates/` | 17 pre-built agent templates (researcher, code-reviewer, synthesizer, data-analyst, qa-tester, writer, planner, security-auditor, devops-engineer, database-engineer, api-designer, performance-engineer, documentation-specialist, architecture-advisor, debugger, refactoring-specialist, integration-specialist, test-automation-engineer) | +| **Trace** | `src/trace/` | Human-readable execution reports, DAG visualization, OTel export (`toOTelSpans`, `toOTLPJson`), OTel auto-exporter (OTLP push) | +| **Workflow** | `src/workflow/` | YAML workflow parser + compiler, event-driven workflows (`EventFlow`), graph state machine (`StateMachine`) | +| **Templates** | `src/templates/` | 17 pre-built agent templates | | **Adapters** | `src/adapters/` | Claude Agent SDK wrapper | -| **Orchestrator** | `src/orchestrator/` | Evolving orchestrator (bandit-based adaptive sequencing), **A/B testing engine**, **Judge agent for quality evaluation**, **Weight table for dynamic routing**, **Distillation collector for training pairs** | +| **Orchestrator** | `src/orchestrator/` | Evolving orchestrator, A/B testing, judge agent, weight table, distillation collector | | **Persistence** | `src/persistence/` | Save/load state to disk or memory backend | -| **Memory** | `src/memory/` | ANCS memory backend, **self-learning memory with EWC**, **vector memory with HNSW-like search** | -| **Testing** | `src/testing/` | `RecordingProvider` (wraps real provider, saves fixtures), `ReplayProvider` (loads fixtures, zero-cost deterministic replay), evals framework | -| **Optimizer** | `src/optimizer/` | Token optimizer with pattern caching, compression, and batch optimization | -| **Workers** | `src/workers/` | Background worker system for continuous optimization (memory optimizer, pattern learner, metrics collector, cache cleanup, health check) | -| **Security** | `src/security/` | Threat detection system (SQL/command/XSS injection, path traversal, hardcoded secrets, prompt injection, PII detection) | -| **Spec** | `src/spec/` | Architecture Decision Records (ADRs) framework for spec-driven development | -| **Graph** | `src/graph/` | Knowledge graph with PageRank-based importance, graph-enhanced ranked retrieval | +| **Memory** | `src/memory/` | ANCS, self-learning (EWC), vector (HNSW-like), A-MEM (Zettelkasten graph), temporal (CMA decay), self-editing blocks (Letta), episodic, procedural, external vector store adapters (Pinecone/Qdrant/Redis/flat) | +| **Session** | `src/session/` | Named persistent sessions, `SessionManager`, conversation branching (`BranchManager`) | +| **Testing** | `src/testing/` | `RecordingProvider`, `ReplayProvider`, evals framework, `EvalHarness` (run history + regression), trajectory evaluation (TRACE) | +| **Optimizer** | `src/optimizer/` | Token optimizer, prompt optimizer (DSPy-style) | +| **Workers** | `src/workers/` | Background workers (memory optimizer, pattern learner, metrics, health check), sleep-time compute agent | +| **Security** | `src/security/` | Threat detection (SQL/command/XSS injection, path traversal, secrets, prompt injection, PII) | +| **Tools** | `src/tools/` | Code execution sandbox (Node vm / Docker / E2B), browser tool (Playwright), computer use (Anthropic), skill reducer (progressive disclosure) | +| **Voice** | `src/voice/` | Voice agent pipeline (STT → LLM → TTS), Deepgram/ElevenLabs/OpenAI providers | +| **Catalog** | `src/catalog/` | Agent discovery catalog with semantic search | +| **Hooks** | `src/hooks/` | Hook registry, priority-ordered hooks, swarm event bridging | +| **Consensus** | `src/consensus/` | Raft, Byzantine fault-tolerant, Gossip consensus | +| **Federation** | `src/federation/` | Multi-swarm federation hub | +| **Spec** | `src/spec/` | Architecture Decision Records (ADRs) | +| **Graph** | `src/graph/` | Knowledge graph with PageRank, graph-enhanced retrieval | ### Data Flow @@ -130,7 +137,7 @@ Task → Scorer → Planner (DAG) → Executor - All tests in `tests/unit/` — pure unit tests, no external services - Tests use mock providers that return canned responses -- 29 test files, 265 tests +- 71 test files, 621 tests - Run with `npm test` ## Peer Dependencies diff --git a/README.md b/README.md index ebf8721..e896bcb 100644 --- a/README.md +++ b/README.md @@ -18,10 +18,6 @@ npm install swarmwire ## Quick Start -```bash -npm install swarmwire -``` - ```typescript import { Swarm, createProvider } from 'swarmwire' @@ -32,7 +28,6 @@ const swarm = new Swarm({ budget: { maxCostCents: 100 }, }) -// Create agents const researcher = swarm.agent({ name: 'researcher', role: 'Research topics thoroughly', @@ -45,7 +40,6 @@ const writer = swarm.agent({ model: { provider: 'anthropic', model: 'claude-sonnet-4-6-20260320' }, }) -// Run const result = await swarm.run('Research TypeScript ORMs and recommend one', { pattern: 'pipeline', stages: [ @@ -66,170 +60,14 @@ console.log(`Cost: ${result.cost.totalCostCents}c | Tokens: ${result.cost.totalT | Problem | SwarmWire's Answer | |---------|-------------------| | **Token bleeding** — AutoGen/CrewAI loops burn money silently | Budget is a hard constraint. Structurally impossible to exceed. | -| **Ceiling trap** — easy frameworks (CrewAI) can't scale, powerful ones (LangGraph) are complex from day one | Progressive disclosure: one-liner to full DAG control. | +| **Ceiling trap** — easy frameworks can't scale, powerful ones are complex from day one | Progressive disclosure: one-liner to full DAG control. | | **Framework lock-in** — Mastra/LangGraph own your app | Library. You call it. No lifecycle hooks, no app structure. | | **No TypeScript** — most frameworks are Python-first | TypeScript-native. Not a port. | -| **Stateless agents** — every run starts from zero | Pluggable memory backends. ANCS for persistent cognitive memory. | +| **Stateless agents** — every run starts from zero | Pluggable memory backends with six memory architectures. | | **No cost visibility** — no framework tracks cost as a first-class metric | Per-agent, per-provider, per-step cost breakdown on every execution. | --- -## Intelligent Capabilities - -### Self-Learning Memory -Pattern-based learning with Elastic Weight Consolidation (EWC) to prevent catastrophic forgetting. - -```typescript -import { createSelfLearningMemory } from 'swarmwire' - -const memory = createSelfLearningMemory({ - backend: ancsMemory({ url: 'http://localhost:3000' }), - learningRate: 0.1, - ewcStrength: 0.9, -}) -``` - -### Vector Memory -HNSW-like approximate nearest neighbor search for semantic retrieval. - -```typescript -import { createVectorMemory, mockEmbeddingFunction } from 'swarmwire' - -const vectorMem = createVectorMemory({ - embedFn: mockEmbeddingFunction, // Or connect to actual embedding model - efSearch: 10, -}) -``` - -### 3-Tier Model Routing -Automatically routes tasks to appropriate model complexity (cheap/standard/premium). - -```typescript -import { routeTaskToModel, defaultModelRoutingConfig } from 'swarmwire' - -const decision = routeTaskToModel(task, availableModels, defaultModelRoutingConfig) -// decision.tier, decision.estimatedCostCents, decision.estimatedLatencyMs -``` - -### Token Optimization -Pattern caching, prompt compression, optimal batching (30-50% token savings). - -```typescript -import { createTokenOptimizer } from 'swarmwire' - -const optimizer = createTokenOptimizer({ memoryBackend }) -const { context, tokensSaved } = await optimizer.getCompactContext(query) -const { optimized, tokensSaved } = await optimizer.optimizePrompt(prompt) -``` - -### Knowledge Graph -PageRank-based importance calculation, graph-enhanced ranked retrieval. - -```typescript -import { createKnowledgeGraph } from 'swarmwire' - -const graph = createKnowledgeGraph() -graph.addNode('task-1', 'API design', 'task') -graph.addNode('agent-1', 'API designer', 'agent') -graph.addEdge('task-1', 'agent-1', 'executed_by', 1.0) -graph.calculatePageRank() -``` - ---- - -## Enterprise-Grade Security - -### Threat Detection -SQL injection, command injection, XSS, path traversal, hardcoded secrets. - -```typescript -import { createThreatDetector, defaultThreatConfig } from 'swarmwire' - -const detector = createThreatDetector({ - checkSqlInjection: true, - checkCommandInjection: true, - autoSanitize: true, -}) - -const result = detector.scan(userInput) -// result.level: 'safe' | 'warning' | 'threat' -// result.detectedPatterns[] -``` - -### PII Detection -Email, SSN, phone, credit card, IP address detection. - -```typescript -const piiFindings = detector.detectPII(input) -// [{ type: 'email', value: 'user@example.com' }, ...] -``` - ---- - -## Spec-Driven Development - -### ADR Framework -Architecture Decision Records with Markdown serialization. - -```typescript -import { createADRFramework, COMMON_ADRS } from 'swarmwire' - -const adr = createADRFramework() -adr.create(COMMON_ADRS.modelRouting()) -adr.get('ADR-002') -adr.checkCompliance('ADR-002', codeString) -``` - ---- - -## Background Workers - -Continuous optimization (memory consolidation, pattern learning, metrics, health checks). - -```typescript -import { createWorkerSystem, createMemoryOptimizationWorker } from 'swarmwire' - -const workers = createWorkerSystem({ memoryBackend }) -workers.registerWorker(...createMemoryOptimizationWorker()) -workers.startAll() -``` - ---- - -## Agent Templates (17 Specialized Agents) - -Ready-to-use agents: - -```typescript -import { Swarm, templates } from 'swarmwire' - -const swarm = new Swarm({ providers }) - -// 7 original templates -const researcher = swarm.agent(templates.researcher()) -const reviewer = swarm.agent(templates.codeReviewer()) -const synth = swarm.agent(templates.synthesizer()) -const analyst = swarm.agent(templates.dataAnalyst()) -const tester = swarm.agent(templates.qaTester()) -const writer = swarm.agent(templates.writer()) -const planner = swarm.agent(templates.planner()) - -// 10 new specialized agents -const security = swarm.agent(templates.securityAuditor()) -const devops = swarm.agent(templates.devopsEngineer()) -const dbEngineer = swarm.agent(templates.databaseEngineer()) -const apiDesigner= swarm.agent(templates.apiDesigner()) -const perfEngine = swarm.agent(templates.performanceEngineer()) -const docs = swarm.agent(templates.documentationSpecialist()) -const architect = swarm.agent(templates.architectureAdvisor()) -const debugger = swarm.agent(templates.debugger()) -const refactor = swarm.agent(templates.refactoringSpecialist()) -const integration= swarm.agent(templates.integrationSpecialist()) -const testAuto = swarm.agent(templates.testAutomationEngineer()) -``` - ---- - ## Orchestration Patterns ### Orchestrator-Worker (default) @@ -272,7 +110,6 @@ Agents argue positions, a judge resolves. import { runDebate } from 'swarmwire' await runDebate(task, { - pattern: 'debate', proponents: [optimist, pessimist], judge: judgeAgent, rounds: 3, @@ -286,7 +123,6 @@ Shared state space with iterative refinement. import { runBlackboard } from 'swarmwire' await runBlackboard(task, { - pattern: 'blackboard', agents: [dataAgent, modelAgent, vizAgent], rounds: 5, convergence: (state) => state.merged.qualityScore > 0.9, @@ -294,20 +130,107 @@ await runBlackboard(task, { ``` ### Fan-Out -Same input, N agents, all parallel. Promise.allSettled for agents. +Same input, N agents, all parallel. ```typescript import { runFanOut } from 'swarmwire' const result = await runFanOut(task, { agents: [reviewer1, reviewer2, reviewer3], input: codeToReview, - optional: true, // individual failures don't kill the batch + optional: true, +}, providers, budget) +``` + +### Hive-Mind +Specialized agents on different knowledge domains, coordinated by a master orchestrator. +```typescript +import { runHiveMind } from 'swarmwire' + +await runHiveMind(task, { + domains: [ + { name: 'frontend', agents: [uiAgent, cssAgent] }, + { name: 'backend', agents: [apiAgent, dbAgent] }, + ], + orchestrator: plannerAgent, }, providers, budget) -// result.output = [output1, output2, output3] +``` + +### Hierarchy +Formal authority levels with override semantics. Low-confidence outputs escalate to higher authority. +```typescript +import { runHierarchy } from 'swarmwire' + +await runHierarchy(task, { + levels: [ + { name: 'workers', authority: 3, agents: [worker1, worker2] }, + { name: 'manager', authority: 2, agents: [managerAgent] }, + { name: 'executive', authority: 1, agents: [executiveAgent] }, + ], + escalationThreshold: 0.6, + maxEscalations: 2, +}, providers, budget) +``` + +### Loop Agent +Iterative self-improvement — run an agent repeatedly until convergence. +```typescript +import { runLoop } from 'swarmwire' + +const result = await runLoop(draft, { + agent: refinerAgent, + provider, + model, + maxIterations: 5, + shouldStop: (output, iter) => qualityScore(output) > 0.9, + refine: (output) => `Improve this further:\n${output}`, +}) +// result.converged, result.iterations, result.history +``` + +### Graph State Machine +LangGraph-style directed graph with cycles, branching, and conditional routing. +```typescript +import { StateMachine, END } from 'swarmwire' + +const machine = new StateMachine({ + nodes: [fetchNode, parseNode, validateNode], + edges: [ + { from: 'fetch', to: 'parse' }, + { from: 'parse', to: (state) => state.valid ? 'validate' : 'fetch' }, + { from: 'validate', to: END }, + ], + entryNode: 'fetch', + maxIterations: 20, +}) + +const result = await machine.run(initialState) +// result.finalState, result.iterations, result.terminated +``` + +### Event-Driven Workflows +Runtime-emergent topology — steps subscribe to events rather than a fixed DAG. +```typescript +import { EventFlow } from 'swarmwire' + +const flow = new EventFlow({ + steps: [ + { + name: 'processor', + handles: ['data.ready'], + handler: async (event, ctx) => { + const result = await process(event.payload) + ctx.emit('data.processed', result) + return null + }, + }, + ], +}) + +await flow.run([{ type: 'data.ready', payload: rawData, timestamp: Date.now() }]) ``` ### Evolving Orchestration -Adaptive agent sequencing that learns from execution traces. +Adaptive agent sequencing that learns from execution traces via bandit algorithm. ```typescript import { EvolvingOrchestrator } from 'swarmwire' @@ -328,222 +251,482 @@ Every operation has a budget. Hard limits, not advisory. ```typescript const result = await swarm.run(task, { budget: { - maxTokens: 100_000, // Hard token cap - maxCostCents: 150, // Hard cost cap - maxLatencyMs: 30_000, // Wall-clock deadline - maxAgents: 5, // Concurrency cap - warningAt: 0.8, // Fire event at 80% + maxTokens: 100_000, + maxCostCents: 150, + maxLatencyMs: 30_000, + maxAgents: 5, + warningAt: 0.8, }, }) -// Detailed cost breakdown result.cost.perAgent // Map result.cost.perProvider // Map result.cost.budgetUsed // 0-1 fraction consumed +result.cost.savings // { promptCachingCents, tierRoutingCents, earlyStopCents } ``` If budget is exhausted mid-execution: running steps complete, no new steps start, best-effort partial result returned. --- -## Agent Templates +## Memory -Ready-to-use agents with sensible defaults: +Six memory architectures, all implementing the same `MemoryBackend` interface. +### ANCS (Cognitive Vault) +Persistent cognitive memory with truth tracking and importance decay. ```typescript -import { Swarm, templates } from 'swarmwire' +import { Swarm, ancsMemory } from 'swarmwire' -const swarm = new Swarm({ providers }) +const swarm = new Swarm({ + providers, + memory: ancsMemory({ url: 'http://localhost:3100', tenantId: 'my-project' }), +}) +``` -const researcher = swarm.agent(templates.researcher()) -const reviewer = swarm.agent(templates.codeReviewer()) -const synth = swarm.agent(templates.synthesizer()) -const analyst = swarm.agent(templates.dataAnalyst()) -const tester = swarm.agent(templates.qaTester()) -const writer = swarm.agent(templates.writer()) -const planner = swarm.agent(templates.planner()) +### Self-Learning Memory (EWC) +Pattern-based learning with Elastic Weight Consolidation to prevent catastrophic forgetting. +```typescript +import { createSelfLearningMemory } from 'swarmwire' -// Override anything -const cheapResearcher = swarm.agent(templates.researcher({ - modelTier: 'cheap', - maxCostCents: 5, -})) +const memory = createSelfLearningMemory({ + backend: ancsMemory({ url: 'http://localhost:3000' }), + learningRate: 0.1, + ewcStrength: 0.9, +}) ``` ---- +### Vector Memory (HNSW-like) +Semantic retrieval with approximate nearest neighbor search. +```typescript +import { createVectorMemory } from 'swarmwire' -## YAML Workflows +const vectorMem = createVectorMemory({ embedFn: myEmbedFn, efSearch: 10 }) +``` -CI/CD-style workflow definitions: +### A-MEM (Zettelkasten Graph) +Living memory graph — on every write, notes link to related memories automatically. Inspired by the A-MEM paper. +```typescript +import { AMem } from 'swarmwire' -```yaml -name: research-and-summarize -version: 1.0.0 +const mem = new AMem({ linkThreshold: 0.4, maxLinks: 10 }) +await mem.store('session1', 'TypeScript generics allow type-safe abstractions', {}) +await mem.store('session2', 'Generic constraints enforce type relationships', {}) -inputs: - topic: string - depth: string +const results = await mem.query('type-safe generics') +const graph = mem.getGraph() // full adjacency list +const linked = mem.getLinked(id) // notes linked to a given note +``` -steps: - - id: research - type: llm - agent: researcher - prompt: "Research: {{ inputs.topic }}" +### Temporal Memory (CMA) +Memories decay over time and reinforce on access. Spreading activation propagates relevance to temporal neighbors. +```typescript +import { TemporalMemory } from 'swarmwire' + +const mem = new TemporalMemory({ + decayRatePerHour: 0.02, + accessReinforcement: 0.1, + evictionThreshold: 0.05, + temporalWindowSize: 3, + activationDepth: 2, +}) - - id: summarize - type: llm - agent: writer - prompt: "Summarize findings about {{ inputs.topic }}" - dependencies: [research] +await mem.store('k1', 'important finding about auth', {}) +const results = await mem.query('authentication') +mem.consolidate() // evict weak memories, update strengths +mem.stats() // { noteCount, avgStrength, oldestMs } ``` +### Self-Editing Memory Blocks (Letta-style) +Named, versioned memory blocks that agents can read and mutate mid-execution. Inject via prompt. ```typescript -import { parseWorkflow, compileWorkflow } from 'swarmwire' +import { SelfEditingMemory } from 'swarmwire' -const workflow = parseWorkflow(yamlString) -const plan = compileWorkflow(workflow, { - agents: new Map([['researcher', researcher], ['writer', writer]]), - inputs: { topic: 'TypeScript ORMs', depth: 'thorough' }, +const mem = new SelfEditingMemory({ + blocks: [ + { name: 'persona', content: 'You are a helpful assistant.', maxChars: 500 }, + { name: 'goal', content: '', maxChars: 1000 }, + ], }) -const result = await swarm.execute(plan) +// In agent execute(): +mem.write('goal', 'Help user debug their TypeScript project') +mem.append('goal', '\nFocus on type errors first.') +mem.patch('goal', 'TypeScript', 'TS') +mem.revert('goal', 1) // go back to version 1 + +const ctx = mem.toContextString() // inject into agent prompt +mem.getHistory('goal') // full edit log ``` ---- +### Episodic Memory +Stores specific past interactions with temporal ordering and tag-based recall. +```typescript +import { EpisodicMemory } from 'swarmwire' + +const mem = new EpisodicMemory({ maxEntries: 1000 }) +await mem.record({ + sessionId: 'sess-1', + timestamp: Date.now(), + description: 'User asked about deployment', + input: userMessage, + output: agentResponse, + success: true, + durationMs: 1200, + costCents: 3.5, + tags: ['deployment', 'infra'], +}) -## Plan → Inspect → Execute +const relevant = await mem.recall('kubernetes deployment', { limit: 5, tags: ['infra'] }) +``` -Don't run blind. Preview the plan, modify it, then execute. +### Procedural Memory +Stores "how to" execution procedures with success tracking. +```typescript +import { ProceduralMemory } from 'swarmwire' + +const mem = new ProceduralMemory() +await mem.learn({ + name: 'debug-typescript-error', + goal: 'Fix TypeScript compilation errors', + steps: [ + { order: 1, action: 'Run tsc --noEmit', output: 'error list' }, + { order: 2, action: 'Read each error and identify the root cause' }, + { order: 3, action: 'Apply fixes starting with the most impactful' }, + ], + tags: ['typescript', 'debugging'], +}) +const procs = await mem.recallFor('fix typescript errors') +await mem.recordOutcome(procs[0].id, true) // track success rate +``` + +### External Vector Stores +Production vector database adapters — Pinecone, Qdrant, Redis, or in-process flat store. ```typescript -const plan = await swarm.plan('Analyze our auth architecture') +import { + createFlatVectorStore, // no deps, always available + createPineconeStore, // peer dep: @pinecone-database/pinecone + createQdrantStore, // peer dep: @qdrant/js-client-rest + createRedisVectorStore, // peer dep: redis +} from 'swarmwire' -console.log(plan.estimatedCost) // Preview cost -console.log(visualizePlan(plan)) // ASCII DAG +const store = createPineconeStore({ + apiKey: process.env.PINECONE_API_KEY!, + indexName: 'my-index', + dimension: 1536, + embedFn: openAIEmbed, +}) -plan.steps[1].agent = alternateAgent // Swap an agent -plan.steps[2].optional = true // Make a step optional +// All adapters implement MemoryBackend +await store.store('doc-1', 'content here', { tags: ['research'] }) +const results = await store.query('semantic search query', { maxItems: 5 }) +``` -const result = await swarm.execute(plan) -console.log(explainExecution(result)) // Full human-readable report -console.log(summarizeExecution(result)) // One-line summary +### Sleep-Time Compute Agent +LLM-driven background consolidation. During idle periods, synthesizes insights from recent memory. +```typescript +import { SleepTimeAgent } from 'swarmwire' + +const agent = new SleepTimeAgent({ + memory, + provider, + model: { model: 'claude-haiku-4-5-20251001' }, + reviewWindow: 20, + evictWeak: true, +}) + +// Manual pass +const result = await agent.consolidate('recent agent activity') +// result.insightsExtracted, result.itemsForgotten, result.insights + +// Periodic background +agent.start(60_000) // consolidate every minute +agent.stop() ``` --- -## Provider Infrastructure +## Typed Dependency Injection + +Agents declare what services they need; `context.deps` is fully typed. -### Multi-Provider with Failover ```typescript -import { createProvider, withCircuitBreaker, withFailover, withRateLimit } from 'swarmwire' +import { createAgent } from 'swarmwire' -const anthropic = withRateLimit( - withCircuitBreaker(createProvider('anthropic', { apiKey: '...' })), - { requestsPerMinute: 50 }, -) +interface Deps { + db: Database + featureFlags: { newUi: boolean } +} -const openai = withRateLimit( - withCircuitBreaker(createProvider('openai', { apiKey: '...' })), - { requestsPerMinute: 60 }, -) +const agent = createAgent({ + name: 'data-agent', + role: 'Query and summarize data', + deps: { db: myDatabase, featureFlags: flags }, + execute: async (input, ctx) => { + const rows = await ctx.deps.db.query(input) // fully typed + return summarize(rows) + }, +}) +``` -// Automatic failover when primary circuit trips -const provider = withFailover([anthropic, openai]) +--- + +## Session Management + +### Named Sessions +Persistent conversation threads across multiple `swarm.run()` calls. +```typescript +import { SessionManager } from 'swarmwire' + +const sessions = new SessionManager({ maxMessages: 20 }) +const session = sessions.create('support-ticket-42') + +const swarm = new Swarm({ providers, sessions }) + +// Context from prior turns is prepended automatically +const result = await swarm.runInSession(session.id, 'What was my last question?') +sessions.getContext(session.id) // formatted history string ``` -### Cost Optimization +### Conversation Branching +Fork a session at any message to explore alternative continuations. ```typescript -import { analyzeCosts } from 'swarmwire' +import { BranchManager } from 'swarmwire' + +const branches = new BranchManager() + +// Fork after message index 3 +const branchA = branches.fork(session, 3, 'alternative-approach') +const branchB = branches.fork(session, 3, 'simpler-approach') + +branches.appendMessage(branchA.id, { role: 'user', content: 'Try approach A', timestamp: Date.now() }) + +// Compare divergence +const { onlyInA, onlyInB } = branches.diff(branchA.id, branchB.id) -const recommendations = analyzeCosts(result) -// [ -// { type: 'tier_downgrade', description: 'Agent "researcher" used 500 tokens but cost 15c...', estimatedSavingsCents: 9 }, -// { type: 'caching', description: 'Only 5% cache hit rate...', estimatedSavingsCents: 12 }, -// ] +// Merge winning branch back +branches.merge(branchA.id, session) + +// Full tree visualization +const tree = branches.buildTree(session, allSessions) ``` --- -## Protocol Support +## Execution Control -### MCP — Agent-to-Tool +### Time-Travel Debugging +Rewind to any step and fork execution from there with modifications. ```typescript -import { loadMcpTools } from 'swarmwire' +import { TimeTravelStore } from 'swarmwire' -const tools = await loadMcpTools('npx @some/mcp-server') -const agent = swarm.agent({ name: 'tooled', role: '...', tools }) +const timeTravel = new TimeTravelStore() + +// Inject into executor — records a checkpoint after each step +const result = await swarm.execute(plan, { timelineStore: timeTravel }) + +// Review history +const timeline = timeTravel.getTimeline(plan.id) + +// Fork from step 2 with a different agent +const forked = await timeTravel.fork(plan.id, { + fromStepId: 'step-2', + modifications: [{ id: 'step-3', agent: alternateAgent }], +}, plan, executorConfig) ``` -### A2A — Agent-to-Agent +### Agent Rollback +Snapshot state before tool calls; undo per execution. ```typescript -import { startA2AServer, importA2AAgent } from 'swarmwire' +import { RollbackManager } from 'swarmwire' -// Expose your agents -startA2AServer({ port: 8080, agents: [researcher, analyst] }) +const rollback = new RollbackManager() + +// Inject into executor +const result = await swarm.execute(plan, { rollbackManager: rollback }) -// Consume external agents -const externalAgent = await importA2AAgent({ url: 'https://partner.api' }) -swarm.register(externalAgent) +// Roll back a specific snapshot +rollback.rollback(snapshotId) + +// Roll back everything in an execution (reverse order) +rollback.undoExecution(executionId) ``` ---- +### Trajectory Reduction (AgentDiet) +Prune expired, redundant, and superseded tool results from agent trajectories before passing to LLM. Achieves 39-60% input token reduction. +```typescript +import { reduceTrajectory, classifyMessage } from 'swarmwire' + +const { messages, stats } = reduceTrajectory(trajectory, { + minContentLength: 10, + deduplicateSameToolResults: true, + pruneSuperseded: true, + maxTokenBudget: 8000, + maxMessages: 50, +}) +// stats.reductionFraction — fraction of messages pruned +``` + +### Speculative Tool Execution (PASTE) +Prefetch likely tool calls in parallel while the LLM is still generating, reducing end-to-end latency. +```typescript +import { SpeculativeToolExecutor, createKeywordPredictor } from 'swarmwire' + +const exec = new SpeculativeToolExecutor({ tools, maxSpeculative: 3, minConfidence: 0.6 }) -## Memory Backends +const predictor = createKeywordPredictor([ + { toolName: 'search_web', keywords: ['search', 'find', 'lookup'], defaultInput: { q: '' } }, +]) -### Without Memory (default) -Every execution is stateless. Results returned and forgotten. +// While LLM is generating, prefetch +exec.prefetch(predictor(partialContext)) -### With CognitiveVault -Persist agent messages across sessions and processes. Agents in different SwarmWire executions — or even MCP tools like Claude Code — see each other's work. +// When LLM finishes, result is already cached +const result = await exec.execute('search_web', { q: userQuery }) +// result.cacheHit — true if prefetch succeeded +``` +### Skill Reducer (Progressive Disclosure) +Compress tool definitions in prompts — inject compact one-liners first, expand to full schemas only when needed. ~48% prompt compression. ```typescript -import { Swarm } from 'swarmwire' -import { CognitiveVaultBoard } from 'swarmwire/adapters' +import { createReducedSkillSet, selectRelevantTools } from 'swarmwire' -const board = new CognitiveVaultBoard({ - apiUrl: 'https://cognitive-vault.com', - apiKey: process.env.CV_API_KEY!, - vaultId: 'vault-123', -}) -await board.hydrate() // catch up on prior messages +const skillSet = createReducedSkillSet(tools, { maxSummaryLength: 60 }) -const swarm = new Swarm({ providers, board }) -// All agent messages now persist to CognitiveVault +// Initial prompt injection — compact +const promptLine = skillSet.toPromptString() +// - search_web: Search the internet for current information… +// - execute_code: Run Python or JavaScript code in a sandbox… + +// Progressive reveal when agent picks a tool +const fullDefs = skillSet.expand(['search_web']) + +// Auto-select relevant tools for a task +const relevant = selectRelevantTools('search for documents', skillSet) ``` -Falls back to local file (`.swarmwire/board.jsonl`) when CV is unreachable. See [CognitiveVault integration guide](./docs/cognitive-vault-integration.md). +--- + +## Evaluation & Testing -### With ANCS -Persistent cognitive memory with truth tracking, entity graphs, and importance decay. ANCS can run alongside CognitiveVault as its knowledge intelligence backend. +### Outcome Evals Harness +Named harnesses with run history, pass-rate tracking, and regression detection. ```typescript -import { Swarm, ancsMemory } from 'swarmwire' +import { EvalHarness } from 'swarmwire' -const swarm = new Swarm({ - providers, - memory: ancsMemory({ - url: 'http://localhost:3100', - tenantId: 'my-project', - }), +const harness = new EvalHarness({ + name: 'research-quality', + suite: myEvalSuite, + greenThreshold: 0.8, + storage: memoryBackend, +}) + +const record = await harness.run(async () => { + const output = await agent.execute(input, ctx) + return { input, output } +}) + +const report = harness.report() +// report.trend: 'improving' | 'stable' | 'degrading' +// report.regressions: string[] — eval names that got worse +``` + +### Trajectory Evaluation (TRACE) +Assess multi-step agent trajectories across five dimensions: step efficiency, tool precision, backtrack rate, plan adherence, outcome quality. +```typescript +import { evalTrajectory, compareTrajectories } from 'swarmwire' + +const result = await evalTrajectory(trajectory, { + expectedSteps: ['fetch', 'parse', 'summarize'], + expectedToolCalls: { 'step-1': ['search_web'] }, + outcomeScorer: (output) => qualityScore(output), + maxSteps: 5, + weights: { outcomeQuality: 2, stepEfficiency: 1 }, }) +// result.score, result.breakdown.stepEfficiency, .toolPrecision, etc. + +// Compare two agent strategies +const comparison = await compareTrajectories(trajectoryA, trajectoryB) +// comparison.better: 'a' | 'b' | 'tie' +``` + +### Record/Replay Testing +Deterministic, zero-cost testing. Record real LLM interactions once, replay in CI forever. +```typescript +import { RecordingProvider, ReplayProvider } from 'swarmwire' + +const recording = new RecordingProvider(realProvider, './fixtures/research.json') +await swarm.run('Research TypeScript ORMs', { /* uses recording */ }) +await recording.save() + +const replay = new ReplayProvider('./fixtures/research.json') +const result = await swarm.run('Research TypeScript ORMs', { /* uses replay */ }) +``` + +### Evals Framework +```typescript +import { runEvalSuite, nonEmpty, containsKeywords, noHallucination } from 'swarmwire' + +const result = await runEvalSuite({ + name: 'research-quality', + evals: [nonEmpty(), containsKeywords(['TypeScript', 'ORM']), noHallucination()], + threshold: 0.8, +}, input, output) ``` --- ## Observability -### Events +### OpenTelemetry Auto-Export +Push traces to any OTLP endpoint automatically after each execution. ```typescript -swarm.on('step:start', (e) => console.log(`Starting ${e.agentName}`)) -swarm.on('step:complete', (e) => console.log(`Done: ${e.durationMs}ms, ${e.costCents}c`)) -swarm.on('budget:warning', (e) => console.log(`Budget at ${(e.usage * 100).toFixed(0)}%`)) -swarm.on('conflict:detected', (e) => console.log(`Conflict: ${e.conflict}`)) +import { createOTelExporter, withOTelExport, exportToOTLP } from 'swarmwire' + +// One-shot export +await exportToOTLP(result, { + endpoint: 'http://localhost:4318/v1/traces', // OTEL Collector, Jaeger, Tempo, Honeycomb + serviceName: 'my-app', + headers: { 'x-honeycomb-team': process.env.HONEYCOMB_KEY }, +}) + +// Fire-and-forget wrapper +const exporter = createOTelExporter({ endpoint: '...', serviceName: 'my-app' }) +swarm.on('execution:complete', (result) => exporter.exportSync(result)) + +// Transparent wrapper around executePlan +const result = await withOTelExport(() => swarm.execute(plan), { + endpoint: 'http://localhost:4318/v1/traces', + serviceName: 'my-app', +}) ``` -### Streaming +### Reputation Board +MessageBoard extended with per-agent reputation scoring. Findings from higher-reputation agents are weighted more heavily. ```typescript -for await (const event of swarm.stream(task)) { - console.log(event.type, event) -} +import { ReputationBoard } from 'swarmwire' + +const board = new ReputationBoard({ defaultScore: 0.5, decayFactor: 0.95 }) +const swarm = new Swarm({ providers, board }) + +// After execution, provide feedback signals +board.upvote(messageId, 'agent-b') +board.cite(sourceMessageId) +board.markAnswerCorrect('agent-a') + +// Aggregate findings weighted by reputation +const weighted = board.weightedFindings('orchestrator') +const summary = board.aggregateFindings('orchestrator') +board.leaderboard() // agents sorted by score +board.decay() // apply decay to prevent score inflation +``` + +### Events +```typescript +swarm.on('step:start', (e) => console.log(`Starting ${e.agentName}`)) +swarm.on('step:complete', (e) => console.log(`Done: ${e.durationMs}ms`)) +swarm.on('budget:warning', (e) => console.log(`Budget at ${(e.usage * 100).toFixed(0)}%`)) ``` ### Execution Reports @@ -557,411 +740,334 @@ console.log(explainExecution(result)) // Full report: steps, cost breakdown, trace, conflicts ``` -### SSE Streaming (Web) - -Pipe agent execution to HTTP clients via Server-Sent Events. Works with Express, Fastify, Next.js, or native http. See [docs/sse-streaming.md](./docs/sse-streaming.md) for full recipes. - +### SSE Streaming ```typescript import { sseHeaders, pipeToSSE } from 'swarmwire/transport' app.get('/api/run', async (req, res) => { sseHeaders(res) - const result = await pipeToSSE(swarm.stream('Analyze codebase'), res) + await pipeToSSE(swarm.stream('Analyze codebase'), res) res.end() }) ``` -```javascript -// Client -const source = new EventSource('/api/run') -source.addEventListener('step:complete', (e) => { - const { agentName, costCents } = JSON.parse(e.data) - console.log(`${agentName} done: ${costCents}c`) -}) -source.addEventListener('result', (e) => { - console.log('Output:', JSON.parse(e.data).output) - source.close() -}) -``` - --- -## MessageBoard (Inter-Agent Communication) - -Agents can communicate ad-hoc during execution through a shared MessageBoard, -accessible via `ctx.board` inside any agent's `execute()` function. This sits -alongside the structured DAG data flow and enables direct messages, broadcasts, -topic-based channels, and priority signals. +## Tools +### Code Execution Sandbox ```typescript -async execute(input: string, ctx: AgentContext) { - // Read findings from other agents - const findings = ctx.board.findings() +import { createNodeSandbox, createDockerSandbox, createCodeExecutionTool } from 'swarmwire' - // Broadcast a discovery - ctx.board.post('*', 'Found a critical issue in auth module', { - type: 'finding', - priority: 'urgent', - data: { file: 'auth.ts', line: 42 }, - }) +// Node.js vm module (no external deps) +const sandbox = createNodeSandbox({ timeoutMs: 5000, allowedModules: ['path'] }) - // Ask a specific agent - ctx.board.post('security-expert', 'Is this a real vulnerability?', { - type: 'question', - }) +// Docker (requires Docker CLI) +const dockerSandbox = createDockerSandbox({ image: 'node:20-alpine', timeoutMs: 10_000 }) - // Read inbox - const messages = ctx.board.inbox() -} +// Returns a Tool for agent.tools[] +const tool = createCodeExecutionTool(sandbox) + +const result = await sandbox.execute('console.log("hello")', 'javascript') +// result.stdout, result.stderr, result.exitCode, result.durationMs ``` -Message types: `finding`, `warning`, `question`, `answer`, `coordination`, `status`, `custom`. +### Browser / Computer Use +```typescript +import { createBrowserTool, createComputerUseTool } from 'swarmwire' -**Persistence options:** The default `MessageBoard` is in-memory only. Use `FileBoard` for local persistence or `CognitiveVaultBoard` for cross-machine, cross-session durability. See [Adapters](./docs/adapters.md). -Priorities: `normal`, `high`, `urgent`. +// Playwright (peer dep) +const browserTool = createBrowserTool({ headless: true, timeoutMs: 30_000 }) -The full `MessageBoard` class is also available standalone: +// Anthropic Computer Use API +const computerTool = createComputerUseTool({ + screenshotProvider: async () => base64Screenshot, +}) +const agent = swarm.agent({ name: 'browser-agent', role: '...', tools: [browserTool] }) +``` + +### Voice Agent Pipeline ```typescript -import { MessageBoard } from 'swarmwire' +import { VoicePipeline } from 'swarmwire' + +const pipeline = new VoicePipeline({ + stt: VoicePipeline.createDeepgramSTT(process.env.DEEPGRAM_KEY!), + tts: VoicePipeline.createElevenLabsTTS(process.env.ELEVENLABS_KEY!, 'voice-id'), + agent: myAgent, + provider, + model, +}) -const board = new MessageBoard(10_000) // max messages -board.post('agent-a', '*', 'Hello everyone', { type: 'status' }) -board.stats() // { totalMessages, channels, byType, byAgent, byPriority } +const turn = await pipeline.processTurn(audioBuffer) +// turn.input (transcribed), turn.output (text), turn.audioOutput (Buffer) ``` --- -## Routing Stack - -SwarmWire includes a 5-layer cost-optimization routing stack that can cut LLM -API spend by 40-85% with minimal quality loss. Each layer works independently -or combined. See [docs/routing.md](./docs/routing.md) for full details. - -| Layer | Component | What it does | -|-------|-----------|-------------| -| 1 | **SemanticCache** | Embeds queries as vectors, returns cached responses on near-duplicate hits (zero cost). | -| 2 | **LatencyRouter** | Picks the fastest model meeting quality/cost constraints via EMA + P95 latency tracking. | -| 3 | **CascadeRouter** | Tries cheapest model first, escalates if quality is below threshold. Bandit learning over time. | -| 4 | **SpeculativeCascade** | Runs N models in parallel, accepts cheapest that passes quality. Trades cost for latency. | -| 5 | **QueryDecomposer** | Breaks multi-part queries into subtasks, routes each to the cheapest model at its complexity tier. | +## Agent Infrastructure +### Agent Discovery Catalog +Registry where agents are discoverable by capability, tag, or semantic description at runtime. ```typescript -import { - SemanticCache, - LatencyRouter, - CascadeRouter, - speculativeCascade, - decomposeQuery, - executeDecomposed, - buildModelLadder, -} from 'swarmwire' -``` - ---- - -## Guardrails +import { AgentCatalog } from 'swarmwire' + +const catalog = new AgentCatalog({ embedFn: myEmbed }) +catalog.register(researchAgent, ['research', 'web'], { version: '1.2' }) +catalog.heartbeat(agentId) // mark available + +// Discover by capability, tag, or semantic description +const agents = catalog.discover({ + capabilities: ['analysis'], + tags: ['finance'], + semantic: 'agent that can summarize quarterly reports', + available: true, +}) -Input, output, and tool-level safety checks with fail-fast tripwires. Inspired by -OpenAI Agents SDK. Guardrails run in `parallel` (default, lower latency) or -`blocking` (sequential, safer) mode. A `block`-severity failure throws -`GuardrailTripped` and cancels execution immediately; `warn` severity logs and -continues. Sanitization guardrails can modify the value in flight. +catalog.resolve('researcher') // by name or id +await catalog.flush() // persist to storage backend +``` +### Prompt Optimizer (DSPy-style) +Uses training pairs from `DistillationCollector` to improve agent prompts automatically. ```typescript -import { - piiGuardrail, - injectionGuardrail, - hallucinationGuardrail, - maxLengthGuardrail, - contentFilter, -} from 'swarmwire' +import { PromptOptimizer } from 'swarmwire' + +const optimizer = new PromptOptimizer({ + collector: distillationCollector, + provider, + model, + numCandidates: 4, + numFewShot: 3, + maxIterations: 3, +}) -const agent = swarm.agent({ - name: 'safe-agent', - role: 'Process user input safely', - guardrails: { - input: [piiGuardrail(), injectionGuardrail()], - output: [hallucinationGuardrail(), maxLengthGuardrail(10_000)], - toolInput: [contentFilter(['DROP TABLE', 'rm -rf'], 'block')], - }, +const result = await optimizer.optimize('agent-id', basePrompt, (prompt, response) => { + return responseQualityScore(response) }) +// result.optimizedPrompt, result.fewShotExamples, result.scoreImprovement ``` -Built-in guardrails: +### A2A / ACP Protocol (v1.0) +```typescript +import { startA2AServer, importA2AAgent } from 'swarmwire' -| Guardrail | Phase | What it checks | -|-----------|-------|---------------| -| `piiGuardrail()` | input | Emails, SSNs, credit cards, phone numbers | -| `injectionGuardrail()` | input | "Ignore previous instructions" and similar injection patterns | -| `hallucinationGuardrail()` | output | Hedging markers ("as of my knowledge cutoff", etc.) | -| `maxLengthGuardrail(n)` | output | Truncates output exceeding `n` chars (warn + sanitize) | -| `contentFilter(strings[], severity)` | any | Blocks or warns on forbidden substrings | +// Expose your agents +startA2AServer({ port: 8080, agents: [researcher, analyst] }) -Custom guardrails implement the `Guardrail` interface with a `check(value, context)` method. +// Consume external agents (contextId for cross-task threading) +const externalAgent = await importA2AAgent({ + url: 'https://partner.api', + contextId: 'ctx-session-123', +}) ---- +// Subscribe to long-running task events +import { streamSubscribe } from 'swarmwire' +await streamSubscribe(baseUrl, task, (event) => console.log(event)) +``` -## Evals Framework +Protocol version: `1.0`. Supports `streaming` task state, `AgentCard.offline`, `contextId` for cross-task threading, and `tasks/sendSubscribe` for SSE push. -Automated quality metrics for agent outputs. Run evals against Record/Replay -fixtures in CI/CD -- no LLM calls needed. +--- +## Provider Infrastructure + +### Multi-Provider with Failover ```typescript -import { - runEvalSuite, - nonEmpty, - lengthCheck, - containsKeywords, - schemaMatch, - similarityToExpected, - noRegression, - noHallucination, -} from 'swarmwire' +import { createProvider, withCircuitBreaker, withFailover, withRateLimit } from 'swarmwire' -const suite = { - name: 'research-quality', - evals: [nonEmpty(), lengthCheck(100, 5000), containsKeywords(['TypeScript', 'ORM']), noHallucination()], - threshold: 0.8, // average score must be >= 0.8 - perEvalThreshold: 0.5, // no individual eval below 0.5 -} +const anthropic = withRateLimit( + withCircuitBreaker(createProvider('anthropic', { apiKey: '...' })), + { requestsPerMinute: 50 }, +) +const openai = withCircuitBreaker(createProvider('openai', { apiKey: '...' })) -const result = await runEvalSuite(suite, input, output, { expected: groundTruth }) -// result.passed, result.averageScore, result.failedEvals +const provider = withFailover([anthropic, openai]) ``` -Built-in metrics: `nonEmpty`, `lengthCheck`, `containsKeywords`, `schemaMatch`, -`similarityToExpected` (Jaccard), `noRegression` (compare to prior run), -`noHallucination`. All return 0-1 scores. `runEvalBatch` runs a suite against -multiple test cases and reports an overall pass/fail. +Supported: `anthropic`, `openai`, `gemini`, `ollama`, any OpenAI-compatible endpoint (LiteLLM, vLLM, Azure). ---- +### Routing Stack +5-layer cost optimization, active in every execution. + +| Layer | Component | What it does | +|-------|-----------|-------------| +| 1 | **SemanticCache** | Near-duplicate query caching (zero cost) | +| 2 | **LatencyRouter** | Fastest model meeting quality/cost via EMA + P95 tracking | +| 3 | **CascadeRouter** | Cheapest model first, escalate on low quality | +| 4 | **SpeculativeCascade** | N models in parallel, accept cheapest that passes | +| 5 | **QueryDecomposer** | Split queries, route each to cheapest sufficient model | -## Record/Replay Testing +--- -Deterministic, zero-cost testing for multi-agent systems. Record real LLM -interactions once, then replay them in CI forever -- instant, free, and -reproducible. Fuzzy matching handles volatile fields (UUIDs, timestamps). +## Plan → Inspect → Execute ```typescript -import { RecordingProvider, ReplayProvider } from 'swarmwire' +const plan = await swarm.plan('Analyze our auth architecture') -// 1. Record: wrap a real provider, run your workflow, save fixtures -const recording = new RecordingProvider(realProvider, './fixtures/research.json') -await swarm.run('Research TypeScript ORMs', { /* uses recording as provider */ }) -await recording.save() // writes fixture file to disk +console.log(plan.estimatedCost) // preview cost +console.log(visualizePlan(plan)) // ASCII DAG -// 2. Replay: load fixtures, run the same workflow with zero LLM calls -const replay = new ReplayProvider('./fixtures/research.json') -const result = await swarm.run('Research TypeScript ORMs', { /* uses replay */ }) +plan.steps[1].agent = alternateAgent // swap an agent +plan.steps[2].optional = true // make a step optional -// 3. Assert: combine with evals -const evalResult = await runEvalSuite(suite, input, result.output) -expect(evalResult.passed).toBe(true) +const result = await swarm.execute(plan) ``` -`ReplayProvider` options: `strict` (throw on unmatched requests, default `true`), -`fallback` (a real provider for partial replay), `simulatedLatencyMs`. - ---- - -## New Providers - -### Gemini -Uses Google's OpenAI-compatible endpoint. Models: `gemini-2.0-flash`, `gemini-2.5-pro`, `gemini-2.5-flash`. +### Dry-Run Cost Projection ```typescript -const gemini = createProvider('gemini', { apiKey: process.env.GOOGLE_API_KEY }) -``` +import { dryRun } from 'swarmwire' -### Ollama (local) -Local execution via Ollama's OpenAI-compatible API. Cost is always $0. -Default models: `llama3.3`, `qwen3`, `deepseek-r1`. -```typescript -const ollama = createProvider('ollama') // defaults to localhost:11434 +const projection = dryRun(plan, providers) +// projection.estimatedCost.likelyCents, .willExceedBudget, .stepBreakdown ``` -### Generic OpenAI-compatible / LiteLLM -Any unknown provider name falls through to the OpenAI adapter. Works with LiteLLM, -vLLM, Azure OpenAI, or any endpoint that speaks the OpenAI chat completions API. +### Differential Execution +Only re-run steps whose inputs changed. ```typescript -const litellm = createProvider('litellm', { - baseUrl: 'http://localhost:4000/v1', - apiKey: process.env.LITELLM_KEY, -}) +import { diffPlans, applyPreviousResults } from 'swarmwire' + +const diff = diffPlans(newPlan, previousResult) +applyPreviousResults(newPlan, previousResult, diff) +const result = await swarm.execute(newPlan) // skips reusable steps ``` --- -## Approval Gates +## Guardrails -Pause execution before a step and wait for human (or programmatic) approval. -If no `onApproval` callback is provided, gates auto-approve. +Input, output, and tool-level safety checks. ```typescript -const plan = await swarm.plan('Deploy to production') - -// Add a gate to the deploy step -plan.steps[2].gate = { - type: 'approval', - message: 'Approve deployment to prod?', - timeoutMs: 60_000, -} - -const result = await swarm.execute(plan, { - onApproval: async (gate) => { - console.log(`[GATE] ${gate.agentName}: ${gate.message}`) - return userSaidYes ? 'approved' : 'rejected' +const agent = swarm.agent({ + name: 'safe-agent', + role: 'Process user input safely', + guardrails: { + input: [piiGuardrail(), injectionGuardrail()], + output: [hallucinationGuardrail(), maxLengthGuardrail(10_000)], + toolInput: [contentFilter(['DROP TABLE', 'rm -rf'], 'block')], }, }) ``` --- -## Dry-Run Cost Projection +## MessageBoard -Simulate plan execution without calling any LLMs. Returns cost/duration estimates -with min/max/likely ranges, per-step breakdowns, parallelism analysis, and a -`willExceedBudget` flag. +Inter-agent communication via `ctx.board` inside any agent's `execute()`. ```typescript -import { dryRun } from 'swarmwire' - -const plan = await swarm.plan('Analyze codebase') -const projection = dryRun(plan, providers) - -console.log(projection.estimatedCost) -// { minCents: 12.5, maxCents: 50.0, likelyCents: 25.0 } -console.log(projection.willExceedBudget) // true/false -console.log(projection.stepBreakdown) // per-step cost + duration -console.log(projection.sequentialDepth) // critical path length +async execute(input: string, ctx: AgentContext) { + ctx.board.post('*', 'Found critical issue in auth', { + type: 'finding', priority: 'urgent', data: { file: 'auth.ts' }, + }) + const answers = ctx.board.inbox() + const findings = ctx.board.findings() +} ``` +Use `ReputationBoard` for reputation-weighted finding aggregation, `FileBoard` for local persistence, or `CognitiveVaultBoard` for cross-machine durability. + --- -## Output Contracts +## YAML Workflows -Schema + semantic validation of agent outputs. Catches syntactically valid but -semantically garbage results. Supports Zod schemas, custom validation functions, -and configurable failure actions (`retry`, `skip`, `fallback`, `escalate`). +```yaml +name: research-and-summarize +version: 1.0.0 +steps: + - id: research + type: llm + agent: researcher + prompt: "Research: {{ inputs.topic }}" + - id: summarize + type: llm + agent: writer + prompt: "Summarize findings" + dependencies: [research] +``` ```typescript -import { withContract, OutputContract } from 'swarmwire' - -const contract: OutputContract<{ summary: string; score: number }> = { - schema: z.object({ summary: z.string().min(10), score: z.number().min(0).max(1) }), - validate: async (output) => ({ - valid: output.score > 0.3, - reason: output.score <= 0.3 ? 'Score too low — likely garbage output' : undefined, - }), - onFailure: 'retry', - maxRetries: 2, -} +import { parseWorkflow, compileWorkflow } from 'swarmwire' -const guardedExecute = withContract(agent.execute, contract) +const plan = compileWorkflow(parseWorkflow(yaml), { agents, inputs }) +const result = await swarm.execute(plan) ``` -Throws `ContractViolationError` when retries are exhausted and `onFailure` is -`'retry'` or `'escalate'`. - --- -## Model Cascade on Quality - -Per-agent model fallback that escalates to a smarter (more expensive) model when -output quality is too low. Different from circuit breaker (which operates at the -provider level on errors). +## Enterprise-Grade Security ```typescript -import { chatWithCascade } from 'swarmwire' +import { createThreatDetector } from 'swarmwire' -const result = await chatWithCascade(request, { - primary: { provider: 'anthropic', model: 'claude-haiku-4-20250414' }, - fallbacks: [ - { provider: 'anthropic', model: 'claude-sonnet-4-6-20260320', condition: 'quality' }, - { provider: 'openai', model: 'gpt-4o', condition: 'both' }, - ], - qualityThreshold: 0.6, - qualityEstimator: myQualityFn, -}, providerMap) +const detector = createThreatDetector({ + checkSqlInjection: true, + checkCommandInjection: true, + autoSanitize: true, +}) -// result.escalated, result.modelUsed, result.modelsAttempted +const result = detector.scan(userInput) +// result.level: 'safe' | 'warning' | 'threat' +// result.detectedPatterns[] ``` ---- +Detects: SQL injection, command injection, XSS, path traversal, hardcoded secrets, prompt injection, PII (email, SSN, phone, credit card, IP). -## Differential Execution +--- -Only re-run steps whose inputs changed. Compares a new plan against a previous -`ExecutionResult`, identifies changed/reusable/cascade steps, and carries forward -completed outputs. +## Agent Templates (17) ```typescript -import { diffPlans, applyPreviousResults } from 'swarmwire' +import { Swarm, templates } from 'swarmwire' -const diff = diffPlans(newPlan, previousResult) -// diff.changedSteps, diff.reusableSteps, diff.cascadeSteps, diff.savingsFraction +const swarm = new Swarm({ providers }) -applyPreviousResults(newPlan, previousResult, diff) -const result = await swarm.execute(newPlan) // skips reusable steps +// Core +const researcher = swarm.agent(templates.researcher()) +const reviewer = swarm.agent(templates.codeReviewer()) +const synth = swarm.agent(templates.synthesizer()) +const analyst = swarm.agent(templates.dataAnalyst()) +const tester = swarm.agent(templates.qaTester()) +const writer = swarm.agent(templates.writer()) +const planner = swarm.agent(templates.planner()) + +// Specialized +const security = swarm.agent(templates.securityAuditor()) +const devops = swarm.agent(templates.devopsEngineer()) +const dbEngineer = swarm.agent(templates.databaseEngineer()) +const apiDesigner = swarm.agent(templates.apiDesigner()) +const perfEngine = swarm.agent(templates.performanceEngineer()) +const docs = swarm.agent(templates.documentationSpecialist()) +const architect = swarm.agent(templates.architectureAdvisor()) +const debugger_ = swarm.agent(templates.debugger()) +const refactor = swarm.agent(templates.refactoringSpecialist()) +const integration = swarm.agent(templates.integrationSpecialist()) +const testAuto = swarm.agent(templates.testAutomationEngineer()) ``` --- -## Structured Output - -Force the LLM to respond with valid JSON matching a schema. Available via -`ctx.llm()` inside any agent's `execute()` function. Works across providers: -maps to `response_format` on OpenAI/Gemini and tool-use forcing on Anthropic. +## Approval Gates ```typescript -const agent = swarm.agent({ - name: 'classifier', - role: 'Classify support tickets', - async execute(input: string, ctx: AgentContext) { - return ctx.llm<{ category: string; priority: number }>(input, { - responseFormat: { - type: 'json_schema', - schema: { - type: 'object', - properties: { - category: { type: 'string', enum: ['bug', 'feature', 'question'] }, - priority: { type: 'number', minimum: 1, maximum: 5 }, - }, - required: ['category', 'priority'], - }, - }, - }) +const result = await swarm.execute(plan, { + onApproval: async (gate) => { + console.log(`[GATE] ${gate.agentName}: ${gate.message}`) + return userSaidYes ? 'approved' : 'rejected' }, }) ``` --- -## Plugin System - -Extend SwarmWire with third-party providers, agents, guardrails, evals, tools, and middleware. See [docs/plugins.md](./docs/plugins.md) for full guide. +## MCP — Agent-to-Tool ```typescript -import { Swarm, definePlugin, piiGuardrail, noHallucination } from 'swarmwire' - -const securityPlugin = definePlugin({ - name: '@myco/security', - version: '1.0.0', - guardrails: { - input: [piiGuardrail()], - output: [contentFilter(['internal-only'], 'block')], - }, - evals: [noHallucination()], - middleware: { - async beforeExecute(agentName, input) { - console.log(`[audit] ${agentName} starting`) - return input - }, - }, -}) +import { loadMcpTools } from 'swarmwire' -const swarm = new Swarm({ providers }) -await swarm.use(securityPlugin) +const tools = await loadMcpTools('npx @some/mcp-server') +const agent = swarm.agent({ name: 'tooled', role: '...', tools }) ``` --- @@ -972,34 +1078,45 @@ await swarm.use(securityPlugin) User Code | v - Swarm ────────────────────────────────────────────────────────── - | | | | | | - Planner Router Executor Budget Patterns Guardrails - (DAG) (cascade (parallel Engine (orch-wkr (input - | semantic runner (hard pipeline output - Scorer cache dry-run limits) map-reduce tool) - | latency diff-exec | debate - Query specul.) checkpoint | blackboard - Decomp. model- approval | evolving) - cascade gates) | - | | | | | | - v v v v v v - Providers MessageBoard MCP Tools Memory Testing - (Anthropic (inter-agent (any server) (ANCS (Record/Replay - OpenAI communication) or Evals - Gemini custom) Contracts) - Ollama - LiteLLM/generic - +circuit breaker - +rate limiter - +failover) + Swarm ───────────────────────────────────────────────────────────────────── + | | | | | | + Planner Routing Executor Budget Patterns Guardrails + (DAG + (semantic (parallel Engine orch-wkr input/output + scorer cache dry-run (hard pipeline tool safety + decomp. latency diff-exec limits) map-reduce threat detect + attn. cascade checkpoint debate PII guard) + router specul.) time-travel blackboard + rl-router rollback fan-out + 3-tier) traj-reduce hive-mind | + spec-tools hierarchy Security + | | | | loop-agent (injection + Memory Session Execution Patterns state-mach XSS + (ANCS (named (hooks event- evolving) path-trav + self-lrn sessions consensus driven) secrets) + vector branching federation) | + a-mem branch-mgr) Testing + temporal (record/replay + self-edit evals harness + episodic trajectory-eval + procedural prompt-optimizer + vector-stores judge-agent) + sleep-time) | | + Providers Observability + (Anthropic (OTel auto-export + OpenAI OTLP push + Gemini SSE streaming + Ollama execution reports + LiteLLM reputation board) + +circuit + +failover + +rate limit) ``` --- ## Project Stats -86 modules | 34 test files | 299 tests | 7 agent templates | 8 docs +110 modules · 71 test files · 621 tests · 17 agent templates --- @@ -1007,7 +1124,7 @@ User Code | Guide | What it covers | |-------|---------------| -| [Routing Stack](./docs/routing.md) | 5-layer cost optimization, cascade routing, semantic cache, OTEL export | +| [Routing Stack](./docs/routing.md) | 5-layer cost optimization, cascade routing, semantic cache | | [Eval Workflow](./docs/eval-workflow.md) | Record → Replay → Eval → CI pipeline | | [SSE Streaming](./docs/sse-streaming.md) | Express, Fastify, Next.js, React recipes | | [Conflict Resolution](./docs/conflict-resolution.md) | Detection algorithms, resolution strategies |