Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 149 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# Changelog

All notable changes to SwarmWire are documented here.

---

## [1.5.0] — 2026-04-09

### New Features

**Execution**
- `reduceTrajectory` — AgentDiet-style trajectory pruning (drop empty/duplicate/superseded tool results, token budget trim). 39-60% input token reduction.
- `SpeculativeToolExecutor` — PASTE-inspired prefetch of likely tool calls in parallel while the LLM generates.
- `createReducedSkillSet` / `selectRelevantTools` — Progressive skill disclosure: compact one-liners first, full schemas on demand. ~48% prompt compression.

**Memory**
- `AMem` — A-MEM Zettelkasten living memory graph. On every write, notes auto-link to related memories via cosine similarity.
- `TemporalMemory` — CMA temporal decay + spreading activation. Strength decays per-hour, reinforces on access, propagates relevance to temporal neighbors.
- `SelfEditingMemory` — Letta/MemGPT-style named memory blocks. Agents read and mutate versioned text blocks mid-execution. Full edit history + revert.
- `createFlatVectorStore` / `createPineconeStore` / `createQdrantStore` / `createRedisVectorStore` — External vector store adapters, all implementing `MemoryBackend`.
- `SleepTimeAgent` — LLM-driven background consolidation. Synthesizes insights from recent memory during idle periods.

**Core**
- `ReputationBoard` — `MessageBoard` extended with per-agent reputation scoring. Upvotes, citations, correct answers drive scores. Findings weighted by sender reputation.
- Typed DI — `AgentContext<TDeps>` and `AgentDefinition<TInput, TOutput, TDeps>`. Agents declare typed dependencies; `context.deps` is fully typed at callsite.

**Testing & Evaluation**
- `evalTrajectory` / `compareTrajectories` — TRACE-style multi-dimension trajectory evaluation: step efficiency, tool precision, backtrack rate, plan adherence, outcome quality.

**Workflow**
- `StateMachine` / `buildLinearStateMachine` — LangGraph-style directed graph with cycles, conditional edges, and `maxIterations` guard.

**Patterns**
- `runLoop` — LoopAgent primitive. Runs an agent iteratively until convergence (`shouldStop` predicate, DONE signal, or `maxIterations`). Full iteration history.

**Session**
- `BranchManager` — Fork a session at any message index to explore alternative continuations. Diff, merge, and tree visualization.

**Observability**
- `exportToOTLP` / `createOTelExporter` / `withOTelExport` — Auto-push traces to any OTLP/HTTP endpoint (Jaeger, Tempo, Honeycomb, OTEL Collector) after execution.

---

## [1.4.0] — 2026-04-08

### New Features

**Execution**
- `TimeTravelStore` — Rewind to any step and fork execution from that point with optional step modifications.
- `RollbackManager` — Snapshot state before tool calls; undo individual or full-execution actions in reverse order.

**Optimizer**
- `PromptOptimizer` — DSPy-style prompt optimization. Bootstraps few-shot examples from `DistillationCollector`, generates prompt variants via LLM, scores against training pairs.

**Testing**
- `EvalHarness` — Named harnesses with run history, pass-rate tracking, and regression detection. Computes trend (`improving` / `stable` / `degrading`) from last 3 runs.

**Tools**
- `createNodeSandbox` / `createDockerSandbox` / `createE2BSandbox` — Code execution sandbox with three backends. Returns a `Tool` for `agent.tools[]`.
- `createBrowserTool` — Playwright-backed browser automation tool (navigate, click, type, screenshot, extract).
- `createComputerUseTool` — Anthropic Computer Use API tool wrapper.

**Patterns**
- `runHierarchy` — Formal authority levels with escalation. Low-confidence outputs escalate to higher-authority agents.

**Session**
- `SessionManager` — Named persistent conversation sessions. `swarm.runInSession()` prepends prior context automatically.

**Workflow**
- `EventFlow` — Event-driven workflow runtime. Steps subscribe to events, emit new ones; execution is queue-driven rather than DAG-fixed.

**Memory**
- `EpisodicMemory` — Stores specific past interactions with temporal ordering and tag-based recall.
- `ProceduralMemory` — Stores "how to" procedures with success rate tracking.

**A2A Protocol — v1.0**
- `kind: 'task'` on `A2ATask`, `ContextId` type alias for cross-task threading.
- `AgentCard.offline?`, `A2ATaskState` gains `'streaming'`, `A2AMessage` gains `messageId` and `contextId`.
- `tasks/sendSubscribe` JSON-RPC method for SSE push.
- `streamSubscribe()` client function.
- Default `protocolVersion` bumped to `'1.0'`.

**Catalog**
- `AgentCatalog` — Runtime agent discovery by capability, tag, availability, or semantic description. Heartbeat-based liveness.

**Voice**
- `VoicePipeline` — STT → LLM → TTS pipeline. Factory methods for Deepgram, ElevenLabs, OpenAI STT/TTS.

---

## [1.3.0] — prior

### New Features
- Hooks system (`HookRegistry`, priority-ordered hooks, swarm event bridging)
- Consensus protocols (`RaftNode`, `ByzantineNode`, `GossipNode`)
- Hive-Mind pattern (`runHiveMind`)
- Federation hub (`FederationHub`)
- `ReasoningBank` — trajectory-based pattern memory with EWC
- Vector quantization (`createQuantizer` — binary, scalar, product)
- `AttentionRouter` — multi-head attention-based agent routing
- `RLRouter` / `RLRouterPPO` — reinforcement learning routers

---

## [1.2.0] — prior

### New Features
- A/B testing engine
- Judge agent for quality evaluation
- Weight table for dynamic routing
- Distillation collector for training pairs (LLMRouter)

---

## [1.1.0] — prior

### New Features
- Self-learning memory with Elastic Weight Consolidation (EWC)
- Vector memory with HNSW-like approximate nearest neighbor search
- 3-tier intelligent model routing
- Token optimizer (pattern caching, prompt compression, batch optimization)
- Knowledge graph with PageRank-based importance
- Background worker system (memory optimizer, pattern learner, metrics, health check)
- Threat detection system (SQL/command/XSS injection, path traversal, secrets, PII)
- ADR framework for spec-driven development
- 10 new agent templates (17 total)

---

## [1.0.0] — initial release

- Budget-first multi-agent orchestration
- Orchestrator-worker, pipeline, map-reduce, debate, blackboard, fan-out patterns
- Anthropic, OpenAI, Gemini, Ollama, generic OpenAI-compatible providers
- Circuit breaker, failover, rate limiter
- Routing stack: SemanticCache, LatencyRouter, CascadeRouter, SpeculativeCascade, QueryDecomposer
- MCP tool loading
- A2A v0.3 protocol
- Record/Replay testing, evals framework
- Guardrails (PII, injection, hallucination, length, content filter)
- Output contracts (schema + semantic validation)
- Approval gates
- YAML workflow compiler
- Dry-run cost projection
- Differential execution (skip unchanged steps)
- SSE streaming transport
- OpenTelemetry export
- Plugin system
- 7 agent templates
43 changes: 25 additions & 18 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,29 +24,36 @@ npm run clean # Remove dist/
| Module | Path | Purpose |
|--------|------|---------|
| **Types** | `src/types/` | All TypeScript interfaces — agent, budget, plan, task, execution, provider, tool, memory, pattern |
| **Core** | `src/core/` | `Swarm` class, `createAgent()`, MCP tool loader, `MessageBoard` (inter-agent messaging), `stub-board` (no-op board for patterns), guardrails (input/output/tool guards with built-ins: PII, injection, hallucination, max-length, content-filter), output contracts (schema + semantic validation) |
| **Core** | `src/core/` | `Swarm` class, `createAgent()`, MCP tool loader, `MessageBoard`, `ReputationBoard` (reputation-weighted board), `stub-board`, guardrails, output contracts |
| **Budget** | `src/budget/` | `BudgetLedger` (hard enforcement), cost optimizer |
| **Planner** | `src/planner/` | Task scorer, DAG builder, model router, adaptive router, cascade router, semantic cache, speculative cascade, query decomposer, latency router, **3-tier intelligent routing** |
| **Executor** | `src/executor/` | Parallel DAG runner, checkpoint/resume, dry-run cost projection, differential execution |
| **Patterns** | `src/patterns/` | Orchestrator-worker, pipeline, map-reduce, debate, blackboard |
| **Providers** | `src/providers/` | Anthropic, OpenAI, Gemini, Ollama adapters, generic OpenAI-compatible (LiteLLM/vLLM), circuit breaker, failover, rate limiter, model cascade on quality |
| **Planner** | `src/planner/` | Task scorer, DAG builder, model router, adaptive router, cascade router, semantic cache, speculative cascade, query decomposer, latency router, attention router, RL router, 3-tier routing |
| **Executor** | `src/executor/` | Parallel DAG runner, checkpoint/resume, dry-run, differential execution, time-travel debugging, rollback manager, trajectory reducer (AgentDiet), speculative tool executor (PASTE) |
| **Patterns** | `src/patterns/` | Orchestrator-worker, pipeline, map-reduce, debate, blackboard, fan-out, hive-mind, hierarchy, loop-agent |
| **Providers** | `src/providers/` | Anthropic, OpenAI, Gemini, Ollama, generic OpenAI-compatible (LiteLLM/vLLM), circuit breaker, failover, rate limiter, model cascade on quality |
| **Conflict** | `src/conflict/` | Contradiction detector (Jaccard/structural), resolver (vote/evidence/escalate) |
| **Context** | `src/context/` | Token-budget-aware context packer |
| **A2A** | `src/a2a/` | Agent2Agent protocol — server, client, agent cards |
| **A2A** | `src/a2a/` | Agent2Agent protocol v1.0 — server, client, agent cards, contextId, streaming state, `tasks/sendSubscribe` |
| **Pool** | `src/pool/` | Worker pool with lifecycle, concurrency, warm pooling |
| **Trace** | `src/trace/` | Human-readable execution reports, DAG visualization |
| **Workflow** | `src/workflow/` | YAML workflow parser + compiler to executable Plans |
| **Templates** | `src/templates/` | 17 pre-built agent templates (researcher, code-reviewer, synthesizer, data-analyst, qa-tester, writer, planner, security-auditor, devops-engineer, database-engineer, api-designer, performance-engineer, documentation-specialist, architecture-advisor, debugger, refactoring-specialist, integration-specialist, test-automation-engineer) |
| **Trace** | `src/trace/` | Human-readable execution reports, DAG visualization, OTel export (`toOTelSpans`, `toOTLPJson`), OTel auto-exporter (OTLP push) |
| **Workflow** | `src/workflow/` | YAML workflow parser + compiler, event-driven workflows (`EventFlow`), graph state machine (`StateMachine`) |
| **Templates** | `src/templates/` | 17 pre-built agent templates |
| **Adapters** | `src/adapters/` | Claude Agent SDK wrapper |
| **Orchestrator** | `src/orchestrator/` | Evolving orchestrator (bandit-based adaptive sequencing), **A/B testing engine**, **Judge agent for quality evaluation**, **Weight table for dynamic routing**, **Distillation collector for training pairs** |
| **Orchestrator** | `src/orchestrator/` | Evolving orchestrator, A/B testing, judge agent, weight table, distillation collector |
| **Persistence** | `src/persistence/` | Save/load state to disk or memory backend |
| **Memory** | `src/memory/` | ANCS memory backend, **self-learning memory with EWC**, **vector memory with HNSW-like search** |
| **Testing** | `src/testing/` | `RecordingProvider` (wraps real provider, saves fixtures), `ReplayProvider` (loads fixtures, zero-cost deterministic replay), evals framework |
| **Optimizer** | `src/optimizer/` | Token optimizer with pattern caching, compression, and batch optimization |
| **Workers** | `src/workers/` | Background worker system for continuous optimization (memory optimizer, pattern learner, metrics collector, cache cleanup, health check) |
| **Security** | `src/security/` | Threat detection system (SQL/command/XSS injection, path traversal, hardcoded secrets, prompt injection, PII detection) |
| **Spec** | `src/spec/` | Architecture Decision Records (ADRs) framework for spec-driven development |
| **Graph** | `src/graph/` | Knowledge graph with PageRank-based importance, graph-enhanced ranked retrieval |
| **Memory** | `src/memory/` | ANCS, self-learning (EWC), vector (HNSW-like), A-MEM (Zettelkasten graph), temporal (CMA decay), self-editing blocks (Letta), episodic, procedural, external vector store adapters (Pinecone/Qdrant/Redis/flat) |
| **Session** | `src/session/` | Named persistent sessions, `SessionManager`, conversation branching (`BranchManager`) |
| **Testing** | `src/testing/` | `RecordingProvider`, `ReplayProvider`, evals framework, `EvalHarness` (run history + regression), trajectory evaluation (TRACE) |
| **Optimizer** | `src/optimizer/` | Token optimizer, prompt optimizer (DSPy-style) |
| **Workers** | `src/workers/` | Background workers (memory optimizer, pattern learner, metrics, health check), sleep-time compute agent |
| **Security** | `src/security/` | Threat detection (SQL/command/XSS injection, path traversal, secrets, prompt injection, PII) |
| **Tools** | `src/tools/` | Code execution sandbox (Node vm / Docker / E2B), browser tool (Playwright), computer use (Anthropic), skill reducer (progressive disclosure) |
| **Voice** | `src/voice/` | Voice agent pipeline (STT → LLM → TTS), Deepgram/ElevenLabs/OpenAI providers |
| **Catalog** | `src/catalog/` | Agent discovery catalog with semantic search |
| **Hooks** | `src/hooks/` | Hook registry, priority-ordered hooks, swarm event bridging |
| **Consensus** | `src/consensus/` | Raft, Byzantine fault-tolerant, Gossip consensus |
| **Federation** | `src/federation/` | Multi-swarm federation hub |
| **Spec** | `src/spec/` | Architecture Decision Records (ADRs) |
| **Graph** | `src/graph/` | Knowledge graph with PageRank, graph-enhanced retrieval |

### Data Flow

Expand Down Expand Up @@ -130,7 +137,7 @@ Task → Scorer → Planner (DAG) → Executor

- All tests in `tests/unit/` — pure unit tests, no external services
- Tests use mock providers that return canned responses
- 29 test files, 265 tests
- 71 test files, 621 tests
- Run with `npm test`

## Peer Dependencies
Expand Down
Loading
Loading