Releases: sethdford/shipwright
Releases · sethdford/shipwright
Shipwright v3.3.0
Highlights
Fully Autonomous Pipeline
- Context exhaustion auto-recovery — detects Claude context limits, auto-restarts with progress briefing. Zero human intervention.
- Config-driven model routing —
_smart_model()replaces all hardcoded model names. Reads from env → daemon-config → user-config → defaults. - Adaptive effort levels — per-stage effort from
daemon-config.jsonwith intelligent defaults. - Exponential backoff — replaces hardcoded sleep values in health checks and deployment polling.
Dark Factory Phases 1-8
- Test-as-holdout validation, spec-driven development, causal graphs, auto-recovery
- Process reward models, constitutional AI, formal specs, mutation testing
- Cross-session RL with Thompson sampling bandits and policy learning
- Spec-driven pipeline stages (spec_generation + spec_verification)
AutoResearch RL System
- Reward aggregation from 80+ pipeline signals
- Thompson Sampling for model/template selection
- Policy learner with per-context strategy optimization
- 26/26 E2E tests prove wiring
Pipeline Fixes
- Fixed 58 pipeline tests (were failing, now all pass)
- Fixed scope enforcement set -e propagation bug
- Fixed mock claude flag handling
SOTA Research
- 5 research docs (77KB), 65+ sources
- 20-item prioritized backlog with 12-week roadmap
Upgrade
shipwright upgrade --applyOr reinstall:
curl -fsSL https://raw.githubusercontent.com/sethdford/shipwright/main/install.sh | bashFull Changelog
https://github.com/sethdford/shipwright/blob/main/CHANGELOG.md
What's Changed
- feat: AI-powered skill injection for pipeline stages by @sethdford in #194
- refactor: split 3 large lib modules into focused sub-modules by @sethdford in #219
- refactor: create lib/bootstrap.sh, convert 5 scripts by @sethdford in #220
- refactor: establish modular backend structure for dashboard by @sethdford in #221
- refactor: decompose sw-pipeline.sh into 4 focused modules by @sethdford in #222
- refactor: decompose sw-recruit.sh into 3 focused modules by @sethdford in #223
- fix: stabilize test suites and fix pipeline intelligence skip by @sethdford in #235
- Add Claude Code GitHub Workflow by @sethdford in #243
- feat: pipeline quality revolution — 7 components closing 6 quality gaps by @sethdford in #248
Full Changelog: v3.2.0...v3.3.0
Shipwright v3.2.0
Full Changelog: v3.1.0...v3.2.0
Shipwright v3.1.0
Full Changelog: v3.0.0...v3.1.0
v3.0.0 — Full Architecture Overhaul
Shipwright 3.0.0
A ground-up architecture overhaul making Shipwright database-first, event-driven, and self-learning.
Highlights
- Centralized Configuration — All ~200+ magic numbers extracted into
config/defaults.jsonwith 4-layer precedence (env var > daemon-config > policy > defaults) - SQLite as Source of Truth — Daemon state, heartbeats, costs, pipeline runs, and memory all read/write to SQLite first with file fallback
- Unified Event System — 3 separate event stores consolidated into a single SQLite events table with consumer offset tracking and durable checkpoints
- Thompson Sampling — Template selection uses Beta distribution sampling over historical success rates per complexity tier
- UCB1 Model Routing — Balances exploration/exploitation for model selection across pipeline stages
- Semantic Memory — Keyword-relevance search over stored memories for context injection into agent prompts
- Reasoning Traces — Multi-step autonomous reasoning stored and queryable
- Adaptive Thresholds — Quality and anomaly thresholds computed from historical distributions instead of hardcoded values
- Real-time Event Streaming — New /ws/events WebSocket endpoint for live event monitoring
- Dead Code Cleanup — Removed duplicate helpers and color definitions from 90+ scripts
New Files
| File | Purpose |
|---|---|
| config/defaults.json | Central defaults for all tunables |
| config/event-schema.json | Known event types and field validation |
| scripts/lib/config.sh | Config reader with _config_get |
Schema Changes
SQLite schema v6 with new tables: daemon_queue, event_consumers, durable_checkpoints, memory_patterns, memory_decisions, memory_embeddings, pipeline_outcomes, model_outcomes, reasoning_traces
Shipwright v2.4.0
Full Changelog: v2.3.1...v2.4.0
Full Changelog: v2.3.1...v2.4.0
v2.3.1 — Autonomous Feedback Loops, Testing Foundation, Chaos Resilience
What's New
Testing Foundation (211 new tests)
- Vitest unit tests — 113 tests across state store, API client, router, WebSocket, design tokens, and icons
- Server API tests — 46 endpoint tests for error handling, edge cases, and lifecycle operations
- Autonomous E2E — 20 tests for daemon coordination, strategic ingestion, retro-optimize, oversight gates
- Budget & chaos — 16 tests for budget limits, missing/corrupted files, large files, concurrent writes
- Memory & discovery — 16 tests for failure patterns, fix effectiveness, discovery TTL, cross-pipeline learning
Feedback Loops Wired (Tier 1)
- Production → Issues — Monitor stage always collects deploy logs, not just on error threshold
- Retro → Self-optimize — Retrospective metrics automatically feed into template weight adjustments
- Oversight → Merge — Oversight gate + approval gate mandatory before merge stage
Coordination Gaps Closed (Tier 2)
- Autonomous ↔ Daemon — Detects running daemon, delegates via
ready-to-buildlabel instead of duplicate pipelines - Strategic → Autonomous — Strategic agent findings ingested, deduplicated, and fed into autonomous creation loop
- AI-driven triage — Intelligence engine classification with
--aiflag, falls back to keyword-based
Trust & Validation (Tier 3)
- Long-running autonomous E2E test validates 100-cycle drift scenarios
- Budget guard tests prove system stops at limits
- Chaos tests cover missing files, corrupted JSON, GitHub 500s, rate limits
Full Changelog: v2.3.0...v2.3.1
Shipwright v2.3.0
Full Changelog: v2.2.2...v2.3.0
Full Changelog: v2.2.2...v2.3.0
Shipwright v2.2.2
Full Changelog: v2.2.1...v2.2.2
Shipwright v2.2.1
Full Changelog: v2.1.2...v2.2.1
Shipwright v2.2.0
Full Changelog: v2.1.2...v2.2.0