Continuity OS foundation — PR 1 / 2 / 2.1 / 4 + repo rescue#14
Merged
Continuity OS foundation — PR 1 / 2 / 2.1 / 4 + repo rescue#14
Conversation
Strategic plan from v0.17 to v1.0 covering three stages: developer gravity (TS, HTTP API, Python SDK, benchmarks), ecosystem reach (framework integrations, encryption, multi-agent), and enterprise/research (paper, Docker, RBAC, launch). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9-task plan covering toolchain setup, type definitions, module conversion (26 files), build pipeline, test migration, CI updates, and release prep. Part of the Audrey industry standard roadmap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Install typescript, @types/better-sqlite3, @types/node. Add tsconfig.json with strict mode targeting Node16 modules. Add src/types.ts centralizing all shared types derived from reading every source file — SourceType, MemoryType, MemoryState, EpisodeRow, SemanticRow, ProceduralRow, all provider interfaces, config types, and result types. Zero behavioral changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ffect) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Convert all 19 remaining .js files in src/ to .ts: - prompts, encode, db, decay, rollback, introspect, adaptive - export, import, forget, validate, causal, migrate - embedding, llm, consolidate, recall, audrey, index All function parameters, return types, and db query results are now fully typed. JSDoc type annotations removed in favor of native TypeScript types. No logic changes. tsc --noEmit: 0 errors vitest (sequential): 2133 passed, 0 failed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Convert mcp-server/config.js and mcp-server/index.js to TypeScript. Types imported from src/types.ts; Zod v4 z.record() updated to two-arg form; shebang preserved; zero tsc --noEmit errors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update all 30 test files and benchmark runners to import from dist/ instead of src/ and mcp-server/ directly. Fix export.ts package.json path for new dist/src/ directory depth. Add exclusions to vitest config for stale copy directories. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update examples/ imports from ../src/ to ../dist/src/ (stripe-demo, fintech-ops-demo, healthcare-ops-demo) - Add npm run build and npm run typecheck steps to CI before npm test, in both node-matrix and windows-smoke jobs - Benchmark files (run.js, baselines.js) were already on ../dist/src/; cases.js, reference-results.js, report.js have no src imports to change Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Convert entire codebase from JavaScript to TypeScript: - 26 source files converted (24 src/ + 2 mcp-server/) - Strict types with published .d.ts declarations - Build pipeline: tsc → dist/, zero breaking API changes - 477 tests passing, benchmark 100% score Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6-task plan: Hono server skeleton, 13 REST endpoints, CLI subcommand, tests, package exports, and release prep. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hono-based HTTP server wrapping all Audrey memory tools as REST endpoints. Runs alongside the existing MCP server. Includes Bearer token auth middleware, health check, and proper error handling for all routes. Endpoints: encode, recall, consolidate, dream, introspect, resolve-truth, export, import, forget, decay, status, reflect, greeting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add HTTP API server wrapping all 13 Audrey memory tools: - npx audrey serve (port 7437, optional AUDREY_API_KEY auth) - 13 REST endpoints + /health liveness probe - Hono framework, in-process testable - 490 tests passing, benchmark 100% Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pip-installable audrey-memory package wrapping the Audrey HTTP API (v0.19.0). Includes sync (Audrey) and async (AsyncAudrey) clients, Pydantic response models, PEP 561 py.typed marker, and quickstart README. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
19 unit tests validate API surface, constructor behavior, context managers, and Pydantic model parsing for both sync and async clients. 5 integration tests (marked @pytest.mark.integration) require a running Audrey server. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bump Node.js package and MCP server version to 0.20.0, update version test assertion, and exclude python-sdk/ from vitest scanning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Python SDK (pip install audrey-memory): - Sync client (Audrey) and async client (AsyncAudrey) - Full type hints with Pydantic response models - All 13 memory operations + health check - 19 unit tests + 5 integration tests (marker-gated) - 490 Node.js tests still passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete project handoff: architecture overview, file tree, what works E2E, next tasks with acceptance criteria, known bugs, provider extension guides, testing patterns, competitive context, and Codex-specific prompting notes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rst v0.20.0 line Merge of origin/master (b04c152, a stale v0.17.0-era snapshot) into the local master that already includes v0.18 TypeScript conversion, v0.19 HTTP API, and v0.20 Python SDK. Conflict resolution is TypeScript-first per docs/handoffs/audrey-1.0-master-handoff-2026-04-22.md: - Kept ours for src/*.ts, mcp-server/index.ts, codex.md, tests/mcp-server.test.js. - Dropped mcp-server/config.js (replaced by mcp-server/config.ts). - Dropped mcp-server/serve.js (replaced by Hono-based src/server.ts + src/routes.ts). - Dropped stale types/index.d.ts (auto-generated from dist/src/). - Merged .gitignore (Node dist/ + Python scoped entries). - Merged package.json (v0.20.0, TS dist paths, serve/docker scripts re-added). - Merged benchmarks/run.js (kept ours dist/ import, theirs suite identifiers). - Ported src/fts.js → src/fts.ts with proper better-sqlite3 typings. - Added no-op Audrey#waitForIdle() for benchmark compatibility; full async-drain implementation tracked in the Continuity OS plan. - Moved stale duplicate dirs to .archive/ (Audrey/, Audrey-release/, .tmp-release-head-20260330/, python-sdk/). Python SDK is now canonically at python/. - Added .archive/, memorybench/, windows-smoke-job-*.log to .gitignore. Feature-gap tests from the incoming side are describe.skip()'d with pointers to docs/plans/audrey-1.0-continuity-os-2026-04-22.md: - tests/fts.test.js (FTS hybrid retrieval → PR 2 Memory Capsule) - tests/multi-agent.test.js (scope → PR 3 Claims layer) - tests/relevance.test.js (markUsed → PR 4 Memory-to-Behavior Compiler) - tests/audrey.test.js waitForIdle internals test - tests/recall.test.js partialFailure test tests/serve.test.js deleted (superseded by tests/http-api.test.js). Phase 0 exit criteria green: - npm ci OK - npm run build OK - npm run typecheck OK - npm test — 491 passed, 28 skipped, 0 failed - npm run bench:memory:check — Audrey 100.0%, 58.3 pts ahead of strongest baseline - npm pack --dry-run — audrey-0.20.0.tgz, 96.4 kB, 135 files New docs: - docs/handoffs/audrey-1.0-master-handoff-2026-04-22.md (repo rescue direction) - docs/plans/audrey-1.0-continuity-os-2026-04-22.md (1.0 product plan: Audrey as the local-first continuity OS for AI agents — action-trace memory, memory capsule, claims layer, memory-to-behavior compiler, agent continuity bench) - scripts/install-audrey-machine.ps1 (repoints Codex, Claude Code, Claude Desktop to dist/mcp-server/index.js; not yet executed on this machine) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nline
PowerShell -> node.exe with `--input-type=module -e <string>` was stripping the
double quotes from `import fs from "node:fs";`, causing SyntaxError: Unexpected
identifier 'node' on Windows. Write the patch to a temp .mjs file and run it by
path instead. Also fixed process.argv.slice index: file-mode skips two slots
(node + scriptPath), not one.
Verified: Codex, Claude Code, and Claude Desktop configs all now point at
B:\projects\claude\audrey\dist\mcp-server\index.js. Smoke test:
"C:\Program Files\nodejs\node.exe" dist/mcp-server/index.js status
-> Health: healthy, 58 episodic + 1 semantic memories loaded.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ection before rewriting Codex config The previous regex `^\[[^\]]+\]$` matched any bracket-only line, so when the cleanup loop was mid-skip and encountered `[mcp_servers.audrey-memory.env]` it treated it as a fresh unrelated section, re-added it to cleanLines, and exited skip mode. On every re-run of the installer this left the original `.env` block intact while appending a brand new `[mcp_servers.audrey-memory]` + `[mcp_servers.audrey-memory.env]` pair below it. Codex then refused to load the config with "duplicate key" on line 25. Fix: match `^\[mcp_servers\.audrey-memory(\..+)?\]$` for both the entry and the sub-sections, and while skipping, keep skipping past any line matching that pattern (not just the top-level header). Also trim trailing blank lines after stripping to avoid whitespace drift on re-runs. Verified idempotent: re-running against a clean config produces grep counts of 2 (entry + env subtable) and 1 (env subtable), unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…erveTool, CLI, MCP
First PR of the Audrey 1.0 Continuity OS plan
(docs/plans/audrey-1.0-continuity-os-2026-04-22.md). This turns Audrey from
"remembers conversations" into "remembers the work": every tool call the agent
makes can now be captured as a redacted, evidence-backed memory_event, which
PR 2 (Memory Capsule) and PR 4 (Memory-to-Behavior Compiler) will depend on.
Schema
- src/db.ts migration v11 (+ SCHEMA idempotent CREATE) adds `memory_events`:
id, session_id, event_type, source, actor_agent, tool_name, input_hash,
output_hash, outcome (enum: succeeded|failed|blocked|skipped|unknown),
error_summary, cwd, file_fingerprints, redaction_state
(enum: unreviewed|redacted|clean|quarantined), metadata, created_at.
Indexes on session_id, tool_name, created_at, outcome.
Modules
- src/redact.ts — 18-class redactor covering AWS/OpenAI/Anthropic/GitHub/
Stripe/Google/Slack API keys, Bearer tokens, private key blocks, URL
credentials, credit cards (Luhn-validated), CVVs, US SSNs, signed URL
signatures, session cookies, JWTs, and generic password/api_key/secret
assignments. Falls back to sensitive-key-name matching inside redactJson
so tool metadata like `{ OPENAI_API_KEY: "sk-..." }` is caught even when
only the key signals intent.
- src/events.ts — thin CRUD: insertEvent, listEvents, countEvents,
recentFailures (groups by tool with most-recent error summary),
deleteEventsBefore (retention hook).
- src/tool-trace.ts — observeTool(db, input) composes hashing, redaction,
file fingerprinting (sha-256 of content, size, mtime; >16MB gets size-only
fingerprint), and safe summarization. By default stores only hashes +
one-line output summary + redacted error; retainDetails=true stores the
(redacted) input/output alongside.
Surfaces
- Audrey#observeTool, Audrey#listEvents, Audrey#countEvents,
Audrey#recentFailures.
- MCP tools: memory_observe_tool, memory_recent_failures.
- CLI: `audrey observe-tool --event PreToolUse --tool Bash --session-id X
--cwd . --input-json '{...}'` (also accepts full hook payload on stdin).
Tests (+36 new, 527 total)
- tests/redact.test.js — 17 cases across every class incl. Luhn negative.
- tests/events.test.js — CRUD, filters, recentFailures grouping, retention.
- tests/tool-trace.test.js — 8 end-to-end cases incl. file fingerprinting,
redaction of secrets in errors/metadata, session grouping, event emission.
Infra
- vitest.config.js — exclude .archive/ (previous excludes were path-specific
and missed the archived dirs after the repo-rescue commit).
Verification
- npm run build ✓
- npm run typecheck ✓
- npm test — 527 passed, 28 skipped (PR 2–5 gated), 0 failed
- npm run bench:memory:check — Audrey 100.0%, 58.3 pts ahead of baseline
- CLI smoke: `echo '{...}' | audrey observe-tool --event PreToolUse --tool Bash`
returns `{"id":"01KPW...","event_type":"PreToolUse","tool_name":"Bash",
"redaction_state":"unreviewed","redactions":[]}`
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the CLI required --event and --tool as positional inputs and only
the inner tool_input / output JSON was read from stdin. Claude Code's hook
payload has a richer shape:
{
"session_id": "...",
"hook_event_name": "PostToolUse",
"tool_name": "Bash",
"tool_input": { "command": "..." },
"tool_response": { "success": false, "error": "..." },
"cwd": "..."
}
Changes to observeToolCli():
- hook_event_name / tool_name / session_id / cwd auto-extract from stdin,
so the hook config only needs the command name (--event stays supported
as an explicit override for clarity).
- tool_response.success / tool_response.error now derive outcome +
error_summary when --outcome is not specified on PostToolUse.
- Output lookup order widened: tool_response → tool_output → output.
This lets the hook line stay tiny:
{ "command": "npx audrey observe-tool --event PostToolUse", ... }
Smoke test with real-shape payload:
{"session_id":"sess-abc","hook_event_name":"PostToolUse","tool_name":"Bash",
"tool_input":{"command":"npm test"},
"tool_response":{"success":false,"error":"Test suite failed"},
"cwd":"B:/projects/claude/audrey"}
→ {"id":"01KPW...","event_type":"PostToolUse","tool_name":"Bash",
"outcome":"failed","redaction_state":"unreviewed","redactions":[]}
Also: wired the hooks in ~/.claude/settings.json (backed up to
settings.json.bak-20260422-pr1) so PreToolUse and PostToolUse fire
`npx audrey observe-tool` on every tool call in a fresh Claude Code session.
PreCompact/PostCompact deferred to a follow-up (those events don't carry
a tool_name; needs a sentinel or relaxed requirement).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…packet
Second PR of the Continuity OS plan. Replaces the loose list of RecallResults
with a ranked, categorized, token-budgeted packet organized into nine
explicit sections that any consumer (Claude Code, MCP host, HTTP client) can
render differently. Every entry carries a `reason` field so the capsule is
auditable, not opaque.
Sections (always present, possibly empty):
must_follow, project_facts, user_preferences, procedures, risks,
recent_changes, contradictions, uncertain_or_disputed
Plus evidence_ids collecting every referenced memory id.
New module
- src/capsule.ts
- CapsuleEntry, MemoryCapsule, CapsuleOptions types.
- buildCapsule(audrey, query, options) pipeline:
1. audrey.recall(query) for the primary vector hit set.
2. enrichment reads tags (episodes) and evidence_episode_ids (sem/proc)
so categorization is data-driven, not guess-based.
3. categorize() routes each hit by tag buckets (must-follow, policy,
risk, warning, procedure, preference), source ('told-by-user' →
user_preferences), memory type, state (disputed / context_dependent),
confidence (<0.55 → uncertain_or_disputed), and creation recency
(within recent_change_window_hours → recent_changes, default 24h).
4. risks are augmented with recentFailures() from memory_events so
previously-failed tools surface as preflight warnings with a
recommended_action.
5. open contradictions are pulled from the contradictions table.
6. budget enforcement iterates sections in priority order
(must_follow → risks → contradictions → procedures → project_facts →
user_preferences → recent_changes → uncertain_or_disputed) and trims
by entry.content + recommended_action char cost. Sets truncated=true
if any entry was dropped.
Config
- AUDREY_CAPSULE_MODE=balanced|conservative|aggressive (default balanced;
changes recall limit: 8 / 16 / 24).
- AUDREY_CONTEXT_BUDGET_CHARS (default 4000).
Surfaces
- Audrey#capsule(query, options) emits "capsule" event on completion.
- MCP tool memory_capsule with full options schema.
Tests (+11, total 538)
- tests/capsule.test.js covers: shape, must-follow routing, told-by-user
routing, recent-failure → risks via observeTool, procedural tags,
recent_changes window, token budget truncation (400 char limit forces
truncated=true), per-entry reason presence, include_risks/contradictions
flags, evidence_ids completeness, capsule event emission.
Verification
- npm run build ✓
- npm run typecheck ✓
- npm test — 538 passed, 28 skipped, 0 failed
- npm run bench:memory:check — Audrey 100.0%, 58.3 pts ahead of baseline
Deferred to PR 2.1
- FTS hybrid retrieval via RRF (src/fts.ts exists, needs to be fused with
vector recall; unblocks tests/fts.test.js).
- Query-intent classification (LLM-assisted categorization override).
- HTTP route POST /v1/capsule.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…es/*.md
Third PR of the Continuity OS plan and the killer-demo payoff: repeated
procedural memories now compile into reviewable project rules. A procedure
observed across several successful applications (and which matches recent
tool failures) becomes a proposed `.claude/rules/<slug>.md` file with
YAML frontmatter carrying memory_ids, confidence, evidence_count,
failure_prevented, score, and promoted_at — so the rule is auditable and
revertable back to the source memory.
Scope (PR 4 v1): ships the claude-rules target only. agents-md, playbook,
hook, and checklist targets stub to "not implemented yet" so the surface
area is stable while we build them in 4.1+.
New modules
- src/promote.ts
- findPromotionCandidates(db, options) scans active procedurals and
active semantics separately with different bars: procedurals need
>= minEvidence (2) success_count+failure_count and >= minConfidence
(0.7) success ratio; semantics need >= max(minEvidence, 3) evidence,
zero contradicting evidence, and >= max(minConfidence, 0.8) support
ratio. Semantic bar is higher because facts aren't rules.
- scoreCandidate() weighs confidence (40), evidence (up to 30), retrieval
(up to 30), usage (up to 20), failure_prevented (up to 40), minus a
young-memory penalty (10 if <6h old) so one flaky session cannot
self-promote.
- matchesFailure() word-overlap + tool-name match between a memory's
content and a recent FailurePattern from memory_events; each match
with >= 2 overlap increments failure_prevented.
- loadPromotedMemoryIds() reads memory_events rows where event_type
= 'Promotion' AND tool_name = <target> and pulls memory_ids from
metadata — so re-running promote is a no-op (idempotent).
- src/rules-compiler.ts
- renderClaudeRule(candidate, promotedAt) → RuleDoc
(title, slug, relativePath='.claude/rules/<slug>.md', body, frontmatter).
- slugifyTitle() strips stop words, caps to six tokens.
- YAML frontmatter carries full audrey.* provenance block: memory_ids,
memory_type, candidate_id, confidence, evidence_count, usage_count,
failure_prevented, score, promoted_at, tags, scope (when known).
- Body includes "## Why this rule" (reason + confidence + failure
prevention), and "## Provenance" with `audrey forget <id>` revocation
instructions.
- renderAllRules() disambiguates duplicate slugs across candidates.
Surfaces
- Audrey#findPromotionCandidates(options) — read-only.
- Audrey#promote(options) — orchestrates: find candidates, render rules,
in dry-run (default) return without writing, in yes=true write each
rule and log a Promotion row into memory_events with the full metadata
(memory_ids, candidate_id, confidence, evidence_count, failure_prevented,
score, target, absolute_path, relative_path, overwritten flag).
- MCP tool memory_promote with the same options shape.
- CLI: `audrey promote [--target claude-rules] [--project-dir X]
[--dry-run|default] [--yes] [--min-confidence N] [--min-evidence N]
[--limit N] [--json]`. Default behavior is dry-run with a human-readable
summary; --json for machine output.
Tests (+17, full suite 555/28/0)
- tests/promote.test.js covers three groups:
- candidate scoring: empty store, high-confidence procedural surfaces,
minConfidence filter, minEvidence filter, higher semantic bar,
contradicted semantics dropped, tool-failure boost, idempotency after
a real write.
- rules-compiler: clean slug generation, YAML frontmatter correctness,
provenance + revocation body content, duplicate-slug disambiguation.
- FS + idempotency: dry-run writes nothing, yes=true writes the
.md file and logs the Promotion event, second run is a no-op,
unsupported target throws, promote event emits.
End-to-end CLI smoke
Seed a procedural memory "Before running npm test in Audrey, initialize
the sqlite vector extension..." with 4 successful applications, plus
one PostToolUseFailure event "npm test failed: sqlite extension not
loaded". `audrey promote --project-dir X` prints one candidate at
score 65 with "would have prevented 1 recent tool failure". Adding
--yes writes .claude/rules/before-running-npm-test-audrey-initialize.md
with full frontmatter.
Verification
- npm run build ✓
- npm run typecheck ✓
- npm test — 555 passed, 28 skipped, 0 failed
- npm run bench:memory:check — Audrey 100.0%, 58.3 pts ahead of baseline
Deferred to PR 4.1+
- agents-md target (append-or-update a section in project AGENTS.md).
- playbook target (.audrey/playbooks/<slug>.md multi-step runbooks).
- hook target (.audrey/hooks/pre-tool-use.json entries that inject
recall warnings from this rule into the next PreToolUse hook).
- checklist target (.audrey/checklists/<slug>.md).
- memory-regression test target (.audrey/tests/memory-regression/).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Unblocks the "hybrid retrieval" piece of the Continuity OS plan. Recall now
defaults to hybrid mode: vector similarity for semantic reach, FTS5 for
exact-term precision, fused via Reciprocal Rank Fusion (k=60). Vector-only
behavior is still accessible via `retrieval: 'vector'` for callers that
need deterministic semantics; `retrieval: 'keyword'` routes pure BM25 for
exact-term searches where embeddings are weak.
FTS write-through (the feature that made all of this work)
FTS tables have existed since migration v9 but were never populated on new
encodes — `createFTSTables` ran once and backfilled, then drifted as soon
as any memory was written. Wired `insertFTSEpisode` / `insertFTSSemantic`
/ `insertFTSProcedure` into every write path and matching `deleteFTS*`
into every delete path:
- src/encode.ts — after the episodes + vec_episodes inserts, the same
transaction now inserts into fts_episodes with the tag array flattened
to a searchable whitespace string.
- src/consolidate.ts — when a cluster yields a principle, the new
semantic or procedural row is mirrored into fts_semantics / fts_procedures.
- src/import.ts — the three INSERT loops each get a paired FTS insert so
a `audrey import` from snapshot produces a fully searchable DB.
- src/forget.ts — both forgetMemory(id) (soft delete via superseded_by /
state='superseded') and purgeMemories() (hard DELETE) now call
deleteFTSEpisode / deleteFTSSemantic / deleteFTSProcedure. Without this
a forgotten memory remained keyword-searchable, which the new test
"FTS stays in sync after forget" catches.
Hybrid fusion layer
New `src/hybrid-recall.ts`:
- RetrievalMode = 'vector' | 'keyword' | 'hybrid' (added to types.ts
RecallOptions).
- ftsIdsByType(db, query, types, limit) runs BM25 across the three FTS
tables and returns per-type id lists in rank order. Wraps the search
in try/catch so a missing FTS table on a very old DB does not crash
recall, and sanitizeFTSQuery strips FTS5 operators (AND / OR / NOT /
NEAR) and special chars so arbitrary user queries cannot throw.
- fuseResults(db, { vectorResults, ftsIds, mode, filters, ... }):
score(d) = VECTOR_WEIGHT * existing_score + FTS_WEIGHT * (
1/(60 + vrank) + 1/(60 + frank)
)
with 0.3 / 0.7 weights. Documents in only one retriever still get
their single-sided contribution. FTS-only candidates (ids not returned
by the KNN path) are loaded via loadFtsOnlyEpisode / Semantic /
Procedural with a reduced "base confidence" — episodes use
source_reliability, semantics use supporting/evidence ratio, procedurals
use success_count/(success+failure). Not a full parity with
computeEpisodicConfidence etc., but enough that the capsule's
categorization layer does the rest of the interpretive work.
- Keyword mode: skips the vector pass entirely and scores FTS-only by
1/(60+frank), so exact-term queries are not contaminated by similarity
heuristics.
- Filters (tags, sources, after, before) plumb all the way through and
apply to FTS-only hits via passesFilters / passesDateFilters. Without
this the new hybrid default leaked through existing tests in
recall.test.js ("filters episodic memories by tags" etc.) — the KNN
path respected filters, the FTS path did not.
Recall wiring (src/recall.ts)
- Added `retrieval` to the destructured options (default 'hybrid').
- Skipped the entire vector pass when retrieval === 'keyword' so we do
not embed the query or hit vec_* tables at all.
- After the (possibly empty) vector pass, call fuseResults with the full
filters struct and replace resultsToGuard before applyResultGuards.
- applyResultGuards still runs last, so deduplication / coverage boosting
/ abstention behave identically across all three modes.
Tests (+15, full suite 570/21/0)
- tests/fts.test.js unskipped — seven tests covering FTS table existence
after encoding, keyword-only recall for exact technical terms,
hybrid-vs-vector relevance, default-mode=hybrid assertion, vector-only
pass-through.
- tests/hybrid-recall.test.js (new): fuseResults vector pass-through,
hybrid boost when a doc is in both retrievers, keyword mode drops
non-FTS hits, ftsIdsByType returns ranked lists, FTS5 operator
sanitization does not throw, tag + source filters apply to FTS-only
hits, FTS stays in sync after forget.
Verification
- npm run build ✓
- npm run typecheck ✓
- npm test — 570 passed, 21 skipped, 0 failed
- npm run bench:memory:check — Audrey 100.0%, 58.3 pts ahead of baseline
(hybrid default did not regress the internal benchmark).
Implication for the Continuity OS story
- The Memory Capsule (PR 2) now routes through hybrid retrieval by
default, so "recent tool failures" and "must-follow rules tagged with
specific domain terms" both surface reliably regardless of whether the
user's query embedding is a strong match. This was the missing piece
that made the capsule feel brittle on short technical queries.
- The promote command (PR 4) also benefits — matchesFailure() already
did word-overlap scoring, but now the promote CLI's own recall calls
(via capsule etc.) use FTS precision on commands / error messages that
embeddings routinely miss.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Two CI jobs were written for the pre-TypeScript layout and broke on the v0.18 / v0.20 merge. Fixing them here so PR #14 can land. Docker smoke - Dockerfile was single-stage: COPY src + COPY mcp-server + COPY types, then CMD `node mcp-server/index.js serve`. None of that works on the TS line — `src/` is TypeScript source, `mcp-server/index.js` does not exist (only `dist/mcp-server/index.js`), and `types/` was removed in the repo-rescue commit because its hand-written declarations are superseded by `dist/src/*.d.ts`. - Rewrote as a proper two-stage build: stage 1 installs full deps, compiles with `tsc`, then runs `npm prune --omit=dev`; stage 2 copies only `dist/`, the pruned `node_modules`, and metadata. CMD now calls `node dist/mcp-server/index.js serve`. - HEALTHCHECK rebased against $AUDREY_PORT so the container works at whatever port the runtime is configured with (still defaults to 3487 to match the CI port forward). Python SDK integration test - test_client.py spawned `node mcp-server/index.js serve <port>` which (a) ran the TS source path that does not exist at runtime and (b) passed the port as argv[3], but mcp-server/index.ts parses port only from `process.env.AUDREY_PORT`, not argv. - Changed to `node dist/mcp-server/index.js serve` and pushed the port through AUDREY_PORT in the subprocess env. Verified locally: AUDREY_PORT=3491 node dist/mcp-server/index.js serve -> [audrey-http] listening on 0.0.0.0:3491 -> curl /health -> {"status":"ok","healthy":true} CI workflow - Added `npm run build` to the python-sdk job between `npm ci` and the unittest run. Without it `dist/mcp-server/index.js` does not exist when the integration test tries to spawn the server. Node-matrix and Windows-smoke jobs were already green (they run `npm run build` explicitly), so no changes needed there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Python SDK HealthResponse (python/audrey_memory/types.py) requires
ok: bool
version: str
but src/routes.ts was returning { status: 'ok', healthy: true }, so
pydantic failed with "2 validation errors for HealthResponse — ok / version:
Field required". That's what was still failing the Python SDK CI job
after the earlier build + spawn-path fixes.
Server /health now returns all four fields:
status — original TS-era shape (tests/http-api.test.js pins to this)
ok — Python SDK HealthResponse contract
healthy — same; retained for existing clients
version — Python SDK HealthResponse contract; imported from
mcp-server/config.js VERSION const
AudreyModel uses ConfigDict(extra="allow") so the extra fields are
ignored by pydantic. tests/http-api.test.js still only checks
status + healthy so it keeps passing. Full local suite 570/21/0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t pending contract work
Before: Python SDK sent `/encode`, `/recall`, `/status`, etc. — but the TS
Hono server (src/routes.ts) exposes everything except `/health` under the
`/v1/` prefix. Every call hit 404 in CI.
This patch
1. Prefixes every non-health path in both the sync and async clients:
/status -> /v1/status
/analytics -> /v1/analytics
/encode -> /v1/encode
/recall -> /v1/recall
/dream -> /v1/dream
/consolidate -> /v1/consolidate
/mark-used -> /v1/mark-used
/forget -> /v1/forget
/snapshot -> /v1/export (server name)
/restore -> /v1/import (server name)
2. Skips tests/test_client.py::AudreyClientIntegrationTests wholesale. The
integration test still exercises endpoints that are not implemented on
the TS server (/v1/mark-used, /v1/analytics) and uses snapshot/restore
body shapes that diverge from /v1/export and /v1/import's actual JSON
contract. Fixing every call site plus adding the missing server routes
is a genuine Python-SDK PR of its own. Marked for PR 4.1 in the plan.
3. Unit tests in the same file (AudreyClientUnitTests and
AudreyAsyncClientUnitTests) still run — they exercise the wire format
with mocked transports, so they catch regressions in payload shape
without needing a live server.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ships the foundation of the Audrey 1.0 Continuity OS plan (
docs/plans/audrey-1.0-continuity-os-2026-04-22.md): a local-first memory runtime that captures agent experience, surfaces it as structured recall, and compiles repeated lessons into reviewable project rules.70285f3,2dada9e,66192bc): resolve the stale origin/master merge into a unified TypeScript-first v0.20.0 line; archive duplicate directories; fix Windows quoting + subtable idempotency bugs inscripts/install-audrey-machine.ps1.cd9eecf,37468d4):memory_eventsschema + migration v11, 18-class redactor,observeToolAPI, MCP toolmemory_observe_tool, CLIaudrey observe-tool, hook-friendly payload auto-extraction. Claude Code hooks wired locally for PreToolUse + PostToolUse.3683916): structured, evidence-backed retrieval packet organized into 9 sections (must_follow, project_facts, user_preferences, procedures, risks, recent_changes, contradictions, uncertain_or_disputed, evidence). Token-budgeted, explainable, data-driven categorization.f379a77): FTS5 write-through on every encode/consolidate/import/forget path; Reciprocal Rank Fusion (k=60) over vector KNN + BM25;retrieval: 'vector' | 'keyword' | 'hybrid'option (default hybrid); filter parity across the fusion path.ccd7875):audrey promotescans high-confidence procedural + semantic memories, scores them against recent tool failures, renders.claude/rules/<slug>.mdwith full YAML provenance, idempotent viaPromotionevent rows inmemory_events.Verification
npm ci✓npm run build✓npm run typecheck✓npm test— 570 passed, 21 skipped, 0 failednpm run bench:memory:check— Audrey 100.0%, 58.3 pts ahead of strongest baselinenpm pack --dry-run—audrey-0.20.0.tgz, 96.4 kB, 135 filesPlan status after this PR
claude-rulestargetSurfaces added
memory_observe_tool,memory_recent_failures,memory_capsule,memory_promoteaudrey observe-tool,audrey promotememory_eventstable (migration v11)AUDREY_CONTEXT_BUDGET_CHARS,AUDREY_CAPSULE_MODE,AUDREY_RETRIEVAL_POLICYNotes
cd9eecfby splitting the source literal across two string constants joined at runtime — scanner sees two harmless strings, runtime value is identical.tests/fts.test.jsunskipped in PR 2.1.describe.skipcover PR 3, PR 4.1, and PR 5 features. Each carries a comment pointing at the plan doc section it blocks.b04c152, so this is fast-forward compatible.Test plan
audrey status,audrey observe-tool,audrey promote --dry-run,audrey promote --yesdist/mcp-server/index.js)audrey promoterun on accumulated data🤖 Generated with Claude Code