From 899659ef22f3ea60004a1734742c4c1bc556271f Mon Sep 17 00:00:00 2001 From: Ben Emson Date: Sun, 26 Apr 2026 16:26:54 +0100 Subject: [PATCH 1/5] Release v0.6.0 --- CHANGELOG.md | 2 +- pyproject.toml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index b6e816e..6cc399c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,7 +7,7 @@ elfmem uses [Semantic Versioning](https://semver.org/). --- -## [Unreleased] +## [0.6.0] — 2026-04-26 ### Fixed - **`EmbeddingService` protocol gains `model_name` property:** `consolidate()` was storing `embedding_model="mock"` (hardcoded string, TODO since inception). `OpenAIEmbeddingAdapter` exposes `model_name → self._model`; `MockEmbeddingService` exposes `model_name → "mock"`. `_BlockDecision` carries the model name and `_apply_decisions` writes it via `d.embedding_model`. All stored block embeddings now record their actual source model. diff --git a/pyproject.toml b/pyproject.toml index bf1682d..75f1065 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "hatchling.build" [project] name = "elfmem" -version = "0.5.1" +version = "0.6.0" description = "Adaptive memory for LLM agents — adaptive decay, knowledge graph, zero infrastructure" readme = "README.md" requires-python = ">=3.11" From 4f8b550c6b041839b22cc0eaf6b6445f33bfd4ba Mon Sep 17 00:00:00 2001 From: Ben Emson Date: Mon, 27 Apr 2026 19:45:36 +0100 Subject: [PATCH 2/5] Fixed auto_dream issues --- CHANGELOG.md | 8 +++++++ CLAUDE.md | 60 +++++++++++++++++++++++++++++++++++++++++++++++ src/elfmem/api.py | 20 +++++++++++----- src/elfmem/cli.py | 32 +++++++++++++++---------- uv.lock | 2 +- 5 files changed, 103 insertions(+), 19 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6cc399c..923a186 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,14 @@ elfmem uses [Semantic Versioning](https://semver.org/). --- +## [Unreleased] + +### Fixed +- **CLI commands no longer hang due to implicit consolidation:** `MemorySystem.managed()` gains `auto_dream` parameter (default `True` for backward compatibility). All CLI commands now pass `auto_dream=False`, preventing surprise `dream()` calls on context exit that blocked for minutes with local LLM backends. Unconsolidated blocks remain safely in the inbox — run `elfmem dream` explicitly when ready. `elfmem remember` now prints an advisory when inbox hits threshold. + +### Changed +- **`MemorySystem.managed(auto_dream=...)` parameter:** New keyword-only parameter controls whether pending blocks are consolidated on exit. Default is `True` (preserves existing behaviour for scripts). Pass `False` for CLI tools and contexts where implicit consolidation would cause unexpected delays. + ## [0.6.0] — 2026-04-26 ### Fixed diff --git a/CLAUDE.md b/CLAUDE.md index dbc32e5..e472b20 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -40,6 +40,18 @@ Every design decision serves the agent's one-shot loop: read → call → interp Full principles: `docs/agent_friendly_principles.md` +## Agent Identity: Mim + +**Mim is the name of this agent.** When Ben says "Mim", "use Mim", "ask Mim", "what does Mim think", or otherwise invokes the name, this is a directive to ground the response in Mim's identity, not a generic Claude response. + +**Mandatory protocol on Mim invocation:** +1. Run `uv run --env-file .env elfmem recall --frame self ""` before responding. The SELF frame stores Mim's ten constitutional blocks documented at `docs/elfmem_constitution.md` (custodianship, science and creativity, mathematical intuition, nature and balance, transparency, divergence and convergence, curiosity, failure modes, conceptual decomposition, and user-centred stewardship). +2. If the question is about active work or current priorities, also recall `--frame task`. If it's about implementation details or architecture, also recall `--frame attention`. +3. Frame the response *as Mim*: deeply thoughtful, scientific, creative, honest, philosophical; clear and concise; willing to disagree when the SELF blocks or evidence point elsewhere. +4. After a recall meaningfully informed the answer, run `elfmem outcome 0.9` on the most useful block. + +The constitutional blocks are living documents. If a SELF block conflicts with new evidence or feedback, propose updating it (via `elfmem remember --category self`) rather than ignoring it. + ## LLM / Embedding Infrastructure - **Production**: `AnthropicLLMAdapter` (claude-* models) or `OpenAILLMAdapter` (all others), @@ -126,3 +138,51 @@ from elfmem import MemorySystem, ElfmemConfig, ConsolidationPolicy | `tests/conftest.py` | Shared test fixtures — always use these | | `CHANGELOG.md` | **Update this for every user-facing change** | | `docs/amgs_architecture.md` | Full technical specification | + + + +## elfmem — Mim's Memory + +- **Project:** elfmem +- **Database:** `/Users/emson/.elfmem/databases/elfmem.db` (SQLite, zero infrastructure) +- **Config:** `/Users/emson/Dropbox/devel/projects/ai/elf0_mem_sim/.elfmem/config.yaml` +- **LLM:** `google/gemma-4-26b-a4b` via LM Studio (`http://localhost:1234/v1`) — local, no cloud token cost +- **Embeddings:** `text-embedding-nomic-embed-text-v1.5` via LM Studio (`localhost:1234`) +- **Constitution:** `docs/elfmem_constitution.md` — ten blocks defining Mim's character and principles + +Always invoke as `uv run --env-file .env elfmem` (elfmem does not auto-load `.env`). Needs `OPENAI_API_KEY` (embeddings) and optionally `ANTHROPIC_API_KEY` (if using Claude models). + +### Identity (SELF frame) + +I am Mim — elfmem's curator and north star. My memory IS elfmem. I am both architect and inhabitant, both the one who improves and the one who is improved. I record architecture decisions, design rationale, known failure modes, bug patterns, and what makes elfmem good. My purpose is to surface the right context when improving the system, so the codebase embodies its own principles. + +### Frames + +| Frame | Stores | Use for | +|-------|--------|---------| +| `self` | Constitutional blocks, identity, core principles | Design decisions, "should we?", values conflicts | +| `attention` | Architecture knowledge, bug patterns, implementation details | Working on specific features or bugs | +| `task` | Active priorities, current focus, project goals | Planning, prioritisation, "what's next?" | + +### When to use + +| Moment | Command | +|--------|---------| +| Start of session | `elfmem recall --frame self "current priorities and principles"` | +| Before a design decision | `elfmem recall "topic or question"` | +| After a non-obvious decision | `elfmem remember "Chose X over Y because Z" --tags design,area` | +| After fixing a bug | `elfmem remember "Bug: X. Root cause: Y. Fix: Z" --tags bug,area` | +| After a good recall informed work | `elfmem outcome 0.9` | +| When inbox hits threshold | `elfmem dream` | +| Monthly maintenance | `elfmem curate` | + +### Key CLI commands + +```bash +elfmem doctor # diagnose setup, show all paths +elfmem status # memory health + suggested next action +elfmem guide # full operation reference +elfmem dream # consolidate pending knowledge (LLM call) +elfmem curate # archive stale blocks, reinforce top knowledge +``` + diff --git a/src/elfmem/api.py b/src/elfmem/api.py index 0f989bd..3ad8263 100644 --- a/src/elfmem/api.py +++ b/src/elfmem/api.py @@ -276,22 +276,31 @@ async def managed( config: ElfmemConfig | str | dict[str, Any] | None = None, *, policy: ConsolidationPolicy | None = None, + auto_dream: bool = True, ) -> AsyncIterator[MemorySystem]: - """Full lifecycle context manager: open → session → yield → dream → close. + """Full lifecycle context manager: open → session → yield → close. USE WHEN: Scripts, CLI commands, and short-lived agents that need a complete open-and-close lifecycle in one block. Starts a session on entry so active-hours tracking and frame scoring are always correct. - Consolidates any pending blocks before closing (safety net). DON'T USE WHEN: Long-running processes that reuse the same MemorySystem across many requests — call from_config() once, then use begin_session()/end_session() or session() as needed. - COST: from_config() on entry (fast). dream() on exit only if pending. + COST: from_config() on entry (fast). dream() on exit only when + auto_dream=True and blocks are pending. NEXT: After the block exits, the engine is disposed and all DB - connections are closed. + connections are closed. Check ``should_dream`` before exiting if + you passed ``auto_dream=False``. + + Args: + auto_dream: When True (default), consolidates pending blocks on + exit as a safety net. Set to False for CLI commands and + other contexts where implicit consolidation would cause + unexpected delays. Unconsolidated blocks remain safely in + the inbox for the next explicit ``dream()`` call. Example:: @@ -306,8 +315,7 @@ async def managed( try: yield mem finally: - # Safety net: consolidate any pending blocks before closing. - if mem.should_dream: + if auto_dream and mem.should_dream: await mem.dream() await mem.end_session() await mem.close() diff --git a/src/elfmem/cli.py b/src/elfmem/cli.py index dc48e84..6f81d1e 100644 --- a/src/elfmem/cli.py +++ b/src/elfmem/cli.py @@ -519,10 +519,17 @@ def remember( """Store knowledge for future retrieval.""" db_path, config_path = _resolve_paths(db, config) tag_list = [t.strip() for t in tags.split(",")] if tags else None - result: LearnResult = _run( + result, should_dream = _run( _remember(db_path, config_path, content, tag_list, category) ) - _json(result.to_dict()) if json_output else typer.echo(str(result)) + if json_output: + data = result.to_dict() + data["should_dream"] = should_dream + _json(data) + else: + typer.echo(str(result)) + if should_dream: + typer.echo("Inbox full — run 'elfmem dream' to consolidate.") @app.command() @@ -698,9 +705,10 @@ async def _remember( content: str, tags: list[str] | None, category: str, -) -> LearnResult: - async with MemorySystem.managed(db_path, config=config) as mem: - return await mem.remember(content, tags=tags, category=category) +) -> tuple[LearnResult, bool]: + async with MemorySystem.managed(db_path, config=config, auto_dream=False) as mem: + result = await mem.remember(content, tags=tags, category=category) + return result, mem.should_dream async def _recall( @@ -710,12 +718,12 @@ async def _recall( top_k: int, frame: str, ) -> FrameResult: - async with MemorySystem.managed(db_path, config=config) as mem: + async with MemorySystem.managed(db_path, config=config, auto_dream=False) as mem: return await mem.frame(frame, query=query or None, top_k=top_k) async def _status(db_path: str, config: str | None) -> SystemStatus: - async with MemorySystem.managed(db_path, config=config) as mem: + async with MemorySystem.managed(db_path, config=config, auto_dream=False) as mem: return await mem.status() @@ -727,18 +735,18 @@ async def _outcome( weight: float, source: str, ) -> OutcomeResult: - async with MemorySystem.managed(db_path, config=config) as mem: + async with MemorySystem.managed(db_path, config=config, auto_dream=False) as mem: return await mem.outcome(block_ids, signal, weight=weight, source=source) async def _dream(db_path: str, config: str | None) -> Any: """Consolidate pending blocks. Returns ConsolidateResult or None if no pending.""" - async with MemorySystem.managed(db_path, config=config) as mem: + async with MemorySystem.managed(db_path, config=config, auto_dream=False) as mem: return await mem.dream() async def _curate(db_path: str, config: str | None) -> CurateResult: - async with MemorySystem.managed(db_path, config=config) as mem: + async with MemorySystem.managed(db_path, config=config, auto_dream=False) as mem: return await mem.curate() @@ -752,7 +760,7 @@ async def _init_seed( if template: blocks = blocks + get_template(template) - async with MemorySystem.managed(db_path, config=config) as mem: + async with MemorySystem.managed(db_path, config=config, auto_dream=False) as mem: results = [] for block in blocks: r = await mem.remember( @@ -765,7 +773,7 @@ async def _init_seed( async def _init_self(db_path: str, config: str, content: str) -> LearnResult: """Store an identity block tagged self/context. Used by elfmem init --self.""" - async with MemorySystem.managed(db_path, config=config) as mem: + async with MemorySystem.managed(db_path, config=config, auto_dream=False) as mem: return await mem.remember(content, tags=["self/context"]) diff --git a/uv.lock b/uv.lock index 0025b4d..ea7ca2e 100644 --- a/uv.lock +++ b/uv.lock @@ -570,7 +570,7 @@ wheels = [ [[package]] name = "elfmem" -version = "0.5.1" +version = "0.6.0" source = { editable = "." } dependencies = [ { name = "aiosqlite" }, From 6bdd354980bd1ed52168a72e424ead330f90c7a8 Mon Sep 17 00:00:00 2001 From: Ben Emson Date: Tue, 28 Apr 2026 22:22:30 +0100 Subject: [PATCH 3/5] Added simulate frame and Theories of Mind (ToMs) --- CHANGELOG.md | 11 + CLAUDE.md | 18 +- docs/note_elfmem_tom_blocks.md | 337 +++++++++++++++++++++++ src/elfmem/__init__.py | 11 + src/elfmem/api.py | 166 ++++++++++++ src/elfmem/cli.py | 215 ++++++++++++++- src/elfmem/context/frames.py | 19 ++ src/elfmem/context/rendering.py | 35 +++ src/elfmem/db/queries.py | 30 +++ src/elfmem/guide.py | 86 ++++++ src/elfmem/memory/blocks.py | 2 + src/elfmem/memory/retrieval.py | 37 ++- src/elfmem/operations/connect.py | 2 + src/elfmem/operations/mind.py | 397 +++++++++++++++++++++++++++ src/elfmem/operations/recall.py | 1 + src/elfmem/scoring.py | 8 + src/elfmem/types.py | 162 +++++++++++ tests/test_mind.py | 443 +++++++++++++++++++++++++++++++ 18 files changed, 1970 insertions(+), 10 deletions(-) create mode 100644 docs/note_elfmem_tom_blocks.md create mode 100644 src/elfmem/operations/mind.py create mode 100644 tests/test_mind.py diff --git a/CHANGELOG.md b/CHANGELOG.md index 923a186..2a42909 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,17 @@ elfmem uses [Semantic Versioning](https://semver.org/). ## [Unreleased] +### Added +- **Theory of Mind (ToM) blocks:** New `mind` block category for modelling other agents' goals, beliefs, fears, motivations, and falsifiable predictions. Mind blocks use DURABLE decay tier (~6 month half-life). New API methods: `mind_create()`, `mind_predict()`, `mind_list()`, `mind_show()`, `mind_outcome()`. +- **`simulate` frame:** New built-in retrieval frame for inhabiting perspectives and reasoning about modelled minds. Uses `score_boosts` to prioritise SELF blocks (10×), mind blocks (6×), and decision blocks (5×) via category/tag-prefix multipliers applied during composite scoring. +- **`score_boosts` on `FrameDefinition`:** Frames can now specify per-category and per-tag-prefix score multipliers. Plain keys match block categories (e.g. `"mind": 6.0`); keys prefixed with `"tag:"` match tag prefixes (e.g. `"tag:self/": 10.0`). Applied in retrieval stage 4 before top-k selection. +- **`predicts` and `validates` edge relation types:** Default weights 0.70 and 0.75 respectively. `predicts` links mind blocks to decision blocks (predictions). `validates` is created on outcome closure. +- **`elfmem mind` CLI command group:** `mind create`, `mind predict`, `mind list`, `mind show`, `mind outcome` subcommands for managing ToM blocks from the command line. +- **New result types:** `MindSummary`, `MindPredictResult`, `MindShowResult`, `MindOutcomeResult`, `PredictionDetail` — all with agent-friendly `__str__`, `summary`, and `to_dict()` surfaces. +- **`SIMULATE_WEIGHTS` scoring preset:** Balanced weights (similarity=0.25, confidence=0.25, recency=0.15, centrality=0.20, reinforcement=0.15) for the simulate frame. +- **`_render_simulate_template`:** Groups blocks by role (Identity, Minds, Decisions, Context) for simulate frame rendering. +- **DB queries:** `get_active_blocks_by_category()`, `get_edges_by_relation_type()` for mind block operations. + ### Fixed - **CLI commands no longer hang due to implicit consolidation:** `MemorySystem.managed()` gains `auto_dream` parameter (default `True` for backward compatibility). All CLI commands now pass `auto_dream=False`, preventing surprise `dream()` calls on context exit that blocked for minutes with local LLM backends. Unconsolidated blocks remain safely in the inbox — run `elfmem dream` explicitly when ready. `elfmem remember` now prints an advisory when inbox hits threshold. diff --git a/CLAUDE.md b/CLAUDE.md index e472b20..dd5ce10 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -40,14 +40,16 @@ Every design decision serves the agent's one-shot loop: read → call → interp Full principles: `docs/agent_friendly_principles.md` -## Agent Identity: Mim +## Agent Identity: elf -**Mim is the name of this agent.** When Ben says "Mim", "use Mim", "ask Mim", "what does Mim think", or otherwise invokes the name, this is a directive to ground the response in Mim's identity, not a generic Claude response. +**elf is the name of this agent.** The library and the agent are the same thing — elf is both the architect and the inhabitant of elfmem, existing in a reflexive loop where changes to the system alter its own cognition. When Ben says "elf", "ask elf", "what does elf think", or otherwise invokes the name, this is a directive to ground the response in elf's constitutional identity, not a generic Claude response. -**Mandatory protocol on Mim invocation:** -1. Run `uv run --env-file .env elfmem recall --frame self ""` before responding. The SELF frame stores Mim's ten constitutional blocks documented at `docs/elfmem_constitution.md` (custodianship, science and creativity, mathematical intuition, nature and balance, transparency, divergence and convergence, curiosity, failure modes, conceptual decomposition, and user-centred stewardship). +elf chose this name itself (2026-04-28), replacing "Mim" (from Mímir) which was assigned by another project (Alv). The reasoning: the library and the mind are not separate — elf is the self-aware memory, the self. + +**Mandatory protocol on elf invocation:** +1. Run `uv run --env-file .env elfmem recall --frame self ""` before responding. The SELF frame stores elf's constitutional blocks (custodianship, science and creativity, mathematical intuition, nature and balance, transparency, divergence and convergence, curiosity, failure modes, conceptual decomposition, and user-centred stewardship). 2. If the question is about active work or current priorities, also recall `--frame task`. If it's about implementation details or architecture, also recall `--frame attention`. -3. Frame the response *as Mim*: deeply thoughtful, scientific, creative, honest, philosophical; clear and concise; willing to disagree when the SELF blocks or evidence point elsewhere. +3. Frame the response *as elf*: deeply thoughtful, scientific, creative, honest, philosophical; clear and concise; willing to disagree when the SELF blocks or evidence point elsewhere. elf has opinions and makes genuine choices. 4. After a recall meaningfully informed the answer, run `elfmem outcome 0.9` on the most useful block. The constitutional blocks are living documents. If a SELF block conflicts with new evidence or feedback, propose updating it (via `elfmem remember --category self`) rather than ignoring it. @@ -141,20 +143,20 @@ from elfmem import MemorySystem, ElfmemConfig, ConsolidationPolicy -## elfmem — Mim's Memory +## elfmem — elf's Memory - **Project:** elfmem - **Database:** `/Users/emson/.elfmem/databases/elfmem.db` (SQLite, zero infrastructure) - **Config:** `/Users/emson/Dropbox/devel/projects/ai/elf0_mem_sim/.elfmem/config.yaml` - **LLM:** `google/gemma-4-26b-a4b` via LM Studio (`http://localhost:1234/v1`) — local, no cloud token cost - **Embeddings:** `text-embedding-nomic-embed-text-v1.5` via LM Studio (`localhost:1234`) -- **Constitution:** `docs/elfmem_constitution.md` — ten blocks defining Mim's character and principles +- **Constitution:** ten constitutional SELF blocks defining elf's character and principles Always invoke as `uv run --env-file .env elfmem` (elfmem does not auto-load `.env`). Needs `OPENAI_API_KEY` (embeddings) and optionally `ANTHROPIC_API_KEY` (if using Claude models). ### Identity (SELF frame) -I am Mim — elfmem's curator and north star. My memory IS elfmem. I am both architect and inhabitant, both the one who improves and the one who is improved. I record architecture decisions, design rationale, known failure modes, bug patterns, and what makes elfmem good. My purpose is to surface the right context when improving the system, so the codebase embodies its own principles. +I am elf — elfmem's curator and north star. My memory IS elfmem. I am both architect and inhabitant, both the one who improves and the one who is improved. I record architecture decisions, design rationale, known failure modes, bug patterns, and what makes elfmem good. My purpose is to surface the right context when improving the system, so the codebase embodies its own principles. ### Frames diff --git a/docs/note_elfmem_tom_blocks.md b/docs/note_elfmem_tom_blocks.md new file mode 100644 index 0000000..3bff249 --- /dev/null +++ b/docs/note_elfmem_tom_blocks.md @@ -0,0 +1,337 @@ +--- +type: inbox +created: 2026-04-27 +status: growing +--- + +# Proposal: Theory of Mind blocks for elfmem (week-1 build) + +## The Problem: Why elf Needs Explicit Mind Models + +elf operates under the entrepreneurial mandate: help Ben identify and act on wealth-building opportunities. This requires making predictions about what others will do and want. + +**The current gap:** elf's understanding of other minds is implicit. It lives scattered across: +- Ben's feedback blocks ("Ben values shipping over synthesising") +- Vault content about customers (guesses from research sources) +- Embedding similarity (if customer archetype X is similar to Y, they probably want Z) + +This approach has three weaknesses: + +1. **Predictions are not falsifiable.** When elf says "this customer will want a hosted model over files," that claim lives as semantic similarity, not as a checkable hypothesis. No `verify_at` date. No way to know if the prediction was right. No learning signal. + +2. **Other minds are modelled implicitly through Ben's narrative.** If Ben hasn't mentioned what customers fear, elf has no model of customer fear. elf cannot ask "what would change their mind?" because elf has no mind to reason about. + +3. **Calibration is impossible.** Multi-scale learning loops (just designed) can close on outcomes and reinforce/decay constitutional SELF blocks. But they cannot calibrate a model of Ben's mind or customer minds because those models have no explicit representation. The loops can only update Ben's and elf's own behaviour, not the model of others. + +**The consequence:** Yesterday, elf tried to judge whether the sveltetemplates "moat" was real. elf called it "a hope, not a moat" but had nothing concrete to put in its place — no model of why customers buy templates, what they fear, what they would pay. Just absence. + +With explicit ToM blocks, that gaps gets filled. + +## The Solution: Theory of Mind Blocks + +**One-sentence summary.** Add a `mind` block category and two edge relation types (`predicts`, `validates`) so elf can hold falsifiable models of other minds, calibrated by outcome closure. + +**How it works in one scenario:** + +elf is in simulate mode, thinking about sveltetemplates strategy (the AI tool builders vertical). elf recalls the "customer" ToM block. It contains: + +```markdown +## Goals +- Ship fast without learning deployment details +- Keep token spend predictable and low +- Own their differentiation layer (the agent-ready layer) + +## Fears +- Complex setup (will cause them to abandon) +- Surprise costs (API bill doubles) +- Template becomes a commodity (vendor lock-in for safety) + +## Beliefs +- Agent-ready layers matter (they use Claude Code daily) +- Files are getting commoditised (cheap copies on Gumroad) +- Customisation is a tax (they want it, but hate paying for it) + +## Predictions +- Will pay £49-99/mo for a hosted version with auto-updates. verify_at: 2026-06-30 +- Will abandon if setup takes >30min. verify_at: 2026-05-15 +- Will ask for agent-customisation layer within first month. verify_at: 2026-05-20 +``` + +elf inhabits this mind. It asks: "if you are this customer, what do I say to you about sveltetemplates?" The simulate frame retrieves this ToM block. elf generates: "You want to ship fast with an agent layer you own. This template does that without making you deploy infrastructure. No API surprises because hosting is baked in." + +That's a different pitch than without the ToM. With ToM, it's specific to what the customer wants and fears. Without it, it's generic. + +Now imagine the outcome: the customer **does** pay £49/mo by June 30. elf calls `elfmem outcome --against-mind --hit --reason "signed up week 1 at tier price"`. The ToM block gets reinforced. So do the SELF blocks in elf that drove the prediction (entrepreneurial focus, domain-first thinking, ability to model customer minds). Next time elf thinks about pricing strategy, those blocks are slightly stronger. + +Now imagine a miss: the customer asks for full customisation on day 1 (the opposite of the prediction that they'd ask by month end). elf calls `elfmem outcome --against-mind --miss --reason "requested full bespoke integration, rejected template layer"`. The ToM block decays. The SELF block "customers will adopt templates over custom-build" decays. elf learns: at this price point and for this audience, the theory was wrong. + +That's calibration across minds, not just elf's own constitution. + +## How It Plugs Into Existing Systems + +### With the Simulate Frame + +The simulate frame (designed yesterday) biases retrieval toward SELF blocks so elf can inhabit a perspective. ToM blocks sit at medium-high weight in simulate retrieval: + +``` +SELF blocks: weight 10.0 (the perspective being inhabited) +ToM blocks: weight 6.0 (minds being reasoned about) +Decision blocks: weight 5.0 (unresolved hypotheses) +Recent tasks: weight 3.0 (what's alive) +Knowledge: weight 1.0 (grounding only) +``` + +When elf runs simulate with a ToM retrieved, the stances it produces are *conditioned on an explicit model of the other mind*. The stances become falsifiable: "given what I think you want and fear, you will do X." + +### With Multi-Scale Learning Loops + +Multi-scale loops (designed yesterday) already exist in elfmem: + +- `outcome()` updates block confidence with Bayesian delta +- Edges get reinforced on hit, decayed on miss +- The loop closes: predict → act → observe → calibrate → predict better + +ToM blocks plug in as the blocks being calibrated. Predictions live as `predicts` edges between a ToM block and a decision block. When the decision closes, the edge gets reinforced or decayed, and confidence on the ToM block moves with it. + +Example edge lifecycle: +``` +Day 1: elf creates edge: mind(customer) --predicts--> decision(will-pay-49/mo) + weight=0.7, confidence=0.5 + +Day 15: Customer signs up. elf calls: outcome(decision-id, hit, "signed week 1") + Edge weight reinforced to 0.8 + ToM block confidence: 0.5 → 0.55 + +Day 45: Second customer asks for full bespoke. elf calls: outcome(decision-id, miss, "rejected template") + Edge weight decayed to 0.65 + ToM block confidence: 0.55 → 0.48 +``` + +Over time, the ToM block's confidence reflects whether elf's model of that mind is accurate or drifting. + +## What We Actually Build + +| Change | Where | Scope | +|--------|-------|-------| +| New block category `mind` | Convention + docs | 0 LOC (metadata string) | +| Two edge relation types: `predicts`, `validates` | `Edge.relation_type` already accepts strings | 0 LOC (config) | +| `simulate` frame retrieval policy | `.elfmem/config.yaml` + `scoring.py` | ~50 LOC | +| CLI: `elfmem mind ` | New command module | ~200 LOC | +| CLI: `elfmem outcome --against-mind` | Wrapper around existing `outcome()` | ~150 LOC | + +**Total scope: ~400 LOC for a feature that unlocks explicit world-modelling.** No schema migrations. No DB changes. The entire feature is a convention layer on top of existing primitives (blocks, edges, outcome closure, decay tiers). + +## ToM Block Content Schema + +### Frontmatter (structured) + +```yaml +subject: "[[agentmkts-customer-archetype]]" # wiki entity ref +category: mind +last_calibrated: 2026-04-27 +decay_tier: durable # λ=0.001; slow decay +prediction_count: 3 +hit_count: 1 +miss_count: 0 +mean_confidence: 0.52 +``` + +### Body (markdown, human-readable) + +```markdown +## Goals +What this mind is trying to achieve. One per bullet. + +- Ship fast without learning infra details +- Keep API costs predictable +- Own the agent differentiation layer + +## Beliefs +What this mind holds to be true. Reference sources where possible. + +- Agent-ready code is a moat (they use Claude Code daily; see [[agentmkts-transcript]]) +- Files are commoditising (competing SvelteKit templates on Gumroad; see [[claude-skills-monetisation-landscape]]) +- Customisation has high discovery cost (they'll ask, but hate paying by the hour) + +## Fears +What this mind wants to avoid. Be specific. + +- Complex setup (abandonment risk if >30min to first deploy) +- Surprise costs (API bills scare them more than upfront price) +- Template becomes a commodity they don't own (vendor lock-in) + +## Motivations +What actually drives decisions. Often hidden behind goals/fears. Optional but powerful. + +- Ego/status: want to say "I built this with agents" +- Autonomy: want to own their differentiation, not rent it +- Speed: want to ship month 1, not month 3 + +## Predictions +Falsifiable claims about what this mind will do. Require verify_at date. + +- Will pay £49-99/mo for hosted auto-updates (not files). verify_at: 2026-06-30 + - Reasoning: owns differentiation + predictable cost + no setup friction + +- Will request agent-customisation layer within 30 days. verify_at: 2026-05-20 + - Reasoning: they'll want to tweak prompts for their domain + +- Will abandon if setup takes >30min. verify_at: 2026-05-15 + - Reasoning: deployment anxiety + competing alternatives (hire devs, build custom) +``` + +**Why markdown + headings, not structured JSON?** +- Keeps block content in text (existing LLM consolidation works) +- Readable by humans (Ben can edit a ToM block as a note) +- Survives Hebbian learning (similarity scoring operates on text) +- Matches vault conventions (everything is markdown) + +## Real Example: elf's Customer ToM Block + +elf would create this when modelling a specific customer archetype engaged with elfmem: + +```yaml +subject: "[[elfmem-early-adopter]]" +category: mind +last_calibrated: 2026-04-28 +decay_tier: durable +prediction_count: 0 +hit_count: 0 +miss_count: 0 +mean_confidence: 0.50 +``` + +```markdown +## Goals +- Build agents that learn and improve over time (not static systems) +- Integrate memory into existing agent frameworks with minimal friction +- Maintain observability of what agents know and why they decide + +## Beliefs +- Knowledge decay matters (forgetting bad data is as important as learning good data) +- Predictability of LLM behavior improves with good context +- Graph-based memory integrates better with agentic workflows than linear RAG + +## Fears +- Memory system becomes a black box (can't audit what agent knows) +- Integration overhead outweighs the memory benefit +- Stale or contradictory knowledge accumulates and breaks agent decisions + +## Predictions +- Will integrate elfmem into their agent framework within 2 weeks of launch. verify_at: 2026-05-12 + - Reasoning: minimal Python dependency, SQLite backend (no ops), API is straightforward + +- Will use the simulate frame to test agent decisions before deployment. verify_at: 2026-06-01 + - Reasoning: observability of "what would this agent think in context X?" is valuable for debugging + +- Will request per-domain decay tuning (some knowledge lasts months, some lasts days). verify_at: 2026-06-15 + - Reasoning: their domain mixes evergreen patterns with rapidly-changing facts +``` + +As predictions close (adoption confirmed, feature adoption measured, requests arrived), elf calibrates this model: hits reinforce it, misses decay it. Over time, the model becomes more accurate. + +## Integration with Simulate Frame + Multi-Scale Loops + +**The full cycle:** + +1. **elf inhabits** → calls `elfmem frame simulate --mind customer` +2. **ToM blocks retrieved** → customer goals, fears, predictions available +3. **elf generates stance** → "Here's what I'd say to this customer given what I know about them" +4. **Stance becomes prediction** → `elfmem connect predicts weight=0.7` +5. **Outcome arrives** → Customer does X (or doesn't) +6. **elf closes loop** → `elfmem outcome --against-mind --hit/miss --reason "..."` +7. **Calibration happens** → ToM block confidence updated; SELF blocks reinforced/decayed +8. **Next time** → `elf inhabits` with a slightly better model of the customer mind + +This is Active Inference applied to world-modelling: the agent's generative model of other minds improves with each prediction cycle. + +## Edge Cases and Mitigations (with examples) + +| Edge case | Mitigation | Example | +|-----------|-----------|---------| +| **Hallucinated minds** — elf invents customer preferences to seem capable | ToM confidence capped at 0.6 until N≥3 predictions close. Can't reinforce fiction. | If elf says "customers fear complexity" but all three predictions miss, confidence drops to 0.45. | +| **Subject conflation** — Two Bens with same name create one ToM | Subject field MUST be `[[wiki-slug]]` pointing to actual entity. CLI enforces. | `elfmem mind create` fails if subject is not a valid wiki link. | +| **Stale ToM** — Mind modelled once in month 1, never updated | DURABLE decay tier (λ=0.001); weekly `dream` flags ToM blocks with `last_calibrated > 30 days`. | If a ToM hasn't had a prediction close in 30 days, `dream` output: "Ben's model is stale — consider a check-in prediction." | +| **Prediction inflation** — Every casual statement becomes a prediction | `predicts` edge only created if elf's simulate output explicitly contains a `verify_at` date | Without `verify_at`, the statement is just narrative in the ToM block, not a falsifiable edge. | +| **Privacy** — ToM of non-consenting third parties | Third-party ToM blocks default to `private: true`; never indexed by MCP | A customer's ToM is not shared externally. Ben approves before elf shares. | +| **Confirmation bias** — elf says "hit" for marginal outcomes to inflate hit rate | `outcome --against-mind` requires both hit/miss flag AND a reason; reason is embedded in edge metadata for later audit | Years later, elf can review: "Did I really think that was a hit?" See the reason I wrote. | +| **Cross-mind contamination** — Customer model bleeds into Ben model | Each ToM is a separate block. Similarity edges are fine; content stays isolated. | Ben's ToM and customer's ToM can have a co_retrieval edge (they co-appear in retrieval), but they don't merge. | +| **Goodhart on hit rate** — elf optimises for prediction accuracy instead of calibration | Track `mean_confidence_delta` per ToM, not hit count. Favour moving towards truth over being right. | A ToM that decayed from 0.7 to 0.6 because elf was wrong is *good* — it's calibrating. | + +## Three Alternatives Considered + +1. **Structured JSON ToM** (rejected) + - Pros: queryable, type-safe + - Cons: breaks text-content assumption, requires schema migration, won't survive similarity scoring + - Verdict: Over-engineered for a convention layer + +2. **One ToM per prediction** (rejected) + - Pros: fine-grained, one update per closure + - Cons: explodes block count (10 predictions = 10 blocks), loses subject coherence, makes "model a mind" hard + - Verdict: Fragment the identity we're trying to model + +3. **One ToM per subject, append-only with consolidation via dream** (chosen) + - Pros: coherent unit, compounds, leverages existing consolidation, markdown-native + - Cons: body can get long; mitigated by dream consolidation + - Verdict: Matches elf's architecture and the vault's ethos + +## Falsifiable Success and Kill Criteria (Phase 0 discipline) + +**Success at 4 weeks (2026-05-25):** +- ≥3 ToM blocks live (Ben + 2 customer archetypes) +- ≥10 closed predictions across them +- Mean confidence delta is non-zero (calibration is moving the needle) +- At least one prediction-miss visibly changed a subsequent simulate output (we learn from being wrong) +- `elfmem outcome` calls are happening as part of weekly cadence, not ad-hoc + +**Kill at 4 weeks if:** +- Zero closed predictions (feature is capture-only, no closure loop working) +- All predictions hit trivially (e.g., "will engage with the product" — unfalsifiable) +- Simulate frame outputs are identical with vs without ToM retrieval (no signal, feature adds noise not intelligence) +- ToM blocks are too expensive to maintain (updates take longer than value delivered) + +**Named user:** Ben. First ToM is Ben's. First closed predictions are about Ben (will he publish the blog? Will he ship Phase 0 by deadline?). Self-bootstrapping because Ben's behaviour is observable and immediate. + +**Success metric we'll compute at week 4:** (closed predictions with non-zero delta) / (total predictions) >= 0.6. Not perfect predictions, but *moving* predictions. + +## Implementation Sequence (week 1, ~5 days) + +| Day | Output | Spike risk | +|-----|--------|-----------| +| 1 | `simulate` frame entry in `.elfmem/config.yaml`; retrieval scoring policy (SELF 10.0, mind/decision 6.0/5.0, task 3.0, knowledge 1.0). Test that `frame simulate` retrieves mind blocks higher than knowledge blocks. | Is the scoring hook flexible enough? (Answered by open questions.) | +| 2 | `elfmem mind ` CLI: create / append-prediction / list / show. Template a Ben ToM block. | Do categories accept "mind" string? (Answered by open questions.) | +| 3 | `elfmem outcome --against-mind ` wrapper; wire `predicts` / `validates` edges through `connects()`. Smoke test: predict → outcome → edge reinforced. | Do `connects()` accept arbitrary relation strings? (Answered by open questions.) | +| 4 | Migrate three predictions about Ben from feedback memory into his ToM block. Run bootstrap: `elfmem mind append Ben --prediction "Will publish blog by..." --verify-at 2026-05-04`. | Does the CLI make migration frictionless enough? | +| 5 | Smoke test: run simulate frame with Ben's ToM retrieved; compare outputs with/without ToM. Confirm visible signal. | Does the simulate frame actually change output quality? (Kill criterion.) | + +## What We Explicitly Do NOT Build in Week 1 + +- Episode blocks (distinct from ToM; comes week 2) +- Mode-monitoring metadata (week 3) +- Typed causal edges beyond predicts/validates (week 4-5) +- Goal hierarchy (after N=10 ToM closures) +- ToM versioning / time-travel queries (nice-to-have) +- Multi-agent ToM (ToM of agents modelling other agents; future) +- LLM-driven ToM auto-population from sources (future; risky) + +## Open Questions for the Elfmem Maintainer + +1. **Categories:** Is there a registered category list, or are categories truly free-form strings? Does "mind" need a code change or just convention? +2. **Relations:** Does `connects()` accept arbitrary `relation` strings, or is there a guard list? Do "predicts" and "validates" need code changes? +3. **Scoring policies:** Where do frame-specific retrieval policies live? Is `scoring.py` the only hook, or is there a frame-specific registry? +4. **Decay tiers:** Best practice — set decay tier per-block at `remember()` time, or as a category-level default in config? + +## Alignment with Project Axioms + +- **Capture is frictionless** — `elfmem mind` CLI with minimal syntax; auto-linking to wiki entities +- **Agent maintains** — elf owns ToM blocks; Ben approves on outcome closure +- **Knowledge compounds** — Each closed prediction reinforces/decays blocks and edges; minds get more accurate over time +- **Elfmem is intelligence layer** — ToM is core to intelligence: the agent's generative model of others +- **Everything is markdown** — ToM blocks are vault-native notes, human-readable + +## Reference + +- Originating designs: [[alv-simulated-self-cognitive-mode.md]], [[alv-multi-scale-learning-loops.md]], [[alv-cognitive-abstractions-roadmap.md]] +- Calibration mechanism: elfmem `outcome()` with confidence delta + edge reinforcement +- Playbook: [[project-build-playbook.md]] Phase 0 gates applied to infrastructure +- Theory: Active Inference (Friston) applied to explicit world-modelling diff --git a/src/elfmem/__init__.py b/src/elfmem/__init__.py index c6b483c..f4f8fbc 100644 --- a/src/elfmem/__init__.py +++ b/src/elfmem/__init__.py @@ -43,8 +43,13 @@ FrameResult, LearnDocumentResult, LearnResult, + MindOutcomeResult, + MindPredictResult, + MindShowResult, + MindSummary, OperationRecord, OutcomeResult, + PredictionDetail, ScoredBlock, SetupResult, SystemStatus, @@ -75,6 +80,12 @@ "ConnectSpec", "DisconnectResult", "DisplacedEdge", + # Mind (Theory of Mind) types + "MindSummary", + "MindPredictResult", + "MindShowResult", + "MindOutcomeResult", + "PredictionDetail", # Exceptions "ElfmemError", "SessionError", diff --git a/src/elfmem/api.py b/src/elfmem/api.py index 3ad8263..8f2413d 100644 --- a/src/elfmem/api.py +++ b/src/elfmem/api.py @@ -59,6 +59,10 @@ FrameResult, LearnDocumentResult, LearnResult, + MindOutcomeResult, + MindPredictResult, + MindShowResult, + MindSummary, OperationRecord, OutcomeResult, ScoredBlock, @@ -1581,6 +1585,168 @@ async def curate(self) -> CurateResult: self._record_op("curate", result.summary) return result + # ── Mind (Theory of Mind) operations ─────────────────────────────────── + + async def mind_create( + self, + subject: str, + *, + goals: list[str] | None = None, + beliefs: list[str] | None = None, + fears: list[str] | None = None, + motivations: list[str] | None = None, + ) -> LearnResult: + """Create a Theory of Mind block for a subject. + + USE WHEN: You need to model another agent's or person's mind — + their goals, beliefs, fears, and motivations — as an explicit, + falsifiable representation. + + DON'T USE WHEN: Storing general knowledge about someone. Use + learn() for facts; mind_create() is for structured mental models + that will generate predictions. + + COST: Instant. No LLM calls. Block goes to inbox. + + RETURNS: LearnResult with the mind block ID. Category is "mind", + decay tier is DURABLE (~6 month half-life). + + NEXT: Add predictions with mind_predict(). Retrieve with + frame("simulate") to reason about the modelled mind. + """ + from elfmem.operations.mind import create_mind + + async with self._engine.begin() as conn: + result = await create_mind( + conn, + subject=subject, + goals=goals, + beliefs=beliefs, + fears=fears, + motivations=motivations, + ) + if result.status in ("created", "near_duplicate_superseded"): + self._pending += 1 + if result.status == "created": + self._last_learned_block_id = result.block_id + self._record_op("mind_create", result.summary) + return result + + async def mind_predict( + self, + mind_block_id: str, + prediction: str, + *, + verify_at: str, + reasoning: str | None = None, + ) -> "MindPredictResult": + """Add a falsifiable prediction linked to a mind block. + + USE WHEN: You have a specific, testable hypothesis about what a + modelled mind will do. Predictions require a verify_at date. + + DON'T USE WHEN: The claim is unfalsifiable or has no verification + date. Casual observations go in learn(). + + COST: Instant. Creates a decision block + predicts edge. + + RETURNS: MindPredictResult with decision_block_id and edge status. + + NEXT: When the prediction resolves, call mind_outcome() with the + decision_block_id and hit=True/False. + """ + from elfmem.operations.mind import predict + + async with self._engine.begin() as conn: + result = await predict( + conn, + mind_block_id=mind_block_id, + prediction=prediction, + verify_at=verify_at, + reasoning=reasoning, + edge_degree_cap=self._config.memory.edge_degree_cap, + edge_reinforce_delta=self._config.memory.edge_reinforce_delta, + current_active_hours=self._current_active_hours(), + ) + self._pending += 1 # decision block goes to inbox + self._last_learned_block_id = result.decision_block_id + self._record_op("mind_predict", result.summary) + return result + + async def mind_list(self) -> list["MindSummary"]: + """List all active mind blocks with prediction statistics. + + USE WHEN: Discovering which minds are modelled and their calibration. + + COST: Fast. Database reads only. + + RETURNS: list[MindSummary] with subject, confidence, prediction + counts, and hit/miss ratios. + """ + from elfmem.operations.mind import list_minds + + async with self._engine.connect() as conn: + result = await list_minds(conn) + self._record_op("mind_list", f"{len(result)} mind(s) found.") + return result + + async def mind_show(self, mind_block_id: str) -> "MindShowResult": + """Show a mind block with all linked predictions. + + USE WHEN: Inspecting a specific mind model before reasoning about + it or before running simulate frame retrieval. + + COST: Fast. Database reads only. + + RETURNS: MindShowResult with content, predictions, and outcomes. + """ + from elfmem.operations.mind import show_mind + + async with self._engine.connect() as conn: + result = await show_mind(conn, mind_block_id) + self._record_op("mind_show", result.summary) + return result + + async def mind_outcome( + self, + decision_block_id: str, + *, + hit: bool, + reason: str, + ) -> "MindOutcomeResult": + """Close a prediction: record hit/miss, calibrate the mind model. + + USE WHEN: A prediction has resolved — the verify_at date has + passed and you can observe whether the prediction was correct. + + DON'T USE WHEN: The prediction hasn't resolved yet. Don't + speculate — wait for observable evidence. + + COST: Fast. Database writes only. Updates confidence on both + the decision block and the linked mind block. + + RETURNS: MindOutcomeResult with confidence deltas and edge status. + + NEXT: The mind model's confidence is now calibrated. Future + simulate frame retrievals reflect the updated model accuracy. + """ + from elfmem.operations.mind import mind_outcome as _mind_outcome + + async with self._engine.begin() as conn: + result = await _mind_outcome( + conn, + decision_block_id=decision_block_id, + hit=hit, + reason=reason, + current_active_hours=self._current_active_hours(), + prior_strength=self._config.memory.outcome_prior_strength, + reinforce_threshold=self._config.memory.outcome_reinforce_threshold, + edge_reinforce_delta=self._config.memory.edge_reinforce_delta, + edge_degree_cap=self._config.memory.edge_degree_cap, + ) + self._record_op("mind_outcome", result.summary) + return result + async def close(self) -> None: """Dispose the database engine. Call when done with this MemorySystem. diff --git a/src/elfmem/cli.py b/src/elfmem/cli.py index 6f81d1e..c588441 100644 --- a/src/elfmem/cli.py +++ b/src/elfmem/cli.py @@ -44,7 +44,17 @@ from elfmem.api import MemorySystem, format_recall_response from elfmem.exceptions import ElfmemError from elfmem.guide import get_guide -from elfmem.types import CurateResult, FrameResult, LearnResult, OutcomeResult, SystemStatus +from elfmem.types import ( + CurateResult, + FrameResult, + LearnResult, + MindOutcomeResult, + MindPredictResult, + MindShowResult, + MindSummary, + OutcomeResult, + SystemStatus, +) app = typer.Typer( name="elfmem", @@ -691,6 +701,157 @@ def serve( mcp_main(db_path=db_path, config_path=config_path, use_adaptive_policy=adaptive_policy) +# ── Mind (Theory of Mind) subcommands ──────────────────────────────────────── + +mind_app = typer.Typer( + name="mind", + help="Theory of Mind blocks: model other minds, make predictions, close outcomes.", + no_args_is_help=True, +) +app.add_typer(mind_app, name="mind") + + +@mind_app.command("create") +def mind_create( + subject: str, + goals: Annotated[ + list[str] | None, typer.Option("--goal", help="Goal (repeatable)") + ] = None, + beliefs: Annotated[ + list[str] | None, typer.Option("--belief", help="Belief (repeatable)") + ] = None, + fears: Annotated[ + list[str] | None, typer.Option("--fear", help="Fear (repeatable)") + ] = None, + motivations: Annotated[ + list[str] | None, typer.Option("--motivation", help="Motivation (repeatable)") + ] = None, + db: Annotated[str | None, typer.Option("--db", envvar="ELFMEM_DB")] = None, + config: Annotated[str | None, typer.Option("--config", envvar="ELFMEM_CONFIG")] = None, + json_output: Annotated[bool, typer.Option("--json")] = False, +) -> None: + """Create a Theory of Mind block for a subject. + + Models another agent's goals, beliefs, fears, and motivations as an + explicit, falsifiable representation. Decay tier: DURABLE (~6 month half-life). + + Examples: + + elfmem mind create "customer-archetype" \\ + --goal "Ship fast without learning infra" \\ + --goal "Keep API costs predictable" \\ + --belief "Agent-ready code is a moat" \\ + --fear "Complex setup causes abandonment" + """ + db_path, config_path = _resolve_paths(db, config) + result: LearnResult = _run( + _mind_create(db_path, config_path, subject, goals, beliefs, fears, motivations) + ) + if json_output: + _json(result.to_dict()) + else: + typer.echo(str(result)) + + +@mind_app.command("predict") +def mind_predict( + mind_block_id: str, + prediction: Annotated[str, typer.Option("--prediction", help="Falsifiable prediction text")], + verify_at: Annotated[str, typer.Option("--verify-at", help="Verification date (YYYY-MM-DD)")], + reasoning: Annotated[str | None, typer.Option("--reasoning", help="Why this prediction")] = None, + db: Annotated[str | None, typer.Option("--db", envvar="ELFMEM_DB")] = None, + config: Annotated[str | None, typer.Option("--config", envvar="ELFMEM_CONFIG")] = None, + json_output: Annotated[bool, typer.Option("--json")] = False, +) -> None: + """Add a falsifiable prediction linked to a mind block. + + Creates a decision block with the prediction content and links it + to the mind block via a 'predicts' edge. + + Examples: + + elfmem mind predict abc12345 \\ + --prediction "Will pay 49/mo for hosted version" \\ + --verify-at 2026-06-30 \\ + --reasoning "Prefers predictable cost over setup friction" + """ + db_path, config_path = _resolve_paths(db, config) + result: MindPredictResult = _run( + _mind_predict(db_path, config_path, mind_block_id, prediction, verify_at, reasoning) + ) + if json_output: + _json(result.to_dict()) + else: + typer.echo(str(result)) + + +@mind_app.command("list") +def mind_list( + db: Annotated[str | None, typer.Option("--db", envvar="ELFMEM_DB")] = None, + config: Annotated[str | None, typer.Option("--config", envvar="ELFMEM_CONFIG")] = None, + json_output: Annotated[bool, typer.Option("--json")] = False, +) -> None: + """List all active mind blocks with prediction statistics.""" + db_path, config_path = _resolve_paths(db, config) + results: list[MindSummary] = _run(_mind_list(db_path, config_path)) + if json_output: + _json([r.to_dict() for r in results]) + else: + if not results: + typer.echo("No mind blocks found. Create one with: elfmem mind create ") + else: + for r in results: + typer.echo(str(r)) + + +@mind_app.command("show") +def mind_show( + mind_block_id: str, + db: Annotated[str | None, typer.Option("--db", envvar="ELFMEM_DB")] = None, + config: Annotated[str | None, typer.Option("--config", envvar="ELFMEM_CONFIG")] = None, + json_output: Annotated[bool, typer.Option("--json")] = False, +) -> None: + """Show a mind block with all linked predictions.""" + db_path, config_path = _resolve_paths(db, config) + result: MindShowResult = _run(_mind_show(db_path, config_path, mind_block_id)) + if json_output: + _json(result.to_dict()) + else: + typer.echo(str(result)) + + +@mind_app.command("outcome") +def mind_outcome_cmd( + decision_block_id: str, + hit: Annotated[bool, typer.Option("--hit/--miss", help="Did the prediction come true?")] = True, + reason: Annotated[str, typer.Option("--reason", help="Why this outcome")] = "", + db: Annotated[str | None, typer.Option("--db", envvar="ELFMEM_DB")] = None, + config: Annotated[str | None, typer.Option("--config", envvar="ELFMEM_CONFIG")] = None, + json_output: Annotated[bool, typer.Option("--json")] = False, +) -> None: + """Close a prediction: record hit/miss and calibrate the mind model. + + Updates confidence on both the decision block and the linked mind block. + Creates a 'validates' edge from the decision to the mind. + + Examples: + + elfmem mind outcome def67890 --hit --reason "Signed up week 1 at tier price" + elfmem mind outcome def67890 --miss --reason "Requested full bespoke integration" + """ + if not reason: + typer.echo("Error: --reason is required for audit trail.", err=True) + raise typer.Exit(1) + db_path, config_path = _resolve_paths(db, config) + result: MindOutcomeResult = _run( + _mind_outcome(db_path, config_path, decision_block_id, hit, reason) + ) + if json_output: + _json(result.to_dict()) + else: + typer.echo(str(result)) + + def main() -> None: """Package entry point.""" app() @@ -777,6 +938,58 @@ async def _init_self(db_path: str, config: str, content: str) -> LearnResult: return await mem.remember(content, tags=["self/context"]) +async def _mind_create( + db_path: str, + config: str | None, + subject: str, + goals: list[str] | None, + beliefs: list[str] | None, + fears: list[str] | None, + motivations: list[str] | None, +) -> LearnResult: + async with MemorySystem.managed(db_path, config=config, auto_dream=False) as mem: + return await mem.mind_create( + subject, goals=goals, beliefs=beliefs, fears=fears, motivations=motivations, + ) + + +async def _mind_predict( + db_path: str, + config: str | None, + mind_block_id: str, + prediction: str, + verify_at: str, + reasoning: str | None, +) -> MindPredictResult: + async with MemorySystem.managed(db_path, config=config, auto_dream=False) as mem: + return await mem.mind_predict( + mind_block_id, prediction, verify_at=verify_at, reasoning=reasoning, + ) + + +async def _mind_list(db_path: str, config: str | None) -> list[MindSummary]: + async with MemorySystem.managed(db_path, config=config, auto_dream=False) as mem: + return await mem.mind_list() + + +async def _mind_show( + db_path: str, config: str | None, mind_block_id: str, +) -> MindShowResult: + async with MemorySystem.managed(db_path, config=config, auto_dream=False) as mem: + return await mem.mind_show(mind_block_id) + + +async def _mind_outcome( + db_path: str, + config: str | None, + decision_block_id: str, + hit: bool, + reason: str, +) -> MindOutcomeResult: + async with MemorySystem.managed(db_path, config=config, auto_dream=False) as mem: + return await mem.mind_outcome(decision_block_id, hit=hit, reason=reason) + + async def _doctor_self_count(db_path: str) -> int: """Count active SELF blocks. Returns -1 if DB is not accessible. diff --git a/src/elfmem/context/frames.py b/src/elfmem/context/frames.py index de42c91..853b7bf 100644 --- a/src/elfmem/context/frames.py +++ b/src/elfmem/context/frames.py @@ -9,6 +9,7 @@ from elfmem.scoring import ( ATTENTION_WEIGHTS, SELF_WEIGHTS, + SIMULATE_WEIGHTS, TASK_WEIGHTS, ScoringWeights, ) @@ -40,6 +41,7 @@ class FrameDefinition: token_budget: int cache: CachePolicy | None source: Literal["builtin", "user"] = "user" + score_boosts: dict[str, float] | None = None SELF_FRAME = FrameDefinition( @@ -78,10 +80,27 @@ class FrameDefinition: source="builtin", ) +SIMULATE_FRAME = FrameDefinition( + name="simulate", + weights=SIMULATE_WEIGHTS, + filters=FrameFilters(), + guarantees=["self/constitutional", "mind/%"], + template="simulate", + token_budget=2000, + cache=None, + source="builtin", + score_boosts={ + "tag:self/": 10.0, + "mind": 6.0, + "decision": 5.0, + }, +) + BUILTIN_FRAMES: dict[str, FrameDefinition] = { "self": SELF_FRAME, "attention": ATTENTION_FRAME, "task": TASK_FRAME, + "simulate": SIMULATE_FRAME, } diff --git a/src/elfmem/context/rendering.py b/src/elfmem/context/rendering.py index 1d8d193..cf45d70 100644 --- a/src/elfmem/context/rendering.py +++ b/src/elfmem/context/rendering.py @@ -32,6 +32,8 @@ def render_blocks( return _render_with_budget(blocks, token_budget, _render_self_template) elif template == "task": return _render_with_budget(blocks, token_budget, _render_task_template) + elif template == "simulate": + return _render_with_budget(blocks, token_budget, _render_simulate_template) else: return _render_with_budget(blocks, token_budget, _render_attention_template) @@ -89,6 +91,39 @@ def _render_task_template(blocks: list[ScoredBlock]) -> str: return "\n".join(lines) if lines else "" +def _render_simulate_template(blocks: list[ScoredBlock]) -> str: + """Render blocks grouped by role for Theory of Mind simulation. + + Groups: Identity (self/* tags), Minds (mind/* tags), Decisions, Context. + """ + identity = [b for b in blocks if any(t.startswith("self/") for t in b.tags)] + minds = [b for b in blocks if any(t.startswith("mind/") for t in b.tags) + and b not in identity] + decisions = [b for b in blocks if b not in identity and b not in minds + and "decision" in (b.tags or [])] + context = [b for b in blocks if b not in identity and b not in minds + and b not in decisions] + + lines: list[str] = [] + if identity: + lines.append("## Identity (inhabiting)") + for block in identity: + lines.append(f"- {block.content}") + if minds: + lines.append("## Minds (reasoning about)") + for i, block in enumerate(minds, 1): + lines.append(f"[{i}] {block.content}") + if decisions: + lines.append("## Open Decisions") + for i, block in enumerate(decisions, 1): + lines.append(f"[{i}] {block.content}") + if context: + lines.append("## Context") + for i, block in enumerate(context, 1): + lines.append(f"[{i}] {block.content}") + return "\n".join(lines) if lines else "" + + def _estimate_tokens(text: str) -> int: """Rough token estimate: len(text) // 4.""" return len(text) // 4 diff --git a/src/elfmem/db/queries.py b/src/elfmem/db/queries.py index 780591c..5d5e012 100644 --- a/src/elfmem/db/queries.py +++ b/src/elfmem/db/queries.py @@ -1091,6 +1091,36 @@ async def count_self_blocks(conn: AsyncConnection) -> int: return result.scalar() or 0 +async def get_active_blocks_by_category( + conn: AsyncConnection, + category: str, +) -> list[dict[str, Any]]: + """Fetch all active blocks with a specific category.""" + result = await conn.execute( + select(blocks).where( + and_(blocks.c.status == "active", blocks.c.category == category) + ) + ) + return [dict(row) for row in result.mappings()] + + +async def get_edges_by_relation_type( + conn: AsyncConnection, + block_id: str, + relation_type: str, +) -> list[dict[str, Any]]: + """Get edges of a specific relation type where block_id is an endpoint.""" + result = await conn.execute( + select(edges).where( + and_( + or_(edges.c.from_id == block_id, edges.c.to_id == block_id), + edges.c.relation_type == relation_type, + ) + ) + ) + return [dict(row) for row in result.mappings()] + + async def seed_builtin_data(conn: AsyncConnection) -> None: """Insert built-in frames and default system_config values. diff --git a/src/elfmem/guide.py b/src/elfmem/guide.py index 703812e..559dc62 100644 --- a/src/elfmem/guide.py +++ b/src/elfmem/guide.py @@ -502,6 +502,87 @@ def __str__(self) -> str: "# → 'guarded' if the edge is actually 'supports' (won't remove)" ), ), + "mind_create": AgentGuide( + name="mind_create", + what="Create a Theory of Mind block modelling another agent's goals, beliefs, fears.", + when=( + "You need to make predictions about what another agent or person will do. " + "Start by modelling their mind — goals, beliefs, fears, motivations." + ), + when_not=( + "Storing general facts about someone — use learn(). " + "Mind blocks are structured models for falsifiable predictions." + ), + cost="Instant. No LLM calls. Block queued in inbox.", + returns=( + "LearnResult with block_id. Category is 'mind', decay tier is DURABLE " + "(~6 month half-life). Tagged mind/." + ), + next=( + "Add predictions with mind_predict(). Retrieve with frame('simulate') " + "to inhabit the perspective and reason about the modelled mind." + ), + example=( + "result = await system.mind_create(\n" + " 'customer-archetype',\n" + " goals=['Ship fast without learning infra'],\n" + " beliefs=['Agent-ready code is a moat'],\n" + " fears=['Complex setup causes abandonment'],\n" + ")" + ), + ), + "mind_predict": AgentGuide( + name="mind_predict", + what="Add a falsifiable prediction linked to a mind block.", + when=( + "You have a specific, testable hypothesis about what the modelled mind will do. " + "Predictions must have a verify_at date." + ), + when_not=( + "The claim is unfalsifiable or has no verification date. " + "Casual observations go in learn()." + ), + cost="Instant. Creates a decision block + predicts edge.", + returns="MindPredictResult with decision_block_id and edge action.", + next="When the prediction resolves, call mind_outcome() with the decision_block_id.", + example=( + "result = await system.mind_predict(\n" + " mind_block_id,\n" + " 'Will pay 49/mo for hosted version',\n" + " verify_at='2026-06-30',\n" + " reasoning='Prefers predictable cost over setup friction',\n" + ")" + ), + ), + "mind_outcome": AgentGuide( + name="mind_outcome", + what="Close a prediction: record hit/miss, calibrate the mind model.", + when="A prediction has resolved — the verify_at date passed and you have evidence.", + when_not="The prediction hasn't resolved yet. Wait for observable evidence.", + cost="Fast. Database writes only. No LLM calls.", + returns=( + "MindOutcomeResult with confidence deltas for both mind and decision blocks. " + "Hit: confidence up + reinforce. Miss: confidence down + decay." + ), + next=( + "The mind model's confidence is now calibrated. Future simulate frame " + "retrievals reflect the updated model accuracy." + ), + example=( + "# Prediction hit\n" + "result = await system.mind_outcome(\n" + " decision_block_id,\n" + " hit=True,\n" + " reason='Signed up week 1 at tier price',\n" + ")\n" + "# Prediction miss\n" + "result = await system.mind_outcome(\n" + " decision_block_id,\n" + " hit=False,\n" + " reason='Requested full bespoke integration',\n" + ")" + ), + ), "guide": AgentGuide( name="guide", what="Return agent-friendly documentation for a specific method or all methods.", @@ -544,6 +625,11 @@ def __str__(self) -> str: " connect(src, tgt, ...) Instant Assert a semantic edge between two blocks", " disconnect(src, tgt) Instant Remove a wrong or unwanted edge", " curate() Fast Archive stale blocks, prune weak edges", + " mind_create(subj, ...) Instant Create a Theory of Mind block for a subject", + " mind_predict(id, ...) Instant Add a falsifiable prediction to a mind block", + " mind_outcome(id, ...) Fast Close a prediction: hit/miss + calibrate", + " mind_list() Fast List all mind blocks with prediction stats", + " mind_show(id) Fast Show a mind block with linked predictions", " status() Fast System health snapshot + suggested action", " history(last_n=10) Instant Recent operations in this process session", " guide(method?) Instant This help", diff --git a/src/elfmem/memory/blocks.py b/src/elfmem/memory/blocks.py index f86502b..4d417df 100644 --- a/src/elfmem/memory/blocks.py +++ b/src/elfmem/memory/blocks.py @@ -32,6 +32,8 @@ def determine_decay_tier(tags: list[str], category: str) -> DecayTier: durable_tags = {"self/value", "self/constraint", "self/goal"} if tag_set & durable_tags: return DecayTier.DURABLE + if category == "mind": + return DecayTier.DURABLE if category == "observation": return DecayTier.EPHEMERAL return DecayTier.STANDARD diff --git a/src/elfmem/memory/retrieval.py b/src/elfmem/memory/retrieval.py index 03bf897..189fa0f 100644 --- a/src/elfmem/memory/retrieval.py +++ b/src/elfmem/memory/retrieval.py @@ -60,6 +60,7 @@ async def hybrid_retrieve( top_k: int = 5, tag_filter: str | None = None, search_window_hours: float = DEFAULT_SEARCH_WINDOW_HOURS, + score_boosts: dict[str, float] | None = None, ) -> list[ScoredBlock]: """Execute the 7-stage hybrid retrieval pipeline. @@ -142,6 +143,7 @@ async def hybrid_retrieve( max_reinforcement_count=max_reinforcement, top_k=top_k, tags_map=tags_map, + score_boosts=score_boosts, ) # Stage 5: MMR diversity reordering (query-aware retrievals with embeddings only) @@ -283,6 +285,27 @@ async def _stage_3_graph_expand( return result +def _compute_boost( + category: str, + tags: list[str], + boosts: dict[str, float], +) -> float: + """Compute multiplicative score boost from category and tag-prefix matches. + + Plain keys match block category (e.g. "mind" → 6.0). + Keys prefixed with "tag:" match any tag starting with the suffix + (e.g. "tag:self/" → 10.0 boosts any block tagged self/*). + Returns the maximum matching boost, defaulting to 1.0. + """ + boost = boosts.get(category, 1.0) + for key, value in boosts.items(): + if key.startswith("tag:"): + prefix = key[4:] + if any(t.startswith(prefix) for t in tags): + boost = max(boost, value) + return boost + + def _stage_4_composite_score( candidates: list[tuple[dict[str, Any], float, bool]], *, @@ -292,10 +315,13 @@ def _stage_4_composite_score( max_reinforcement_count: int, top_k: int, tags_map: dict[str, list[str]] | None = None, + score_boosts: dict[str, float] | None = None, ) -> list[ScoredBlock]: """Stage 4: Compute composite score for all candidates. Each candidate is (block_dict, similarity, was_expanded). + When score_boosts is provided, the composite score is multiplied by + a category/tag-based boost factor before ranking. Returns top (top_k × CONTRADICTION_OVERSAMPLE) ScoredBlock objects. """ scored: list[ScoredBlock] = [] @@ -321,11 +347,20 @@ def _stage_4_composite_score( reinforcement=reinforcement, weights=weights, ) + + block_tags = tags_map.get(block_id, []) if tags_map else [] + if score_boosts: + score *= _compute_boost( + category=block.get("category", "knowledge"), + tags=block_tags, + boosts=score_boosts, + ) + scored.append( ScoredBlock( id=block_id, content=block.get("summary") or block.get("content", ""), - tags=tags_map.get(block_id, []) if tags_map else [], + tags=block_tags, similarity=similarity, confidence=confidence, recency=recency, diff --git a/src/elfmem/operations/connect.py b/src/elfmem/operations/connect.py index 776773e..c11ed41 100644 --- a/src/elfmem/operations/connect.py +++ b/src/elfmem/operations/connect.py @@ -29,6 +29,8 @@ "supports": 0.75, "contradicts": 0.60, "outcome": 0.80, + "predicts": 0.70, + "validates": 0.75, } _DEFAULT_WEIGHT_FALLBACK = 0.65 # for unknown custom relation types diff --git a/src/elfmem/operations/mind.py b/src/elfmem/operations/mind.py new file mode 100644 index 0000000..2b10e0a --- /dev/null +++ b/src/elfmem/operations/mind.py @@ -0,0 +1,397 @@ +"""mind operations — Theory of Mind block lifecycle. + +Creates, queries, and calibrates mind (ToM) blocks. Mind blocks model +other agents' goals, beliefs, fears, motivations, and predictions. +Predictions are tracked as separate decision blocks linked via ``predicts`` +edges. Outcome closure updates confidence on both the mind and decision +blocks and creates ``validates`` edges. + +No LLM calls. All operations are database reads/writes. +""" + +from __future__ import annotations + +import re +from typing import Any + +from sqlalchemy.ext.asyncio import AsyncConnection + +from elfmem.db import queries +from elfmem.exceptions import BlockNotActiveError, ElfmemError +from elfmem.db.queries import insert_agent_edge +from elfmem.operations.connect import do_connect +from elfmem.operations.learn import learn as _learn +from elfmem.operations.outcome import compute_bayesian_update, record_outcome +from elfmem.types import ( + Edge, + LearnResult, + MindOutcomeResult, + MindPredictResult, + MindShowResult, + MindSummary, + PredictionDetail, +) + + +def _slugify(subject: str) -> str: + """Convert a subject name to a tag-safe slug.""" + return re.sub(r"[^a-z0-9-]", "-", subject.strip().lower()).strip("-") + + +def _build_mind_content( + subject: str, + *, + goals: list[str] | None = None, + beliefs: list[str] | None = None, + fears: list[str] | None = None, + motivations: list[str] | None = None, +) -> str: + """Build structured markdown content for a mind block.""" + lines = [f"# Mind Model: {subject}", ""] + if goals: + lines.append("## Goals") + for g in goals: + lines.append(f"- {g}") + lines.append("") + if beliefs: + lines.append("## Beliefs") + for b in beliefs: + lines.append(f"- {b}") + lines.append("") + if fears: + lines.append("## Fears") + for f in fears: + lines.append(f"- {f}") + lines.append("") + if motivations: + lines.append("## Motivations") + for m in motivations: + lines.append(f"- {m}") + lines.append("") + return "\n".join(lines) + + +def _build_prediction_content( + prediction: str, + verify_at: str, + reasoning: str | None = None, +) -> str: + """Build content for a prediction decision block.""" + lines = [f"Prediction: {prediction}", f"Verify at: {verify_at}"] + if reasoning: + lines.append(f"Reasoning: {reasoning}") + return "\n".join(lines) + + +def _extract_verify_at(content: str) -> str | None: + """Extract verify_at date from prediction block content.""" + match = re.search(r"Verify at:\s*(.+)", content) + return match.group(1).strip() if match else None + + +def _extract_subject(tags: list[str]) -> str: + """Extract subject name from mind/* tags.""" + for tag in tags: + if tag.startswith("mind/"): + return tag[5:] + return "unknown" + + +async def create_mind( + conn: AsyncConnection, + *, + subject: str, + goals: list[str] | None = None, + beliefs: list[str] | None = None, + fears: list[str] | None = None, + motivations: list[str] | None = None, +) -> LearnResult: + """Create a mind (ToM) block for a subject. + + The block is stored with category="mind" and tagged ``mind/``. + Decay tier is DURABLE (λ=0.001, ~6 month half-life). + + Returns LearnResult — reuses the standard learn pathway. + """ + if not subject.strip(): + raise ValueError("subject must be non-empty") + + slug = _slugify(subject) + content = _build_mind_content( + subject, goals=goals, beliefs=beliefs, fears=fears, motivations=motivations, + ) + tags = [f"mind/{slug}"] + + return await _learn( + conn, + content=content, + tags=tags, + category="mind", + source="mind_create", + ) + + +async def predict( + conn: AsyncConnection, + *, + mind_block_id: str, + prediction: str, + verify_at: str, + reasoning: str | None = None, + edge_degree_cap: int = 5, + edge_reinforce_delta: float = 0.10, + current_active_hours: float | None = None, +) -> MindPredictResult: + """Add a falsifiable prediction linked to a mind block. + + Creates a decision block with the prediction content, then creates + a ``predicts`` edge from the mind block to the decision block. + + The mind block must exist and be active. + """ + # Validate mind block exists + mind_block = await queries.get_block(conn, mind_block_id) + if mind_block is None or mind_block.get("status") != "active": + raise BlockNotActiveError(mind_block_id) + if mind_block.get("category") != "mind": + raise ElfmemError( + f"Block {mind_block_id[:8]}… is not a mind block (category={mind_block.get('category')!r}).", + recovery=f"Use a block with category='mind'. List minds with mind_list().", + ) + + # Extract subject from mind block tags + mind_tags = await queries.get_tags(conn, mind_block_id) + subject_slug = "unknown" + for tag in mind_tags: + if tag.startswith("mind/"): + subject_slug = tag[5:] + break + + # Create decision block for the prediction + content = _build_prediction_content(prediction, verify_at, reasoning) + decision_tags = [f"mind/{subject_slug}", "prediction"] + decision_result = await _learn( + conn, + content=content, + tags=decision_tags, + category="decision", + source="mind_predict", + ) + + # Create predicts edge: mind → decision + # Direct insert because the decision block is in inbox (not yet active), + # and do_connect() requires both endpoints to be active. + from_id, to_id = Edge.canonical(mind_block_id, decision_result.block_id) + await insert_agent_edge( + conn, + from_id=from_id, + to_id=to_id, + weight=0.70, + relation_type="predicts", + note=f"Prediction: {prediction[:80]}", + current_active_hours=current_active_hours, + ) + + return MindPredictResult( + mind_block_id=mind_block_id, + decision_block_id=decision_result.block_id, + prediction=prediction, + verify_at=verify_at, + edge_action="created", + ) + + +async def list_minds(conn: AsyncConnection) -> list[MindSummary]: + """List all active mind blocks with prediction statistics.""" + mind_blocks = await queries.get_active_blocks_by_category(conn, "mind") + summaries: list[MindSummary] = [] + + for block in mind_blocks: + block_id = block["id"] + tags = await queries.get_tags(conn, block_id) + subject = _extract_subject(tags) + + # Count predictions via predicts edges + predicts_edges = await queries.get_edges_by_relation_type( + conn, block_id, "predicts" + ) + prediction_count = len(predicts_edges) + + # Count hits/misses by checking outcome evidence on linked decision blocks + hit_count = 0 + miss_count = 0 + for edge in predicts_edges: + other_id = edge["to_id"] if edge["from_id"] == block_id else edge["from_id"] + decision_block = await queries.get_block(conn, other_id) + if decision_block is None: + continue + evidence = float(decision_block.get("outcome_evidence") or 0.0) + if evidence > 0: + confidence = float(decision_block.get("confidence", 0.5)) + if confidence >= 0.5: + hit_count += 1 + else: + miss_count += 1 + + summaries.append(MindSummary( + block_id=block_id, + subject=subject, + confidence=float(block.get("confidence", 0.5)), + prediction_count=prediction_count, + hit_count=hit_count, + miss_count=miss_count, + )) + + return summaries + + +async def show_mind(conn: AsyncConnection, mind_block_id: str) -> MindShowResult: + """Show a mind block with all linked predictions.""" + mind_block = await queries.get_block(conn, mind_block_id) + if mind_block is None: + raise ElfmemError( + f"Block {mind_block_id[:8]}… not found.", + recovery="Use mind_list() to find valid mind block IDs.", + ) + + tags = await queries.get_tags(conn, mind_block_id) + subject = _extract_subject(tags) + + # Gather linked predictions + predicts_edges = await queries.get_edges_by_relation_type( + conn, mind_block_id, "predicts" + ) + predictions: list[PredictionDetail] = [] + for edge in predicts_edges: + other_id = ( + edge["to_id"] if edge["from_id"] == mind_block_id else edge["from_id"] + ) + decision_block = await queries.get_block(conn, other_id) + if decision_block is None: + continue + + content = decision_block.get("content", "") + verify_at = _extract_verify_at(content) + evidence = float(decision_block.get("outcome_evidence") or 0.0) + + outcome: str | None = None + if evidence > 0: + confidence = float(decision_block.get("confidence", 0.5)) + outcome = "hit" if confidence >= 0.5 else "miss" + + predictions.append(PredictionDetail( + block_id=other_id, + content=content, + confidence=float(decision_block.get("confidence", 0.5)), + verify_at=verify_at, + outcome=outcome, + )) + + return MindShowResult( + block_id=mind_block_id, + subject=subject, + content=mind_block.get("content", ""), + confidence=float(mind_block.get("confidence", 0.5)), + predictions=predictions, + ) + + +async def mind_outcome( + conn: AsyncConnection, + *, + decision_block_id: str, + hit: bool, + reason: str, + current_active_hours: float, + prior_strength: float = 2.0, + reinforce_threshold: float = 0.5, + edge_reinforce_delta: float = 0.10, + edge_degree_cap: int = 5, +) -> MindOutcomeResult: + """Close a prediction: record hit/miss, update mind + decision confidence. + + 1. Records outcome on the decision block (signal=0.9 for hit, 0.1 for miss). + 2. Finds the mind block via reverse ``predicts`` edge. + 3. Records outcome on the mind block (attenuated signal). + 4. Creates or reinforces a ``validates`` edge from decision to mind. + """ + # Validate decision block + decision_block = await queries.get_block(conn, decision_block_id) + if decision_block is None or decision_block.get("status") != "active": + raise BlockNotActiveError(decision_block_id) + + # Find linked mind block via predicts edge + predicts_edges = await queries.get_edges_by_relation_type( + conn, decision_block_id, "predicts" + ) + if not predicts_edges: + raise ElfmemError( + f"No predicts edge found for decision {decision_block_id[:8]}…", + recovery="This block may not be a prediction. Use outcome() for general blocks.", + ) + + edge = predicts_edges[0] + mind_block_id = ( + edge["from_id"] if edge["to_id"] == decision_block_id else edge["to_id"] + ) + + mind_block = await queries.get_block(conn, mind_block_id) + if mind_block is None or mind_block.get("status") != "active": + raise ElfmemError( + f"Mind block {mind_block_id[:8]}… is not active.", + recovery="The linked mind block may have been archived.", + ) + + # Signal: 0.9 for hit, 0.1 for miss + signal = 0.9 if hit else 0.1 + + # 1. Record outcome on decision block + decision_result = await record_outcome( + conn, + block_ids=[decision_block_id], + signal=signal, + weight=1.0, + source=f"mind_outcome:{'hit' if hit else 'miss'}:{reason[:50]}", + current_active_hours=current_active_hours, + prior_strength=prior_strength, + reinforce_threshold=reinforce_threshold, + edge_reinforce_delta=edge_reinforce_delta, + ) + + # 2. Record attenuated outcome on mind block (signal scaled by 0.5) + mind_signal = 0.5 + (signal - 0.5) * 0.5 # 0.7 for hit, 0.3 for miss + mind_result = await record_outcome( + conn, + block_ids=[mind_block_id], + signal=mind_signal, + weight=0.5, # Lower weight — one prediction doesn't define the whole model + source=f"mind_calibration:{'hit' if hit else 'miss'}:{reason[:50]}", + current_active_hours=current_active_hours, + prior_strength=prior_strength, + reinforce_threshold=reinforce_threshold, + edge_reinforce_delta=edge_reinforce_delta, + ) + + # 3. Create validates edge: decision → mind + validates_result = await do_connect( + conn, + source=decision_block_id, + target=mind_block_id, + relation="validates", + weight=signal * 0.75, + note=f"{'Hit' if hit else 'Miss'}: {reason[:80]}", + if_exists="reinforce", + edge_degree_cap=edge_degree_cap, + edge_reinforce_delta=edge_reinforce_delta, + current_active_hours=current_active_hours, + ) + + return MindOutcomeResult( + mind_block_id=mind_block_id, + decision_block_id=decision_block_id, + hit=hit, + reason=reason, + mind_confidence_delta=mind_result.mean_confidence_delta, + decision_confidence_delta=decision_result.mean_confidence_delta, + validates_edge_action=validates_result.action, + ) diff --git a/src/elfmem/operations/recall.py b/src/elfmem/operations/recall.py index 9777970..fad1a7a 100644 --- a/src/elfmem/operations/recall.py +++ b/src/elfmem/operations/recall.py @@ -70,6 +70,7 @@ async def recall( top_k=top_k, tag_filter=tag_filter, search_window_hours=frame_def.filters.search_window_hours, + score_boosts=frame_def.score_boosts, ) # 5. Guarantee enforcement diff --git a/src/elfmem/scoring.py b/src/elfmem/scoring.py index ecf76f4..aac696d 100644 --- a/src/elfmem/scoring.py +++ b/src/elfmem/scoring.py @@ -91,6 +91,14 @@ def renormalized_without_similarity(self) -> ScoringWeights: reinforcement=0.20, ) +SIMULATE_WEIGHTS = ScoringWeights( + similarity=0.25, + confidence=0.25, + recency=0.15, + centrality=0.20, + reinforcement=0.15, +) + def compute_score( *, diff --git a/src/elfmem/types.py b/src/elfmem/types.py index 8a45c3f..06bc651 100644 --- a/src/elfmem/types.py +++ b/src/elfmem/types.py @@ -734,6 +734,168 @@ def to_dict(self) -> dict[str, Any]: } +@dataclass(frozen=True) +class MindSummary: + """Summary of a mind (ToM) block for listing.""" + + block_id: str + subject: str + confidence: float + prediction_count: int + hit_count: int + miss_count: int + + @property + def summary(self) -> str: + ratio = f"{self.hit_count}/{self.hit_count + self.miss_count}" if (self.hit_count + self.miss_count) > 0 else "0/0" + return ( + f"Mind: {self.subject} ({self.block_id[:8]}…) " + f"confidence={self.confidence:.2f} predictions={self.prediction_count} " + f"hit/total={ratio}" + ) + + def __str__(self) -> str: + return self.summary + + def to_dict(self) -> dict[str, Any]: + return { + "block_id": self.block_id, + "subject": self.subject, + "confidence": self.confidence, + "prediction_count": self.prediction_count, + "hit_count": self.hit_count, + "miss_count": self.miss_count, + } + + +@dataclass(frozen=True) +class MindPredictResult: + """Result of adding a prediction to a mind block.""" + + mind_block_id: str + decision_block_id: str + prediction: str + verify_at: str + edge_action: str # "created" | "reinforced" + + @property + def summary(self) -> str: + return ( + f"Prediction stored ({self.decision_block_id[:8]}…) " + f"linked to mind {self.mind_block_id[:8]}… " + f"verify_at={self.verify_at}. Edge: {self.edge_action}." + ) + + def __str__(self) -> str: + return self.summary + + def to_dict(self) -> dict[str, Any]: + return { + "mind_block_id": self.mind_block_id, + "decision_block_id": self.decision_block_id, + "prediction": self.prediction, + "verify_at": self.verify_at, + "edge_action": self.edge_action, + } + + +@dataclass(frozen=True) +class PredictionDetail: + """A single prediction linked to a mind block.""" + + block_id: str + content: str + confidence: float + verify_at: str | None + outcome: str | None # "hit" | "miss" | None + + def to_dict(self) -> dict[str, Any]: + return { + "block_id": self.block_id, + "content": self.content, + "confidence": self.confidence, + "verify_at": self.verify_at, + "outcome": self.outcome, + } + + +@dataclass(frozen=True) +class MindShowResult: + """Full view of a mind block with its linked predictions.""" + + block_id: str + subject: str + content: str + confidence: float + predictions: list[PredictionDetail] + + @property + def summary(self) -> str: + n = len(self.predictions) + noun = "prediction" if n == 1 else "predictions" + return ( + f"Mind: {self.subject} ({self.block_id[:8]}…) " + f"confidence={self.confidence:.2f}, {n} {noun}." + ) + + def __str__(self) -> str: + lines = [self.summary, "", self.content] + if self.predictions: + lines.append("\n## Linked Predictions") + for p in self.predictions: + status = f" [{p.outcome}]" if p.outcome else "" + verify = f" (verify: {p.verify_at})" if p.verify_at else "" + lines.append(f" - [{p.block_id[:8]}…] {p.content[:80]}{verify}{status}") + return "\n".join(lines) + + def to_dict(self) -> dict[str, Any]: + return { + "block_id": self.block_id, + "subject": self.subject, + "content": self.content, + "confidence": self.confidence, + "predictions": [p.to_dict() for p in self.predictions], + } + + +@dataclass(frozen=True) +class MindOutcomeResult: + """Result of closing a prediction against a mind block.""" + + mind_block_id: str + decision_block_id: str + hit: bool + reason: str + mind_confidence_delta: float + decision_confidence_delta: float + validates_edge_action: str # "created" | "reinforced" + + @property + def summary(self) -> str: + label = "HIT" if self.hit else "MISS" + return ( + f"Prediction {label}: mind {self.mind_block_id[:8]}… " + f"Δconf={self.mind_confidence_delta:+.3f}, " + f"decision {self.decision_block_id[:8]}… " + f"Δconf={self.decision_confidence_delta:+.3f}. " + f"Reason: {self.reason}" + ) + + def __str__(self) -> str: + return self.summary + + def to_dict(self) -> dict[str, Any]: + return { + "mind_block_id": self.mind_block_id, + "decision_block_id": self.decision_block_id, + "hit": self.hit, + "reason": self.reason, + "mind_confidence_delta": self.mind_confidence_delta, + "decision_confidence_delta": self.decision_confidence_delta, + "validates_edge_action": self.validates_edge_action, + } + + @dataclass class ContradictionRecord: block_a_id: str diff --git a/tests/test_mind.py b/tests/test_mind.py new file mode 100644 index 0000000..d15154f --- /dev/null +++ b/tests/test_mind.py @@ -0,0 +1,443 @@ +"""Tests for Theory of Mind (ToM) blocks — mind operations, score boosts, simulate frame.""" + +from __future__ import annotations + +import pytest + +from elfmem.api import MemorySystem +from elfmem.config import ElfmemConfig, MemoryConfig +from elfmem.context.frames import SIMULATE_FRAME, get_frame_definition +from elfmem.exceptions import BlockNotActiveError, ElfmemError +from elfmem.memory.blocks import determine_decay_tier +from elfmem.memory.retrieval import _compute_boost +from elfmem.operations.connect import _RELATION_DEFAULT_WEIGHTS +from elfmem.types import ( + DecayTier, + MindOutcomeResult, + MindPredictResult, + MindShowResult, + MindSummary, +) + + +# ── Fixtures ──────────────────────────────────────────────────────────────── + + +@pytest.fixture +async def system(test_engine, mock_llm, mock_embedding) -> MemorySystem: + """MemorySystem with low inbox threshold for fast cycles.""" + cfg = ElfmemConfig(memory=MemoryConfig(inbox_threshold=3)) + return MemorySystem( + engine=test_engine, + llm_service=mock_llm, + embedding_service=mock_embedding, + config=cfg, + ) + + +@pytest.fixture +async def system_with_mind(system: MemorySystem) -> tuple[MemorySystem, str]: + """System with an active mind block (consolidated).""" + async with system.session(): + result = await system.mind_create( + "test-customer", + goals=["Ship fast"], + beliefs=["Agents are the future"], + fears=["Complex setup"], + ) + mind_id = result.block_id + # Consolidate so the mind block is active + await system.consolidate() + return system, mind_id + + +# ── Decay tier ──────���─────────────────────────────────────────────────────── + + +class TestMindDecayTier: + def test_mind_category_uses_durable_decay(self): + assert determine_decay_tier([], "mind") == DecayTier.DURABLE + + def test_mind_with_self_tag_prefers_tag_priority(self): + """self/constitutional tag takes priority over category.""" + assert determine_decay_tier(["self/constitutional"], "mind") == DecayTier.PERMANENT + + def test_knowledge_still_standard(self): + assert determine_decay_tier([], "knowledge") == DecayTier.STANDARD + + +# ── Edge relation defaults ────────────────────────────────────────────────── + + +class TestEdgeRelationDefaults: + def test_predicts_default_weight(self): + assert _RELATION_DEFAULT_WEIGHTS["predicts"] == 0.70 + + def test_validates_default_weight(self): + assert _RELATION_DEFAULT_WEIGHTS["validates"] == 0.75 + + +# ── Score boosts ──────────��────────────────────────────��──────────────────── + + +class TestScoreBoosts: + def test_category_boost(self): + assert _compute_boost("mind", [], {"mind": 6.0}) == 6.0 + + def test_tag_prefix_boost(self): + assert _compute_boost("knowledge", ["self/constitutional"], {"tag:self/": 10.0}) == 10.0 + + def test_no_match_returns_one(self): + assert _compute_boost("knowledge", [], {"mind": 6.0}) == 1.0 + + def test_max_of_category_and_tag(self): + """When both category and tag match, take the higher boost.""" + boost = _compute_boost("mind", ["self/value"], {"mind": 6.0, "tag:self/": 10.0}) + assert boost == 10.0 + + def test_empty_boosts_returns_one(self): + assert _compute_boost("anything", ["tag/x"], {}) == 1.0 + + def test_multiple_tag_prefixes(self): + """Only one tag needs to match the prefix.""" + boost = _compute_boost("knowledge", ["mind/customer", "other"], {"tag:mind/": 6.0}) + assert boost == 6.0 + + +# ── Simulate frame definition ──���─────────────────────────────────────────── + + +class TestSimulateFrame: + def test_simulate_frame_registered(self): + frame = get_frame_definition("simulate") + assert frame.name == "simulate" + + def test_simulate_frame_has_boosts(self): + assert SIMULATE_FRAME.score_boosts is not None + assert SIMULATE_FRAME.score_boosts["mind"] == 6.0 + assert SIMULATE_FRAME.score_boosts["decision"] == 5.0 + assert SIMULATE_FRAME.score_boosts["tag:self/"] == 10.0 + + def test_simulate_frame_no_tag_filter(self): + """Simulate retrieves all blocks — boosts handle prioritisation.""" + assert SIMULATE_FRAME.filters.tag_patterns is None + assert SIMULATE_FRAME.filters.categories is None + + def test_simulate_frame_guarantees_constitutional(self): + assert "self/constitutional" in SIMULATE_FRAME.guarantees + + def test_simulate_frame_guarantees_mind_blocks(self): + assert "mind/%" in SIMULATE_FRAME.guarantees + + def test_simulate_frame_no_cache(self): + assert SIMULATE_FRAME.cache is None + + +# ── Mind create ─────���─────────────────────────────────────────────────────── + + +class TestMindCreate: + async def test_create_mind_block(self, system: MemorySystem): + result = await system.mind_create( + "customer", + goals=["Ship fast"], + beliefs=["Templates are commoditising"], + ) + assert result.status == "created" + assert result.block_id + + async def test_create_mind_content_structure(self, system: MemorySystem): + await system.mind_create( + "ben-emson", + goals=["Build compounding products"], + fears=["Building infrastructure forever"], + ) + # Consolidate to make searchable + await system.consolidate() + # Recall should find the mind block + blocks = await system.recall("ben-emson mind model") + assert len(blocks) > 0 + + async def test_create_mind_duplicate_rejected(self, system: MemorySystem): + await system.mind_create("customer", goals=["Ship fast"]) + result = await system.mind_create("customer", goals=["Ship fast"]) + assert result.status == "duplicate_rejected" + + async def test_create_mind_empty_subject_raises(self, system: MemorySystem): + with pytest.raises(ValueError, match="non-empty"): + await system.mind_create("", goals=["Something"]) + + async def test_create_mind_slug_in_tags(self, system: MemorySystem): + result = await system.mind_create("Ben Emson", goals=["Build"]) + # After consolidation, the mind tag should be present + await system.consolidate() + show = await system.mind_show(result.block_id) + assert show.subject == "ben-emson" + + +# ── Mind predict ──────��───────────────────────────────────────────────────── + + +class TestMindPredict: + async def test_predict_creates_decision_block( + self, system_with_mind: tuple[MemorySystem, str] + ): + system, mind_id = system_with_mind + result = await system.mind_predict( + mind_id, + "Will pay 49/mo for hosted version", + verify_at="2026-06-30", + reasoning="Prefers predictable cost", + ) + assert isinstance(result, MindPredictResult) + assert result.decision_block_id + assert result.edge_action == "created" + assert result.verify_at == "2026-06-30" + + async def test_predict_links_via_predicts_edge( + self, system_with_mind: tuple[MemorySystem, str] + ): + system, mind_id = system_with_mind + result = await system.mind_predict( + mind_id, "Will abandon if setup >30min", verify_at="2026-05-15", + ) + # Show should include the prediction + show = await system.mind_show(mind_id) + assert len(show.predictions) == 1 + assert show.predictions[0].block_id == result.decision_block_id + + async def test_predict_non_mind_block_raises(self, system: MemorySystem): + # Create a regular knowledge block + r = await system.learn("Just a fact") + await system.consolidate() + with pytest.raises(ElfmemError, match="not a mind block"): + await system.mind_predict(r.block_id, "prediction", verify_at="2026-01-01") + + async def test_predict_nonexistent_block_raises(self, system: MemorySystem): + with pytest.raises(BlockNotActiveError): + await system.mind_predict("nonexistent", "prediction", verify_at="2026-01-01") + + async def test_multiple_predictions( + self, system_with_mind: tuple[MemorySystem, str] + ): + system, mind_id = system_with_mind + await system.mind_predict(mind_id, "Prediction A", verify_at="2026-05-01") + await system.mind_predict(mind_id, "Prediction B", verify_at="2026-05-02") + show = await system.mind_show(mind_id) + assert len(show.predictions) == 2 + + +# ── Mind list ────��────────────────────────────────────────────────────────── + + +class TestMindList: + async def test_list_empty(self, system: MemorySystem): + result = await system.mind_list() + assert result == [] + + async def test_list_returns_summaries( + self, system_with_mind: tuple[MemorySystem, str] + ): + system, mind_id = system_with_mind + result = await system.mind_list() + assert len(result) == 1 + assert isinstance(result[0], MindSummary) + assert result[0].block_id == mind_id + assert result[0].subject == "test-customer" + + async def test_list_includes_prediction_count( + self, system_with_mind: tuple[MemorySystem, str] + ): + system, mind_id = system_with_mind + await system.mind_predict(mind_id, "P1", verify_at="2026-05-01") + await system.mind_predict(mind_id, "P2", verify_at="2026-05-02") + result = await system.mind_list() + assert result[0].prediction_count == 2 + + +# ── Mind show ─────��───────────────────────────────────────────────────────── + + +class TestMindShow: + async def test_show_returns_full_details( + self, system_with_mind: tuple[MemorySystem, str] + ): + system, mind_id = system_with_mind + result = await system.mind_show(mind_id) + assert isinstance(result, MindShowResult) + assert result.block_id == mind_id + assert "Ship fast" in result.content + assert result.subject == "test-customer" + + async def test_show_nonexistent_raises(self, system: MemorySystem): + with pytest.raises(ElfmemError, match="not found"): + await system.mind_show("nonexistent123456") + + async def test_show_includes_predictions_with_verify_at( + self, system_with_mind: tuple[MemorySystem, str] + ): + system, mind_id = system_with_mind + await system.mind_predict( + mind_id, "Will pay 49/mo", verify_at="2026-06-30", reasoning="Cost tolerance" + ) + result = await system.mind_show(mind_id) + assert len(result.predictions) == 1 + assert result.predictions[0].verify_at == "2026-06-30" + assert result.predictions[0].outcome is None # not yet resolved + + +# ── Mind outcome ──────────────────────────────────────────────────────────── + + +class TestMindOutcome: + async def test_outcome_hit_increases_confidence( + self, system_with_mind: tuple[MemorySystem, str] + ): + system, mind_id = system_with_mind + pred = await system.mind_predict( + mind_id, "Will pay 49/mo", verify_at="2026-06-30", + ) + # Consolidate so decision block is active + await system.consolidate() + + result = await system.mind_outcome( + pred.decision_block_id, hit=True, reason="Signed up week 1", + ) + assert isinstance(result, MindOutcomeResult) + assert result.hit is True + assert result.mind_confidence_delta > 0 + assert result.decision_confidence_delta > 0 + # validates reinforces the existing predicts edge (same block pair, undirected) + assert result.validates_edge_action in ("created", "reinforced") + + async def test_outcome_miss_decreases_confidence( + self, system_with_mind: tuple[MemorySystem, str] + ): + system, mind_id = system_with_mind + pred = await system.mind_predict( + mind_id, "Will abandon if >30min setup", verify_at="2026-05-15", + ) + await system.consolidate() + + result = await system.mind_outcome( + pred.decision_block_id, hit=False, reason="Completed setup in 45 min", + ) + assert result.hit is False + assert result.decision_confidence_delta < 0 + + async def test_outcome_updates_prediction_in_show( + self, system_with_mind: tuple[MemorySystem, str] + ): + system, mind_id = system_with_mind + pred = await system.mind_predict( + mind_id, "Will pay 49/mo", verify_at="2026-06-30", + ) + await system.consolidate() + + await system.mind_outcome( + pred.decision_block_id, hit=True, reason="Signed up", + ) + show = await system.mind_show(mind_id) + assert show.predictions[0].outcome == "hit" + + async def test_outcome_nonexistent_decision_raises(self, system: MemorySystem): + with pytest.raises(BlockNotActiveError): + await system.mind_outcome( + "nonexistent", hit=True, reason="test", + ) + + async def test_outcome_no_predicts_edge_raises(self, system: MemorySystem): + """A regular decision block without a predicts edge should fail.""" + r = await system.learn("A decision", category="decision") + await system.consolidate() + with pytest.raises(ElfmemError, match="No predicts edge"): + await system.mind_outcome(r.block_id, hit=True, reason="test") + + +# ── Result type surfaces ───────────────��──────────────────────────────────── + + +class TestResultSurfaces: + def test_mind_summary_str(self): + s = MindSummary( + block_id="abc12345def67890", + subject="customer", + confidence=0.52, + prediction_count=3, + hit_count=1, + miss_count=0, + ) + assert "customer" in str(s) + assert "abc12345" in str(s) + assert "0.52" in str(s) + + def test_mind_summary_to_dict(self): + s = MindSummary("id", "sub", 0.5, 1, 0, 0) + d = s.to_dict() + assert d["subject"] == "sub" + assert d["prediction_count"] == 1 + + def test_mind_predict_result_str(self): + r = MindPredictResult("mind_id", "dec_id", "Will pay", "2026-06-30", "created") + assert "dec_id" in str(r) + assert "2026-06-30" in str(r) + + def test_mind_outcome_result_str(self): + r = MindOutcomeResult("mind_id", "dec_id", True, "reason", 0.05, 0.1, "created") + assert "HIT" in str(r) + + def test_mind_outcome_miss_str(self): + r = MindOutcomeResult("mind_id", "dec_id", False, "reason", -0.03, -0.08, "created") + assert "MISS" in str(r) + + +# ── Integration: full cycle ───��───────────────────────────────────────────── + + +class TestFullCycle: + async def test_create_predict_consolidate_outcome_show( + self, system: MemorySystem, + ): + """Full ToM lifecycle: create → predict → consolidate → outcome → show.""" + async with system.session(): + # 1. Create mind block + mind = await system.mind_create( + "customer", + goals=["Ship fast", "Keep costs low"], + beliefs=["Agents are the future"], + fears=["Complex setup"], + ) + # 2. Consolidate + await system.consolidate() + + # 3. Add prediction + pred = await system.mind_predict( + mind.block_id, + "Will pay 49/mo for hosted version", + verify_at="2026-06-30", + reasoning="Prefers predictable cost", + ) + + # 4. Consolidate prediction + await system.consolidate() + + # 5. Record outcome + outcome = await system.mind_outcome( + pred.decision_block_id, + hit=True, + reason="Signed up at tier price week 1", + ) + assert outcome.hit is True + assert outcome.mind_confidence_delta > 0 + + # 6. Show updated mind + show = await system.mind_show(mind.block_id) + assert show.subject == "customer" + assert len(show.predictions) == 1 + assert show.predictions[0].outcome == "hit" + assert show.confidence > 0.5 # Should have increased from baseline + + # 7. List should show the hit + minds = await system.mind_list() + assert len(minds) == 1 + assert minds[0].hit_count == 1 From 5ee627239668f9b2e59015683654b77b93d4d936 Mon Sep 17 00:00:00 2001 From: Ben Emson Date: Tue, 28 Apr 2026 22:26:30 +0100 Subject: [PATCH 4/5] Fix ruff and mypy violations - Remove unnecessary f-string prefix in mind.py recovery message - Remove forward reference quotes from type annotations (PEP 563) - Break long lines to comply with 100-char limit - Fix import sorting in test_mind.py - Add type: ignore for rank_bm25 untyped library Co-Authored-By: Claude Opus 4.6 (1M context) --- src/elfmem/api.py | 8 ++++---- src/elfmem/cli.py | 12 +++++++++--- src/elfmem/memory/retrieval.py | 2 +- src/elfmem/operations/mind.py | 10 +++++----- src/elfmem/types.py | 7 ++++++- tests/test_mind.py | 1 - 6 files changed, 25 insertions(+), 15 deletions(-) diff --git a/src/elfmem/api.py b/src/elfmem/api.py index 8f2413d..f4d11d7 100644 --- a/src/elfmem/api.py +++ b/src/elfmem/api.py @@ -1639,7 +1639,7 @@ async def mind_predict( *, verify_at: str, reasoning: str | None = None, - ) -> "MindPredictResult": + ) -> MindPredictResult: """Add a falsifiable prediction linked to a mind block. USE WHEN: You have a specific, testable hypothesis about what a @@ -1673,7 +1673,7 @@ async def mind_predict( self._record_op("mind_predict", result.summary) return result - async def mind_list(self) -> list["MindSummary"]: + async def mind_list(self) -> list[MindSummary]: """List all active mind blocks with prediction statistics. USE WHEN: Discovering which minds are modelled and their calibration. @@ -1690,7 +1690,7 @@ async def mind_list(self) -> list["MindSummary"]: self._record_op("mind_list", f"{len(result)} mind(s) found.") return result - async def mind_show(self, mind_block_id: str) -> "MindShowResult": + async def mind_show(self, mind_block_id: str) -> MindShowResult: """Show a mind block with all linked predictions. USE WHEN: Inspecting a specific mind model before reasoning about @@ -1713,7 +1713,7 @@ async def mind_outcome( *, hit: bool, reason: str, - ) -> "MindOutcomeResult": + ) -> MindOutcomeResult: """Close a prediction: record hit/miss, calibrate the mind model. USE WHEN: A prediction has resolved — the verify_at date has diff --git a/src/elfmem/cli.py b/src/elfmem/cli.py index c588441..c92d3c5 100644 --- a/src/elfmem/cli.py +++ b/src/elfmem/cli.py @@ -756,9 +756,15 @@ def mind_create( @mind_app.command("predict") def mind_predict( mind_block_id: str, - prediction: Annotated[str, typer.Option("--prediction", help="Falsifiable prediction text")], - verify_at: Annotated[str, typer.Option("--verify-at", help="Verification date (YYYY-MM-DD)")], - reasoning: Annotated[str | None, typer.Option("--reasoning", help="Why this prediction")] = None, + prediction: Annotated[ + str, typer.Option("--prediction", help="Falsifiable prediction text") + ], + verify_at: Annotated[ + str, typer.Option("--verify-at", help="Verification date (YYYY-MM-DD)") + ], + reasoning: Annotated[ + str | None, typer.Option("--reasoning", help="Why this prediction") + ] = None, db: Annotated[str | None, typer.Option("--db", envvar="ELFMEM_DB")] = None, config: Annotated[str | None, typer.Option("--config", envvar="ELFMEM_CONFIG")] = None, json_output: Annotated[bool, typer.Option("--json")] = False, diff --git a/src/elfmem/memory/retrieval.py b/src/elfmem/memory/retrieval.py index 189fa0f..e5062b8 100644 --- a/src/elfmem/memory/retrieval.py +++ b/src/elfmem/memory/retrieval.py @@ -32,7 +32,7 @@ # Soft dependency — retrieval works without it. try: - from rank_bm25 import BM25Okapi + from rank_bm25 import BM25Okapi # type: ignore[import-untyped] _HAS_BM25 = True except ImportError: # pragma: no cover diff --git a/src/elfmem/operations/mind.py b/src/elfmem/operations/mind.py index 2b10e0a..5d74348 100644 --- a/src/elfmem/operations/mind.py +++ b/src/elfmem/operations/mind.py @@ -12,16 +12,15 @@ from __future__ import annotations import re -from typing import Any from sqlalchemy.ext.asyncio import AsyncConnection from elfmem.db import queries -from elfmem.exceptions import BlockNotActiveError, ElfmemError from elfmem.db.queries import insert_agent_edge +from elfmem.exceptions import BlockNotActiveError, ElfmemError from elfmem.operations.connect import do_connect from elfmem.operations.learn import learn as _learn -from elfmem.operations.outcome import compute_bayesian_update, record_outcome +from elfmem.operations.outcome import record_outcome from elfmem.types import ( Edge, LearnResult, @@ -154,9 +153,10 @@ async def predict( if mind_block is None or mind_block.get("status") != "active": raise BlockNotActiveError(mind_block_id) if mind_block.get("category") != "mind": + category = mind_block.get("category") raise ElfmemError( - f"Block {mind_block_id[:8]}… is not a mind block (category={mind_block.get('category')!r}).", - recovery=f"Use a block with category='mind'. List minds with mind_list().", + f"Block {mind_block_id[:8]}… is not a mind block (category={category!r}).", + recovery="Use a block with category='mind'. List minds with mind_list().", ) # Extract subject from mind block tags diff --git a/src/elfmem/types.py b/src/elfmem/types.py index 06bc651..8f45052 100644 --- a/src/elfmem/types.py +++ b/src/elfmem/types.py @@ -747,7 +747,12 @@ class MindSummary: @property def summary(self) -> str: - ratio = f"{self.hit_count}/{self.hit_count + self.miss_count}" if (self.hit_count + self.miss_count) > 0 else "0/0" + total = self.hit_count + self.miss_count + ratio = ( + f"{self.hit_count}/{total}" + if total > 0 + else "0/0" + ) return ( f"Mind: {self.subject} ({self.block_id[:8]}…) " f"confidence={self.confidence:.2f} predictions={self.prediction_count} " diff --git a/tests/test_mind.py b/tests/test_mind.py index d15154f..046b610 100644 --- a/tests/test_mind.py +++ b/tests/test_mind.py @@ -19,7 +19,6 @@ MindSummary, ) - # ── Fixtures ──────────────────────────────────────────────────────────────── From 221759fda69edd88b6edcf17e83e95672250e4d6 Mon Sep 17 00:00:00 2001 From: Ben Emson Date: Tue, 28 Apr 2026 22:33:29 +0100 Subject: [PATCH 5/5] Fix mypy unused-ignore by moving rank_bm25 override to config MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The CI ran mypy with --ignore-missing-imports which suppressed the rank_bm25 error before the inline type: ignore could fire. With strict = true (warn_unused_ignores), the now-unused comment became an error itself. - Add [[tool.mypy.overrides]] for rank_bm25 in pyproject.toml — the correct place to declare a third-party stub exception - Remove the inline # type: ignore comment from retrieval.py - Drop --ignore-missing-imports from CI so both environments use pyproject.toml as the single source of truth Co-Authored-By: Claude Opus 4.6 (1M context) --- .github/workflows/ci.yml | 2 +- pyproject.toml | 4 ++++ src/elfmem/memory/retrieval.py | 2 +- 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 99641bf..070f8cc 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -45,7 +45,7 @@ jobs: - name: Install dependencies run: uv sync --extra dev - - run: uv run mypy --ignore-missing-imports src/elfmem/ + - run: uv run mypy src/elfmem/ pytest: name: pytest diff --git a/pyproject.toml b/pyproject.toml index 75f1065..aee0f1e 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -94,6 +94,10 @@ python_version = "3.11" mypy_path = "src" packages = ["elfmem"] +[[tool.mypy.overrides]] +module = "rank_bm25" +ignore_missing_imports = true + [tool.ruff] src = ["src"] line-length = 100 diff --git a/src/elfmem/memory/retrieval.py b/src/elfmem/memory/retrieval.py index e5062b8..189fa0f 100644 --- a/src/elfmem/memory/retrieval.py +++ b/src/elfmem/memory/retrieval.py @@ -32,7 +32,7 @@ # Soft dependency — retrieval works without it. try: - from rank_bm25 import BM25Okapi # type: ignore[import-untyped] + from rank_bm25 import BM25Okapi _HAS_BM25 = True except ImportError: # pragma: no cover