Mastra memory provider for Hermes Agent.
The full power of @mastra/memory plugged into Hermes through one shared local server. Each Hermes profile becomes a Mastra resourceId; each session becomes a thread; every Hermes hook routes to the Mastra primitive that naturally fits — and nothing on Hermes' main thread ever blocks on Mastra's network.
┌────────────────────────────────────────────────────────┐
Hermes default ────┤ │
│ Bun + Hono server ───▶ libSQL DB │
Hermes profile a ───┤ localhost:4191 ~/.hermes/mastra.db │
│ │
Hermes profile b ───┤ @mastra/memory (Memory + Observer + Reflector) │
│ @mastra/libsql (Storage + PromptBlocks domain) │
└────────────────────────────────────────────────────────┘
Eight model-driven memory tools alongside Hermes' built-in memory and session_search:
| Tool | What it does | Cost |
|---|---|---|
mastra_recall |
Latest observation log for the current session | ⚡ cheap (cache-served) |
mastra_search |
Keyword search across observations in this profile | ⚡ cheap |
mastra_semantic_search |
Vector / semantic search across observations | 💸 pays for vector query |
mastra_observe |
Persist a manual observation | ⚡ cheap |
mastra_working_memory |
Read the resource-scoped working-memory mirror | ⚡ cheap |
mastra_artifact_get |
Read the canonical version of SOUL.md / MEMORY.md / USER.md / AGENTS.md |
⚡ cheap |
mastra_artifact_history |
Show edit history (versioned via Mastra prompt-blocks) | ⚡ cheap |
mastra_artifact_revert |
Append a new version with old content (history preserved, no rewinds) | ⚡ cheap |
One opt-in ContextEngine wrapper that injects cached observations as a protected system message right before compression and bumps recall_top_k when prompt tokens cross 50% of the compressor's threshold. Enable per profile with hermes config set context.engine mastra.
Per-profile isolation, one shared server. A single Bun process at localhost:4191 serves every Hermes profile. Each profile = one Mastra resourceId = hermes:<profile>. Zero cross-profile leakage, by design.
Background Observer + Reflector (per Mastra's reference docs). The Observer watches conversations and creates new observations from raw turns and tool results; the Reflector restructures the log by combining related items, surfacing overarching patterns, and condensing where possible. Together they replace raw message history with a dense observation log.
Versioned identity files (NEW). SOUL.md, MEMORY.md, USER.md, and per-project AGENTS.md snapshots are stored as Mastra prompt-blocks — sanctioned, schema-validated, versioned storage from @mastra/core. The on-disk file becomes an atomic cache of the active version; Mastra is the source of truth. Every edit creates a new version with a change-message and timestamp. The system prompt keeps working when the server is unreachable because the file always exists.
Non-blocking by contract. Every public hook returns within a documented latency budget (typically 2–50 ms) even when the underlying Mastra HTTP call hangs for 5 full seconds. Verified by tests/test_non_blocking_hooks.py — the test fixture forces a 5-second hang on every HTTP call and asserts each hook still returns under budget.
Hermes-led design. Every Mastra primitive is mapped to the Hermes hook that naturally triggers it. Read docs/HERMES_INTEGRATION_MAP.md for the per-hook contract — what fires when, what's deliberately not integrated, and why.
# 1. Install the plugin into Hermes
hermes plugins install srinitude/hermes-mastra
hermes config set memory.provider mastra
# 2. Optional: also wire the context-engine wrapper (Mastra-aware compression)
hermes config set context.engine mastra
# 3. One-time setup: install Bun deps + start the local server
hermes mastra setup
# 4. Verify
hermes mastra status
hermes memory status # should show: "Provider: mastra ← active"
# 5. Use Hermes normally — observations start flowing after a few turns
hermesRequires Python ≥ 3.10 and Bun (mise use -g bun@latest or curl -fsSL https://bun.sh/install | bash).
Full table in docs/HERMES_INTEGRATION_MAP.md. Highlights:
| Hermes hook | Budget | Mastra primitive | What we do |
|---|---|---|---|
system_prompt_block() |
0 ms | none | Stable text only — keeps Anthropic prompt-cache intact |
prefetch(query) |
≤ 5 ms | observation cache | Read last cached snapshot; refresh in background |
sync_turn(user, asst) |
≤ 5 ms | POST /messages |
Fire-and-forget enqueue |
on_pre_compress(messages) |
≤ 50 ms | observation injection (via engine wrapper) | Cached observations land inside the compressor's protected zone |
on_memory_write(action, target, content) |
≤ 5 ms | working memory + artifact mirror | Both targets enqueued; file cache updated atomically |
on_session_end(messages) |
≤ 50 ms | POST /flush |
Best-effort drain |
handle_tool_call(name, args) |
per-tool | direct HTTP | Only deliberately-blocking surface — model asked for it |
The plugin exposes 8 tools to the model, all prefixed mastra_. Each tool has a single, well-defined responsibility — there's no overlap, and the schema descriptions cross-reference siblings so the model picks the right one. Tool names, parameters, and limits below are the authoritative spec, taken directly from tool_schemas.py.
All tools are model-driven. None of these run automatically on Hermes' hot path — they fire only when the agent decides to call them. The hot-path side (the cached observation block injected into every system prompt) is separate; see
docs/HERMES_INTEGRATION_MAP.md§2 for the per-hook contract.
| Tool | Purpose | Required params | Optional params | Cost |
|---|---|---|---|---|
mastra_recall |
Read THIS thread's distilled observation log | — | limit (1–32, default 8) |
⚡ cache-served |
mastra_search |
Keyword search across all observations in this profile | query |
limit (1–20, default 8) |
⚡ cheap (substring match) |
mastra_semantic_search |
Vector / meaning-based search across observations | query |
limit (1–20, default 8) |
💸 vector query + re-rank |
mastra_observe |
Persist a manual observation (correction, decision, preference) | text |
kind (free-form tag) |
⚡ cheap |
mastra_working_memory |
Read the resource-scoped working-memory mirror | — | — | ⚡ cheap |
mastra_artifact_get |
Read the canonical version of an identity file | kind |
— | ⚡ cheap |
mastra_artifact_history |
List version history of an identity file | kind |
limit (1–50, default 20) |
⚡ cheap |
mastra_artifact_revert |
Restore an old version (history preserved, new version appended) | kind, version |
— | ⚡ cheap (one upsert) |
kind ∈ {soul, memory, user, agents} for artifact tools. See Versioned artifacts below for what each kind maps to.
mastra_recall(limit?: 1..32 = 8) → { profile, thread, observations: string }
What it does. Returns the dense observation log the Observer + Reflector have produced for the current thread, formatted as numbered bullet points. The observation log is what gets injected into every Hermes system prompt's recall block — this tool just exposes it on demand at higher fidelity.
How it works. Hits GET /api/memory/recall?thread=<sid>&profile=<p>&limit=<n>. The Bun server reads the latest cached observation set for the active session from libSQL; no LLM call is made.
When to use. Every turn where you want fresher detail than the (cache-served) system-prompt block already shows. The system prompt's recall is intentionally tiny (default 4 observations); this tool can pull up to 32.
Pitfalls.
- The Observer fires on token thresholds, not per-message. A handful of test pings won't produce observations — wait for real conversation volume.
- This is the current thread only. For cross-thread search, use
mastra_searchormastra_semantic_search.
mastra_search(query: string, limit?: 1..20 = 8) → { count, observations: [{thread, text, kind}] }
What it does. Substring-match search across all observations in every thread under the active profile. Each result includes the originating thread ID so you can correlate hits with session_search matches.
How it works. Hits GET /api/memory/search?query=<q>&profile=<p>&limit=<n>. The server iterates threads, scans each thread's observations, returns matches in document order. No vector search; pure case-insensitive substring.
When to use.
- "What was decided about X?" — keyword finds it cheaply
- "Have I ever discussed Y?" — quick existence check
- Prefer this over
mastra_semantic_searchwhenever exact keywords would work — it's 10× cheaper.
Pitfalls.
- Substring match. "deadlines" won't match "deadline" — sign-off on the keyword you choose.
- For synonym / paraphrase / conceptual matches, escalate to
mastra_semantic_search.
mastra_semantic_search(query: string, limit?: 1..20 = 8)
→ { count, observations: [{thread, text, score}] }
What it does. Vector / semantic search across observations using Mastra's vectorSearchString recall path. Returns the same shape as mastra_search plus a relevance score per hit.
How it works. Hits GET /api/memory/semantic_search?query=<q>&profile=<p>&limit=<n>. The server picks an anchor thread under the active resource, calls memory.recall({ vectorSearchString, threadConfig: { semanticRecall: { topK } } }) from @mastra/memory. The vector store + embedder need to be configured (default off — falls back to empty results otherwise).
When to use.
- Keyword search returned nothing useful but you suspect the concept exists under different wording.
- "Anything related to deadline pressure?" (would match "tight schedule" / "Q3 crunch" / "shipping anxiety")
Pitfalls.
- Costs more than
mastra_search(vector query + re-rank, ~50 ms vs <5 ms). - Falls back to empty results when the vector store is unconfigured — the schema description tells the model so it knows to retry with
mastra_search. - Disabled by default in
Memoryconfig to avoid auto-firing on every save — only fires when this tool is called.
mastra_observe(text: string, kind?: string) → { ok, profile, thread }
What it does. Writes a synthetic system-message tagged [OBSERVATION:<kind>] into the active thread's history. The Observer ingests it on its next pass and folds it into the dense observation log.
How it works. Hits POST /api/memory/observation with the text + optional kind tag. The server saves it as a system role message under the thread; the Observer later picks it up.
When to use.
- The user corrects you ("no, we use Postgres, not MySQL") —
kind="correction" - The user makes a decision ("we're going with TanStack Query") —
kind="decision" - The user states a durable preference ("always commit, never push") —
kind="preference" - A delegation result worth remembering —
kind="delegation"
Pitfalls.
- Don't observe routine commentary. The Observer auto-distills raw turns. This tool is for facts you want guaranteed to land — not for narrating what just happened.
- Larger / session-specific facts go here. Smaller / always-relevant facts (≤2200 chars total) belong in Hermes' built-in
memorytool, which is frozen into the system prompt at session start.
mastra_working_memory() → { profile, working_memory: string }
What it does. Reads the resource-scoped working-memory document Mastra keeps in sync with the agent's MEMORY.md / USER.md edits.
How it works. Hits GET /api/memory/working_memory?profile=<p>. The server calls memory.getWorkingMemory({ resourceId }) and returns the markdown text.
When to use.
- The agent suspects the on-disk MEMORY.md / USER.md and the Mastra mirror diverged.
- The agent wants to inspect what was mirrored before deciding whether to write more.
- Audit / debugging.
Pitfalls.
- Built-in
MEMORY.md/USER.mdis canonical — already injected into the system prompt at layer 5. Don't loop the agent through this tool for routine recall; the system prompt already has the relevant content. - Working memory is resource-scoped (per profile, across threads), unlike observations which are thread-scoped.
mastra_artifact_get(kind: 'soul' | 'memory' | 'user' | 'agents')
→ { profile, kind, exists, version, content }
What it does. Returns the canonical current version of a Hermes identity file, sourced from Mastra's prompt-blocks storage domain. The on-disk file (~/.hermes/SOUL.md, ~/.hermes/memories/MEMORY.md, etc.) is just a cache of this version, kept fresh atomically.
How it works. Hits GET /api/memory/artifact?kind=<k>&profile=<p>. The server calls promptBlocks.getById('hermes:<kind>:<profile>') from @mastra/libsql. Returns the resolved content of the active version.
When to use.
- Verify the on-disk file matches the database (especially after another process edited the mirror).
- Read the canonical text when the file may be stale.
- Diff against a previous version (chain with
mastra_artifact_history).
Pitfalls.
- The on-disk file is always the source the system prompt reads — this tool just exposes the canonical version. They should normally agree.
- For per-project AGENTS.md, the artifact is keyed by
sha256(absolute_cwd_path), not profile name. The plugin captures snapshots automatically when Hermes loads anAGENTS.md; you don't address them bykind=agentsfrom a different working directory.
mastra_artifact_history(kind: 'soul' | 'memory' | 'user' | 'agents', limit?: 1..50 = 20)
→ { profile, kind, count, versions: [{version, created_at, change_message, content}] }
What it does. Returns the version history (newest first) for a Hermes identity file. Each version row includes its content, change message, and timestamp.
How it works. Hits GET /api/memory/artifact/history?kind=<k>&profile=<p>&per_page=<n>. The server calls promptBlocks.listVersions({ blockId: 'hermes:<kind>:<profile>', orderBy: 'versionNumber', direction: 'DESC' }).
When to use.
- Audit changes ("what did MEMORY.md look like yesterday?").
- Diff revisions before a revert.
- Debug "who/what overwrote this?" — change messages tell you.
Pitfalls.
- Versions are append-only. Even reverts create a new version (never rewinds) — see
mastra_artifact_revert. limitcaps at 50 per call. For older versions, paginate by adjustinglimitand re-querying (pagination params not yet exposed; planned).
mastra_artifact_revert(kind: 'soul' | 'memory' | 'user' | 'agents', version: integer ≥ 1)
→ { ok, profile, kind, reverted_to }
What it does. Appends a new version of the artifact whose content matches version N (the version param). History is preserved — the older version stays in place, and a fresh version-bump records the revert with change_message: "Reverted to vN". The on-disk file cache is refreshed atomically right after.
How it works. Hits POST /api/memory/artifact/revert with { kind, profile, version }. The server lists versions, finds the target by version number, then calls promptBlocks.update({ id, content: target.content, changeMessage }) — which Mastra's prompt-blocks domain implements as a version bump, not a destructive overwrite.
When to use.
- The agent or user just made a bad edit to MEMORY.md / USER.md / SOUL.md and wants the previous text back.
- A/B-comparing identity tweaks.
- Recovering from a corruption cascade.
Pitfalls.
- Returns 404 if
versiondoesn't exist in history. Always pair withmastra_artifact_historyfirst. - The newly-appended version becomes
version + 1of the latest, not the oldversionitself. The history-with-content trail keeps both.
| Question | Best tool | Why |
|---|---|---|
| "What's already in this thread's summary?" | mastra_recall |
Cheapest, current-thread only, no query |
| "What was decided about X (exact word)?" | mastra_search |
Substring match, fastest cross-thread search |
| "Anything related to , even paraphrased?" | mastra_semantic_search |
Vector match, catches synonyms keyword would miss |
| "What were the exact spoken words?" | session_search (Hermes built-in) |
Raw transcript FTS5, complementary to observations |
| "Persist this fact for future sessions" | mastra_observe |
If big or session-specific |
| "Persist this fact for every session" | memory (Hermes built-in) |
If small + always relevant; goes into MEMORY.md / USER.md and frozen into system prompt |
| "Is the MEMORY.md mirror up to date?" | mastra_working_memory |
Reads the resource-scoped mirror |
| "What did SOUL.md look like 3 versions ago?" | mastra_artifact_history kind=soul |
Lists every version with content |
| "Restore that old MEMORY.md" | mastra_artifact_revert kind=memory version=N |
Appends a new version with the old content |
| "Is the on-disk file canonical right now?" | mastra_artifact_get kind=<k> |
Returns the active version from the database |
- Recall hierarchy. Observations (Mastra, distilled) live above transcripts (
session_search, raw). Reach formastra_*first when you want what was decided; reach forsession_searchwhen you want what was said. - Persistence hierarchy. Built-in
memory(small, prompt-frozen) →mastra_observe(large, surface via recall/search) →mastra_artifact_*(versioned, structured identity files). - Cost hierarchy. All ⚡ tools are <5 ms in normal operation.
mastra_semantic_searchis the only 💸 tool — use it when keyword fails.
The plugin doesn't replace Hermes' built-in memory — it complements it.
| Surface | Authority | Size | Persistence |
|---|---|---|---|
Built-in memory tool → MEMORY.md / USER.md |
canonical for tiny core facts | 2,200 / 1,375 chars | Frozen into system prompt at session start |
mastra_observe → observation log |
overflow + session-specific | unlimited | Surfaces via recall/search next turn |
| Mastra prompt-blocks → SOUL/MEMORY/USER/AGENTS | canonical with full version history | unlimited | On-disk file is the cache; full edit history queryable |
Built-in memory file format. Entries in MEMORY.md and USER.md are delimited by a single line containing only the section sign § (U+00A7). This is what makes memory(action="remove", old_text="...") work — each §-delimited block is one independently-addressable entry. Don't substitute --- / === / blank lines — the parser only recognises §.
Capacity-aware hint. When MEMORY.md or USER.md crosses 50% of its cap, the system-prompt block adds a hint pointing the agent at mastra_observe for overflow. Half-full is the early-warning point — waiting until 80% leaves only a few turns of headroom.
Off-load anchor. On first activation, the plugin appends a single [mastra-rule] entry to both files telling the agent which surface to use for what. The rule is anchor-detected so re-installs are idempotent and uninstalls are clean.
SOUL.md, MEMORY.md, USER.md, and per-project AGENTS.md snapshots are stored as versioned Mastra prompt-blocks — the PromptBlocksStorage domain already exposed by LibSQLStore. We don't invent a custom table.
Source-of-truth direction. Mastra is canonical; the on-disk file is a cache.
- Read path: Hermes' system prompt assembly reads the file from disk as it always has. The file always exists with the latest content because the plugin keeps it fresh atomically (temp +
os.replace). The system prompt keeps working even when the Bun server is unreachable — the non-blocking contract holds. - Write path:
on_memory_write(andmastra_artifact_revert) enqueue anupsert_artifactto the prompt-blocks store via the bounded background queue, then refresh the file cache atomically. - Seed path: On first activation per profile, existing on-disk content is uploaded as version 1 of each block. Idempotent — content equality is detected server-side.
Per-block identity:
hermes:soul:<profile> ← was SOUL.md
hermes:memory:<profile> ← was MEMORY.md
hermes:user:<profile> ← was USER.md
hermes:agents:<sha256(cwd)> ← per-project AGENTS.md snapshots
AGENTS.md is project-scoped (one per repo), so snapshots are keyed on sha256(absolute_path). The plugin observes them via do_context_files_loaded — when Hermes loads an AGENTS.md, we upsert a snapshot. We don't write back to the project file (it's the user's, not ours).
All config lives in ~/.hermes/mastra.json. Edit by re-running hermes mastra setup or hand-editing the JSON.
| Key | Default | Purpose |
|---|---|---|
server_url |
http://127.0.0.1:4191 |
Where the Mastra server listens |
server_port |
4191 |
Port for the Bun server |
auto_start |
true |
Start Bun on first use if not running |
observer_url / observer_name / observer_api_key_env |
Venice / Gemini Flash | Observer model |
reflector_url / reflector_name / reflector_api_key_env |
Venice / Gemini Pro | Reflector model |
recall_top_k |
4 |
Observations injected per turn |
temporal_markers |
true |
Insert temporal-gap markers (10+ min idle) |
auth_token |
(none) | Optional bearer token via MASTRA_API_KEY |
context_engine_wrapper |
true |
Install the ContextEngine wrapper. Active only when context.engine: mastra. |
context_engine_pressure_fraction |
0.50 |
Boost recall_top_k once prompt tokens cross this fraction of the compressor's threshold |
context_engine_boosted_top_k |
8 |
Value to boost recall_top_k to under pressure |
Two roles, configured independently. The Observer runs frequently and should be cheap; the Reflector runs less often and should be stronger.
# Apply a known-good preset
hermes mastra models preset venice # default: gemini-3-flash-preview → gemini-3-1-pro-preview
hermes mastra models preset openai # gpt-4o-mini → gpt-4o
hermes mastra models preset openrouter # llama-3.1-8b → claude-3.5-sonnet
hermes mastra models preset anthropic-or # haiku → sonnet (via OpenRouter)
hermes mastra models preset hermes-local # reuse Hermes's own logged-in provider
# Or set each role manually
hermes mastra models set observer \
--name gpt-4o-mini \
--base-url https://api.openai.com/v1 \
--api-key-env OPENAI_API_KEY
hermes mastra models set reflector \
--name claude-3-5-sonnet-latest \
--base-url https://api.anthropic.com/v1 \
--api-key-env ANTHROPIC_API_KEY
hermes mastra models # show current
hermes mastra models presets # list all built-in presetsAnything documented at mastra.ai/reference/memory/Memory can be set via dotted-key config — the plugin is a JSON courier. The TS server deep-merges your overrides over its built-in defaults before constructing new Memory({ options }).
import mastra_options as mo
mo.set_option("lastMessages", 50)
mo.set_option("workingMemory.scope", "thread")
mo.set_option("observationalMemory.observation.messageTokens", 4000)
mo.set_option("observationalMemory.reflection.observationTokens", 12000)
mo.resolve_options() # see resolved options shipped to the Bun server
mo.reset_options() # wipe overrides (defaults remain)# Server lifecycle
hermes mastra setup # interactive: write config, install bun deps, start server
hermes mastra status # health probe + pid + recent log lines
hermes mastra server start # / stop / restart / install
hermes mastra logs [-n 80] # tail $HERMES_HOME/logs/mastra.log
# Inspect
hermes mastra resources # list resourceIds (one per profile)
hermes mastra threads --profile <name> # threads for a profile
hermes mastra observations <thread_id> # dump observations for a thread
# Destructive (with confirmation)
hermes mastra reset --profile <name> # wipe all threads/observations for a profile
hermes mastra reset --profile <name> --yes # skip confirmation┌──────────────────────────────────────────────────────────────────────────┐
│ Hermes Agent (Python) │
│ │
│ MemoryManager ──→ MastraMemoryProvider │
│ ├── tool_schemas.py (8 tool JSON schemas) │
│ ├── provider_lifecycle.py (every hook as a fn) │
│ ├── tool_observers.py (todo / skill / snapshot)│
│ ├── lifecycle_helpers.py (alive / profile / safe) │
│ ├── lifecycle_observer_reexports.py (compat shim) │
│ ├── recall_cache.py (cached snapshot) │
│ ├── async_runner.py (bounded work queue) │
│ ├── client.py (httpx wrapper) │
│ ├── server_config.py (paths + config) │
│ ├── server_env.py (env builder) │
│ ├── server_process.py (Bun spawn / stop) │
│ ├── model_config.py (Observer / Reflector) │
│ ├── model_presets.py (Venice / OpenAI / etc) │
│ ├── mastra_options.py (flexible passthrough) │
│ ├── memory_rules.py ([mastra-rule] anchor) │
│ ├── artifacts.py (seed + file-cache write)│
│ ├── artifact_tools.py (artifact tool dispatch) │
│ ├── cli.py + cli_commands.py (hermes mastra) │
│ └── config_schema.py (memory setup wizard) │
│ │
│ ContextEngineManager ──→ MastraContextEngine *(opt-in wrapper)* │
│ ├── agent_context_engine.py (engine wrapper class) │
│ └── engine_install.py (register-time wiring) │
└──────────────────────────────────┬───────────────────────────────────────┘
│ HTTP (httpx, bounded async work queue)
▼
┌──────────────────────────────────────────────────────────────────────────┐
│ Bun + Hono server (server/src/) │
│ │
│ Routes: │
│ GET /health │
│ POST /api/memory/messages (turn ingestion) │
│ GET /api/memory/recall (current thread's observations) │
│ GET /api/memory/search (keyword search) │
│ GET /api/memory/semantic_search (vector / semantic search) │
│ POST /api/memory/working_memory (mirror MEMORY.md / USER.md write) │
│ GET /api/memory/working_memory (read working-memory mirror) │
│ POST /api/memory/observation (manual observe + lineage + snaps) │
│ POST /api/memory/flush (force Observer drain) │
│ GET /api/memory/resources (list profiles seen) │
│ GET /api/memory/threads (list threads for a profile) │
│ GET /api/memory/observations (raw observation log dump) │
│ POST /api/memory/reset (wipe a profile's memory) │
│ GET /api/memory/artifact (read canonical artifact) │
│ POST /api/memory/artifact (upsert artifact, version-bumps) │
│ GET /api/memory/artifact/history (list versions) │
│ POST /api/memory/artifact/revert (append new version with old text) │
│ │
│ @mastra/memory → Memory{ workingMemory, observationalMemory, ... } │
│ @mastra/memory → Observer + Reflector agents │
│ @mastra/libsql → LibSQLStore (memory + prompt-blocks domains) │
│ Storage: → file:~/.hermes/mastra.db │
└──────────────────────────────────────────────────────────────────────────┘
The full Hermes ↔ Mastra mapping — every hook, every primitive, every latency budget — lives in docs/HERMES_INTEGRATION_MAP.md.
Project conventions keep every Python file under 200 LOC, every function/class under 30 LOC, and max nesting depth ≤ 3. Enforced by tests/test_code_size_policy.py.
mise is the only user-facing CLI — every workflow goes through mise run <task>. Tasks are grouped here by what role they play in the dev loop. The full task table at the end shows dependencies + side effects so you can pick the right entrypoint.
First time? Run
mise run installand you're done. It chainssetup→setup:opensrc→ installs the Python venv + Bun deps + opensrc CLI in one shot. Every command below assumes you've done that once.
| Task | Role | Side effects |
|---|---|---|
mise run setup |
Verifies bun and python are registered globally with mise. Exits 3 with a remediation message if not. |
None — read-only check. |
mise run setup:opensrc |
Installs the opensrc CLI used by compat:* to clone upstream source for static API checks. Idempotent — skips if already present, exits 0 even on offline failure (compat tests skip). |
May download opensrc to a directory on your PATH (e.g. /usr/local/bin or /opt/homebrew/bin). |
mise run install |
The one-shot bootstrap. Depends on setup + setup:opensrc. Creates .venv/, installs .[dev], drops server/bun.lock, and runs bun install so latest actually walks forward against upstream Mastra/Hermes. |
Creates .venv/, refreshes server/node_modules/ and server/bun.lock. |
| Task | Role | Side effects |
|---|---|---|
mise run dev |
Foreground Bun server with --watch hot reload. Use this while editing server/src/. |
Listens on port 4191 (or whatever MASTRA_PORT is set to). |
mise run server |
Foreground Bun server without hot reload (closer to production). | Same port. |
mise run compile |
TS compile to server/dist/ — sanity build, used by the prod hermes mastra server start path. |
Writes server/dist/index.js. |
mise run build |
No-op for the Python plugin; chains typecheck for the TS server so a single command verifies "this builds". |
Same as typecheck. |
| Task | Role | Side effects |
|---|---|---|
mise run test:py |
The fastest gate — only the Python pytest suite. Most edits should run this. ~3 sec. | None. |
mise run test:ts |
Bun TS tests (server-side, when present). | None. |
mise run test |
Full Python suite (425+ tests). Includes test_code_size_policy.py (LOC + nesting limits) and test_non_blocking_hooks.py (5-second-stuck-client deadline tests). ~2-3 min. |
None. |
format, lint, typecheck, test, security:audit, validate each enforce one quality dimension. The quality task runs all of them in dependency order:
| Task | Role | Side effects |
|---|---|---|
mise run format |
ruff format (Python) + biome write (TS) + ruff check --fix for safe auto-fixes. |
Edits files in place — run before committing, not on a dirty WIP. |
mise run lint |
ruff check + biome lint. Read-only. |
None. |
mise run typecheck |
TS typecheck on the Bun server. Advisory — Bun runs TS permissively at runtime, so this is a CI guard, not a runtime gate. | None. |
mise run security:audit |
pip-audit + bun pm audit for known CVEs. |
None. |
mise run validate |
Confirms plugin.yaml + pyproject.toml parse cleanly. |
None. |
mise run quality |
The canonical CI gate. Depends on all five above. Run this before opening a PR. | Files may be edited by format. |
| Task | Role | Side effects |
|---|---|---|
mise run sync |
rsync runtime files into ~/.hermes/hermes-agent/plugins/memory/mastra/, then verifies the plugin still loads under Hermes' venv. Skips dev-only files (tests/, scripts/, _*.py, mise.toml, etc.). |
Writes to ~/.hermes/. Does NOT bounce the running Bun server — that's a separate decision. |
mise run quality:full |
The local dev sweet spot. quality + sync. Use this as your default loop while iterating. |
Same as quality + sync. |
These tasks use opensrc to fetch upstream source code and statically verify the APIs the plugin depends on still exist. They run nightly in CI.
| Task | Role | Side effects |
|---|---|---|
mise run compat:hermes |
Verifies every MemoryProvider hook the plugin implements still exists in NousResearch/hermes-agent at HEAD. |
Clones the repo via opensrc (cached). |
mise run compat:mastra |
Verifies every Memory / PromptBlocks method the plugin calls still exists in mastra-ai/mastra at HEAD. |
Same. |
mise run compat |
Both checks. CI runs this. | Same. Set GITHUB_TOKEN to avoid unauth rate limits. |
mise run deps:refresh |
Drops server/bun.lock and re-bun install --silent so transitive latest deps actually walk forward. |
Edits server/bun.lock + server/node_modules/. |
These use Firecrawl to crawl upstream docs sites and write a JSON map of all URLs to references/. Used to source-ground future implementation decisions. Requires FIRECRAWL_API_KEY (read from ~/.hermes/.env).
| Task | Role | Side effects |
|---|---|---|
mise run docs:map-mastra |
Refresh references/mastra-ai-docs-map.json (mastra.ai/docs). |
Writes to references/. |
mise run docs:map-hermes |
Refresh references/hermes-agent-nousresearch-com-map.json. |
Same. |
mise run docs:map-mise |
Refresh references/mise-jdx-dev-map.json. |
Same. |
mise run docs:map |
All three. | Same. |
| Task | Role | Side effects |
|---|---|---|
mise run bench |
Plugin's own benchmarks: hot-path latency, runner throughput, recall-cache freshness. | Writes results to stdout. |
mise run bench:compare |
Comparative benchmark against the 8 other Hermes memory provider plugins. | Requires those plugins installed. |
mise run bench:all |
Both. | Same. |
| Task | Role | Side effects |
|---|---|---|
mise run env |
Dump the mise-resolved environment as JSON (debug .env cascade). |
None. |
mise run tasks |
List every task this project exposes (this is the live source-of-truth). | None. |
mise run doctor |
mise's own diagnostics + project-specific checks. | None. |
CI runs mise run quality on every PR across three platforms (Ubuntu, macOS, Windows) via .github/workflows/quality.yml. The upstream-watch.yml workflow runs nightly to track Hermes + Mastra at HEAD, opening a rolling sync PR when drift is detected.
Agentic workflows via GitHub Agentic Workflows (gh-aw) handle higher-level tasks:
aw-failure-investigator.md— auto-investigates failed CI runs and creates issues with root-cause analysisaw-daily-quality-audit.md— weekday audit of code-size limits, stale refs, dependency freshness, and doc driftaw-stale-ref-sweeper.md— reviews every PR diff for old-name remnants and code-size violations
All workflows reuse mise run setup:opensrc and mise run install so local-dev parity is exact.
| Rule | Enforced by |
|---|---|
| 200 LOC max per file (Python + TypeScript) | tests/test_code_size_policy.py |
| 30 LOC max per function / class body | same |
| Max nesting depth 3 (.py/.ts/.sh only) | same |
| Max cognitive complexity 8 | biome (TS) + ruff (Python) |
Mastra/AI-SDK pinned to latest (HEAD-tracking) |
tests/test_latest_pinning.py |
| Every public hook has a deadline test | tests/test_non_blocking_hooks.py |
| Every commit authored as Kiren Srinivasan only — no AI co-authors, no bot identities | git config in CI workflows + commit hooks |
mise is the only user-facing CLI surface |
mise.toml (no direct bun/bunx/biome/tsc/prettier in user-facing surfaces) |
Pre-emptive code review via coderabbit.ai on every PR |
.coderabbit.yml |
| Dependency | Pin | Why |
|---|---|---|
@mastra/core |
latest |
Plugin tracks Mastra at HEAD; mise run compat:mastra verifies API surface every CI run |
@mastra/libsql |
latest |
same |
@mastra/memory |
latest |
same |
@ai-sdk/openai-compatible |
latest |
same |
hono |
latest |
Mastra's transitive HTTP layer |
zod |
latest |
Mastra's transitive validator |
mise run compat runs daily against NousResearch/hermes-agent (verifies all MemoryProvider hooks still exist) and mastra-ai/mastra (verifies the Memory + PromptBlocks APIs we depend on).
# ~/.hermes/config.yaml
memory:
provider: mastra # activate this plugin
# Optional: route compression through the Mastra-aware ContextEngine wrapper
# context:
# engine: mastra # observation injection + token-aware recall_top_k# Activate per profile. Either install profile-aware wrapper aliases
# (hermes profile alias <name>) and run them directly, or switch profiles
# via `hermes profile use <name>` first and then call `hermes config set`.
hermes config set memory.provider mastra # default profile
hermes profile use <profile-a> && hermes config set memory.provider mastra
hermes profile use <profile-b> && hermes config set memory.provider mastra
# Verify (each `resourceId` is one profile that has activated the plugin)
hermes mastra resources
# → ['hermes:default', 'hermes:<profile-a>', 'hermes:<profile-b>']Profile credentials: each profile that uses the provider must be able to resolve observer_api_key_env / reflector_api_key_env. Set these in ~/.hermes/.env (auto-inherited by every profile) or in the profile's own ~/.hermes/profiles/<name>/.env.
| Symptom | Fix |
|---|---|
bun not found in PATH |
Install Bun: mise use -g bun@latest (recommended) or curl -fsSL https://bun.sh/install | bash |
Bun install fails: peer dependency 'quansync' |
cd server && rm -f bun.lock && bun install |
recall returns 0 observations |
Normal early in a session — the Observer fires on token thresholds, not per-message. Check hermes mastra threads --profile <name> to confirm ingestion. |
mastra server is not running |
hermes mastra server restart then hermes mastra logs |
mise run compat reports "opensrc not found" |
mise run setup:opensrc to install it (idempotent — skips if already present). The CI workflows run this best-effort and skip compat checks when offline. |
GitHub API rate limit exceeded (during mise run compat) |
Set GITHUB_TOKEN in env so opensrc authenticates |
Profile observations leaking across profiles |
Should never happen — verified by tests/test_hermes_link.py. If it does, file an issue with hermes mastra resources output. |
For more, see hermes mastra logs and ~/.hermes/logs/agent.log.
MIT — see LICENSE.
Author: Kiren Srinivasan <kiren@fantasymetals.com>.
This plugin is independent of Mastra and Nous Research. Built against @mastra/memory and hermes-agent at HEAD.