-
Notifications
You must be signed in to change notification settings - Fork 2
Core Architecture and MCP Implementation for Cerebro #213
Description
Product Architecture & Specification: Cerebro
1. Core Philosophy
- Agent-Agnostic: Designed to work with any AI agent or LLM that supports the Model Context
Protocol (MCP). - Single Binary: Built in Rust for maximum performance, memory safety, and easy distribution.
- Proactive Memory: Cerebro trusts the agent to decide what is worth remembering. It is not a
passive firehose of raw logs; it requires the agent to synthesize and save meaningful data. - Token-Efficient (Drill-In Strategy): Prevents context window bloat by using a two-step
retrieval process. Agents search for summaries first, then fetch full contents by ID only when
needed. - Progressive Enhancement: Works offline and blazingly fast out-of-the-box as a structured
database. Optional LLM integrations can be enabled for "smart" background tasks (vector
embeddings, knowledge graphs).
2. Tech Stack
- Language: Rust
- Database: SurrealDB (Embedded mode). Chosen for its multi-model capabilities (Document +
Graph + Vector Search) within a single engine. - Concurrency:
tokio(for async runtime and message passing). - User Interface:
ratatui+crossterm(for a rich Terminal User Interface). - Protocol: MCP (Model Context Protocol) via JSON-RPC.
3. Architecture & Data Flow
Cerebro utilizes a Sync API + Async Worker pattern to ensure the MCP server never blocks the
agent while performing heavy "smart" tasks.
- MCP Server (Frontend): Receives the tool call (e.g.,
mem_save), writes the document to
SurrealDB, and immediately returns a success response with the new Memory ID to the agent. - Message Queue: Upon a successful save, an event (
MemoryCreated{id}) is sent down a
tokio::mpscchannel. - Smart Background Worker (Optional): Listens to the channel.
- If an LLM provider (OpenAI, Ollama, Anthropic) is configured, it fetches the memory, generates
vector embeddings, extracts entities, and creates Graph Edges (RELATES_TO) in SurrealDB. - If no LLM is configured, it safely ignores the event.
4. Data Model (SurrealDB Schema)
-
Nodes (Documents):
-
session: Tracks session lifecycle andsession_summary. -
memory(Engram): Stores title, type (bugfix, arch, concept), content (What/Why/Where/Learned),
topic_key,scope, and vector embeddings (if enabled). -
prompt: Explicitly saved user prompts. -
Edges (Graph Relations):
-
memory->CREATED_IN->session -
memory->RELATES_TO->memory(Generated by the smart worker). -
session->FOLLOWS->session(Chronological thread).
5. MCP Tools API
The 13 tools exposed to the agent, grouped by logical domain:
| Tool Name | Purpose |
|---|---|
| Session Management | |
mem_session_start |
Registers a new session start. |
mem_session_end |
Marks the active session as completed. |
mem_session_summary |
Saves the end-of-session summary (Goal/Discoveries/Accomplished). |
mem_context |
Fetches recent context automatically at the start of a session. |
| Memory Operations | |
mem_save |
Saves a structured observation. Supports scope and topic_key. |
mem_update |
Updates an existing observation by ID. |
mem_delete |
Soft-deletes an observation (hard-delete optional). |
mem_suggest_topic_key |
Suggests a stable topic_key for evolving topics before saving. |
| Exploration (Drill-in) | |
mem_search |
Full-text/Semantic search. Returns compact results (~100 tokens). |
mem_get_observation |
Fetches the full, untruncated content of a specific memory ID. |
mem_timeline |
Returns chronological context (what happened before/after an ID). |
| System Utilities | |
mem_save_prompt |
Saves a user prompt for future context. |
mem_stats |
Returns database sizes, node counts, and worker status. |
6. Memory Hygiene & Business Logic
- Exact Deduplication: Prevents spam in a rolling window by hashing
SHA256(project + scope + type + title). Duplicates update metadata (duplicate_count,
last_seen_at) instead of creating rows. - Topic Upserts: If
mem_saveincludes atopic_key, it updates the existing memory and
incrementsrevision_count. - Global Filters:
mem_search,mem_context, andmem_timelineautomatically ignore records
wheredeleted_at IS NOT NULL.
7. Terminal User Interface (TUI)
Built with ratatui (https://ratatui.rs/) to provide humans with real-time observability into the
agent's mind.
- Dashboard: DB stats, session counts, and background worker status.
- Memory Explorer: Interactive list to browse and read full memories.
- Session Timeline: Visual chronological view of sessions and their summaries.
- Live Logs: Real-time stream of incoming MCP tool calls (e.g., watching the agent perform a
mem_search).
8. Agent Integration (prompt_template.md)
A markdown file provided in the repository containing the "System Prompt" instructions for users to
paste into their AI agents. It instructs the agent on how to use Cerebro proactively, emphasizing
the drill-in strategy and the What/Why/Where/Learned format.