Core Architecture and MCP Implementation for Cerebro

# Product Architecture & Specification: Cerebro

## 1. Core Philosophy

* **Agent-Agnostic:** Designed to work with any AI agent or LLM that supports the Model Context
  Protocol (MCP).
* **Single Binary:** Built in Rust for maximum performance, memory safety, and easy distribution.
* **Proactive Memory:** Cerebro trusts the agent to decide what is worth remembering. It is not a
  passive firehose of raw logs; it requires the agent to synthesize and save meaningful data.
* **Token-Efficient (Drill-In Strategy):** Prevents context window bloat by using a two-step
  retrieval process. Agents search for summaries first, then fetch full contents by ID only when
  needed.
* **Progressive Enhancement:** Works offline and blazingly fast out-of-the-box as a structured
  database. Optional LLM integrations can be enabled for "smart" background tasks (vector
  embeddings, knowledge graphs).

## 2. Tech Stack

* **Language:** Rust
* **Database:** SurrealDB (Embedded mode). Chosen for its multi-model capabilities (Document +
  Graph + Vector Search) within a single engine.
* **Concurrency:** `tokio` (for async runtime and message passing).
* **User Interface:** `ratatui` + `crossterm` (for a rich Terminal User Interface).
* **Protocol:** MCP (Model Context Protocol) via JSON-RPC.

## 3. Architecture & Data Flow

Cerebro utilizes a **Sync API + Async Worker** pattern to ensure the MCP server never blocks the
agent while performing heavy "smart" tasks.

1. **MCP Server (Frontend):** Receives the tool call (e.g., `mem_save`), writes the document to
   SurrealDB, and immediately returns a success response with the new Memory ID to the agent.
2. **Message Queue:** Upon a successful save, an event (`MemoryCreated{id}`) is sent down a
   `tokio::mpsc` channel.
3. **Smart Background Worker (Optional):** Listens to the channel.

* If an LLM provider (OpenAI, Ollama, Anthropic) is configured, it fetches the memory, generates
  vector embeddings, extracts entities, and creates Graph Edges (`RELATES_TO`) in SurrealDB.
* If no LLM is configured, it safely ignores the event.

## 4. Data Model (SurrealDB Schema)

* **Nodes (Documents):**
* `session`: Tracks session lifecycle and `session_summary`.
* `memory` (Engram): Stores title, type (bugfix, arch, concept), content (What/Why/Where/Learned),
  `topic_key`, `scope`, and vector embeddings (if enabled).
* `prompt`: Explicitly saved user prompts.


* **Edges (Graph Relations):**
* `memory` -> `CREATED_IN` -> `session`
* `memory` -> `RELATES_TO` -> `memory` (Generated by the smart worker).
* `session` -> `FOLLOWS` -> `session` (Chronological thread).

## 5. MCP Tools API

The 13 tools exposed to the agent, grouped by logical domain:

| Tool Name                  | Purpose                                                           |
|----------------------------|-------------------------------------------------------------------|
| **Session Management**     |                                                                   |
| `mem_session_start`        | Registers a new session start.                                    |
| `mem_session_end`          | Marks the active session as completed.                            |
| `mem_session_summary`      | Saves the end-of-session summary (Goal/Discoveries/Accomplished). |
| `mem_context`              | Fetches recent context automatically at the start of a session.   |
| **Memory Operations**      |                                                                   |
| `mem_save`                 | Saves a structured observation. Supports `scope` and `topic_key`. |
| `mem_update`               | Updates an existing observation by ID.                            |
| `mem_delete`               | Soft-deletes an observation (hard-delete optional).               |
| `mem_suggest_topic_key`    | Suggests a stable `topic_key` for evolving topics before saving.  |
| **Exploration (Drill-in)** |                                                                   |
| `mem_search`               | Full-text/Semantic search. Returns compact results (~100 tokens). |
| `mem_get_observation`      | Fetches the full, untruncated content of a specific memory ID.    |
| `mem_timeline`             | Returns chronological context (what happened before/after an ID). |
| **System Utilities**       |                                                                   |
| `mem_save_prompt`          | Saves a user prompt for future context.                           |
| `mem_stats`                | Returns database sizes, node counts, and worker status.           |

## 6. Memory Hygiene & Business Logic

* **Exact Deduplication:** Prevents spam in a rolling window by hashing
  `SHA256(project + scope + type + title)`. Duplicates update metadata (`duplicate_count`,
  `last_seen_at`) instead of creating rows.
* **Topic Upserts:** If `mem_save` includes a `topic_key`, it updates the existing memory and
  increments `revision_count`.
* **Global Filters:** `mem_search`, `mem_context`, and `mem_timeline` automatically ignore records
  where `deleted_at IS NOT NULL`.

## 7. Terminal User Interface (TUI)

Built with `ratatui` (https://ratatui.rs/) to provide humans with real-time observability into the
agent's mind.

* **Dashboard:** DB stats, session counts, and background worker status.
* **Memory Explorer:** Interactive list to browse and read full memories.
* **Session Timeline:** Visual chronological view of sessions and their summaries.
* **Live Logs:** Real-time stream of incoming MCP tool calls (e.g., watching the agent perform a
  `mem_search`).

## 8. Agent Integration (`prompt_template.md`)

A markdown file provided in the repository containing the "System Prompt" instructions for users to
paste into their AI agents. It instructs the agent on *how* to use Cerebro proactively, emphasizing
the drill-in strategy and the What/Why/Where/Learned format.

Tool Name	Purpose
Session Management
`mem_session_start`	Registers a new session start.
`mem_session_end`	Marks the active session as completed.
`mem_session_summary`	Saves the end-of-session summary (Goal/Discoveries/Accomplished).
`mem_context`	Fetches recent context automatically at the start of a session.
Memory Operations
`mem_save`	Saves a structured observation. Supports `scope` and `topic_key`.
`mem_update`	Updates an existing observation by ID.
`mem_delete`	Soft-deletes an observation (hard-delete optional).
`mem_suggest_topic_key`	Suggests a stable `topic_key` for evolving topics before saving.
Exploration (Drill-in)
`mem_search`	Full-text/Semantic search. Returns compact results (~100 tokens).
`mem_get_observation`	Fetches the full, untruncated content of a specific memory ID.
`mem_timeline`	Returns chronological context (what happened before/after an ID).
System Utilities
`mem_save_prompt`	Saves a user prompt for future context.
`mem_stats`	Returns database sizes, node counts, and worker status.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core Architecture and MCP Implementation for Cerebro #213

Product Architecture & Specification: Cerebro

1. Core Philosophy

2. Tech Stack

3. Architecture & Data Flow

4. Data Model (SurrealDB Schema)

5. MCP Tools API

6. Memory Hygiene & Business Logic

7. Terminal User Interface (TUI)

8. Agent Integration (`prompt_template.md`)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Core Architecture and MCP Implementation for Cerebro #213

Description

Product Architecture & Specification: Cerebro

1. Core Philosophy

2. Tech Stack

3. Architecture & Data Flow

4. Data Model (SurrealDB Schema)

5. MCP Tools API

6. Memory Hygiene & Business Logic

7. Terminal User Interface (TUI)

8. Agent Integration (prompt_template.md)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

8. Agent Integration (`prompt_template.md`)