Skip to content

Core Architecture and MCP Implementation for Cerebro #213

@yacosta738

Description

@yacosta738

Product Architecture & Specification: Cerebro

1. Core Philosophy

  • Agent-Agnostic: Designed to work with any AI agent or LLM that supports the Model Context
    Protocol (MCP).
  • Single Binary: Built in Rust for maximum performance, memory safety, and easy distribution.
  • Proactive Memory: Cerebro trusts the agent to decide what is worth remembering. It is not a
    passive firehose of raw logs; it requires the agent to synthesize and save meaningful data.
  • Token-Efficient (Drill-In Strategy): Prevents context window bloat by using a two-step
    retrieval process. Agents search for summaries first, then fetch full contents by ID only when
    needed.
  • Progressive Enhancement: Works offline and blazingly fast out-of-the-box as a structured
    database. Optional LLM integrations can be enabled for "smart" background tasks (vector
    embeddings, knowledge graphs).

2. Tech Stack

  • Language: Rust
  • Database: SurrealDB (Embedded mode). Chosen for its multi-model capabilities (Document +
    Graph + Vector Search) within a single engine.
  • Concurrency: tokio (for async runtime and message passing).
  • User Interface: ratatui + crossterm (for a rich Terminal User Interface).
  • Protocol: MCP (Model Context Protocol) via JSON-RPC.

3. Architecture & Data Flow

Cerebro utilizes a Sync API + Async Worker pattern to ensure the MCP server never blocks the
agent while performing heavy "smart" tasks.

  1. MCP Server (Frontend): Receives the tool call (e.g., mem_save), writes the document to
    SurrealDB, and immediately returns a success response with the new Memory ID to the agent.
  2. Message Queue: Upon a successful save, an event (MemoryCreated{id}) is sent down a
    tokio::mpsc channel.
  3. Smart Background Worker (Optional): Listens to the channel.
  • If an LLM provider (OpenAI, Ollama, Anthropic) is configured, it fetches the memory, generates
    vector embeddings, extracts entities, and creates Graph Edges (RELATES_TO) in SurrealDB.
  • If no LLM is configured, it safely ignores the event.

4. Data Model (SurrealDB Schema)

  • Nodes (Documents):

  • session: Tracks session lifecycle and session_summary.

  • memory (Engram): Stores title, type (bugfix, arch, concept), content (What/Why/Where/Learned),
    topic_key, scope, and vector embeddings (if enabled).

  • prompt: Explicitly saved user prompts.

  • Edges (Graph Relations):

  • memory -> CREATED_IN -> session

  • memory -> RELATES_TO -> memory (Generated by the smart worker).

  • session -> FOLLOWS -> session (Chronological thread).

5. MCP Tools API

The 13 tools exposed to the agent, grouped by logical domain:

Tool Name Purpose
Session Management
mem_session_start Registers a new session start.
mem_session_end Marks the active session as completed.
mem_session_summary Saves the end-of-session summary (Goal/Discoveries/Accomplished).
mem_context Fetches recent context automatically at the start of a session.
Memory Operations
mem_save Saves a structured observation. Supports scope and topic_key.
mem_update Updates an existing observation by ID.
mem_delete Soft-deletes an observation (hard-delete optional).
mem_suggest_topic_key Suggests a stable topic_key for evolving topics before saving.
Exploration (Drill-in)
mem_search Full-text/Semantic search. Returns compact results (~100 tokens).
mem_get_observation Fetches the full, untruncated content of a specific memory ID.
mem_timeline Returns chronological context (what happened before/after an ID).
System Utilities
mem_save_prompt Saves a user prompt for future context.
mem_stats Returns database sizes, node counts, and worker status.

6. Memory Hygiene & Business Logic

  • Exact Deduplication: Prevents spam in a rolling window by hashing
    SHA256(project + scope + type + title). Duplicates update metadata (duplicate_count,
    last_seen_at) instead of creating rows.
  • Topic Upserts: If mem_save includes a topic_key, it updates the existing memory and
    increments revision_count.
  • Global Filters: mem_search, mem_context, and mem_timeline automatically ignore records
    where deleted_at IS NOT NULL.

7. Terminal User Interface (TUI)

Built with ratatui (https://ratatui.rs/) to provide humans with real-time observability into the
agent's mind.

  • Dashboard: DB stats, session counts, and background worker status.
  • Memory Explorer: Interactive list to browse and read full memories.
  • Session Timeline: Visual chronological view of sessions and their summaries.
  • Live Logs: Real-time stream of incoming MCP tool calls (e.g., watching the agent perform a
    mem_search).

8. Agent Integration (prompt_template.md)

A markdown file provided in the repository containing the "System Prompt" instructions for users to
paste into their AI agents. It instructs the agent on how to use Cerebro proactively, emphasizing
the drill-in strategy and the What/Why/Where/Learned format.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions