memtomem-stm

Official website & docs: https://memtomem.com

🚧 Alpha — APIs and defaults may change between 0.1.x releases. Feedback and issue reports are especially welcome: Issues · Discussions.

Spend fewer tokens. Remember more. Ship faster.

memtomem-stm is an MCP proxy that typically cuts token usage by 20–80% and gives your agent memory across sessions — with no changes to your upstream MCP servers.

It sits between your AI agent and its upstream MCP servers, compressing bloated tool responses, caching repeated calls, and automatically surfacing relevant context from prior sessions via a memtomem LTM server.

You need this if:

Your agent burns tokens re-reading the same files and search results — STM compresses and caches them (Claude Code, Cursor, Claude Desktop, or any MCP client)
Your coding sessions lose context and the agent re-discovers decisions it already made — STM surfaces prior context automatically via memtomem LTM
You run custom MCP servers and want compression, caching, and observability without changing upstream code — STM is a drop-in proxy layer

flowchart TB
    Agent["Agent<br/>(Claude Code, Cursor, …)"]
    subgraph STM["memtomem-stm (STM)"]
        Pipe["CLEAN → COMPRESS → SURFACE → INDEX"]
    end
    LTM[("memtomem LTM<br/>(MCP server)")]
    FS["filesystem<br/>MCP server"]
    GH["github<br/>MCP server"]
    Other["…any MCP server"]

    Agent -->|MCP| STM
    STM <-->|MCP: stdio / SSE / HTTP| FS
    STM <-->|MCP| GH
    STM <-->|MCP| Other
    STM <-.->|surfacing<br/>via MCP| LTM

Installation

pip install memtomem-stm

Or with uv:

uv tool install memtomem-stm     # install mms / memtomem-stm as global CLI tools
uvx memtomem-stm --help          # or run without installing
uv pip install memtomem-stm      # or install into the active environment

memtomem-stm is independent: it has no Python-level dependency on memtomem core. To enable proactive memory surfacing, point STM at a running memtomem MCP server (or any compatible MCP server) — communication happens entirely through the MCP protocol.

Quick Start

mms is the short alias for memtomem-stm-proxy — both commands are identical, use whichever you prefer.

1. Add an upstream MCP server

For first-time setup, run the guided wizard — it prompts for name/prefix/command, optionally probes the server, and then offers to register STM with Claude Code (or generate .mcp.json) in the same flow:

mms init

Or add servers non-interactively:

mms add filesystem \
  --command npx \
  --args "-y @modelcontextprotocol/server-filesystem /home/user/projects" \
  --prefix fs

--prefix is required: it's the namespace under which the upstream server's tools will appear (e.g. fs__read_file). Repeat for each MCP server you want to proxy.

If you've already configured MCP servers in Claude Desktop, Claude Code, or a project .mcp.json, mms add --import (alias --from-clients) reuses the init wizard to bulk-select them — skipping anything already registered.

mms list      # show what you've added
mms status    # show full config + connectivity

2. Connect your AI client to STM

mms init ends with a 3-way prompt — pick option 1 and it shells out to claude mcp add for you. If you skipped that step or want to register with a different client later, run:

mms register

To register manually, use claude directly:

claude mcp add mms -s user -- mms

Or add it to a JSON MCP config for Cursor / Windsurf / Claude Desktop / Gemini:

{
  "mcpServers": {
    "mms": {
      "command": "mms"
    }
  }
}

Why mms and not memtomem-stm? Either name works (the three entry points are interchangeable), but the MCP client composes proxied tool names as mcp__<server>__<prefix>__<tool>. The short alias mms (3 chars) saves 9 bytes vs memtomem-stm (12 chars), which is exactly enough headroom to keep upstreams with long tool names under the 64-char MCP limit. If you registered under a different name and want the mms add overflow check (#261) to match exactly, export MMS_CLIENT_SERVER_NAME=<name> in your shell — otherwise the default assumption is conservative and at worst causes a few false-positive warnings on borderline prefixes.

3. Use the proxied tools

Your agent now sees proxied tools (fs__read_file, gh__search_repositories, etc.). Every call goes through the 4-stage pipeline automatically — responses are cleaned, compressed, cached, and (when an LTM server is configured) enriched with relevant memories.

To check what's happening, ask the agent to call stm_proxy_stats.

Tutorial notebooks

Try it without wiring into your AI client first. A quickstart Jupyter notebook registers an upstream MCP server, calls a proxied tool, and reads stm_proxy_stats end-to-end. Clone the repo, uv sync, and uv run jupyter lab notebooks/ — no external services needed.

Key Features

🗜️ Typically 20–80% fewer tokens per tool call — 10 compression strategies with auto-selection by content type, query-aware budget, and zero-loss progressive delivery → docs/compression.md
🧠 Your agent remembers — proactive memory surfacing from prior sessions, gated by relevance threshold, rate limit, dedup, and circuit breaker → docs/surfacing.md
💾 Repeated calls are free — response cache with TTL and eviction; surfacing re-applied on cache hit so injected memories stay fresh → docs/caching.md
🛡️ Production-safe — circuit breaker, retry with backoff, write-tool skip, query cooldown, dedup, sensitive content auto-detection, Langfuse tracing, horizontal scaling via PendingStore

Documentation

Guide	Topic
Surfacing	How agents recall prior context automatically
Compression	All 10 strategies — pick the right one for your content
Caching	Skip repeated work with response caching
Configuration	Tune settings without touching code
CLI	CLI commands and the 11 MCP tools

Development

uv sync                                                    # install dev deps
uv run pytest -m "not ollama and not bench_qa_meta and not bench_qa_llm_judge"   # tests (CI filter)
uv run ruff check src && uv run ruff format --check src    # lint (required)
uv run mypy src                                            # typecheck (advisory)

CI runs the same commands on every PR via .github/workflows/ci.yml. Lint (ruff check + ruff format --check) and tests must pass; mypy is advisory.

License

Apache License 2.0. Contributions are accepted under the terms of the Contributor License Agreement.

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
.github/workflows		.github/workflows
docs		docs
notebooks		notebooks
src/memtomem_stm		src/memtomem_stm
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLA.md		CLA.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

memtomem-stm

Installation

Quick Start

1. Add an upstream MCP server

2. Connect your AI client to STM

3. Use the proxied tools

Tutorial notebooks

Key Features

Documentation

Development

License

About

Uh oh!

Releases 21

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

memtomem-stm

Installation

Quick Start

1. Add an upstream MCP server

2. Connect your AI client to STM

3. Use the proxied tools

Tutorial notebooks

Key Features

Documentation

Development

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 21

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages