Usage Augmented Retrieval

Note: This is a proof of concept, vibe-coded with Claude. It works, but expect rough edges.

A usage-augmented retrieval (UAR) system for Claude Code. Unlike traditional RAG which is stateless, this system learns from usage outcomes to improve retrieval over time.

Core Idea

The Problem with Traditional RAG

Standard RAG is stateless. It retrieves content based on semantic similarity, but it doesn't learn from outcomes. A chunk that has consistently led you astray gets retrieved just as readily as one that has saved you dozens of times—as long as the embeddings are close enough.

The result: sarcastic Reddit comments presented as factual claims, outdated Stack Overflow answers weighted equally with current documentation, and no sense of "this source is great for practical tips but terrible for authoritative statements."

What UAR Does Differently

RAG asks: "What content is semantically similar to this query?"

UAR asks: "What content is semantically similar AND has actually helped with tasks like this before?"

Every time you retrieve and use a piece of knowledge, the system records:

What task were you trying to accomplish? (debugging, factual lookup, conceptual understanding, etc.)
Did it help, partially help, miss entirely, or actively mislead?
Why?

This creates a usage history for each chunk. Over time, the system learns what each piece of knowledge is actually good for—not through a reliability score, but through role-casting: the same content might be excellent for debugging but useless for theoretical understanding.

Rather than pre-computing a score that bakes in these signals, the raw usage history is surfaced directly to the LLM. It sees "this chunk won 3x for debugging, missed 2x for factual lookup" and reasons about whether to trust it for the current task. The judgment stays with the reasoning system.

Relation to Agentic Memory Systems

UAR fits into the emerging landscape of agentic RAG and agent memory systems:

Concept	UAR Implementation
Episodic Memory	Usage traces (timestamped outcomes with context)
Semantic Memory	Chunks + functional profiles
Memory Evolution	Consolidation (traces → profiles over time)
Write Capability	`kb_ingest` + `kb_record`
Retrieval as Tool	`kb_search` invoked by research sub-agent

The progression from traditional RAG → Agentic RAG → Agent Memory is about adding write capabilities so systems can learn from interactions. UAR takes this further with explicit outcome attribution: not just "what happened" but "did it actually help, and for what kind of task?"

This is similar to patterns like A-Mem (Zettelkasten-inspired agent memory) where notes evolve as new related memories arrive. Our consolidation process serves the same purpose—converting episodic usage traces into semantic functional profiles that inform future retrieval.

Feedback Attribution

A key challenge in learning systems is attribution: when multiple sources contribute to an answer, which ones deserve credit?

UAR solves this with self-evaluation: after synthesizing a response, the sub-agent evaluates which chunks actually contributed to the summary versus which were retrieved but unused. The user provides a single piece of feedback ("helpful" / "not useful"), and that feedback is recorded precisely:

Contributing chunks → user's feedback (win/partial/miss/misleading)
Non-contributing chunks → automatic "miss" (retrieved but not useful)

This gives per-source learning without burdening the user with per-source feedback. The system learns "Neo4j docs are great for implementation" and "Wikipedia was retrieved but didn't help" from a single thumbs-up.

Features

Multi-strategy retrieval: keyword, semantic (MiniLM embeddings), concept, usage history, recency
Usage tracking: Record outcomes (win/partial/miss/misleading) per task type
Role-casting: Same content can be good for debugging but bad for factual lookup—tracked separately
LLM-in-the-loop: Usage history is surfaced directly, letting the LLM reason about trust
Local embeddings: Uses all-MiniLM-L6-v2 via sentence-transformers, no external API needed
SQLite storage: Single-file database with WAL mode for concurrent access

Installation

Requires Python 3.10+ and uv.

cd knowledge_base
uv sync

Usage with Claude Code

Add to your ~/.claude.json for global access:

{
  "mcpServers": {
    "knowledge": {
      "type": "stdio",
      "command": "uv",
      "args": ["run", "--directory", "/path/to/knowledge_base", "python", "mcp_server.py"]
    }
  }
}

Restart Claude Code. The knowledge tools will be available.

MCP Tools

Core Tools

Tool	Description
`kb_search`	Multi-strategy search with rich usage history
`kb_semantic_search`	Pure embedding similarity search
`kb_ingest`	Add content (auto-chunked and embedded)
`kb_record`	Record usage outcome for retrieved chunks
`kb_reflect`	Comprehensive analysis of what's working
`kb_quick_insights`	Fast health check

Utility Tools (not advertised to LLM)

These are available but not included in the server instructions—useful for debugging and admin:

Tool	Description
`kb_stats`	Database statistics
`kb_get_chunk`	Get chunk details with full usage history
`kb_list_documents`	List all documents
`kb_delete_document`	Remove a document and all its chunks
`kb_consolidate`	Convert usage traces into functional profiles

Consolidation

Consolidation converts episodic memory (individual usage traces) into semantic memory (functional profiles). It runs when chunks or sources have accumulated enough usage data:

Chunks: 5+ traces → generates a functional profile describing what the chunk is good/bad for
Sources: 10+ traces → generates a source-level profile

When to run it: kb_quick_insights includes a consolidation status:

{
  "consolidation": {
    "status": "recommended",
    "chunks_ready_for_profiling": 3,
    "sources_ready_for_profiling": 1
  }
}

When status is "recommended", run kb_consolidate to generate profiles. These profiles are then surfaced in future kb_search results, helping the LLM reason about whether to trust each chunk.

Task Types

When recording usage, specify the task type:

factual_lookup - Checking facts, definitions, specifications
implementation_howto - How to build/code something
conceptual_understanding - Understanding concepts/theory
debugging - Fixing errors or issues
decision_support - Choosing between options
exploratory_research - Open-ended exploration

Usage Outcomes

win - Directly solved the problem
partial - Helped but needed more
miss - Retrieved but not useful
misleading - Led to wrong conclusions

Example Workflow

Search for relevant knowledge before answering a question
Use the retrieved content to help with the task
Record the outcome with task type and notes
Over time, retrieval improves based on what actually worked

Recommended Usage: Research Sub-Agent

For best results, use a sub-agent pattern for research tasks:

┌─────────────────────────────────────────────────────────────┐
│                      MAIN CLAUDE                            │
│  - Focuses on user conversation                             │
│  - Spawns research agent when needed                        │
│  - Provides feedback on results                             │
└─────────────────────────────────────────────────────────────┘
        │                                   ▲
        │ query + context                   │ results + feedback request
        ▼                                   │
┌─────────────────────────────────────────────────────────────┐
│                    RESEARCH SUB-AGENT                       │
│  - Searches KB first                                        │
│  - Searches web if KB insufficient                          │
│  - Ingests valuable findings                                │
│  - Records outcome based on main Claude's feedback          │
└─────────────────────────────────────────────────────────────┘

This pattern:

Keeps the main conversation focused
Ensures the feedback loop is always closed
Lets the sub-agent handle KB complexity

The MCP server instructions include a template for spawning research sub-agents.

Architecture

┌─────────────────────────────────────────────────────────┐
│                     THE UAR LOOP                        │
├─────────────────────────────────────────────────────────┤
│                                                         │
│   RETRIEVE ──────► USE ──────► RECORD                   │
│       ▲                           │                     │
│       │                           │                     │
│       │         CONSOLIDATE ◄─────┘                     │
│       │              │                                  │
│       └──────────────┘                                  │
│                                                         │
│   Retrieval is augmented by usage history.              │
│   Usage history consolidates into functional profiles.  │
│   Profiles inform future retrieval.                     │
│                                                         │
└─────────────────────────────────────────────────────────┘

Files

schema.py - Data models and SQLite database
harness.py - Unified API
ingest.py - Content ingestion pipeline
retrieve.py - Multi-strategy retrieval
record.py - Usage tracking
embeddings.py - Local MiniLM embeddings
context.py - Chunk dossier assembly for LLM reasoning
consolidate.py - Episodic to semantic consolidation
reflect.py - Knowledge base analysis
mcp_server.py - MCP server for Claude Code integration

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Usage Augmented Retrieval

Core Idea

The Problem with Traditional RAG

What UAR Does Differently

Relation to Agentic Memory Systems

Feedback Attribution

Features

Installation

Usage with Claude Code

MCP Tools

Core Tools

Utility Tools (not advertised to LLM)

Consolidation

Task Types

Usage Outcomes

Example Workflow

Recommended Usage: Research Sub-Agent

Architecture

Files

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
consolidate.py		consolidate.py
context.py		context.py
embeddings.py		embeddings.py
harness.py		harness.py
ingest.py		ingest.py
mcp_server.py		mcp_server.py
pyproject.toml		pyproject.toml
record.py		record.py
reflect.py		reflect.py
retrieve.py		retrieve.py
schema.py		schema.py

jochiang/Usage-Augmented-Retrieval

Folders and files

Latest commit

History

Repository files navigation

Usage Augmented Retrieval

Core Idea

The Problem with Traditional RAG

What UAR Does Differently

Relation to Agentic Memory Systems

Feedback Attribution

Features

Installation

Usage with Claude Code

MCP Tools

Core Tools

Utility Tools (not advertised to LLM)

Consolidation

Task Types

Usage Outcomes

Example Workflow

Recommended Usage: Research Sub-Agent

Architecture

Files

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages