Investigation: context-1 agentic search model for document/memory retrieval

## Summary

This issue documents an investigation into using [chromadb/context-1](https://huggingface.co/chromadb/context-1) as a search backend for Elroy's memory and document retrieval.

**Conclusion: Not yet viable for integration, but worth tracking. When the agent harness is released, it could be a strong fit for `search_documents` on large corpora.**

---

## What is context-1?

Context-1 is a **20B parameter agentic search model** published by Chroma in March 2026 ([research paper](https://www.trychroma.com/research/context-1)). It is derived from `gpt-oss-20B` and achieves retrieval performance comparable to frontier LLMs at significantly lower cost (up to 10x faster inference).

**It is NOT a traditional embedding model.** Instead, it's a retrieval subagent that:

- Operates in an **observe-reason-act loop**, calling a `search_corpus` tool repeatedly
- **Decomposes complex queries** into targeted subqueries (averages 2.56 tool calls/turn)
- **Self-edits its context** mid-search — pruning irrelevant results to free space for further exploration (0.94 prune accuracy)
- Returns a **ranked list of relevant documents** to a downstream answering model

The `search_corpus` tool it calls internally combines:
- Sparse vector search (keyword)
- Dense embedding search (semantic)
- **Reciprocal Rank Fusion (RRF)** to merge results
- A **reranker** to select top results within a token budget

---

## How Elroy currently does search

Elroy uses a **single-pass embedding lookup**:

1. Query text → embedding via LiteLLM (`text-embedding-3-small`, 1536 dims)
2. Embedding stored/queried in ChromaDB (L2 distance)
3. Returns top results under `l2_memory_relevance_distance_threshold`

Relevant files:
- `elroy/llm/client.py:155` — `get_embedding()`
- `elroy/repository/recall/queries.py` — `query_vector()`, `get_most_relevant_memories()`, `search_documents()`
- `elroy/db/db_session.py:165` — `query_vector()` ChromaDB call

---

## Why integration isn't viable today

### 1. Agent harness is not yet public
The paper explicitly states: *"Context-1 is designed to operate within a specific agent harness that manages tool execution, token budgets, context pruning, and deduplication — that harness is not yet public."* Without it, the model weights alone won't reproduce the reported results.

### 2. No hosted inference API
As of March 2026, context-1 is not deployed by any HuggingFace inference provider. Self-hosting a 20B model requires substantial GPU resources.

### 3. Wrong paradigm for Elroy's primary use case
Elroy's main retrieval pattern is **simple, low-latency, single-hop**: retrieve 2 relevant memories per conversation turn. context-1's multi-turn agentic loop would add significant latency and cost to every message.

### 4. Not a drop-in replacement
It cannot substitute for the embedding model in `EmbeddingModel` / `get_embedding()`. The existing ChromaDB + L2 search pipeline would need to be redesigned.

---

## Where context-1 could fit (future)

The **`search_documents` tool** (`elroy/repository/recall/queries.py:72`) is the most natural fit. It's used for complex document queries — exactly the multi-hop retrieval scenario context-1 excels at.

A potential future architecture:
- Keep existing embedding search for **memory/reminder recall** (simple, latency-sensitive)
- Use context-1 as an **optional deep search backend** for `search_documents` on large ingested corpora
- Expose as a configurable option (e.g., `document_search_backend: context1 | embedding`)

This would require:
1. Chroma releasing the agent harness
2. An accessible inference API (Chroma Cloud, HuggingFace, or self-hosted)
3. Building a context-1 client that wraps the observe-reason-act loop and exposes a `search(query) -> List[DocumentExcerpt]` interface

---

## Action items

- [ ] Watch [chroma-core/chroma](https://github.com/chroma-core/chroma) for agent harness release
- [ ] Re-evaluate once Chroma Cloud or a hosted API becomes available
- [ ] When available: prototype context-1 backend for `search_documents` as opt-in feature

---

## References

- [chromadb/context-1 on HuggingFace](https://huggingface.co/chromadb/context-1)
- [Chroma research page](https://www.trychroma.com/research/context-1)
- [chroma-core/context-1-data-gen](https://github.com/chroma-core/context-1-data-gen) (training data generation pipeline)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigation: context-1 agentic search model for document/memory retrieval #552

Summary

What is context-1?

How Elroy currently does search

Why integration isn't viable today

1. Agent harness is not yet public

2. No hosted inference API

3. Wrong paradigm for Elroy's primary use case

4. Not a drop-in replacement

Where context-1 could fit (future)

Action items

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Investigation: context-1 agentic search model for document/memory retrieval #552

Description

Summary

What is context-1?

How Elroy currently does search

Why integration isn't viable today

1. Agent harness is not yet public

2. No hosted inference API

3. Wrong paradigm for Elroy's primary use case

4. Not a drop-in replacement

Where context-1 could fit (future)

Action items

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions