Summary
This issue documents an investigation into using chromadb/context-1 as a search backend for Elroy's memory and document retrieval.
Conclusion: Not yet viable for integration, but worth tracking. When the agent harness is released, it could be a strong fit for search_documents on large corpora.
What is context-1?
Context-1 is a 20B parameter agentic search model published by Chroma in March 2026 (research paper). It is derived from gpt-oss-20B and achieves retrieval performance comparable to frontier LLMs at significantly lower cost (up to 10x faster inference).
It is NOT a traditional embedding model. Instead, it's a retrieval subagent that:
- Operates in an observe-reason-act loop, calling a
search_corpus tool repeatedly
- Decomposes complex queries into targeted subqueries (averages 2.56 tool calls/turn)
- Self-edits its context mid-search — pruning irrelevant results to free space for further exploration (0.94 prune accuracy)
- Returns a ranked list of relevant documents to a downstream answering model
The search_corpus tool it calls internally combines:
- Sparse vector search (keyword)
- Dense embedding search (semantic)
- Reciprocal Rank Fusion (RRF) to merge results
- A reranker to select top results within a token budget
How Elroy currently does search
Elroy uses a single-pass embedding lookup:
- Query text → embedding via LiteLLM (
text-embedding-3-small, 1536 dims)
- Embedding stored/queried in ChromaDB (L2 distance)
- Returns top results under
l2_memory_relevance_distance_threshold
Relevant files:
elroy/llm/client.py:155 — get_embedding()
elroy/repository/recall/queries.py — query_vector(), get_most_relevant_memories(), search_documents()
elroy/db/db_session.py:165 — query_vector() ChromaDB call
Why integration isn't viable today
1. Agent harness is not yet public
The paper explicitly states: "Context-1 is designed to operate within a specific agent harness that manages tool execution, token budgets, context pruning, and deduplication — that harness is not yet public." Without it, the model weights alone won't reproduce the reported results.
2. No hosted inference API
As of March 2026, context-1 is not deployed by any HuggingFace inference provider. Self-hosting a 20B model requires substantial GPU resources.
3. Wrong paradigm for Elroy's primary use case
Elroy's main retrieval pattern is simple, low-latency, single-hop: retrieve 2 relevant memories per conversation turn. context-1's multi-turn agentic loop would add significant latency and cost to every message.
4. Not a drop-in replacement
It cannot substitute for the embedding model in EmbeddingModel / get_embedding(). The existing ChromaDB + L2 search pipeline would need to be redesigned.
Where context-1 could fit (future)
The search_documents tool (elroy/repository/recall/queries.py:72) is the most natural fit. It's used for complex document queries — exactly the multi-hop retrieval scenario context-1 excels at.
A potential future architecture:
- Keep existing embedding search for memory/reminder recall (simple, latency-sensitive)
- Use context-1 as an optional deep search backend for
search_documents on large ingested corpora
- Expose as a configurable option (e.g.,
document_search_backend: context1 | embedding)
This would require:
- Chroma releasing the agent harness
- An accessible inference API (Chroma Cloud, HuggingFace, or self-hosted)
- Building a context-1 client that wraps the observe-reason-act loop and exposes a
search(query) -> List[DocumentExcerpt] interface
Action items
References
Summary
This issue documents an investigation into using chromadb/context-1 as a search backend for Elroy's memory and document retrieval.
Conclusion: Not yet viable for integration, but worth tracking. When the agent harness is released, it could be a strong fit for
search_documentson large corpora.What is context-1?
Context-1 is a 20B parameter agentic search model published by Chroma in March 2026 (research paper). It is derived from
gpt-oss-20Band achieves retrieval performance comparable to frontier LLMs at significantly lower cost (up to 10x faster inference).It is NOT a traditional embedding model. Instead, it's a retrieval subagent that:
search_corpustool repeatedlyThe
search_corpustool it calls internally combines:How Elroy currently does search
Elroy uses a single-pass embedding lookup:
text-embedding-3-small, 1536 dims)l2_memory_relevance_distance_thresholdRelevant files:
elroy/llm/client.py:155—get_embedding()elroy/repository/recall/queries.py—query_vector(),get_most_relevant_memories(),search_documents()elroy/db/db_session.py:165—query_vector()ChromaDB callWhy integration isn't viable today
1. Agent harness is not yet public
The paper explicitly states: "Context-1 is designed to operate within a specific agent harness that manages tool execution, token budgets, context pruning, and deduplication — that harness is not yet public." Without it, the model weights alone won't reproduce the reported results.
2. No hosted inference API
As of March 2026, context-1 is not deployed by any HuggingFace inference provider. Self-hosting a 20B model requires substantial GPU resources.
3. Wrong paradigm for Elroy's primary use case
Elroy's main retrieval pattern is simple, low-latency, single-hop: retrieve 2 relevant memories per conversation turn. context-1's multi-turn agentic loop would add significant latency and cost to every message.
4. Not a drop-in replacement
It cannot substitute for the embedding model in
EmbeddingModel/get_embedding(). The existing ChromaDB + L2 search pipeline would need to be redesigned.Where context-1 could fit (future)
The
search_documentstool (elroy/repository/recall/queries.py:72) is the most natural fit. It's used for complex document queries — exactly the multi-hop retrieval scenario context-1 excels at.A potential future architecture:
search_documentson large ingested corporadocument_search_backend: context1 | embedding)This would require:
search(query) -> List[DocumentExcerpt]interfaceAction items
search_documentsas opt-in featureReferences