Skip to content

Investigation: context-1 agentic search model for document/memory retrieval #552

@tombedor

Description

@tombedor

Summary

This issue documents an investigation into using chromadb/context-1 as a search backend for Elroy's memory and document retrieval.

Conclusion: Not yet viable for integration, but worth tracking. When the agent harness is released, it could be a strong fit for search_documents on large corpora.


What is context-1?

Context-1 is a 20B parameter agentic search model published by Chroma in March 2026 (research paper). It is derived from gpt-oss-20B and achieves retrieval performance comparable to frontier LLMs at significantly lower cost (up to 10x faster inference).

It is NOT a traditional embedding model. Instead, it's a retrieval subagent that:

  • Operates in an observe-reason-act loop, calling a search_corpus tool repeatedly
  • Decomposes complex queries into targeted subqueries (averages 2.56 tool calls/turn)
  • Self-edits its context mid-search — pruning irrelevant results to free space for further exploration (0.94 prune accuracy)
  • Returns a ranked list of relevant documents to a downstream answering model

The search_corpus tool it calls internally combines:

  • Sparse vector search (keyword)
  • Dense embedding search (semantic)
  • Reciprocal Rank Fusion (RRF) to merge results
  • A reranker to select top results within a token budget

How Elroy currently does search

Elroy uses a single-pass embedding lookup:

  1. Query text → embedding via LiteLLM (text-embedding-3-small, 1536 dims)
  2. Embedding stored/queried in ChromaDB (L2 distance)
  3. Returns top results under l2_memory_relevance_distance_threshold

Relevant files:

  • elroy/llm/client.py:155get_embedding()
  • elroy/repository/recall/queries.pyquery_vector(), get_most_relevant_memories(), search_documents()
  • elroy/db/db_session.py:165query_vector() ChromaDB call

Why integration isn't viable today

1. Agent harness is not yet public

The paper explicitly states: "Context-1 is designed to operate within a specific agent harness that manages tool execution, token budgets, context pruning, and deduplication — that harness is not yet public." Without it, the model weights alone won't reproduce the reported results.

2. No hosted inference API

As of March 2026, context-1 is not deployed by any HuggingFace inference provider. Self-hosting a 20B model requires substantial GPU resources.

3. Wrong paradigm for Elroy's primary use case

Elroy's main retrieval pattern is simple, low-latency, single-hop: retrieve 2 relevant memories per conversation turn. context-1's multi-turn agentic loop would add significant latency and cost to every message.

4. Not a drop-in replacement

It cannot substitute for the embedding model in EmbeddingModel / get_embedding(). The existing ChromaDB + L2 search pipeline would need to be redesigned.


Where context-1 could fit (future)

The search_documents tool (elroy/repository/recall/queries.py:72) is the most natural fit. It's used for complex document queries — exactly the multi-hop retrieval scenario context-1 excels at.

A potential future architecture:

  • Keep existing embedding search for memory/reminder recall (simple, latency-sensitive)
  • Use context-1 as an optional deep search backend for search_documents on large ingested corpora
  • Expose as a configurable option (e.g., document_search_backend: context1 | embedding)

This would require:

  1. Chroma releasing the agent harness
  2. An accessible inference API (Chroma Cloud, HuggingFace, or self-hosted)
  3. Building a context-1 client that wraps the observe-reason-act loop and exposes a search(query) -> List[DocumentExcerpt] interface

Action items

  • Watch chroma-core/chroma for agent harness release
  • Re-evaluate once Chroma Cloud or a hosted API becomes available
  • When available: prototype context-1 backend for search_documents as opt-in feature

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions