Skip to content

[EPIC] The Rust API for CocoIndex v1.0's engine #132

@bashandbone

Description

@bashandbone

With Upstream's Move to v1.0.0, Recoco will need a complete Rust API

Recoco is building the first and only pure-Rust API for the incremental data processing engine that powers CocoIndex. Upstream has committed fully to Python for their v1 user-facing API, Recoco must provide a native Rust experience: typed operations, proc macros, explicit context management, and zero Python dependencies.

If you're looking for a Rust API for CocoIndex, this is it.

Upstream context

The CocoIndex community has expressed clear demand for a Rust API:

Upstream's v1 milestone is exclusively Python-focused. Neither Rust API issue is milestoned. The v1 branch has rebuilt all API and integration layers in Python, with the Rust engine serving only as an internal runtime.

Recoco's v1.0.0 branch has full parity with upstream's Rust -- sharing the same high-performance Rust engine (LMDB-backed state, component memoization, target reconciliation, incremental processing). But with upstream's divergence, we need to plan for a proper all-Rust API while keeping in sync and benefiting from upstream's engine development. I would have preferred more alignment, but it is what it is -- their motivations and needs don't align with mine. It makes sense for them, but doesn't give me the foundation I need for Thread

This issue will serve as the tracking and planning issue for Recoco's v1 design and API.

What we're building

Goals:

  • Continue to match upstream's engine in recoco-core, optimizing for Rust where it makes sense, and contributing improvements upstream where directions/needs align. (I've admittedly been bad about this; I need to do better)
  • Continue our obsessiveness about feature gating and performance optimization.
  • Build the Rust API for upstream's engine. This means to the extent practical extending idioms and API surfaces in upstream's python to Rust. There will necessarily be large divergences -- Rust is a very different language and Rustaceans expect different things from their APIs.
  • Aim to implement parity with upstream's integrations in Rust (what were functions/sources/targets in pre-v1). Granularly feature gated without compromise. I'm hoping we pick up some community interest to help in this department.
#[recoco::function(memo)]
async fn process_file(ctx: &Ctx, file: FileEntry, table: &TableTarget) -> Result<()> {
    let text = file.read_text().await?;
    let chunks = RecursiveSplitter::default().split(&text, SplitOptions::default());
    let embedder = ctx.use_resource(&EMBEDDER);
    for chunk in &chunks {
        let embedding = embedder.embed(&chunk.text).await?;
        table.declare_row(/* ... */)?;
    }
    Ok(())
}

Programming model: Persistent-state-driven — transformations are plain Rust async fn functions. The engine handles incrementality, memoization, lineage, and fault tolerance transparently. No DSL, no graph builder, no string-based dispatch.

Key features:

  • #[recoco::function] proc macro with memo, batching, and code-hash-based cache invalidation (our blake3 implementation vice upstream's blake2)
  • Environment + App for LMDB-backed persistent state
  • Ctx with typed ContextKey<T> for explicit resource management
  • mount_each() for parallel, incremental component processing
  • Sources as iterators, functions as direct method calls, targets as declarative mounts
  • Feature-gated everything — only compile what you use

Design ancestry

Our API design draws directly from @tomz-alt's proposal in cocoindex-io/cocoindex#1667, refined with the feedback from upstream maintainers and adapted for Recoco's pure-Rust context. We diverge where Rust idioms demand it (e.g., ContextKey<T> over type-erased lookup, Environment/App separation, richer target abstractions).

@tomz-alt — if you're interested in contributing to a Rust implementation of the ideas you proposed, we'd welcome you. Your design work and understanding of the engine are exactly what this project needs. You mentioned having a partial implementation; we'd love to build on that foundation.

Current state

Component Status Location
Engine (LMDB, components, memoization) Done v1.0.0 branch, crates/core/
Operations (sources, functions, targets) Done (old arch) main branch, crates/recoco-core/src/ops/
Concrete RecocoProfile Not started
Ctx / ContextKey / Environment / App API Not started
#[recoco::function] proc macro Not started
Operation port (old arch → new API) Not started

The engine is ready. The operations exist but are wired for the pre-v1 FlowBuilder architecture. This epic tracks bridging them with a Rust-native API layer.

Implementation plan

Phase 1: Foundation — Concrete RecocoProfile, Ctx, ContextKey<T>, Environment, App
Phase 2: Proc macrosrecoco-macros crate, #[recoco::function(memo, batching)]
Phase 3: Port operations — Sources, functions, targets as standalone Rust types
Phase 4: Transient mode — One-shot execution without LMDB persistence
Phase 5: Query handlers — Search endpoints, graph DB mappings, reusable transforms

Areas requiring deep design

Each of these will get their own issue before implementation begins:

  • RecocoProfile concrete types — 8 associated types that define the entire API surface
  • Proc macro design#[recoco::function] code generation, error quality, compile-fail tests
  • Ctx lifetime and ownership — Thread safety, child contexts, reference ergonomics
  • Target state reconciliation — Serialization format, row identity, diffing, backend mapping
  • Incremental source change detection — Engine integration, CDC, file watching
  • Crate structure — Workspace organization, feature flag design, public API paths
  • Error handlingthiserror types for public API, propagation through the engine
  • Operation porting strategy — Priority order, standard patterns, config migration
  • Docs - Update v1 docs to reflect the new surface and API.

Related links

Upstream Rust API demand:

Upstream v1 direction (Python-only):

Recoco design documents:

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationenhancementNew feature or requesthelp wantedExtra attention is neededneeds-designupstream-syncIssues for syncing updates with our upstream (cocoindex-io/cocoindex)v1.0.0Dev branch for v1.0.0

    Type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions