Skip to content

performance: cache sentence embeddings during chunking #11

@kordless

Description

@kordless

The Monte Carlo boundary search in `chunker.rs` calls the ONNX embedding model 10–50 times per document to locate semantic split points between sentences. Sentence embeddings computed during candidate sampling are discarded and the same sentences get re-embedded during boundary refinement.

Caching sentence embeddings (keyed by sentence text or index) within a single `chunk()` call would eliminate redundant ONNX inference and significantly reduce ingest latency for long documents, where chunking is currently the dominant cost.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions