performance: cache sentence embeddings during chunking

The Monte Carlo boundary search in \`chunker.rs\` calls the ONNX embedding model 10–50 times per document to locate semantic split points between sentences. Sentence embeddings computed during candidate sampling are discarded and the same sentences get re-embedded during boundary refinement.

Caching sentence embeddings (keyed by sentence text or index) within a single \`chunk()\` call would eliminate redundant ONNX inference and significantly reduce ingest latency for long documents, where chunking is currently the dominant cost.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance: cache sentence embeddings during chunking #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

performance: cache sentence embeddings during chunking #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions