Skip to content

perf: Batch embeddings in RAGIndexer instead of per-file encode #150

@omsherikar

Description

@omsherikar

Scope

refactron/rag/indexer.pyadd_chunks / index_repository

Problem

Each file path through _index_fileadd_chunks calls self.embedding_model.encode(documents, ...) for only that file's chunks. SentenceTransformer/GPU throughput is much better with larger batches (e.g. hundreds of texts per call) than many small calls.

Suggested direction

  • Accumulate (chunks, documents) across files (with a max batch size / memory cap), then encode in batches and collection.add in corresponding batches.
  • Optional: configurable --batch-size for index CLI.

Acceptance

  • Benchmark: indexing N files shows fewer encode invocations and lower wall-clock time on representative repos (document rough numbers in PR).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions