Skip to content

perf: Share SentenceTransformer between RAGIndexer and ContextRetriever#157

Open
shrutu0929 wants to merge 1 commit intoRefactron-ai:mainfrom
shrutu0929:perf/share-sentence-transformer
Open

perf: Share SentenceTransformer between RAGIndexer and ContextRetriever#157
shrutu0929 wants to merge 1 commit intoRefactron-ai:mainfrom
shrutu0929:perf/share-sentence-transformer

Conversation

@shrutu0929
Copy link
Copy Markdown

@shrutu0929 shrutu0929 commented Apr 7, 2026

solve #151
The recent updates significantly optimize the memory footprint and startup latency across the entire retrieval-augmented generation (RAG) subsystem. Previously, both the RAGIndexer and ContextRetriever classes independently initialized separate instances of the intensive SentenceTransformer model, causing redundant memory allocation and delayed workload execution—especially noticeable in workflows that sequentially index and then immediately query. We resolved this by implementing a fast, module-level LRU caching factory that centrally manages the heavy machine learning weights. By updating the constructors to accept generic types, both classes now natively query this cache when provided with a model string, safely sharing a singular process-wide model instance. Additionally, we enabled dynamic dependency injection, allowing advanced users to directly pass pre-warmed model objects into either class entirely bypassing string lookups, all while strictly preserving backward compatibility with the existing public interface.

Summary by CodeRabbit

  • Refactor
    • Enhanced RAG system to accept embedding models as either string identifiers or pre-constructed objects.
    • Improved memory efficiency through embedding model reuse and caching.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

The changes refactor the embedding model initialization in RAGIndexer and ContextRetriever to accept either a string model name or a pre-constructed embedding model object. A module-level caching function get_sentence_transformer() was added to reuse SentenceTransformer instances across initializations, reducing duplicate memory allocation.

Changes

Cohort / File(s) Summary
Embedding Model Caching and Flexibility
refactron/rag/indexer.py, refactron/rag/retriever.py
Added cached get_sentence_transformer() function in indexer. Updated both classes to accept embedding_model: Any instead of str, with conditional logic to handle string model names via cached loader or pre-constructed objects directly. Imports updated to include Any type annotation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Poem

🐰 A rabbit rejoiced at the caching so neat,
No duplicate models, just one bittersweet,
With Any so flexibly wrapping the call,
The memory rejoices—less waste after all! 🌟

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'perf: Share SentenceTransformer between RAGIndexer and ContextRetriever' accurately and specifically summarizes the main change: introducing shared caching of SentenceTransformer instances across two classes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
refactron/rag/indexer.py (1)

101-104: ⚠️ Potential issue | 🔴 Critical

Use the model name string instead of the model object in collection metadata.

Storing the raw embedding_model parameter in metadata will fail if a non-string model object is injected, as ChromaDB only supports JSON-serializable primitives in metadata. Use self.embedding_model_name instead, which is guaranteed to be a string.

Suggested patch
         self.collection = self.client.get_or_create_collection(
             name=collection_name,
-            metadata={"embedding_model": embedding_model, "hnsw:space": "cosine"},
+            metadata={"embedding_model": self.embedding_model_name, "hnsw:space": "cosine"},
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/rag/indexer.py` around lines 101 - 104, The collection metadata
currently stores the raw embedding_model parameter which can be a
non-JSON-serializable object; update the call to
self.client.get_or_create_collection (used to set self.collection) to use the
string property self.embedding_model_name instead of embedding_model in the
metadata (e.g., metadata={"embedding_model": self.embedding_model_name,
"hnsw:space": "cosine"}) so only a JSON-serializable string is stored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@refactron/rag/indexer.py`:
- Around line 9-10: Move the imports for refactron.rag up into the top import
block to satisfy flake8 E402 and isort; specifically, relocate any late imports
currently at lines 35-36 (the refs to refactron.rag) so they appear with the
other imports (e.g., alongside "from functools import lru_cache") before any
function or executable module code (before function definitions in this module),
then run isort (profile=black) / pre-commit to ensure ordering and remove the
E402/isort churn.

In `@refactron/rag/retriever.py`:
- Around line 7-10: The import block is not isort-compliant; reorder the typing
imports alphabetically and place standard/library imports before local package
imports so that the line "from typing import List, Optional, Any" becomes
alphabetized as "from typing import Any, List, Optional" (or otherwise sorted by
isort with black profile) and ensure "from refactron.rag.indexer import
get_sentence_transformer" remains after the typing import; run isort
(profile=black) to verify.
- Around line 63-66: In __init__, validate the injected embedding_model
interface immediately: if embedding_model is a str, set self.embedding_model =
get_sentence_transformer(embedding_model) and then verify the resulting object
(or the provided object when not a str) exposes an encode attribute that is
callable (i.e., hasattr(..., "encode") and
callable(self.embedding_model.encode)); if the check fails raise a TypeError
with a clear message indicating that embedding_model must provide a callable
.encode method so callers don’t hit a runtime failure at the later .encode
usage.

---

Outside diff comments:
In `@refactron/rag/indexer.py`:
- Around line 101-104: The collection metadata currently stores the raw
embedding_model parameter which can be a non-JSON-serializable object; update
the call to self.client.get_or_create_collection (used to set self.collection)
to use the string property self.embedding_model_name instead of embedding_model
in the metadata (e.g., metadata={"embedding_model": self.embedding_model_name,
"hnsw:space": "cosine"}) so only a JSON-serializable string is stored.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 39751ae5-dd2a-41d8-b5dc-4f9539743a5a

📥 Commits

Reviewing files that changed from the base of the PR and between a9659f5 and 1c30b6e.

📒 Files selected for processing (2)
  • refactron/rag/indexer.py
  • refactron/rag/retriever.py

Comment on lines +9 to 10
from functools import lru_cache

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix import order/placement to clear pre-commit gates.

Line 35/36 imports appear after executable module code, which triggers flake8 E402 and isort churn in this file. Move refactron.rag imports to the top import block (before function definitions).

Suggested patch
 from __future__ import annotations
 
+from functools import lru_cache
 import json
 from dataclasses import dataclass
 from pathlib import Path
 from typing import Any, Dict, List, Optional, cast
-from functools import lru_cache
+from refactron.rag.chunker import CodeChunk
+from refactron.rag.parser import CodeParser
@@
-@lru_cache(maxsize=2)
-def get_sentence_transformer(model_name: str) -> Any:
+@lru_cache(maxsize=2)
+def get_sentence_transformer(model_name: str) -> Any:
@@
-        )
-    return SentenceTransformer(model_name)
-
-from refactron.rag.chunker import CodeChunk
-from refactron.rag.parser import CodeParser
+        )
+    return SentenceTransformer(model_name)
As per coding guidelines, "Use isort with profile set to 'black' for import sorting".

Also applies to: 33-36

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/rag/indexer.py` around lines 9 - 10, Move the imports for
refactron.rag up into the top import block to satisfy flake8 E402 and isort;
specifically, relocate any late imports currently at lines 35-36 (the refs to
refactron.rag) so they appear with the other imports (e.g., alongside "from
functools import lru_cache") before any function or executable module code
(before function definitions in this module), then run isort (profile=black) /
pre-commit to ensure ordering and remove the E402/isort churn.

Comment on lines +7 to 10
from typing import List, Optional, Any

from refactron.rag.indexer import get_sentence_transformer

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize import ordering to satisfy isort.

This block is out of isort order (typing members should be alphabetized). That’s consistent with the reported pre-commit failure.

Suggested patch
-from typing import List, Optional, Any
+from typing import Any, List, Optional
As per coding guidelines, "Use isort with profile set to 'black' for import sorting".
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from typing import List, Optional, Any
from refactron.rag.indexer import get_sentence_transformer
from typing import Any, List, Optional
from refactron.rag.indexer import get_sentence_transformer
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/rag/retriever.py` around lines 7 - 10, The import block is not
isort-compliant; reorder the typing imports alphabetically and place
standard/library imports before local package imports so that the line "from
typing import List, Optional, Any" becomes alphabetized as "from typing import
Any, List, Optional" (or otherwise sorted by isort with black profile) and
ensure "from refactron.rag.indexer import get_sentence_transformer" remains
after the typing import; run isort (profile=black) to verify.

Comment on lines +63 to +66
if isinstance(embedding_model, str):
self.embedding_model = get_sentence_transformer(embedding_model)
else:
self.embedding_model = embedding_model
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate injected embedding model interface at initialization.

With embedding_model: Any, non-conforming objects pass constructor checks and fail later at Line 95 (.encode). Fail fast in __init__ with a clear error.

Suggested patch
         if isinstance(embedding_model, str):
             self.embedding_model = get_sentence_transformer(embedding_model)
         else:
+            if not hasattr(embedding_model, "encode") or not callable(embedding_model.encode):
+                raise TypeError("embedding_model must be a model name or an object with encode()")
             self.embedding_model = embedding_model
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/rag/retriever.py` around lines 63 - 66, In __init__, validate the
injected embedding_model interface immediately: if embedding_model is a str, set
self.embedding_model = get_sentence_transformer(embedding_model) and then verify
the resulting object (or the provided object when not a str) exposes an encode
attribute that is callable (i.e., hasattr(..., "encode") and
callable(self.embedding_model.encode)); if the check fails raise a TypeError
with a clear message indicating that embedding_model must provide a callable
.encode method so callers don’t hit a runtime failure at the later .encode
usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant