perf: Share SentenceTransformer between RAGIndexer and ContextRetriever#157
perf: Share SentenceTransformer between RAGIndexer and ContextRetriever#157shrutu0929 wants to merge 1 commit intoRefactron-ai:mainfrom
Conversation
📝 WalkthroughWalkthroughThe changes refactor the embedding model initialization in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
refactron/rag/indexer.py (1)
101-104:⚠️ Potential issue | 🔴 CriticalUse the model name string instead of the model object in collection metadata.
Storing the raw
embedding_modelparameter in metadata will fail if a non-string model object is injected, as ChromaDB only supports JSON-serializable primitives in metadata. Useself.embedding_model_nameinstead, which is guaranteed to be a string.Suggested patch
self.collection = self.client.get_or_create_collection( name=collection_name, - metadata={"embedding_model": embedding_model, "hnsw:space": "cosine"}, + metadata={"embedding_model": self.embedding_model_name, "hnsw:space": "cosine"}, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@refactron/rag/indexer.py` around lines 101 - 104, The collection metadata currently stores the raw embedding_model parameter which can be a non-JSON-serializable object; update the call to self.client.get_or_create_collection (used to set self.collection) to use the string property self.embedding_model_name instead of embedding_model in the metadata (e.g., metadata={"embedding_model": self.embedding_model_name, "hnsw:space": "cosine"}) so only a JSON-serializable string is stored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@refactron/rag/indexer.py`:
- Around line 9-10: Move the imports for refactron.rag up into the top import
block to satisfy flake8 E402 and isort; specifically, relocate any late imports
currently at lines 35-36 (the refs to refactron.rag) so they appear with the
other imports (e.g., alongside "from functools import lru_cache") before any
function or executable module code (before function definitions in this module),
then run isort (profile=black) / pre-commit to ensure ordering and remove the
E402/isort churn.
In `@refactron/rag/retriever.py`:
- Around line 7-10: The import block is not isort-compliant; reorder the typing
imports alphabetically and place standard/library imports before local package
imports so that the line "from typing import List, Optional, Any" becomes
alphabetized as "from typing import Any, List, Optional" (or otherwise sorted by
isort with black profile) and ensure "from refactron.rag.indexer import
get_sentence_transformer" remains after the typing import; run isort
(profile=black) to verify.
- Around line 63-66: In __init__, validate the injected embedding_model
interface immediately: if embedding_model is a str, set self.embedding_model =
get_sentence_transformer(embedding_model) and then verify the resulting object
(or the provided object when not a str) exposes an encode attribute that is
callable (i.e., hasattr(..., "encode") and
callable(self.embedding_model.encode)); if the check fails raise a TypeError
with a clear message indicating that embedding_model must provide a callable
.encode method so callers don’t hit a runtime failure at the later .encode
usage.
---
Outside diff comments:
In `@refactron/rag/indexer.py`:
- Around line 101-104: The collection metadata currently stores the raw
embedding_model parameter which can be a non-JSON-serializable object; update
the call to self.client.get_or_create_collection (used to set self.collection)
to use the string property self.embedding_model_name instead of embedding_model
in the metadata (e.g., metadata={"embedding_model": self.embedding_model_name,
"hnsw:space": "cosine"}) so only a JSON-serializable string is stored.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 39751ae5-dd2a-41d8-b5dc-4f9539743a5a
📒 Files selected for processing (2)
refactron/rag/indexer.pyrefactron/rag/retriever.py
| from functools import lru_cache | ||
|
|
There was a problem hiding this comment.
Fix import order/placement to clear pre-commit gates.
Line 35/36 imports appear after executable module code, which triggers flake8 E402 and isort churn in this file. Move refactron.rag imports to the top import block (before function definitions).
Suggested patch
from __future__ import annotations
+from functools import lru_cache
import json
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Dict, List, Optional, cast
-from functools import lru_cache
+from refactron.rag.chunker import CodeChunk
+from refactron.rag.parser import CodeParser
@@
-@lru_cache(maxsize=2)
-def get_sentence_transformer(model_name: str) -> Any:
+@lru_cache(maxsize=2)
+def get_sentence_transformer(model_name: str) -> Any:
@@
- )
- return SentenceTransformer(model_name)
-
-from refactron.rag.chunker import CodeChunk
-from refactron.rag.parser import CodeParser
+ )
+ return SentenceTransformer(model_name)Also applies to: 33-36
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/rag/indexer.py` around lines 9 - 10, Move the imports for
refactron.rag up into the top import block to satisfy flake8 E402 and isort;
specifically, relocate any late imports currently at lines 35-36 (the refs to
refactron.rag) so they appear with the other imports (e.g., alongside "from
functools import lru_cache") before any function or executable module code
(before function definitions in this module), then run isort (profile=black) /
pre-commit to ensure ordering and remove the E402/isort churn.
| from typing import List, Optional, Any | ||
|
|
||
| from refactron.rag.indexer import get_sentence_transformer | ||
|
|
There was a problem hiding this comment.
Normalize import ordering to satisfy isort.
This block is out of isort order (typing members should be alphabetized). That’s consistent with the reported pre-commit failure.
Suggested patch
-from typing import List, Optional, Any
+from typing import Any, List, Optional📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from typing import List, Optional, Any | |
| from refactron.rag.indexer import get_sentence_transformer | |
| from typing import Any, List, Optional | |
| from refactron.rag.indexer import get_sentence_transformer | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/rag/retriever.py` around lines 7 - 10, The import block is not
isort-compliant; reorder the typing imports alphabetically and place
standard/library imports before local package imports so that the line "from
typing import List, Optional, Any" becomes alphabetized as "from typing import
Any, List, Optional" (or otherwise sorted by isort with black profile) and
ensure "from refactron.rag.indexer import get_sentence_transformer" remains
after the typing import; run isort (profile=black) to verify.
| if isinstance(embedding_model, str): | ||
| self.embedding_model = get_sentence_transformer(embedding_model) | ||
| else: | ||
| self.embedding_model = embedding_model |
There was a problem hiding this comment.
Validate injected embedding model interface at initialization.
With embedding_model: Any, non-conforming objects pass constructor checks and fail later at Line 95 (.encode). Fail fast in __init__ with a clear error.
Suggested patch
if isinstance(embedding_model, str):
self.embedding_model = get_sentence_transformer(embedding_model)
else:
+ if not hasattr(embedding_model, "encode") or not callable(embedding_model.encode):
+ raise TypeError("embedding_model must be a model name or an object with encode()")
self.embedding_model = embedding_model🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/rag/retriever.py` around lines 63 - 66, In __init__, validate the
injected embedding_model interface immediately: if embedding_model is a str, set
self.embedding_model = get_sentence_transformer(embedding_model) and then verify
the resulting object (or the provided object when not a str) exposes an encode
attribute that is callable (i.e., hasattr(..., "encode") and
callable(self.embedding_model.encode)); if the check fails raise a TypeError
with a clear message indicating that embedding_model must provide a callable
.encode method so callers don’t hit a runtime failure at the later .encode
usage.
solve #151
The recent updates significantly optimize the memory footprint and startup latency across the entire retrieval-augmented generation (RAG) subsystem. Previously, both the RAGIndexer and ContextRetriever classes independently initialized separate instances of the intensive SentenceTransformer model, causing redundant memory allocation and delayed workload execution—especially noticeable in workflows that sequentially index and then immediately query. We resolved this by implementing a fast, module-level LRU caching factory that centrally manages the heavy machine learning weights. By updating the constructors to accept generic types, both classes now natively query this cache when provided with a model string, safely sharing a singular process-wide model instance. Additionally, we enabled dynamic dependency injection, allowing advanced users to directly pass pre-warmed model objects into either class entirely bypassing string lookups, all while strictly preserving backward compatibility with the existing public interface.
Summary by CodeRabbit