Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 18 additions & 3 deletions refactron/rag/indexer.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Dict, List, Optional, cast
from functools import lru_cache

Comment on lines +9 to 10
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix import order/placement to clear pre-commit gates.

Line 35/36 imports appear after executable module code, which triggers flake8 E402 and isort churn in this file. Move refactron.rag imports to the top import block (before function definitions).

Suggested patch
 from __future__ import annotations
 
+from functools import lru_cache
 import json
 from dataclasses import dataclass
 from pathlib import Path
 from typing import Any, Dict, List, Optional, cast
-from functools import lru_cache
+from refactron.rag.chunker import CodeChunk
+from refactron.rag.parser import CodeParser
@@
-@lru_cache(maxsize=2)
-def get_sentence_transformer(model_name: str) -> Any:
+@lru_cache(maxsize=2)
+def get_sentence_transformer(model_name: str) -> Any:
@@
-        )
-    return SentenceTransformer(model_name)
-
-from refactron.rag.chunker import CodeChunk
-from refactron.rag.parser import CodeParser
+        )
+    return SentenceTransformer(model_name)
As per coding guidelines, "Use isort with profile set to 'black' for import sorting".

Also applies to: 33-36

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/rag/indexer.py` around lines 9 - 10, Move the imports for
refactron.rag up into the top import block to satisfy flake8 E402 and isort;
specifically, relocate any late imports currently at lines 35-36 (the refs to
refactron.rag) so they appear with the other imports (e.g., alongside "from
functools import lru_cache") before any function or executable module code
(before function definitions in this module), then run isort (profile=black) /
pre-commit to ensure ordering and remove the E402/isort churn.

try:
import chromadb
Expand All @@ -19,6 +20,16 @@
SentenceTransformer = None
CHROMA_AVAILABLE = False

@lru_cache(maxsize=2)
def get_sentence_transformer(model_name: str) -> Any:
"""Module-level LRU cache for SentenceTransformer to avoid duplicate memory allocation."""
if not SentenceTransformer:
raise RuntimeError(
"sentence-transformers is not available. "
"Install with: pip install sentence-transformers"
)
return SentenceTransformer(model_name)

from refactron.rag.chunker import CodeChunk
from refactron.rag.parser import CodeParser

Expand Down Expand Up @@ -46,7 +57,7 @@ class RAGIndexer:
def __init__(
self,
workspace_path: Path,
embedding_model: str = "all-MiniLM-L6-v2",
embedding_model: Any = "all-MiniLM-L6-v2",
collection_name: str = "code_chunks",
llm_client: Optional[GroqClient] = None,
):
Expand Down Expand Up @@ -74,8 +85,12 @@ def __init__(
self.llm_client = llm_client

# Initialize embedding model
self.embedding_model_name = embedding_model
self.embedding_model = SentenceTransformer(embedding_model)
if isinstance(embedding_model, str):
self.embedding_model_name = embedding_model
self.embedding_model = get_sentence_transformer(embedding_model)
else:
self.embedding_model_name = "custom_model"
self.embedding_model = embedding_model

# Initialize ChromaDB
self.client = chromadb.PersistentClient(
Expand Down
11 changes: 8 additions & 3 deletions refactron/rag/retriever.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@

from dataclasses import dataclass
from pathlib import Path
from typing import List, Optional
from typing import List, Optional, Any

from refactron.rag.indexer import get_sentence_transformer

Comment on lines +7 to 10
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize import ordering to satisfy isort.

This block is out of isort order (typing members should be alphabetized). That’s consistent with the reported pre-commit failure.

Suggested patch
-from typing import List, Optional, Any
+from typing import Any, List, Optional
As per coding guidelines, "Use isort with profile set to 'black' for import sorting".
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from typing import List, Optional, Any
from refactron.rag.indexer import get_sentence_transformer
from typing import Any, List, Optional
from refactron.rag.indexer import get_sentence_transformer
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/rag/retriever.py` around lines 7 - 10, The import block is not
isort-compliant; reorder the typing imports alphabetically and place
standard/library imports before local package imports so that the line "from
typing import List, Optional, Any" becomes alphabetized as "from typing import
Any, List, Optional" (or otherwise sorted by isort with black profile) and
ensure "from refactron.rag.indexer import get_sentence_transformer" remains
after the typing import; run isort (profile=black) to verify.

try:
import chromadb
Expand Down Expand Up @@ -38,7 +40,7 @@ class ContextRetriever:
def __init__(
self,
workspace_path: Path,
embedding_model: str = "all-MiniLM-L6-v2",
embedding_model: Any = "all-MiniLM-L6-v2",
collection_name: str = "code_chunks",
):
"""Initialize the context retriever.
Expand All @@ -58,7 +60,10 @@ def __init__(
self.index_path = self.workspace_path / ".rag"

# Initialize embedding model
self.embedding_model = SentenceTransformer(embedding_model)
if isinstance(embedding_model, str):
self.embedding_model = get_sentence_transformer(embedding_model)
else:
self.embedding_model = embedding_model
Comment on lines +63 to +66
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate injected embedding model interface at initialization.

With embedding_model: Any, non-conforming objects pass constructor checks and fail later at Line 95 (.encode). Fail fast in __init__ with a clear error.

Suggested patch
         if isinstance(embedding_model, str):
             self.embedding_model = get_sentence_transformer(embedding_model)
         else:
+            if not hasattr(embedding_model, "encode") or not callable(embedding_model.encode):
+                raise TypeError("embedding_model must be a model name or an object with encode()")
             self.embedding_model = embedding_model
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/rag/retriever.py` around lines 63 - 66, In __init__, validate the
injected embedding_model interface immediately: if embedding_model is a str, set
self.embedding_model = get_sentence_transformer(embedding_model) and then verify
the resulting object (or the provided object when not a str) exposes an encode
attribute that is callable (i.e., hasattr(..., "encode") and
callable(self.embedding_model.encode)); if the check fails raise a TypeError
with a clear message indicating that embedding_model must provide a callable
.encode method so callers don’t hit a runtime failure at the later .encode
usage.


# Initialize ChromaDB client
self.client = chromadb.PersistentClient(
Expand Down
Loading