Perf/symbol table cache invalidation#166
Perf/symbol table cache invalidation#166shrutu0929 wants to merge 5 commits intoRefactron-ai:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdded per-file cache metadata and incremental rebuild to the SymbolTable: files are fingerprinted (size, mtime, sha256), changed files are re-analyzed, removed files purge symbols/exports, and cache persistence now includes Changes
Sequence DiagramsequenceDiagram
participant BuilderAPI as SymbolTableBuilder
participant Cache as Cache Storage
participant FileSystem as File System
participant Analyzer as Analyzer/_analyze_file
BuilderAPI->>Cache: Load cached symbols + file_metadata
BuilderAPI->>FileSystem: List current *.py files
BuilderAPI->>BuilderAPI: Determine removed / added / existing files
loop Removed files
BuilderAPI->>BuilderAPI: remove_file(file_path) — purge symbols & exports
end
loop For each existing or new file
BuilderAPI->>FileSystem: stat (size, mtime)
alt size/mtime indicates possible change
BuilderAPI->>FileSystem: read file bytes
BuilderAPI->>BuilderAPI: _calculate_hash()
BuilderAPI->>BuilderAPI: _has_file_changed()?
end
alt File changed or new
BuilderAPI->>Analyzer: _analyze_file(path)
Analyzer-->>BuilderAPI: Symbols
BuilderAPI->>BuilderAPI: clear prior symbols for file
BuilderAPI->>BuilderAPI: _update_file_metadata(path)
else Unchanged
BuilderAPI->>Cache: reuse cached symbols
end
end
BuilderAPI->>Cache: _save_cache() with updated file_metadata
Cache-->>BuilderAPI: persisted
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
fe47718 to
95eb501
Compare
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
tests/test_config_management.py (1)
1213-1214: Consider using underscore prefix for unused unpacked variables.Static analysis (Ruff RUF059) flags several unpacked variables that are never used. While not blocking, prefixing unused variables with
_improves clarity and silences linter warnings.♻️ Example fix for a few cases
- results, errors, skips = pp.process_files([], success_func) - assert results == [] and errors == [] and skips == [] + results, errors, skips = pp.process_files([], success_func) + assert results == [] and errors == [] and skips == [] - results, errors, skips = pp.process_files([Path("a.py")], raises_func) + _results, errors, _skips = pp.process_files([Path("a.py")], raises_func) assert len(errors) == 1 - results, errors, skips = pp.process_files(files, success_func) + results, _errors, skips = pp.process_files(files, success_func) assert len(results) == 2 assert len(skips) == 0Apply similar changes where variables are unpacked but not subsequently used in assertions.
Also applies to: 1219-1220, 1224-1225, 1229-1230, 1247-1249, 1253-1254, 1268-1269, 1278-1279, 1284-1284
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_config_management.py` around lines 1213 - 1214, The unpacked variables from pp.process_files (results, errors, skips) are unused and flagged by Ruff RUF059; rename any unused unpacked variables to start with an underscore (for example _results, _errors, _skips or _ for single-use) in the call sites (including the shown line and the other locations noted: around lines 1219-1220, 1224-1225, 1229-1230, 1247-1249, 1253-1254, 1268-1269, 1278-1279, 1284) so the linter stops warning while preserving the assertions that actually check values where needed; update only the variable names in unpackings for functions like pp.process_files and any similar unpack patterns (do not change function signatures or logic).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@refactron/analysis/symbol_table.py`:
- Around line 195-200: The _calculate_hash function currently returns an empty
string on exception which gets persisted and weakens rollback-safe invalidation;
change its signature to return Optional[str] (or None on failure) instead of "",
log or capture the underlying exception, and ensure callers (the code that
stores metadata) treat None as "no hash available" and skip persisting any hash
value; apply the same change to the other similar try/except block referenced
(lines 206-210) so no empty string hash is ever written to metadata.
- Line 12: The import list in symbol_table.py includes an unused symbol `Set`
causing a flake8 F401; remove `Set` from the typing import line (the line
importing Any, Dict, List, Optional, Set) so it becomes only the actually used
types (e.g., Any, Dict, List, Optional), ensuring no other references to `Set`
exist in the file (check functions/classes in this module) before committing.
- Around line 150-156: The deletion cleanup currently uses only
self.symbol_table.file_metadata to build cached_files, so when an older cache
has symbols but no file_metadata deleted files aren't found; update the logic in
the cleanup loop to compute cached_files by first using
self.symbol_table.file_metadata if present and non-empty, otherwise fall back to
deriving file paths from other symbol stores on the symbol_table (e.g., keys
from self.symbol_table.exported_symbols or any self.symbol_table.files /
self.symbol_table.symbols mapping), then proceed to call
self.symbol_table.remove_file(cached_path) for paths not in current_file_paths;
ensure you reference the existing symbols (exported_symbols, remove_file,
file_metadata) to locate the right structures.
- Around line 65-67: Define a concrete TypedDict (e.g., FileMetadata with keys
"mtime": float, "size": int, "sha256": str) and replace the loose type Dict[str,
Dict[str, Any]] on file_metadata with Dict[str, FileMetadata]; update any
imports (from typing import TypedDict) and usages so comparisons that read
file_metadata[...] .get("mtime"/"size"/"sha256") yield the correct types
(float/int/str) instead of Any, ensuring functions in symbol_table.py that
compare metadata values (references to the file_metadata variable) now return
proper bools for mypy.
In `@refactron/core/parallel.py`:
- Line 10: The import ordering in refactron.core.parallel is incorrect for
isort: reorder the module-level imports so they follow alphabetical order within
their group; specifically ensure the import line referencing FileMetrics and
AnalysisSkipWarning from refactron.core.models is positioned and ordered
according to isort rules (alphabetize the imported symbols and place the import
in the correct import group) so that FileMetrics and AnalysisSkipWarning are
sorted and the file passes isort.
- Around line 61-63: The long type hints on the signature of process_files (and
similarly in _process_sequential, _process_parallel_threads,
_process_parallel_processes) exceed 100 chars; refactor each signature to break
the Callable[...] type and return Tuple[...] across multiple lines by putting
the Callable parameter and Tuple return types on their own indented lines (e.g.,
split the process_func: Callable[[Path], Tuple[Optional[FileMetrics],
Optional[FileAnalysisError], Optional[AnalysisSkipWarning]]], and the
progress_callback/return Tuple onto separate lines) so each line is <=100 chars
and conforms to black/flake8 formatting while keeping the same types and
parameter order.
In `@refactron/core/refactron.py`:
- Around line 268-270: The return type annotation on process_file_wrapper is too
long and causes black/line-length failures; update the function signature in
refactron.core.refactron by breaking the return type across multiple lines or
introducing a short type alias (e.g., AnalysisResult) used in the annotation, so
the annotation for Tuple[Optional[FileMetrics], Optional[FileAnalysisError],
Optional[AnalysisSkipWarning]] is wrapped or replaced to satisfy the 100-char
limit while preserving the same types.
In `@tests/test_symbol_table_incremental.py`:
- Around line 3-4: The test file imports unused symbols causing flake8 F401
failures; remove the unused imports Path and SymbolType from the import line so
only the used SymbolTableBuilder is imported from
refactron.analysis.symbol_table (i.e., adjust the import in
tests/test_symbol_table_incremental.py to drop Path and SymbolType).
---
Nitpick comments:
In `@tests/test_config_management.py`:
- Around line 1213-1214: The unpacked variables from pp.process_files (results,
errors, skips) are unused and flagged by Ruff RUF059; rename any unused unpacked
variables to start with an underscore (for example _results, _errors, _skips or
_ for single-use) in the call sites (including the shown line and the other
locations noted: around lines 1219-1220, 1224-1225, 1229-1230, 1247-1249,
1253-1254, 1268-1269, 1278-1279, 1284) so the linter stops warning while
preserving the assertions that actually check values where needed; update only
the variable names in unpackings for functions like pp.process_files and any
similar unpack patterns (do not change function signatures or logic).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9a6ad5d8-b291-45f1-829b-69c8f4b50792
📒 Files selected for processing (6)
refactron/analysis/symbol_table.pyrefactron/core/parallel.pyrefactron/core/refactron.pytests/test_config_management.pytests/test_performance_optimization.pytests/test_symbol_table_incremental.py
| # Map: file_path -> { "mtime": float, "size": int, "sha256": str } | ||
| file_metadata: Dict[str, Dict[str, Any]] = field(default_factory=dict) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify Any-typed metadata declaration
rg -n 'file_metadata:\s*Dict\[str,\s*Dict\[str,\s*Any\]\]' refactron/analysis/symbol_table.py
# Verify bool-returning function currently returns expressions based on Any lookups
rg -n 'return current_hash != stored_hash|return stat\.st_mtime != metadata\.get\("mtime"\)' \
refactron/analysis/symbol_table.pyRepository: Refactron-ai/Refactron_lib
Length of output: 264
🏁 Script executed:
# View the import statements and file_metadata declaration (around line 66)
head -70 refactron/analysis/symbol_table.py | tail -25Repository: Refactron-ai/Refactron_lib
Length of output: 1035
🏁 Script executed:
# View the function containing the problematic return statements (around lines 176-189)
sed -n '170,195p' refactron/analysis/symbol_table.pyRepository: Refactron-ai/Refactron_lib
Length of output: 1005
🏁 Script executed:
# Check Python version targets and type annotations in the file
head -20 refactron/analysis/symbol_table.pyRepository: Refactron-ai/Refactron_lib
Length of output: 533
🏁 Script executed:
# Check if TypedDict is already imported or if we need typing_extensions
grep -n "TypedDict\|typing_extensions" refactron/analysis/symbol_table.pyRepository: Refactron-ai/Refactron_lib
Length of output: 52
🏁 Script executed:
# Check Python version targets in project configuration
fd -e pyproject.toml -e setup.py -e setup.cfg | head -3 | xargs grep -l "python_requires\|target-version" 2>/dev/null | head -1 | xargs cat | grep -A5 -B5 "python\|version" | head -30Repository: Refactron-ai/Refactron_lib
Length of output: 52
🏁 Script executed:
# Find and read pyproject.toml to check Python version targets
find . -name "pyproject.toml" -type f | head -1 | xargs cat | grep -A10 "python" | head -15Repository: Refactron-ai/Refactron_lib
Length of output: 679
🏁 Script executed:
# Also verify the complete context of the problematic lines one more time with broader view
sed -n '176,195p' refactron/analysis/symbol_table.pyRepository: Refactron-ai/Refactron_lib
Length of output: 775
Tighten file_metadata typing to resolve mypy no-any-return errors.
Dict[str, Dict[str, Any]] types metadata values as Any, causing comparisons at lines 186 and 189 to return Any instead of bool, violating the function's return type annotation.
Suggested fix
-from typing import Any, Dict, List, Optional
+from typing import Any, Dict, List, Optional, TypedDict
+
+
+class FileMetadata(TypedDict, total=False):
+ mtime: float
+ size: int
+ sha256: str
@@
- file_metadata: Dict[str, Dict[str, Any]] = field(default_factory=dict)
+ file_metadata: Dict[str, FileMetadata] = field(default_factory=dict)
@@
- metadata = self.symbol_table.file_metadata[file_path_str]
+ metadata: FileMetadata = self.symbol_table.file_metadata[file_path_str]
@@
- stored_hash = metadata.get("sha256")
- if stored_hash:
+ stored_hash = metadata.get("sha256")
+ if isinstance(stored_hash, str) and stored_hash:
current_hash = self._calculate_hash(file_path)
- return current_hash != stored_hash
+ return bool(current_hash != stored_hash)
@@
- return stat.st_mtime != metadata.get("mtime")
+ stored_mtime = metadata.get("mtime")
+ return bool(stored_mtime is None or stat.st_mtime != stored_mtime)Required to satisfy mypy disallow_untyped_defs in refactron/ per coding guidelines.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/analysis/symbol_table.py` around lines 65 - 67, Define a concrete
TypedDict (e.g., FileMetadata with keys "mtime": float, "size": int, "sha256":
str) and replace the loose type Dict[str, Dict[str, Any]] on file_metadata with
Dict[str, FileMetadata]; update any imports (from typing import TypedDict) and
usages so comparisons that read file_metadata[...] .get("mtime"/"size"/"sha256")
yield the correct types (float/int/str) instead of Any, ensuring functions in
symbol_table.py that compare metadata values (references to the file_metadata
variable) now return proper bools for mypy.
| # 1. Remove deleted files | ||
| cached_files = list(self.symbol_table.file_metadata.keys()) | ||
| for cached_path in cached_files: | ||
| if cached_path not in current_file_paths: | ||
| logger.debug(f"Removing deleted file from symbol table: {cached_path}") | ||
| self.symbol_table.remove_file(cached_path) | ||
|
|
There was a problem hiding this comment.
Deleted-file cleanup misses stale symbols when old cache lacks file_metadata.
Line 151 derives cached_files only from self.symbol_table.file_metadata. If an older cache has symbols but no
metadata, deleted files are never purged and stale exports can persist.
Suggested fix
- cached_files = list(self.symbol_table.file_metadata.keys())
+ cached_files = set(self.symbol_table.file_metadata.keys()) | set(
+ self.symbol_table.symbols.keys()
+ )
for cached_path in cached_files:
if cached_path not in current_file_paths:
logger.debug(f"Removing deleted file from symbol table: {cached_path}")
self.symbol_table.remove_file(cached_path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/analysis/symbol_table.py` around lines 150 - 156, The deletion
cleanup currently uses only self.symbol_table.file_metadata to build
cached_files, so when an older cache has symbols but no file_metadata deleted
files aren't found; update the logic in the cleanup loop to compute cached_files
by first using self.symbol_table.file_metadata if present and non-empty,
otherwise fall back to deriving file paths from other symbol stores on the
symbol_table (e.g., keys from self.symbol_table.exported_symbols or any
self.symbol_table.files / self.symbol_table.symbols mapping), then proceed to
call self.symbol_table.remove_file(cached_path) for paths not in
current_file_paths; ensure you reference the existing symbols (exported_symbols,
remove_file, file_metadata) to locate the right structures.
| def _calculate_hash(self, file_path: Path) -> str: | ||
| """Calculate SHA-256 hash of file content.""" | ||
| try: | ||
| return hashlib.sha256(file_path.read_bytes()).hexdigest() | ||
| except Exception: | ||
| return "" |
There was a problem hiding this comment.
Do not persist empty SHA-256 values; it weakens rollback-safe invalidation.
If _calculate_hash() fails, it returns "", and that value is stored in metadata. On later runs this disables the
hash path and falls back to mtime, which can miss changes after timestamp rollback.
Suggested fix
-def _calculate_hash(self, file_path: Path) -> str:
+def _calculate_hash(self, file_path: Path) -> Optional[str]:
@@
- except Exception:
- return ""
+ except OSError as e:
+ logger.warning(f"Failed to hash {file_path}: {e}")
+ return None
@@
- self.symbol_table.file_metadata[file_path_str] = {
+ file_hash = self._calculate_hash(file_path)
+ if not file_hash:
+ self.symbol_table.file_metadata.pop(file_path_str, None)
+ return
+ self.symbol_table.file_metadata[file_path_str] = {
"mtime": stat.st_mtime,
"size": stat.st_size,
- "sha256": self._calculate_hash(file_path),
+ "sha256": file_hash,
}Also applies to: 206-210
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 199-199: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/analysis/symbol_table.py` around lines 195 - 200, The
_calculate_hash function currently returns an empty string on exception which
gets persisted and weakens rollback-safe invalidation; change its signature to
return Optional[str] (or None on failure) instead of "", log or capture the
underlying exception, and ensure callers (the code that stores metadata) treat
None as "no hash available" and skip persisting any hash value; apply the same
change to the other similar try/except block referenced (lines 206-210) so no
empty string hash is ever written to metadata.
|
|
||
| from refactron.core.analysis_result import FileAnalysisError | ||
| from refactron.core.models import FileMetrics | ||
| from refactron.core.models import FileMetrics, AnalysisSkipWarning |
There was a problem hiding this comment.
Import ordering needs adjustment.
The pipeline indicates isort failed. The import should maintain alphabetical order within the module import.
🔧 Proposed fix
-from refactron.core.models import FileMetrics, AnalysisSkipWarning
+from refactron.core.models import AnalysisSkipWarning, FileMetrics📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from refactron.core.models import FileMetrics, AnalysisSkipWarning | |
| from refactron.core.models import AnalysisSkipWarning, FileMetrics |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/core/parallel.py` at line 10, The import ordering in
refactron.core.parallel is incorrect for isort: reorder the module-level imports
so they follow alphabetical order within their group; specifically ensure the
import line referencing FileMetrics and AnalysisSkipWarning from
refactron.core.models is positioned and ordered according to isort rules
(alphabetize the imported symbols and place the import in the correct import
group) so that FileMetrics and AnalysisSkipWarning are sorted and the file
passes isort.
| from pathlib import Path | ||
| from refactron.analysis.symbol_table import SymbolTableBuilder, SymbolType |
There was a problem hiding this comment.
Drop unused imports to fix pre-commit failures.
Path and SymbolType are unused and currently break flake8 (F401).
Suggested fix
import json
import time
-from pathlib import Path
-from refactron.analysis.symbol_table import SymbolTableBuilder, SymbolType
+from refactron.analysis.symbol_table import SymbolTableBuilder📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from pathlib import Path | |
| from refactron.analysis.symbol_table import SymbolTableBuilder, SymbolType | |
| import json | |
| import time | |
| from refactron.analysis.symbol_table import SymbolTableBuilder |
🧰 Tools
🪛 GitHub Actions: Pre-commit
[error] 3-3: flake8: F401 'pathlib.Path' imported but unused
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_symbol_table_incremental.py` around lines 3 - 4, The test file
imports unused symbols causing flake8 F401 failures; remove the unused imports
Path and SymbolType from the import line so only the used SymbolTableBuilder is
imported from refactron.analysis.symbol_table (i.e., adjust the import in
tests/test_symbol_table_incremental.py to drop Path and SymbolType).
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
refactron/analysis/symbol_table.py (1)
12-13:⚠️ Potential issue | 🟠 MajorGive
file_metadataa concrete type and collapse the typing imports.
Dict[str, Dict[str, Any]]is still the source of themypyno-any-returnfailures in_has_file_changed(), and the duplicated typing imports still leaveSetunused. Defining aTypedDicthere fixes the type hole and clears the import noise in one pass.Suggested fix
-from typing import Any, Dict, List, Optional, Set -from typing import Any, Dict, List, Optional +from typing import Any, Dict, List, Optional, TypedDict + + +class FileMetadata(TypedDict, total=False): + mtime: float + size: int + sha256: str @@ - file_metadata: Dict[str, Dict[str, Any]] = field(default_factory=dict) + file_metadata: Dict[str, FileMetadata] = field(default_factory=dict)As per coding guidelines,
refactron/**/*.py: Type annotations are required in
refactron/ with mypy disallow_untyped_defs = true enabled.Also applies to: 66-67
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@refactron/analysis/symbol_table.py` around lines 12 - 13, Replace the loose Dict[str, Dict[str, Any]] type for file_metadata with a concrete TypedDict (e.g., FileMetadata or FileMeta specifying the inner keys and value types) and update the signature/uses in _has_file_changed to return/accept that TypedDict instead of Any; also collapse the duplicated typing imports into a single line (only import the actually used names) and remove the unused Set import so mypy no-any-return errors disappear and imports are tidy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@refactron/analysis/symbol_table.py`:
- Around line 148-165: The code compares and removes files using absolute paths
but calls _analyze_file/_has_file_changed/_update_file_metadata with the Path
object or non-normalized string, causing mismatches; normalize each file path
once into a single canonical string (e.g., file_path_str =
str(file_path.absolute())) before any checks and pass that same file_path_str
into _has_file_changed (or adapt _has_file_changed to accept the canonical
string), _analyze_file (modify _analyze_file to accept the canonical path or
pass it alongside the Path), remove_file, and _update_file_metadata so all
operations use the exact same normalized identifier and stale entries cannot
survive renames/relative roots.
- Around line 141-165: The incremental update in build_for_project currently
falls through into the legacy build and returns a stale symbols.json; fix by
short-circuiting: after loading cache (via _load_cache) and performing the
incremental delete/analyze/update loop (using _has_file_changed, _analyze_file,
_update_file_metadata and symbol_table.remove_file), save the updated table
(call _save_cache if available) and return self.symbol_table immediately when
self.cache_dir is truthy so the function does not continue into the old
implementation that reloads the stale file.
---
Duplicate comments:
In `@refactron/analysis/symbol_table.py`:
- Around line 12-13: Replace the loose Dict[str, Dict[str, Any]] type for
file_metadata with a concrete TypedDict (e.g., FileMetadata or FileMeta
specifying the inner keys and value types) and update the signature/uses in
_has_file_changed to return/accept that TypedDict instead of Any; also collapse
the duplicated typing imports into a single line (only import the actually used
names) and remove the unused Set import so mypy no-any-return errors disappear
and imports are tidy.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ed2eef62-9e3b-47f9-8fbd-dd0c57f385b4
📒 Files selected for processing (2)
refactron/analysis/symbol_table.pytests/test_symbol_table_incremental.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_symbol_table_incremental.py
dfc7bb2 to
56f6932
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (5)
refactron/analysis/symbol_table.py (5)
66-67:⚠️ Potential issue | 🟠 MajorGive
file_metadataa concrete shape.
Dict[str, Dict[str, Any]]letsmtime/size/sha256flow through asAny, so_has_file_changed()
loses a real metadata contract and mypy can't validate those comparisons.#!/bin/bash set -euo pipefail sed -n '60,75p' refactron/analysis/symbol_table.py fd -a '^(pyproject\.toml|mypy\.ini|setup\.cfg)$' | xargs -r rg -n 'disallow_untyped_defs|mypy'As per coding guidelines:
refactron/**/*.py: Type annotations are required in refactron/ with mypy disallow_untyped_defs = true enabled.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@refactron/analysis/symbol_table.py` around lines 66 - 67, Replace the loose Dict[str, Dict[str, Any]] type for file_metadata with a concrete metadata type (e.g., a TypedDict or small dataclass named FileMetadata with fields mtime: float, size: int, sha256: str) and change the annotation of file_metadata to Dict[str, FileMetadata]; then update the uses (notably the _has_file_changed function) to rely on those typed fields so mypy can validate comparisons and avoid Any flowing through. Ensure you import typing.TypedDict (or dataclasses) and update any related helper code to access metadata.mtime/metadata["mtime"] consistent with your chosen concrete type.
221-235:⚠️ Potential issue | 🟠 MajorDon't persist a blank hash.
When
_calculate_hash()fails it returns"", and Line 235 stores that value. On the next run the hash branch
is skipped and change detection falls back tomtime, so timestamp rollback can hide a real content change.Suggested fix
- def _calculate_hash(self, file_path: Path) -> str: + def _calculate_hash(self, file_path: Path) -> Optional[str]: """Calculate SHA-256 hash of file content.""" try: return hashlib.sha256(file_path.read_bytes()).hexdigest() - except Exception: - return "" + except OSError as e: + logger.warning(f"Failed to hash {file_path}: {e}") + return None @@ - self.symbol_table.file_metadata[file_path_str] = { + file_hash = self._calculate_hash(file_path) + if file_hash is None: + self.symbol_table.file_metadata.pop(file_path_str, None) + return + + self.symbol_table.file_metadata[file_path_str] = { "mtime": stat.st_mtime, "size": stat.st_size, - "sha256": self._calculate_hash(file_path), + "sha256": file_hash, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@refactron/analysis/symbol_table.py` around lines 221 - 235, The _calculate_hash() currently returns an empty string on failure and _update_file_metadata() persists that blank value, which hides content changes; change _update_file_metadata() to call _calculate_hash(file_path) and only set the "sha256" key in self.symbol_table.file_metadata[file_path_str] when the returned hash is non-empty (or set it to None/omit the key) so that a failed hash does not overwrite/replace a previous valid hash; ensure the rest of the metadata ("mtime", "size") is still stored and reference the methods _calculate_hash and _update_file_metadata and the dict self.symbol_table.file_metadata when making the conditional change.
166-170:⚠️ Potential issue | 🟠 MajorInclude cached symbol files in deleted-file cleanup.
If an older
symbols.jsonhassymbolsbut nofile_metadata,cached_filesis empty here and deleted
modules keep exporting stale symbols forever. Build the candidate set from metadata plus the existing symbol
store before callingremove_file().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@refactron/analysis/symbol_table.py` around lines 166 - 170, The cleanup loop only uses self.symbol_table.file_metadata keys so old symbols stored in self.symbol_table.symbols can be missed; change the candidate set used for cached_files in the cleanup (the code that currently sets cached_files = list(self.symbol_table.file_metadata.keys()) and then calls self.symbol_table.remove_file(cached_path)) to instead build a union of file paths from file_metadata and any file paths referenced in the existing symbol store (e.g. gather keys from self.symbol_table.file_metadata plus file paths extracted from self.symbol_table.symbols values), then iterate that combined set when calling remove_file to ensure stale symbols are removed.
181-190:⚠️ Potential issue | 🔴 CriticalReturn after the incremental pass.
Once Lines 165-180 finish, this legacy block reloads and returns the old cache on cache hits; on cold runs it
analyzes every file again. That defeats the incremental path and can surface stale data.Suggested fix
def build_for_project(self, project_root: Path) -> SymbolTable: """Scan project and build symbol table incrementally.""" if self.cache_dir: cached_table = self._load_cache() if cached_table: @@ if self._has_file_changed(file_path_all, file_path_str): logger.debug(f"Analyzing changed file: {file_path_str}") self.symbol_table.remove_file(file_path_str) self._analyze_file(file_path_all) self._update_file_metadata(file_path_all, file_path_str) - """Scan project and build symbol table.""" - if self.cache_dir: - cached = self._load_cache() - if cached: - # TODO: Implement incremental update logic here - return cached - - python_files = list(project_root.rglob("*.py")) - for file_path in python_files: - self._analyze_file(file_path) + + self._save_cache() + return self.symbol_table + + for file_path in project_root.rglob("*.py"): + self._analyze_file(file_path.resolve()) if self.cache_dir: self._save_cache()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@refactron/analysis/symbol_table.py` around lines 181 - 190, The current early return on cache hit prevents any incremental update and causes stale or full re-analysis; instead of returning cached immediately, call an incremental update routine (e.g., implement and invoke self._incremental_update(cached, project_root)) that compares cached metadata to on-disk files and calls self._analyze_file(file_path) only for changed/new files, then merge updates into cached and persist via the existing cache save path before returning the updated symbol table; if you cannot implement incremental logic yet, remove the early return so the full scan runs consistently (i.e., do not return cached from _load_cache() without applying incremental update).
12-13:⚠️ Potential issue | 🟡 MinorRemove the duplicate typing import.
Line 12 reintroduces
Setand duplicates the same import from Line 13, so pre-commit stays red withF401.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@refactron/analysis/symbol_table.py` around lines 12 - 13, Remove the duplicated typing import: keep a single import that includes Set (e.g., the line importing Any, Dict, List, Optional, Set) and delete the redundant "from typing import Any, Dict, List, Optional" line so the module only imports typing names once and F401 is resolved.
🧹 Nitpick comments (2)
tests/test_config_management.py (1)
909-909: Clean up unused unpacked variables to avoid Ruff RUF059 noise.Several tests unpack values they never use. Prefer
_/_nameplaceholders.🔧 Example cleanup
- _, errors, skips = p_seq.process_files(files, process_func) + _, errors, _ = p_seq.process_files(files, process_func) - results, errors, skips = p_thr.process_files(files, lambda p: (None, None, None)) + results, errors, _ = p_thr.process_files(files, lambda p: (None, None, None)) - results, errors, skips = pp.process_files([Path("a.py")], raises_func) + _, errors, _ = pp.process_files([Path("a.py")], raises_func)Also applies to: 915-915, 1229-1229, 1247-1247, 1253-1253, 1268-1268, 1278-1278
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_config_management.py` at line 909, Tests are unpacking return values from p_seq.process_files into variables that aren't used (e.g., "_, errors, skips = p_seq.process_files(files, process_func)"); change the unused names to underscore placeholders (e.g., "_", "_errors", or "_skips") so unused results don't trigger Ruff RUF059; update all occurrences mentioned (the lines using p_seq.process_files and similar unpacking at the listed locations) to use appropriate "_" placeholders while leaving any actually used variables intact.refactron/core/refactron.py (1)
268-304: The nestedprocess_file_wrapperfunction has a pickling limitation that affects multiprocessing, but this is mitigated by current defaults and fallback handling.While
process_file_wrapperis not picklable forProcessPoolExecutor, the default configuration uses threading (use_multiprocessing=False) and ParallelProcessor already documents this constraint in its docstring (lines 199-201). Additionally, if process pool creation fails, the code falls back gracefully to sequential processing (line 242-246 in parallel.py).If multiprocessing support is intended as a feature, refactor the nested function to a module-level function or static method to ensure picklability. Otherwise, this is a non-issue under current defaults.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@refactron/core/refactron.py` around lines 268 - 304, The nested process_file_wrapper function inside Refactron.process_files prevents pickling for ProcessPoolExecutor; either lift it to a module-level function or make it a `@staticmethod` on the Refactron class so it is picklable, update it to accept explicit parameters it currently closes over (e.g., file_path, self/_analyze_file, self.incremental_tracker, and logger) and pass those through from the call site where ParallelProcessor.process_files is invoked, and keep the existing exception handling and return shape so ParallelProcessor.process_files can continue to work unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@refactron/analysis/symbol_table.py`:
- Around line 240-252: Remove the duplicate _analyze_file method so the
POSIX-normalizing implementation is the one used: delete the second definition
that uses str(file_path) and keep the first definition which creates
file_path_str via file_path.resolve().as_posix(), calls
self.inference_engine.parse_file(file_path_str) and then self._visit_node(tree,
file_path_str, "global"); ensure no other _analyze_file override remains so
symbol table keys use normalized paths.
In `@refactron/core/inference.py`:
- Around line 77-86: The code currently calls the non-existent
modutils.modname_from_path and silently falls back to an empty module_name which
loses package context; replace that call with
modutils.modpath_from_file(abs_path) and convert the returned module path to a
dotted module name for astroid.parse (assign to modname) while handling
ImportError/ValueError for files outside sys.path; alternatively, if resolution
fails, either accept a repository-derived module name passed into parse_file or
call manager.ast_from_file(abs_path) as a fallback instead of parsing with an
empty module_name—update the block around modutils usage and the return calling
astroid.parse(code, module_name=modname, path=abs_path) accordingly.
In `@refactron/core/parallel.py`:
- Line 75: The docstring line containing "(FileMetrics, None, skip_warn) on
success or (None, FileAnalysisError, None) on error." is over 100 characters and
triggers flake8 E501; split this sentence into two or more shorter lines (e.g.,
break before "on success" or before "or") in the docstring in
refactron/core/parallel.py so each line <=100 chars, keeping the same wording
and the tuple literals "(FileMetrics, None, skip_warn)" and "(None,
FileAnalysisError, None)" intact.
In `@refactron/core/refactron.py`:
- Around line 268-281: The helper function process_file_wrapper is missing its
try: block which leaves the except AnalysisError handler orphaned; wrap the body
that calls self._analyze_file(file_path), updates
self.incremental_tracker.update_file_state(file_path), and returns results
inside a try: so the existing except AnalysisError (and any other
except/finally) match; ensure indentation of the lines starting with
file_metrics, skip_warn = self._analyze_file(file_path) and the
incremental_tracker update are one level deeper under try and that the return
remains inside the try so exceptions propagate to the existing except
AnalysisError handler.
---
Duplicate comments:
In `@refactron/analysis/symbol_table.py`:
- Around line 66-67: Replace the loose Dict[str, Dict[str, Any]] type for
file_metadata with a concrete metadata type (e.g., a TypedDict or small
dataclass named FileMetadata with fields mtime: float, size: int, sha256: str)
and change the annotation of file_metadata to Dict[str, FileMetadata]; then
update the uses (notably the _has_file_changed function) to rely on those typed
fields so mypy can validate comparisons and avoid Any flowing through. Ensure
you import typing.TypedDict (or dataclasses) and update any related helper code
to access metadata.mtime/metadata["mtime"] consistent with your chosen concrete
type.
- Around line 221-235: The _calculate_hash() currently returns an empty string
on failure and _update_file_metadata() persists that blank value, which hides
content changes; change _update_file_metadata() to call
_calculate_hash(file_path) and only set the "sha256" key in
self.symbol_table.file_metadata[file_path_str] when the returned hash is
non-empty (or set it to None/omit the key) so that a failed hash does not
overwrite/replace a previous valid hash; ensure the rest of the metadata
("mtime", "size") is still stored and reference the methods _calculate_hash and
_update_file_metadata and the dict self.symbol_table.file_metadata when making
the conditional change.
- Around line 166-170: The cleanup loop only uses
self.symbol_table.file_metadata keys so old symbols stored in
self.symbol_table.symbols can be missed; change the candidate set used for
cached_files in the cleanup (the code that currently sets cached_files =
list(self.symbol_table.file_metadata.keys()) and then calls
self.symbol_table.remove_file(cached_path)) to instead build a union of file
paths from file_metadata and any file paths referenced in the existing symbol
store (e.g. gather keys from self.symbol_table.file_metadata plus file paths
extracted from self.symbol_table.symbols values), then iterate that combined set
when calling remove_file to ensure stale symbols are removed.
- Around line 181-190: The current early return on cache hit prevents any
incremental update and causes stale or full re-analysis; instead of returning
cached immediately, call an incremental update routine (e.g., implement and
invoke self._incremental_update(cached, project_root)) that compares cached
metadata to on-disk files and calls self._analyze_file(file_path) only for
changed/new files, then merge updates into cached and persist via the existing
cache save path before returning the updated symbol table; if you cannot
implement incremental logic yet, remove the early return so the full scan runs
consistently (i.e., do not return cached from _load_cache() without applying
incremental update).
- Around line 12-13: Remove the duplicated typing import: keep a single import
that includes Set (e.g., the line importing Any, Dict, List, Optional, Set) and
delete the redundant "from typing import Any, Dict, List, Optional" line so the
module only imports typing names once and F401 is resolved.
---
Nitpick comments:
In `@refactron/core/refactron.py`:
- Around line 268-304: The nested process_file_wrapper function inside
Refactron.process_files prevents pickling for ProcessPoolExecutor; either lift
it to a module-level function or make it a `@staticmethod` on the Refactron class
so it is picklable, update it to accept explicit parameters it currently closes
over (e.g., file_path, self/_analyze_file, self.incremental_tracker, and logger)
and pass those through from the call site where ParallelProcessor.process_files
is invoked, and keep the existing exception handling and return shape so
ParallelProcessor.process_files can continue to work unchanged.
In `@tests/test_config_management.py`:
- Line 909: Tests are unpacking return values from p_seq.process_files into
variables that aren't used (e.g., "_, errors, skips = p_seq.process_files(files,
process_func)"); change the unused names to underscore placeholders (e.g., "_",
"_errors", or "_skips") so unused results don't trigger Ruff RUF059; update all
occurrences mentioned (the lines using p_seq.process_files and similar unpacking
at the listed locations) to use appropriate "_" placeholders while leaving any
actually used variables intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5246dca8-b5b3-4597-b300-aa84c97a5bb9
📒 Files selected for processing (7)
refactron/analysis/symbol_table.pyrefactron/core/inference.pyrefactron/core/parallel.pyrefactron/core/refactron.pytests/test_config_management.pytests/test_performance_optimization.pytests/test_symbol_table_incremental.py
✅ Files skipped from review due to trivial changes (1)
- tests/test_symbol_table_incremental.py
| modname = "" | ||
| try: | ||
| from astroid import modutils | ||
|
|
||
| modname = modutils.modname_from_path(abs_path) | ||
| except Exception: | ||
| pass | ||
|
|
||
| # Use string_build via parse to avoid manager.ast_from_file's internal caching | ||
| return astroid.parse(code, module_name=modname, path=abs_path) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "inference.py" | head -20Repository: Refactron-ai/Refactron_lib
Length of output: 99
🏁 Script executed:
# Look for refactron directory structure
fd -type f "inference\.py" | grep refactronRepository: Refactron-ai/Refactron_lib
Length of output: 240
🏁 Script executed:
# Get the file location and content
ls -la refactron/core/inference.py 2>/dev/null && wc -l refactron/core/inference.pyRepository: Refactron-ai/Refactron_lib
Length of output: 178
🏁 Script executed:
cat -n refactron/core/inference.py | sed -n '65,100p'Repository: Refactron-ai/Refactron_lib
Length of output: 1757
🏁 Script executed:
# Also check the full function signature and any related context
cat -n refactron/core/inference.py | sed -n '1,90p'Repository: Refactron-ai/Refactron_lib
Length of output: 4199
🏁 Script executed:
# Check if BackupManager is imported or used in this file
grep -n "BackupManager\|validate_backup_integrity" refactron/core/inference.pyRepository: Refactron-ai/Refactron_lib
Length of output: 52
🏁 Script executed:
# Check the broader context - what calls parse_file and expects?
rg -n "parse_file" --type pyRepository: Refactron-ai/Refactron_lib
Length of output: 2087
🏁 Script executed:
# Look at modutils to understand what modname_from_path returns
rg -A 5 "modutils.modname_from_path" --type pyRepository: Refactron-ai/Refactron_lib
Length of output: 518
🏁 Script executed:
# Check how symbol_table.py uses the result from parse_file()
cat -n refactron/analysis/symbol_table.py | sed -n '240,270p'Repository: Refactron-ai/Refactron_lib
Length of output: 1510
🏁 Script executed:
# Look at the test cases to understand expected behavior
cat -n tests/test_rag_parser.py | sed -n '84,110p'Repository: Refactron-ai/Refactron_lib
Length of output: 1352
🏁 Script executed:
# Check if there are any files with relative imports that would be problematic
rg "from \.\." --type py | head -20Repository: Refactron-ai/Refactron_lib
Length of output: 208
🏁 Script executed:
# Search for any relative import issues, bugs, or TODOs related to module_name
rg "relative import|package context" --type py -iRepository: Refactron-ai/Refactron_lib
Length of output: 694
🏁 Script executed:
# Check for any test failures or issues related to module_name or modutils
rg "modname|module_name" refactron/core/inference.py -B 2 -A 2Repository: Refactron-ai/Refactron_lib
Length of output: 1297
🏁 Script executed:
# Look for git history or comments about this change
git log --oneline -20 -- refactron/core/inference.py 2>/dev/null || echo "Git history unavailable"Repository: Refactron-ai/Refactron_lib
Length of output: 176
🏁 Script executed:
# Check if there are any conftest or fixture files that test relative imports
rg "from \.\." tests/ --type py | head -5Repository: Refactron-ai/Refactron_lib
Length of output: 208
🏁 Script executed:
# Look at the test for relative import detection
cat -n tests/test_analyzer_coverage_supplement.py | grep -A 10 "Test relative import"Repository: Refactron-ai/Refactron_lib
Length of output: 414
🏁 Script executed:
# Check what dependency_analyzer does with relative imports
cat -n refactron/analyzers/dependency_analyzer.py | sed -n '1,50p'Repository: Refactron-ai/Refactron_lib
Length of output: 2084
🏁 Script executed:
# Search for any error logs or issues with module inference
rg "Failed to.*infer|infer.*failed" --type py -iRepository: Refactron-ai/Refactron_lib
Length of output: 52
🏁 Script executed:
# Check if there's documentation about astroid's parse function behavior
rg "astroid.parse" refactron/ --type py -B 2 -A 2Repository: Refactron-ai/Refactron_lib
Length of output: 859
🏁 Script executed:
# Check if there are actual test cases that would fail with empty module_name
rg "from \.\." tests/ --type py -B 2 -A 2Repository: Refactron-ai/Refactron_lib
Length of output: 435
🏁 Script executed:
# Look at the actual test for relative import detection more closely
cat -n tests/test_analyzer_coverage_supplement.py | sed -n '195,220p'Repository: Refactron-ai/Refactron_lib
Length of output: 965
🏁 Script executed:
# Check if symbol_table.py actually needs to resolve relative imports
rg "infer\|relative import\|package" refactron/analysis/symbol_table.py --type py | head -20Repository: Refactron-ai/Refactron_lib
Length of output: 52
🏁 Script executed:
# Look for any actual issues in tests related to module resolution
rg "modname|module_name" tests/ --type pyRepository: Refactron-ai/Refactron_lib
Length of output: 52
🏁 Script executed:
# Check if modutils.modname_from_path can actually fail, and in what scenarios
python3 << 'EOF'
try:
from astroid import modutils
# Test with a random path
result = modutils.modname_from_path("/some/random/path/not/in/sys/path.py")
print(f"Result for path not in sys.path: {result}")
except Exception as e:
print(f"Exception type: {type(e).__name__}, message: {e}")
EOFRepository: Refactron-ai/Refactron_lib
Length of output: 173
🏁 Script executed:
# Check the actual astroid documentation or source comments about module_name and path
python3 << 'EOF'
import inspect
import astroid
# Get the signature and docstring of astroid.parse
sig = inspect.signature(astroid.parse)
print("astroid.parse signature:")
print(sig)
print("\nDocstring:")
print(astroid.parse.__doc__)
EOFRepository: Refactron-ai/Refactron_lib
Length of output: 599
🏁 Script executed:
# Look for any actual reported issues or TODOs in the code about this
rg "TODO|FIXME|BUG|XXX" refactron/core/inference.py --type pyRepository: Refactron-ai/Refactron_lib
Length of output: 52
🏁 Script executed:
# Check if there are any test failures or skip markers related to this
rg "@pytest.mark.skip|@pytest.mark.xfail|@unittest.skip" tests/ --type py | grep -i "import\|module\|inference"Repository: Refactron-ai/Refactron_lib
Length of output: 52
🏁 Script executed:
# Check what version of astroid is being used and what functions are available
python3 << 'EOF'
import astroid
print(f"Astroid version: {astroid.__version__}")
print(f"modutils attributes:")
from astroid import modutils
print([attr for attr in dir(modutils) if 'modname' in attr.lower() or 'path' in attr.lower()])
EOFRepository: Refactron-ai/Refactron_lib
Length of output: 486
🏁 Script executed:
# Check if there's a requirements file or setup.py that specifies astroid version
cat setup.py pyproject.toml requirements.txt 2>/dev/null | grep -A 2 -B 2 "astroid" | head -20Repository: Refactron-ai/Refactron_lib
Length of output: 228
🏁 Script executed:
# Look at what the code is actually importing from modutils
rg "from astroid import modutils|from astroid.modutils import" --type pyRepository: Refactron-ai/Refactron_lib
Length of output: 142
🏁 Script executed:
# Check if there's any version compatibility handling in the codebase
rg "astroid.__version__|astroid.version" --type pyRepository: Refactron-ai/Refactron_lib
Length of output: 195
🏁 Script executed:
# Verify the actual modpath_from_file function and how it works
python3 << 'EOF'
from astroid import modutils
import inspect
print("modpath_from_file signature:")
print(inspect.signature(modutils.modpath_from_file))
print("\nDocstring:")
print(modutils.modpath_from_file.__doc__)
EOFRepository: Refactron-ai/Refactron_lib
Length of output: 715
🏁 Script executed:
# Test modpath_from_file to see what it returns
python3 << 'EOF'
from astroid import modutils
try:
result = modutils.modpath_from_file("/some/random/path.py")
print(f"modpath_from_file result: {result}")
except Exception as e:
print(f"Exception: {type(e).__name__}: {e}")
# Try with an actual file
import tempfile
import os
with tempfile.NamedTemporaryFile(suffix=".py", delete=False) as f:
temp_path = f.name
try:
result = modutils.modpath_from_file(temp_path)
print(f"modpath_from_file for temp file: {result}")
except Exception as e:
print(f"Exception for temp file: {type(e).__name__}: {e}")
finally:
os.unlink(temp_path)
EOFRepository: Refactron-ai/Refactron_lib
Length of output: 544
🏁 Script executed:
# Check if modname_from_path was available in older astroid versions
git log -p --all -S "modname_from_path" -- "*.py" | head -50 2>/dev/null || echo "Not found in git history"Repository: Refactron-ai/Refactron_lib
Length of output: 52
🏁 Script executed:
# Check the actual error handling in parse_file - what exceptions are expected?
cat -n refactron/core/inference.py | sed -n '71,95p'Repository: Refactron-ai/Refactron_lib
Length of output: 1226
🏁 Script executed:
# Test if astroid can resolve relative imports with empty module_name but valid path
python3 << 'EOF'
import astroid
import tempfile
import os
from pathlib import Path
# Create a test package structure
temp_dir = tempfile.mkdtemp()
pkg_dir = os.path.join(temp_dir, "mypackage")
os.makedirs(pkg_dir)
# Create __init__.py
with open(os.path.join(pkg_dir, "__init__.py"), "w") as f:
f.write("# init\n")
# Create utils.py
with open(os.path.join(pkg_dir, "utils.py"), "w") as f:
f.write("def helper(): return 42\n")
# Create test.py with relative import
test_file = os.path.join(pkg_dir, "test.py")
with open(test_file, "w") as f:
f.write("""from . import utils
def main():
return utils.helper()
""")
# Test 1: Parse with empty module_name
print("Test 1: Parse with empty module_name and path")
with open(test_file) as f:
code = f.read()
try:
module1 = astroid.parse(code, module_name="", path=test_file)
print(f" Success. Module name: '{module1.name}'")
# Try to infer the import
for node in module1.nodes_of_class(astroid.ImportFrom):
print(f" Found ImportFrom: {node.as_string()}")
except Exception as e:
print(f" Error: {e}")
# Test 2: Parse with correct module_name
print("\nTest 2: Parse with correct module_name='mypackage.test' and path")
try:
module2 = astroid.parse(code, module_name="mypackage.test", path=test_file)
print(f" Success. Module name: '{module2.name}'")
for node in module2.nodes_of_class(astroid.ImportFrom):
print(f" Found ImportFrom: {node.as_string()}")
except Exception as e:
print(f" Error: {e}")
# Cleanup
import shutil
shutil.rmtree(temp_dir)
EOFRepository: Refactron-ai/Refactron_lib
Length of output: 646
🏁 Script executed:
# Check what the correct way to get module name from file path should be
python3 << 'EOF'
from astroid import modutils
import os
# Show the correct usage of modpath_from_file
print("Correct function: modpath_from_file()")
print("It returns a list of module path components")
print("\nExample of how to convert modpath to modname:")
modpath = ["mypackage", "test"]
modname = ".".join(modpath)
print(f" modpath: {modpath}")
print(f" modname: {modname}")
EOFRepository: Refactron-ai/Refactron_lib
Length of output: 257
🏁 Script executed:
# Test if inference actually works with empty module_name for relative imports
python3 << 'EOF'
import astroid
import tempfile
import os
from pathlib import Path
# Create a test package structure
temp_dir = tempfile.mkdtemp()
pkg_dir = os.path.join(temp_dir, "mypackage")
os.makedirs(pkg_dir)
# Create __init__.py
with open(os.path.join(pkg_dir, "__init__.py"), "w") as f:
f.write("# init\n")
# Create utils.py
with open(os.path.join(pkg_dir, "utils.py"), "w") as f:
f.write("def helper(): return 42\n")
# Create test.py with relative import
test_file = os.path.join(pkg_dir, "test.py")
with open(test_file) as f:
code = f.read()
# Parse with empty module_name (current behavior)
print("Test: Inference with empty module_name")
module1 = astroid.parse(code, module_name="", path=test_file)
print(f"Module name: '{module1.name}'")
# Try to infer the imported symbol
for node in module1.nodes_of_class(astroid.nodes.ImportFrom):
print(f"ImportFrom node: level={node.level}, names={node.names}")
try:
# Try to infer what 'utils' refers to
inferred = list(node.infer_name("utils", None))
print(f" Inferred 'utils': {inferred}")
except Exception as e:
print(f" Inference failed: {type(e).__name__}: {e}")
print("\n---\n")
# Parse with correct module_name (desired behavior)
print("Test: Inference with correct module_name='mypackage.test'")
module2 = astroid.parse(code, module_name="mypackage.test", path=test_file)
print(f"Module name: '{module2.name}'")
for node in module2.nodes_of_class(astroid.nodes.ImportFrom):
print(f"ImportFrom node: level={node.level}, names={node.names}")
try:
inferred = list(node.infer_name("utils", None))
print(f" Inferred 'utils': {inferred}")
except Exception as e:
print(f" Inference failed: {type(e).__name__}: {e}")
# Cleanup
import shutil
shutil.rmtree(temp_dir)
EOFRepository: Refactron-ai/Refactron_lib
Length of output: 236
🏁 Script executed:
# Check if symbol_table.py actually needs to infer relative imports
cat -n refactron/analysis/symbol_table.py | sed -n '1,50p'Repository: Refactron-ai/Refactron_lib
Length of output: 1841
🏁 Script executed:
# Check if there are any actual test failures when running tests
rg "test.*parse_file|test.*inference" tests/ --type py -lRepository: Refactron-ai/Refactron_lib
Length of output: 94
🏁 Script executed:
# Fix the test - create the file first, then read it
python3 << 'EOF'
import astroid
import tempfile
import os
from pathlib import Path
# Create a test package structure
temp_dir = tempfile.mkdtemp()
pkg_dir = os.path.join(temp_dir, "mypackage")
os.makedirs(pkg_dir)
# Create __init__.py
with open(os.path.join(pkg_dir, "__init__.py"), "w") as f:
f.write("# init\n")
# Create utils.py
with open(os.path.join(pkg_dir, "utils.py"), "w") as f:
f.write("def helper(): return 42\n")
# Create test.py with relative import - BEFORE reading it
test_file = os.path.join(pkg_dir, "test.py")
with open(test_file, "w") as f:
f.write("""from . import utils
def main():
return utils.helper()
""")
# NOW read it
with open(test_file) as f:
code = f.read()
# Parse with empty module_name (current behavior)
print("Test: Inference with empty module_name")
module1 = astroid.parse(code, module_name="", path=test_file)
print(f"Module name: '{module1.name}'")
# Try to infer the imported symbol
for node in module1.nodes_of_class(astroid.nodes.ImportFrom):
print(f"ImportFrom node: level={node.level}, names={node.names}")
try:
inferred = list(node.infer_name("utils", None))
print(f" Inferred 'utils': {inferred}")
except Exception as e:
print(f" Inference failed: {type(e).__name__}: {e}")
print("\n---\n")
# Parse with correct module_name (desired behavior)
print("Test: Inference with correct module_name='mypackage.test'")
module2 = astroid.parse(code, module_name="mypackage.test", path=test_file)
print(f"Module name: '{module2.name}'")
for node in module2.nodes_of_class(astroid.nodes.ImportFrom):
print(f"ImportFrom node: level={node.level}, names={node.names}")
try:
inferred = list(node.infer_name("utils", None))
print(f" Inferred 'utils': {inferred}")
except Exception as e:
print(f" Inference failed: {type(e).__name__}: {e}")
# Cleanup
import shutil
shutil.rmtree(temp_dir)
EOFRepository: Refactron-ai/Refactron_lib
Length of output: 490
🏁 Script executed:
# Check what the test_rag_parser tests actually do
cat -n tests/test_rag_parser.py | sed -n '84,120p'Repository: Refactron-ai/Refactron_lib
Length of output: 1826
🏁 Script executed:
# Look for any test that calls InferenceEngine.parse_file directly
rg "InferenceEngine\|parse_file" tests/ --type py -B 2 -A 5 | head -50Repository: Refactron-ai/Refactron_lib
Length of output: 52
🏁 Script executed:
# Check if there are any actual test failures by examining test output or fixtures
cat -n tests/test_rag_parser.py | sed -n '1,50p'Repository: Refactron-ai/Refactron_lib
Length of output: 1516
🏁 Script executed:
# Look at what parse_file returns and how it's used in RAG parser
cat -n refactron/rag/parser.py | sed -n '185,210p'Repository: Refactron-ai/Refactron_lib
Length of output: 1234
🏁 Script executed:
# The key question: does symbol_table need correct module_name for its use case?
cat -n refactron/analysis/symbol_table.py | sed -n '240,280p'Repository: Refactron-ai/Refactron_lib
Length of output: 2045
🏁 Script executed:
# Verify if modname_from_path is truly non-existent or if it's from a different import
rg "modname_from_path" . --type pyRepository: Refactron-ai/Refactron_lib
Length of output: 162
🏁 Script executed:
# Check if there's maybe a compatibility layer or if this is just dead code
rg "def modname_from_path" . --type pyRepository: Refactron-ai/Refactron_lib
Length of output: 52
🏁 Script executed:
# Check the git commit history for this specific function
git log --oneline -p refactron/core/inference.py | grep -A 10 -B 5 "modname_from_path" | head -40Repository: Refactron-ai/Refactron_lib
Length of output: 731
🏁 Script executed:
# Check if maybe there was an older version of astroid that had this function
python3 << 'EOF'
# See if there's any way to resolve module name correctly from a path
from astroid import modutils
import os
import sys
test_path = "/some/nonexistent/path/mymodule.py"
print("Available functions in modutils related to module/path:")
for attr in sorted(dir(modutils)):
if not attr.startswith('_'):
print(f" - {attr}")
EOFRepository: Refactron-ai/Refactron_lib
Length of output: 1044
🏁 Script executed:
# Check if the code even runs - try to import and use it
python3 << 'EOF'
import sys
sys.path.insert(0, '.')
try:
from refactron.core.inference import InferenceEngine
print("Import successful")
# Try to use parse_file
import tempfile
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
f.write("x = 1\n")
f.flush()
try:
result = InferenceEngine.parse_file(f.name)
print(f"parse_file succeeded: {result}")
except Exception as e:
print(f"parse_file failed: {type(e).__name__}: {e}")
finally:
import os
os.unlink(f.name)
except Exception as e:
print(f"Import failed: {e}")
EOFRepository: Refactron-ai/Refactron_lib
Length of output: 341
🏁 Script executed:
# Final verification: Check if symbol_table actually needs correct module_name
# by checking what operations it does on the parsed tree
cat -n refactron/analysis/symbol_table.py | sed -n '264,320p'Repository: Refactron-ai/Refactron_lib
Length of output: 2504
🏁 Script executed:
# Verify the actual impact: does empty module_name cause problems for inference in practice?
python3 << 'EOF'
import astroid
import tempfile
import os
# Create a simple test
temp_dir = tempfile.mkdtemp()
test_file = os.path.join(temp_dir, "test.py")
with open(test_file, "w") as f:
f.write("""
class MyClass:
def method(self):
return 42
def my_function():
pass
""")
# Parse with empty module_name (current broken behavior)
with open(test_file) as f:
code = f.read()
module = astroid.parse(code, module_name="", path=test_file)
print(f"Module name: '{module.name}'")
print(f"Module file: {module.file}")
print(f"Classes: {[c.name for c in module.nodes_of_class(astroid.nodes.ClassDef)]}")
print(f"Functions: {[f.name for f in module.nodes_of_class(astroid.nodes.FunctionDef)]}")
# Can we visit the nodes like symbol_table does?
for node in module.body:
if isinstance(node, (astroid.nodes.ClassDef, astroid.nodes.FunctionDef)):
print(f"Node: {node.name}, line: {node.lineno}")
import shutil
shutil.rmtree(temp_dir)
EOFRepository: Refactron-ai/Refactron_lib
Length of output: 231
Fix the non-existent function call and properly resolve module names.
The code calls modutils.modname_from_path() which does not exist in astroid 4.0.4 (or any version). This causes an AttributeError that gets silently caught, leaving modname as an empty string. The correct approach is to use modpath_from_file() to derive the module name from the file path.
With empty module_name, astroid parses the code but loses package context. This breaks relative import resolution and package-aware inference for files outside sys.path, undermining the cache-correctness goal.
Instead of silently falling back to empty module name, either:
- Use
modutils.modpath_from_file()to properly resolve the module path and convert it to a module name (handlingImportErrorfor files outside sys.path), or - Pass a repository-derived module name into
parse_file, or - Fall back to
manager.ast_from_file()only when module resolution fails.
🧰 Tools
🪛 GitHub Actions: CI
[error] Command failed with exit code 1: flake8 refactron --count --select=E9,F63,F7,F82 --show-source --statistics
🪛 Ruff (0.15.9)
[error] 82-83: try-except-pass detected, consider logging the exception
(S110)
[warning] 82-82: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/core/inference.py` around lines 77 - 86, The code currently calls
the non-existent modutils.modname_from_path and silently falls back to an empty
module_name which loses package context; replace that call with
modutils.modpath_from_file(abs_path) and convert the returned module path to a
dotted module name for astroid.parse (assign to modname) while handling
ImportError/ValueError for files outside sys.path; alternatively, if resolution
fails, either accept a repository-derived module name passed into parse_file or
call manager.ast_from_file(abs_path) as a fallback instead of parsing with an
empty module_name—update the block around modutils usage and the return calling
astroid.parse(code, module_name=modname, path=abs_path) accordingly.
| files: List of file paths to process. | ||
| process_func: Function to process a single file. Should return | ||
| (FileMetrics, None) on success or (None, FileAnalysisError) on error. | ||
| (FileMetrics, None, skip_warn) on success or (None, FileAnalysisError, None) on error. |
There was a problem hiding this comment.
Line 75 exceeds the max line length and is currently failing flake8.
This is causing the pre-commit failure (E501).
🔧 Proposed fix
- process_func: Function to process a single file. Should return
- (FileMetrics, None, skip_warn) on success or (None, FileAnalysisError, None) on error.
+ process_func: Function to process a single file.
+ Should return (FileMetrics, None, skip_warn) on success
+ or (None, FileAnalysisError, None) on error.🧰 Tools
🪛 GitHub Actions: CI
[error] Command failed with exit code 1: flake8 refactron --count --select=E9,F63,F7,F82 --show-source --statistics
🪛 GitHub Actions: Pre-commit
[error] 75-75: pre-commit hook 'flake8' failed (exit code 1): E501 line too long (111 > 100 characters)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/core/parallel.py` at line 75, The docstring line containing
"(FileMetrics, None, skip_warn) on success or (None, FileAnalysisError, None) on
error." is over 100 characters and triggers flake8 E501; split this sentence
into two or more shorter lines (e.g., break before "on success" or before "or")
in the docstring in refactron/core/parallel.py so each line <=100 chars, keeping
the same wording and the tuple literals "(FileMetrics, None, skip_warn)" and
"(None, FileAnalysisError, None)" intact.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
refactron/core/refactron.py (1)
341-341:⚠️ Potential issue | 🟠 MajorFix flake8 E501 at Line 341 (current CI blocker).
The recovery suggestion string is over 100 chars and fails pre-commit (
E501), so this should be wrapped.🔧 Proposed fix
- recovery_suggestion="Check the file for syntax errors or encoding issues", + recovery_suggestion=( + "Check the file for syntax errors or encoding issues" + ),As per coding guidelines,
**/*.pymust use a maximum line length of 100 characters.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@refactron/core/refactron.py` at line 341, The long literal passed as recovery_suggestion exceeds the 100-char line length; locate the call that supplies recovery_suggestion in refactron.core.refactron (the argument named recovery_suggestion) and split the message across multiple shorter string literals (using implicit concatenation inside parentheses or concatenated with +) so no single source line is longer than 100 chars, preserving the same text and punctuation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@refactron/core/refactron.py`:
- Line 341: The long literal passed as recovery_suggestion exceeds the 100-char
line length; locate the call that supplies recovery_suggestion in
refactron.core.refactron (the argument named recovery_suggestion) and split the
message across multiple shorter string literals (using implicit concatenation
inside parentheses or concatenated with +) so no single source line is longer
than 100 chars, preserving the same text and punctuation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 73c612e7-b837-4a08-b849-a02e11ee4a44
📒 Files selected for processing (1)
refactron/core/refactron.py
solve #162
In this task, I implemented an incremental invalidation system for the SymbolTable cache to optimize the performance of project-wide symbol analysis. By introducing per-file metadata—comprising the last modified time, file size, and an authoritative SHA-256 content hash—into the symbols.json cache, the SymbolTableBuilder can now accurately detect which files have been modified, added, or deleted since the previous run. This allows the engine to re-analyze only the changed source files and merge their symbols back into the existing table, rather than performing an all-or-nothing full project scan on every cache miss. This approach significantly reduces analysis latency for large repositories and ensures the correctness of the symbol data even if file timestamps are rolled back, satisfying both the performance and safety requirements of the implementation.
Summary by CodeRabbit
New Features
Tests