Skip to content

Perf/symbol table cache invalidation#166

Open
shrutu0929 wants to merge 5 commits intoRefactron-ai:mainfrom
shrutu0929:perf/symbol-table-cache-invalidation
Open

Perf/symbol table cache invalidation#166
shrutu0929 wants to merge 5 commits intoRefactron-ai:mainfrom
shrutu0929:perf/symbol-table-cache-invalidation

Conversation

@shrutu0929
Copy link
Copy Markdown

@shrutu0929 shrutu0929 commented Apr 10, 2026

solve #162
In this task, I implemented an incremental invalidation system for the SymbolTable cache to optimize the performance of project-wide symbol analysis. By introducing per-file metadata—comprising the last modified time, file size, and an authoritative SHA-256 content hash—into the symbols.json cache, the SymbolTableBuilder can now accurately detect which files have been modified, added, or deleted since the previous run. This allows the engine to re-analyze only the changed source files and merge their symbols back into the existing table, rather than performing an all-or-nothing full project scan on every cache miss. This approach significantly reduces analysis latency for large repositories and ensures the correctness of the symbol data even if file timestamps are rolled back, satisfying both the performance and safety requirements of the implementation.

Summary by CodeRabbit

  • New Features

    • Incremental symbol analysis with per-file cache/metadata — only changed or new files are re-analyzed; exports from deleted files are removed.
    • Content-based change detection (hashing) to detect edits even when timestamps are unchanged.
    • Parsing improvements via normalized file paths and stronger cache invalidation to reduce stale parses.
    • Async functions are now recognized as regular function symbols.
    • Parallel analysis now returns and aggregates skip warnings alongside results and errors.
  • Tests

    • Added tests validating incremental builds, cache persistence, deletion handling, and content-based invalidation.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Added per-file cache metadata and incremental rebuild to the SymbolTable: files are fingerprinted (size, mtime, sha256), changed files are re-analyzed, removed files purge symbols/exports, and cache persistence now includes file_metadata.

Changes

Cohort / File(s) Summary
Symbol Table Incremental Support
refactron/analysis/symbol_table.py
Added file_metadata: Dict[str, Dict[str, Any]], _normalize_path(), _calculate_hash(), _has_file_changed(), _update_file_metadata(), remove_file(file_path), and incremental build_for_project() flow that loads/saves symbols.json, re-analyzes only changed/added files, removes deleted-file symbols, and treats AsyncFunctionDef as a function. Also adjusted cache load/save to normalize paths.
Incremental Rebuild Tests
tests/test_symbol_table_incremental.py
New tests covering cache initialization, unchanged-file skipping (no re-analysis), SHA-256 content-based invalidation (mtime/size spoofing), file addition re-analysis, and file deletion cleanup of exports & metadata.
Parsing cache invalidation
refactron/core/inference.py
parse_file() now normalizes/resolves file_path, manually clears relevant entries from astroid.MANAGER caches, reads file bytes and invokes astroid.parse(..., path=abs_path) with a derived module name, and falls back to manager.ast_from_file() on read/parse failures.
Parallel processing contract change
refactron/core/parallel.py
Expanded process_func return contract to (Optional[FileMetrics], Optional[FileAnalysisError], Optional[AnalysisSkipWarning]). process_files() and _process_* internals now return (results, errors, skips) and collect skip warnings across sequential, threaded, and process-pool paths.
Parallel usage updates
refactron/core/refactron.py
Per-file processing wrapper updated to return (file_metrics, error, skip_warn); Refactron.analyze() now captures third return value and extends result.semantic_skip_warnings with collected skip warnings.
Tests updated for new parallel contract
tests/test_config_management.py, tests/test_performance_optimization.py
Updated test-local process_func return tuples and assertions to expect a third skips collection; added assertions that skips is empty where applicable.

Sequence Diagram

sequenceDiagram
    participant BuilderAPI as SymbolTableBuilder
    participant Cache as Cache Storage
    participant FileSystem as File System
    participant Analyzer as Analyzer/_analyze_file

    BuilderAPI->>Cache: Load cached symbols + file_metadata
    BuilderAPI->>FileSystem: List current *.py files
    BuilderAPI->>BuilderAPI: Determine removed / added / existing files

    loop Removed files
        BuilderAPI->>BuilderAPI: remove_file(file_path) — purge symbols & exports
    end

    loop For each existing or new file
        BuilderAPI->>FileSystem: stat (size, mtime)
        alt size/mtime indicates possible change
            BuilderAPI->>FileSystem: read file bytes
            BuilderAPI->>BuilderAPI: _calculate_hash()
            BuilderAPI->>BuilderAPI: _has_file_changed()? 
        end
        alt File changed or new
            BuilderAPI->>Analyzer: _analyze_file(path)
            Analyzer-->>BuilderAPI: Symbols
            BuilderAPI->>BuilderAPI: clear prior symbols for file
            BuilderAPI->>BuilderAPI: _update_file_metadata(path)
        else Unchanged
            BuilderAPI->>Cache: reuse cached symbols
        end
    end

    BuilderAPI->>Cache: _save_cache() with updated file_metadata
    Cache-->>BuilderAPI: persisted
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Suggested labels

enhancement, testing, size: large

Suggested reviewers

  • omsherikar

Poem

🐇 I nibble hashes, mtimes, and size,

I hop through bytes with keen little eyes.
I keep what’s fresh, I drop what’s gone,
rebuild small and sprint along. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.62% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Perf/symbol table cache invalidation' directly relates to the main changeset, which implements incremental cache invalidation for the SymbolTable to optimize project-wide symbol analysis by detecting and re-analyzing only modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@shrutu0929 shrutu0929 force-pushed the perf/symbol-table-cache-invalidation branch from fe47718 to 95eb501 Compare April 10, 2026 05:53
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
tests/test_config_management.py (1)

1213-1214: Consider using underscore prefix for unused unpacked variables.

Static analysis (Ruff RUF059) flags several unpacked variables that are never used. While not blocking, prefixing unused variables with _ improves clarity and silences linter warnings.

♻️ Example fix for a few cases
-        results, errors, skips = pp.process_files([], success_func)
-        assert results == [] and errors == [] and skips == []
+        results, errors, skips = pp.process_files([], success_func)
+        assert results == [] and errors == [] and skips == []

-        results, errors, skips = pp.process_files([Path("a.py")], raises_func)
+        _results, errors, _skips = pp.process_files([Path("a.py")], raises_func)
         assert len(errors) == 1

-        results, errors, skips = pp.process_files(files, success_func)
+        results, _errors, skips = pp.process_files(files, success_func)
         assert len(results) == 2
         assert len(skips) == 0

Apply similar changes where variables are unpacked but not subsequently used in assertions.

Also applies to: 1219-1220, 1224-1225, 1229-1230, 1247-1249, 1253-1254, 1268-1269, 1278-1279, 1284-1284

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_config_management.py` around lines 1213 - 1214, The unpacked
variables from pp.process_files (results, errors, skips) are unused and flagged
by Ruff RUF059; rename any unused unpacked variables to start with an underscore
(for example _results, _errors, _skips or _ for single-use) in the call sites
(including the shown line and the other locations noted: around lines 1219-1220,
1224-1225, 1229-1230, 1247-1249, 1253-1254, 1268-1269, 1278-1279, 1284) so the
linter stops warning while preserving the assertions that actually check values
where needed; update only the variable names in unpackings for functions like
pp.process_files and any similar unpack patterns (do not change function
signatures or logic).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@refactron/analysis/symbol_table.py`:
- Around line 195-200: The _calculate_hash function currently returns an empty
string on exception which gets persisted and weakens rollback-safe invalidation;
change its signature to return Optional[str] (or None on failure) instead of "",
log or capture the underlying exception, and ensure callers (the code that
stores metadata) treat None as "no hash available" and skip persisting any hash
value; apply the same change to the other similar try/except block referenced
(lines 206-210) so no empty string hash is ever written to metadata.
- Line 12: The import list in symbol_table.py includes an unused symbol `Set`
causing a flake8 F401; remove `Set` from the typing import line (the line
importing Any, Dict, List, Optional, Set) so it becomes only the actually used
types (e.g., Any, Dict, List, Optional), ensuring no other references to `Set`
exist in the file (check functions/classes in this module) before committing.
- Around line 150-156: The deletion cleanup currently uses only
self.symbol_table.file_metadata to build cached_files, so when an older cache
has symbols but no file_metadata deleted files aren't found; update the logic in
the cleanup loop to compute cached_files by first using
self.symbol_table.file_metadata if present and non-empty, otherwise fall back to
deriving file paths from other symbol stores on the symbol_table (e.g., keys
from self.symbol_table.exported_symbols or any self.symbol_table.files /
self.symbol_table.symbols mapping), then proceed to call
self.symbol_table.remove_file(cached_path) for paths not in current_file_paths;
ensure you reference the existing symbols (exported_symbols, remove_file,
file_metadata) to locate the right structures.
- Around line 65-67: Define a concrete TypedDict (e.g., FileMetadata with keys
"mtime": float, "size": int, "sha256": str) and replace the loose type Dict[str,
Dict[str, Any]] on file_metadata with Dict[str, FileMetadata]; update any
imports (from typing import TypedDict) and usages so comparisons that read
file_metadata[...] .get("mtime"/"size"/"sha256") yield the correct types
(float/int/str) instead of Any, ensuring functions in symbol_table.py that
compare metadata values (references to the file_metadata variable) now return
proper bools for mypy.

In `@refactron/core/parallel.py`:
- Line 10: The import ordering in refactron.core.parallel is incorrect for
isort: reorder the module-level imports so they follow alphabetical order within
their group; specifically ensure the import line referencing FileMetrics and
AnalysisSkipWarning from refactron.core.models is positioned and ordered
according to isort rules (alphabetize the imported symbols and place the import
in the correct import group) so that FileMetrics and AnalysisSkipWarning are
sorted and the file passes isort.
- Around line 61-63: The long type hints on the signature of process_files (and
similarly in _process_sequential, _process_parallel_threads,
_process_parallel_processes) exceed 100 chars; refactor each signature to break
the Callable[...] type and return Tuple[...] across multiple lines by putting
the Callable parameter and Tuple return types on their own indented lines (e.g.,
split the process_func: Callable[[Path], Tuple[Optional[FileMetrics],
Optional[FileAnalysisError], Optional[AnalysisSkipWarning]]], and the
progress_callback/return Tuple onto separate lines) so each line is <=100 chars
and conforms to black/flake8 formatting while keeping the same types and
parameter order.

In `@refactron/core/refactron.py`:
- Around line 268-270: The return type annotation on process_file_wrapper is too
long and causes black/line-length failures; update the function signature in
refactron.core.refactron by breaking the return type across multiple lines or
introducing a short type alias (e.g., AnalysisResult) used in the annotation, so
the annotation for Tuple[Optional[FileMetrics], Optional[FileAnalysisError],
Optional[AnalysisSkipWarning]] is wrapped or replaced to satisfy the 100-char
limit while preserving the same types.

In `@tests/test_symbol_table_incremental.py`:
- Around line 3-4: The test file imports unused symbols causing flake8 F401
failures; remove the unused imports Path and SymbolType from the import line so
only the used SymbolTableBuilder is imported from
refactron.analysis.symbol_table (i.e., adjust the import in
tests/test_symbol_table_incremental.py to drop Path and SymbolType).

---

Nitpick comments:
In `@tests/test_config_management.py`:
- Around line 1213-1214: The unpacked variables from pp.process_files (results,
errors, skips) are unused and flagged by Ruff RUF059; rename any unused unpacked
variables to start with an underscore (for example _results, _errors, _skips or
_ for single-use) in the call sites (including the shown line and the other
locations noted: around lines 1219-1220, 1224-1225, 1229-1230, 1247-1249,
1253-1254, 1268-1269, 1278-1279, 1284) so the linter stops warning while
preserving the assertions that actually check values where needed; update only
the variable names in unpackings for functions like pp.process_files and any
similar unpack patterns (do not change function signatures or logic).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9a6ad5d8-b291-45f1-829b-69c8f4b50792

📥 Commits

Reviewing files that changed from the base of the PR and between a9659f5 and fe47718.

📒 Files selected for processing (6)
  • refactron/analysis/symbol_table.py
  • refactron/core/parallel.py
  • refactron/core/refactron.py
  • tests/test_config_management.py
  • tests/test_performance_optimization.py
  • tests/test_symbol_table_incremental.py

Comment on lines +65 to 67
# Map: file_path -> { "mtime": float, "size": int, "sha256": str }
file_metadata: Dict[str, Dict[str, Any]] = field(default_factory=dict)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify Any-typed metadata declaration
rg -n 'file_metadata:\s*Dict\[str,\s*Dict\[str,\s*Any\]\]' refactron/analysis/symbol_table.py

# Verify bool-returning function currently returns expressions based on Any lookups
rg -n 'return current_hash != stored_hash|return stat\.st_mtime != metadata\.get\("mtime"\)' \
  refactron/analysis/symbol_table.py

Repository: Refactron-ai/Refactron_lib

Length of output: 264


🏁 Script executed:

# View the import statements and file_metadata declaration (around line 66)
head -70 refactron/analysis/symbol_table.py | tail -25

Repository: Refactron-ai/Refactron_lib

Length of output: 1035


🏁 Script executed:

# View the function containing the problematic return statements (around lines 176-189)
sed -n '170,195p' refactron/analysis/symbol_table.py

Repository: Refactron-ai/Refactron_lib

Length of output: 1005


🏁 Script executed:

# Check Python version targets and type annotations in the file
head -20 refactron/analysis/symbol_table.py

Repository: Refactron-ai/Refactron_lib

Length of output: 533


🏁 Script executed:

# Check if TypedDict is already imported or if we need typing_extensions
grep -n "TypedDict\|typing_extensions" refactron/analysis/symbol_table.py

Repository: Refactron-ai/Refactron_lib

Length of output: 52


🏁 Script executed:

# Check Python version targets in project configuration
fd -e pyproject.toml -e setup.py -e setup.cfg | head -3 | xargs grep -l "python_requires\|target-version" 2>/dev/null | head -1 | xargs cat | grep -A5 -B5 "python\|version" | head -30

Repository: Refactron-ai/Refactron_lib

Length of output: 52


🏁 Script executed:

# Find and read pyproject.toml to check Python version targets
find . -name "pyproject.toml" -type f | head -1 | xargs cat | grep -A10 "python" | head -15

Repository: Refactron-ai/Refactron_lib

Length of output: 679


🏁 Script executed:

# Also verify the complete context of the problematic lines one more time with broader view
sed -n '176,195p' refactron/analysis/symbol_table.py

Repository: Refactron-ai/Refactron_lib

Length of output: 775


Tighten file_metadata typing to resolve mypy no-any-return errors.

Dict[str, Dict[str, Any]] types metadata values as Any, causing comparisons at lines 186 and 189 to return Any instead of bool, violating the function's return type annotation.

Suggested fix
-from typing import Any, Dict, List, Optional
+from typing import Any, Dict, List, Optional, TypedDict
+
+
+class FileMetadata(TypedDict, total=False):
+    mtime: float
+    size: int
+    sha256: str
@@
-    file_metadata: Dict[str, Dict[str, Any]] = field(default_factory=dict)
+    file_metadata: Dict[str, FileMetadata] = field(default_factory=dict)
@@
-        metadata = self.symbol_table.file_metadata[file_path_str]
+        metadata: FileMetadata = self.symbol_table.file_metadata[file_path_str]
@@
-            stored_hash = metadata.get("sha256")
-            if stored_hash:
+            stored_hash = metadata.get("sha256")
+            if isinstance(stored_hash, str) and stored_hash:
                 current_hash = self._calculate_hash(file_path)
-                return current_hash != stored_hash
+                return bool(current_hash != stored_hash)
@@
-            return stat.st_mtime != metadata.get("mtime")
+            stored_mtime = metadata.get("mtime")
+            return bool(stored_mtime is None or stat.st_mtime != stored_mtime)

Required to satisfy mypy disallow_untyped_defs in refactron/ per coding guidelines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` around lines 65 - 67, Define a concrete
TypedDict (e.g., FileMetadata with keys "mtime": float, "size": int, "sha256":
str) and replace the loose type Dict[str, Dict[str, Any]] on file_metadata with
Dict[str, FileMetadata]; update any imports (from typing import TypedDict) and
usages so comparisons that read file_metadata[...] .get("mtime"/"size"/"sha256")
yield the correct types (float/int/str) instead of Any, ensuring functions in
symbol_table.py that compare metadata values (references to the file_metadata
variable) now return proper bools for mypy.

Comment on lines +150 to +156
# 1. Remove deleted files
cached_files = list(self.symbol_table.file_metadata.keys())
for cached_path in cached_files:
if cached_path not in current_file_paths:
logger.debug(f"Removing deleted file from symbol table: {cached_path}")
self.symbol_table.remove_file(cached_path)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Deleted-file cleanup misses stale symbols when old cache lacks file_metadata.

Line 151 derives cached_files only from self.symbol_table.file_metadata. If an older cache has symbols but no
metadata, deleted files are never purged and stale exports can persist.

Suggested fix
-        cached_files = list(self.symbol_table.file_metadata.keys())
+        cached_files = set(self.symbol_table.file_metadata.keys()) | set(
+            self.symbol_table.symbols.keys()
+        )
         for cached_path in cached_files:
             if cached_path not in current_file_paths:
                 logger.debug(f"Removing deleted file from symbol table: {cached_path}")
                 self.symbol_table.remove_file(cached_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` around lines 150 - 156, The deletion
cleanup currently uses only self.symbol_table.file_metadata to build
cached_files, so when an older cache has symbols but no file_metadata deleted
files aren't found; update the logic in the cleanup loop to compute cached_files
by first using self.symbol_table.file_metadata if present and non-empty,
otherwise fall back to deriving file paths from other symbol stores on the
symbol_table (e.g., keys from self.symbol_table.exported_symbols or any
self.symbol_table.files / self.symbol_table.symbols mapping), then proceed to
call self.symbol_table.remove_file(cached_path) for paths not in
current_file_paths; ensure you reference the existing symbols (exported_symbols,
remove_file, file_metadata) to locate the right structures.

Comment on lines +195 to +200
def _calculate_hash(self, file_path: Path) -> str:
"""Calculate SHA-256 hash of file content."""
try:
return hashlib.sha256(file_path.read_bytes()).hexdigest()
except Exception:
return ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not persist empty SHA-256 values; it weakens rollback-safe invalidation.

If _calculate_hash() fails, it returns "", and that value is stored in metadata. On later runs this disables the
hash path and falls back to mtime, which can miss changes after timestamp rollback.

Suggested fix
-def _calculate_hash(self, file_path: Path) -> str:
+def _calculate_hash(self, file_path: Path) -> Optional[str]:
@@
-        except Exception:
-            return ""
+        except OSError as e:
+            logger.warning(f"Failed to hash {file_path}: {e}")
+            return None
@@
-            self.symbol_table.file_metadata[file_path_str] = {
+            file_hash = self._calculate_hash(file_path)
+            if not file_hash:
+                self.symbol_table.file_metadata.pop(file_path_str, None)
+                return
+            self.symbol_table.file_metadata[file_path_str] = {
                 "mtime": stat.st_mtime,
                 "size": stat.st_size,
-                "sha256": self._calculate_hash(file_path),
+                "sha256": file_hash,
             }

Also applies to: 206-210

🧰 Tools
🪛 Ruff (0.15.9)

[warning] 199-199: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` around lines 195 - 200, The
_calculate_hash function currently returns an empty string on exception which
gets persisted and weakens rollback-safe invalidation; change its signature to
return Optional[str] (or None on failure) instead of "", log or capture the
underlying exception, and ensure callers (the code that stores metadata) treat
None as "no hash available" and skip persisting any hash value; apply the same
change to the other similar try/except block referenced (lines 206-210) so no
empty string hash is ever written to metadata.


from refactron.core.analysis_result import FileAnalysisError
from refactron.core.models import FileMetrics
from refactron.core.models import FileMetrics, AnalysisSkipWarning
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Import ordering needs adjustment.

The pipeline indicates isort failed. The import should maintain alphabetical order within the module import.

🔧 Proposed fix
-from refactron.core.models import FileMetrics, AnalysisSkipWarning
+from refactron.core.models import AnalysisSkipWarning, FileMetrics
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from refactron.core.models import FileMetrics, AnalysisSkipWarning
from refactron.core.models import AnalysisSkipWarning, FileMetrics
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/core/parallel.py` at line 10, The import ordering in
refactron.core.parallel is incorrect for isort: reorder the module-level imports
so they follow alphabetical order within their group; specifically ensure the
import line referencing FileMetrics and AnalysisSkipWarning from
refactron.core.models is positioned and ordered according to isort rules
(alphabetize the imported symbols and place the import in the correct import
group) so that FileMetrics and AnalysisSkipWarning are sorted and the file
passes isort.

Comment on lines +3 to +4
from pathlib import Path
from refactron.analysis.symbol_table import SymbolTableBuilder, SymbolType
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Drop unused imports to fix pre-commit failures.

Path and SymbolType are unused and currently break flake8 (F401).

Suggested fix
 import json
 import time
-from pathlib import Path
-from refactron.analysis.symbol_table import SymbolTableBuilder, SymbolType
+from refactron.analysis.symbol_table import SymbolTableBuilder
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from pathlib import Path
from refactron.analysis.symbol_table import SymbolTableBuilder, SymbolType
import json
import time
from refactron.analysis.symbol_table import SymbolTableBuilder
🧰 Tools
🪛 GitHub Actions: Pre-commit

[error] 3-3: flake8: F401 'pathlib.Path' imported but unused

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_symbol_table_incremental.py` around lines 3 - 4, The test file
imports unused symbols causing flake8 F401 failures; remove the unused imports
Path and SymbolType from the import line so only the used SymbolTableBuilder is
imported from refactron.analysis.symbol_table (i.e., adjust the import in
tests/test_symbol_table_incremental.py to drop Path and SymbolType).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
refactron/analysis/symbol_table.py (1)

12-13: ⚠️ Potential issue | 🟠 Major

Give file_metadata a concrete type and collapse the typing imports.

Dict[str, Dict[str, Any]] is still the source of the mypy no-any-return failures in _has_file_changed(), and the duplicated typing imports still leave Set unused. Defining a TypedDict here fixes the type hole and clears the import noise in one pass.

Suggested fix
-from typing import Any, Dict, List, Optional, Set
-from typing import Any, Dict, List, Optional
+from typing import Any, Dict, List, Optional, TypedDict
+
+
+class FileMetadata(TypedDict, total=False):
+    mtime: float
+    size: int
+    sha256: str
@@
-    file_metadata: Dict[str, Dict[str, Any]] = field(default_factory=dict)
+    file_metadata: Dict[str, FileMetadata] = field(default_factory=dict)

As per coding guidelines, refactron/**/*.py: Type annotations are required in
refactron/ with mypy disallow_untyped_defs = true enabled.

Also applies to: 66-67

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` around lines 12 - 13, Replace the loose
Dict[str, Dict[str, Any]] type for file_metadata with a concrete TypedDict
(e.g., FileMetadata or FileMeta specifying the inner keys and value types) and
update the signature/uses in _has_file_changed to return/accept that TypedDict
instead of Any; also collapse the duplicated typing imports into a single line
(only import the actually used names) and remove the unused Set import so mypy
no-any-return errors disappear and imports are tidy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@refactron/analysis/symbol_table.py`:
- Around line 148-165: The code compares and removes files using absolute paths
but calls _analyze_file/_has_file_changed/_update_file_metadata with the Path
object or non-normalized string, causing mismatches; normalize each file path
once into a single canonical string (e.g., file_path_str =
str(file_path.absolute())) before any checks and pass that same file_path_str
into _has_file_changed (or adapt _has_file_changed to accept the canonical
string), _analyze_file (modify _analyze_file to accept the canonical path or
pass it alongside the Path), remove_file, and _update_file_metadata so all
operations use the exact same normalized identifier and stale entries cannot
survive renames/relative roots.
- Around line 141-165: The incremental update in build_for_project currently
falls through into the legacy build and returns a stale symbols.json; fix by
short-circuiting: after loading cache (via _load_cache) and performing the
incremental delete/analyze/update loop (using _has_file_changed, _analyze_file,
_update_file_metadata and symbol_table.remove_file), save the updated table
(call _save_cache if available) and return self.symbol_table immediately when
self.cache_dir is truthy so the function does not continue into the old
implementation that reloads the stale file.

---

Duplicate comments:
In `@refactron/analysis/symbol_table.py`:
- Around line 12-13: Replace the loose Dict[str, Dict[str, Any]] type for
file_metadata with a concrete TypedDict (e.g., FileMetadata or FileMeta
specifying the inner keys and value types) and update the signature/uses in
_has_file_changed to return/accept that TypedDict instead of Any; also collapse
the duplicated typing imports into a single line (only import the actually used
names) and remove the unused Set import so mypy no-any-return errors disappear
and imports are tidy.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ed2eef62-9e3b-47f9-8fbd-dd0c57f385b4

📥 Commits

Reviewing files that changed from the base of the PR and between fe47718 and dfc7bb2.

📒 Files selected for processing (2)
  • refactron/analysis/symbol_table.py
  • tests/test_symbol_table_incremental.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_symbol_table_incremental.py

@shrutu0929 shrutu0929 force-pushed the perf/symbol-table-cache-invalidation branch from dfc7bb2 to 56f6932 Compare April 10, 2026 06:06
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (5)
refactron/analysis/symbol_table.py (5)

66-67: ⚠️ Potential issue | 🟠 Major

Give file_metadata a concrete shape.

Dict[str, Dict[str, Any]] lets mtime/size/sha256 flow through as Any, so _has_file_changed()
loses a real metadata contract and mypy can't validate those comparisons.

#!/bin/bash
set -euo pipefail

sed -n '60,75p' refactron/analysis/symbol_table.py
fd -a '^(pyproject\.toml|mypy\.ini|setup\.cfg)$' | xargs -r rg -n 'disallow_untyped_defs|mypy'

As per coding guidelines: refactron/**/*.py: Type annotations are required in refactron/ with mypy disallow_untyped_defs = true enabled.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` around lines 66 - 67, Replace the loose
Dict[str, Dict[str, Any]] type for file_metadata with a concrete metadata type
(e.g., a TypedDict or small dataclass named FileMetadata with fields mtime:
float, size: int, sha256: str) and change the annotation of file_metadata to
Dict[str, FileMetadata]; then update the uses (notably the _has_file_changed
function) to rely on those typed fields so mypy can validate comparisons and
avoid Any flowing through. Ensure you import typing.TypedDict (or dataclasses)
and update any related helper code to access metadata.mtime/metadata["mtime"]
consistent with your chosen concrete type.

221-235: ⚠️ Potential issue | 🟠 Major

Don't persist a blank hash.

When _calculate_hash() fails it returns "", and Line 235 stores that value. On the next run the hash branch
is skipped and change detection falls back to mtime, so timestamp rollback can hide a real content change.

Suggested fix
-    def _calculate_hash(self, file_path: Path) -> str:
+    def _calculate_hash(self, file_path: Path) -> Optional[str]:
         """Calculate SHA-256 hash of file content."""
         try:
             return hashlib.sha256(file_path.read_bytes()).hexdigest()
-        except Exception:
-            return ""
+        except OSError as e:
+            logger.warning(f"Failed to hash {file_path}: {e}")
+            return None
@@
-            self.symbol_table.file_metadata[file_path_str] = {
+            file_hash = self._calculate_hash(file_path)
+            if file_hash is None:
+                self.symbol_table.file_metadata.pop(file_path_str, None)
+                return
+
+            self.symbol_table.file_metadata[file_path_str] = {
                 "mtime": stat.st_mtime,
                 "size": stat.st_size,
-                "sha256": self._calculate_hash(file_path),
+                "sha256": file_hash,
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` around lines 221 - 235, The
_calculate_hash() currently returns an empty string on failure and
_update_file_metadata() persists that blank value, which hides content changes;
change _update_file_metadata() to call _calculate_hash(file_path) and only set
the "sha256" key in self.symbol_table.file_metadata[file_path_str] when the
returned hash is non-empty (or set it to None/omit the key) so that a failed
hash does not overwrite/replace a previous valid hash; ensure the rest of the
metadata ("mtime", "size") is still stored and reference the methods
_calculate_hash and _update_file_metadata and the dict
self.symbol_table.file_metadata when making the conditional change.

166-170: ⚠️ Potential issue | 🟠 Major

Include cached symbol files in deleted-file cleanup.

If an older symbols.json has symbols but no file_metadata, cached_files is empty here and deleted
modules keep exporting stale symbols forever. Build the candidate set from metadata plus the existing symbol
store before calling remove_file().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` around lines 166 - 170, The cleanup loop
only uses self.symbol_table.file_metadata keys so old symbols stored in
self.symbol_table.symbols can be missed; change the candidate set used for
cached_files in the cleanup (the code that currently sets cached_files =
list(self.symbol_table.file_metadata.keys()) and then calls
self.symbol_table.remove_file(cached_path)) to instead build a union of file
paths from file_metadata and any file paths referenced in the existing symbol
store (e.g. gather keys from self.symbol_table.file_metadata plus file paths
extracted from self.symbol_table.symbols values), then iterate that combined set
when calling remove_file to ensure stale symbols are removed.

181-190: ⚠️ Potential issue | 🔴 Critical

Return after the incremental pass.

Once Lines 165-180 finish, this legacy block reloads and returns the old cache on cache hits; on cold runs it
analyzes every file again. That defeats the incremental path and can surface stale data.

Suggested fix
     def build_for_project(self, project_root: Path) -> SymbolTable:
         """Scan project and build symbol table incrementally."""
         if self.cache_dir:
             cached_table = self._load_cache()
             if cached_table:
@@
             if self._has_file_changed(file_path_all, file_path_str):
                 logger.debug(f"Analyzing changed file: {file_path_str}")
                 self.symbol_table.remove_file(file_path_str)
                 self._analyze_file(file_path_all)
                 self._update_file_metadata(file_path_all, file_path_str)
-        """Scan project and build symbol table."""
-        if self.cache_dir:
-            cached = self._load_cache()
-            if cached:
-                # TODO: Implement incremental update logic here
-                return cached
-
-        python_files = list(project_root.rglob("*.py"))
-        for file_path in python_files:
-            self._analyze_file(file_path)
+
+            self._save_cache()
+            return self.symbol_table
+
+        for file_path in project_root.rglob("*.py"):
+            self._analyze_file(file_path.resolve())
 
         if self.cache_dir:
             self._save_cache()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` around lines 181 - 190, The current early
return on cache hit prevents any incremental update and causes stale or full
re-analysis; instead of returning cached immediately, call an incremental update
routine (e.g., implement and invoke self._incremental_update(cached,
project_root)) that compares cached metadata to on-disk files and calls
self._analyze_file(file_path) only for changed/new files, then merge updates
into cached and persist via the existing cache save path before returning the
updated symbol table; if you cannot implement incremental logic yet, remove the
early return so the full scan runs consistently (i.e., do not return cached from
_load_cache() without applying incremental update).

12-13: ⚠️ Potential issue | 🟡 Minor

Remove the duplicate typing import.

Line 12 reintroduces Set and duplicates the same import from Line 13, so pre-commit stays red with F401.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` around lines 12 - 13, Remove the
duplicated typing import: keep a single import that includes Set (e.g., the line
importing Any, Dict, List, Optional, Set) and delete the redundant "from typing
import Any, Dict, List, Optional" line so the module only imports typing names
once and F401 is resolved.
🧹 Nitpick comments (2)
tests/test_config_management.py (1)

909-909: Clean up unused unpacked variables to avoid Ruff RUF059 noise.

Several tests unpack values they never use. Prefer _/_name placeholders.

🔧 Example cleanup
-    _, errors, skips = p_seq.process_files(files, process_func)
+    _, errors, _ = p_seq.process_files(files, process_func)

-    results, errors, skips = p_thr.process_files(files, lambda p: (None, None, None))
+    results, errors, _ = p_thr.process_files(files, lambda p: (None, None, None))

-        results, errors, skips = pp.process_files([Path("a.py")], raises_func)
+        _, errors, _ = pp.process_files([Path("a.py")], raises_func)

Also applies to: 915-915, 1229-1229, 1247-1247, 1253-1253, 1268-1268, 1278-1278

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_config_management.py` at line 909, Tests are unpacking return
values from p_seq.process_files into variables that aren't used (e.g., "_,
errors, skips = p_seq.process_files(files, process_func)"); change the unused
names to underscore placeholders (e.g., "_", "_errors", or "_skips") so unused
results don't trigger Ruff RUF059; update all occurrences mentioned (the lines
using p_seq.process_files and similar unpacking at the listed locations) to use
appropriate "_" placeholders while leaving any actually used variables intact.
refactron/core/refactron.py (1)

268-304: The nested process_file_wrapper function has a pickling limitation that affects multiprocessing, but this is mitigated by current defaults and fallback handling.

While process_file_wrapper is not picklable for ProcessPoolExecutor, the default configuration uses threading (use_multiprocessing=False) and ParallelProcessor already documents this constraint in its docstring (lines 199-201). Additionally, if process pool creation fails, the code falls back gracefully to sequential processing (line 242-246 in parallel.py).

If multiprocessing support is intended as a feature, refactor the nested function to a module-level function or static method to ensure picklability. Otherwise, this is a non-issue under current defaults.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/core/refactron.py` around lines 268 - 304, The nested
process_file_wrapper function inside Refactron.process_files prevents pickling
for ProcessPoolExecutor; either lift it to a module-level function or make it a
`@staticmethod` on the Refactron class so it is picklable, update it to accept
explicit parameters it currently closes over (e.g., file_path,
self/_analyze_file, self.incremental_tracker, and logger) and pass those through
from the call site where ParallelProcessor.process_files is invoked, and keep
the existing exception handling and return shape so
ParallelProcessor.process_files can continue to work unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@refactron/analysis/symbol_table.py`:
- Around line 240-252: Remove the duplicate _analyze_file method so the
POSIX-normalizing implementation is the one used: delete the second definition
that uses str(file_path) and keep the first definition which creates
file_path_str via file_path.resolve().as_posix(), calls
self.inference_engine.parse_file(file_path_str) and then self._visit_node(tree,
file_path_str, "global"); ensure no other _analyze_file override remains so
symbol table keys use normalized paths.

In `@refactron/core/inference.py`:
- Around line 77-86: The code currently calls the non-existent
modutils.modname_from_path and silently falls back to an empty module_name which
loses package context; replace that call with
modutils.modpath_from_file(abs_path) and convert the returned module path to a
dotted module name for astroid.parse (assign to modname) while handling
ImportError/ValueError for files outside sys.path; alternatively, if resolution
fails, either accept a repository-derived module name passed into parse_file or
call manager.ast_from_file(abs_path) as a fallback instead of parsing with an
empty module_name—update the block around modutils usage and the return calling
astroid.parse(code, module_name=modname, path=abs_path) accordingly.

In `@refactron/core/parallel.py`:
- Line 75: The docstring line containing "(FileMetrics, None, skip_warn) on
success or (None, FileAnalysisError, None) on error." is over 100 characters and
triggers flake8 E501; split this sentence into two or more shorter lines (e.g.,
break before "on success" or before "or") in the docstring in
refactron/core/parallel.py so each line <=100 chars, keeping the same wording
and the tuple literals "(FileMetrics, None, skip_warn)" and "(None,
FileAnalysisError, None)" intact.

In `@refactron/core/refactron.py`:
- Around line 268-281: The helper function process_file_wrapper is missing its
try: block which leaves the except AnalysisError handler orphaned; wrap the body
that calls self._analyze_file(file_path), updates
self.incremental_tracker.update_file_state(file_path), and returns results
inside a try: so the existing except AnalysisError (and any other
except/finally) match; ensure indentation of the lines starting with
file_metrics, skip_warn = self._analyze_file(file_path) and the
incremental_tracker update are one level deeper under try and that the return
remains inside the try so exceptions propagate to the existing except
AnalysisError handler.

---

Duplicate comments:
In `@refactron/analysis/symbol_table.py`:
- Around line 66-67: Replace the loose Dict[str, Dict[str, Any]] type for
file_metadata with a concrete metadata type (e.g., a TypedDict or small
dataclass named FileMetadata with fields mtime: float, size: int, sha256: str)
and change the annotation of file_metadata to Dict[str, FileMetadata]; then
update the uses (notably the _has_file_changed function) to rely on those typed
fields so mypy can validate comparisons and avoid Any flowing through. Ensure
you import typing.TypedDict (or dataclasses) and update any related helper code
to access metadata.mtime/metadata["mtime"] consistent with your chosen concrete
type.
- Around line 221-235: The _calculate_hash() currently returns an empty string
on failure and _update_file_metadata() persists that blank value, which hides
content changes; change _update_file_metadata() to call
_calculate_hash(file_path) and only set the "sha256" key in
self.symbol_table.file_metadata[file_path_str] when the returned hash is
non-empty (or set it to None/omit the key) so that a failed hash does not
overwrite/replace a previous valid hash; ensure the rest of the metadata
("mtime", "size") is still stored and reference the methods _calculate_hash and
_update_file_metadata and the dict self.symbol_table.file_metadata when making
the conditional change.
- Around line 166-170: The cleanup loop only uses
self.symbol_table.file_metadata keys so old symbols stored in
self.symbol_table.symbols can be missed; change the candidate set used for
cached_files in the cleanup (the code that currently sets cached_files =
list(self.symbol_table.file_metadata.keys()) and then calls
self.symbol_table.remove_file(cached_path)) to instead build a union of file
paths from file_metadata and any file paths referenced in the existing symbol
store (e.g. gather keys from self.symbol_table.file_metadata plus file paths
extracted from self.symbol_table.symbols values), then iterate that combined set
when calling remove_file to ensure stale symbols are removed.
- Around line 181-190: The current early return on cache hit prevents any
incremental update and causes stale or full re-analysis; instead of returning
cached immediately, call an incremental update routine (e.g., implement and
invoke self._incremental_update(cached, project_root)) that compares cached
metadata to on-disk files and calls self._analyze_file(file_path) only for
changed/new files, then merge updates into cached and persist via the existing
cache save path before returning the updated symbol table; if you cannot
implement incremental logic yet, remove the early return so the full scan runs
consistently (i.e., do not return cached from _load_cache() without applying
incremental update).
- Around line 12-13: Remove the duplicated typing import: keep a single import
that includes Set (e.g., the line importing Any, Dict, List, Optional, Set) and
delete the redundant "from typing import Any, Dict, List, Optional" line so the
module only imports typing names once and F401 is resolved.

---

Nitpick comments:
In `@refactron/core/refactron.py`:
- Around line 268-304: The nested process_file_wrapper function inside
Refactron.process_files prevents pickling for ProcessPoolExecutor; either lift
it to a module-level function or make it a `@staticmethod` on the Refactron class
so it is picklable, update it to accept explicit parameters it currently closes
over (e.g., file_path, self/_analyze_file, self.incremental_tracker, and logger)
and pass those through from the call site where ParallelProcessor.process_files
is invoked, and keep the existing exception handling and return shape so
ParallelProcessor.process_files can continue to work unchanged.

In `@tests/test_config_management.py`:
- Line 909: Tests are unpacking return values from p_seq.process_files into
variables that aren't used (e.g., "_, errors, skips = p_seq.process_files(files,
process_func)"); change the unused names to underscore placeholders (e.g., "_",
"_errors", or "_skips") so unused results don't trigger Ruff RUF059; update all
occurrences mentioned (the lines using p_seq.process_files and similar unpacking
at the listed locations) to use appropriate "_" placeholders while leaving any
actually used variables intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5246dca8-b5b3-4597-b300-aa84c97a5bb9

📥 Commits

Reviewing files that changed from the base of the PR and between dfc7bb2 and d0a04a6.

📒 Files selected for processing (7)
  • refactron/analysis/symbol_table.py
  • refactron/core/inference.py
  • refactron/core/parallel.py
  • refactron/core/refactron.py
  • tests/test_config_management.py
  • tests/test_performance_optimization.py
  • tests/test_symbol_table_incremental.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_symbol_table_incremental.py

Comment on lines +77 to +86
modname = ""
try:
from astroid import modutils

modname = modutils.modname_from_path(abs_path)
except Exception:
pass

# Use string_build via parse to avoid manager.ast_from_file's internal caching
return astroid.parse(code, module_name=modname, path=abs_path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "inference.py" | head -20

Repository: Refactron-ai/Refactron_lib

Length of output: 99


🏁 Script executed:

# Look for refactron directory structure
fd -type f "inference\.py" | grep refactron

Repository: Refactron-ai/Refactron_lib

Length of output: 240


🏁 Script executed:

# Get the file location and content
ls -la refactron/core/inference.py 2>/dev/null && wc -l refactron/core/inference.py

Repository: Refactron-ai/Refactron_lib

Length of output: 178


🏁 Script executed:

cat -n refactron/core/inference.py | sed -n '65,100p'

Repository: Refactron-ai/Refactron_lib

Length of output: 1757


🏁 Script executed:

# Also check the full function signature and any related context
cat -n refactron/core/inference.py | sed -n '1,90p'

Repository: Refactron-ai/Refactron_lib

Length of output: 4199


🏁 Script executed:

# Check if BackupManager is imported or used in this file
grep -n "BackupManager\|validate_backup_integrity" refactron/core/inference.py

Repository: Refactron-ai/Refactron_lib

Length of output: 52


🏁 Script executed:

# Check the broader context - what calls parse_file and expects?
rg -n "parse_file" --type py

Repository: Refactron-ai/Refactron_lib

Length of output: 2087


🏁 Script executed:

# Look at modutils to understand what modname_from_path returns
rg -A 5 "modutils.modname_from_path" --type py

Repository: Refactron-ai/Refactron_lib

Length of output: 518


🏁 Script executed:

# Check how symbol_table.py uses the result from parse_file()
cat -n refactron/analysis/symbol_table.py | sed -n '240,270p'

Repository: Refactron-ai/Refactron_lib

Length of output: 1510


🏁 Script executed:

# Look at the test cases to understand expected behavior
cat -n tests/test_rag_parser.py | sed -n '84,110p'

Repository: Refactron-ai/Refactron_lib

Length of output: 1352


🏁 Script executed:

# Check if there are any files with relative imports that would be problematic
rg "from \.\." --type py | head -20

Repository: Refactron-ai/Refactron_lib

Length of output: 208


🏁 Script executed:

# Search for any relative import issues, bugs, or TODOs related to module_name
rg "relative import|package context" --type py -i

Repository: Refactron-ai/Refactron_lib

Length of output: 694


🏁 Script executed:

# Check for any test failures or issues related to module_name or modutils
rg "modname|module_name" refactron/core/inference.py -B 2 -A 2

Repository: Refactron-ai/Refactron_lib

Length of output: 1297


🏁 Script executed:

# Look for git history or comments about this change
git log --oneline -20 -- refactron/core/inference.py 2>/dev/null || echo "Git history unavailable"

Repository: Refactron-ai/Refactron_lib

Length of output: 176


🏁 Script executed:

# Check if there are any conftest or fixture files that test relative imports
rg "from \.\." tests/ --type py | head -5

Repository: Refactron-ai/Refactron_lib

Length of output: 208


🏁 Script executed:

# Look at the test for relative import detection
cat -n tests/test_analyzer_coverage_supplement.py | grep -A 10 "Test relative import"

Repository: Refactron-ai/Refactron_lib

Length of output: 414


🏁 Script executed:

# Check what dependency_analyzer does with relative imports
cat -n refactron/analyzers/dependency_analyzer.py | sed -n '1,50p'

Repository: Refactron-ai/Refactron_lib

Length of output: 2084


🏁 Script executed:

# Search for any error logs or issues with module inference
rg "Failed to.*infer|infer.*failed" --type py -i

Repository: Refactron-ai/Refactron_lib

Length of output: 52


🏁 Script executed:

# Check if there's documentation about astroid's parse function behavior
rg "astroid.parse" refactron/ --type py -B 2 -A 2

Repository: Refactron-ai/Refactron_lib

Length of output: 859


🏁 Script executed:

# Check if there are actual test cases that would fail with empty module_name
rg "from \.\." tests/ --type py -B 2 -A 2

Repository: Refactron-ai/Refactron_lib

Length of output: 435


🏁 Script executed:

# Look at the actual test for relative import detection more closely
cat -n tests/test_analyzer_coverage_supplement.py | sed -n '195,220p'

Repository: Refactron-ai/Refactron_lib

Length of output: 965


🏁 Script executed:

# Check if symbol_table.py actually needs to resolve relative imports
rg "infer\|relative import\|package" refactron/analysis/symbol_table.py --type py | head -20

Repository: Refactron-ai/Refactron_lib

Length of output: 52


🏁 Script executed:

# Look for any actual issues in tests related to module resolution
rg "modname|module_name" tests/ --type py

Repository: Refactron-ai/Refactron_lib

Length of output: 52


🏁 Script executed:

# Check if modutils.modname_from_path can actually fail, and in what scenarios
python3 << 'EOF'
try:
    from astroid import modutils
    # Test with a random path
    result = modutils.modname_from_path("/some/random/path/not/in/sys/path.py")
    print(f"Result for path not in sys.path: {result}")
except Exception as e:
    print(f"Exception type: {type(e).__name__}, message: {e}")
EOF

Repository: Refactron-ai/Refactron_lib

Length of output: 173


🏁 Script executed:

# Check the actual astroid documentation or source comments about module_name and path
python3 << 'EOF'
import inspect
import astroid
# Get the signature and docstring of astroid.parse
sig = inspect.signature(astroid.parse)
print("astroid.parse signature:")
print(sig)
print("\nDocstring:")
print(astroid.parse.__doc__)
EOF

Repository: Refactron-ai/Refactron_lib

Length of output: 599


🏁 Script executed:

# Look for any actual reported issues or TODOs in the code about this
rg "TODO|FIXME|BUG|XXX" refactron/core/inference.py --type py

Repository: Refactron-ai/Refactron_lib

Length of output: 52


🏁 Script executed:

# Check if there are any test failures or skip markers related to this
rg "@pytest.mark.skip|@pytest.mark.xfail|@unittest.skip" tests/ --type py | grep -i "import\|module\|inference"

Repository: Refactron-ai/Refactron_lib

Length of output: 52


🏁 Script executed:

# Check what version of astroid is being used and what functions are available
python3 << 'EOF'
import astroid
print(f"Astroid version: {astroid.__version__}")
print(f"modutils attributes:")
from astroid import modutils
print([attr for attr in dir(modutils) if 'modname' in attr.lower() or 'path' in attr.lower()])
EOF

Repository: Refactron-ai/Refactron_lib

Length of output: 486


🏁 Script executed:

# Check if there's a requirements file or setup.py that specifies astroid version
cat setup.py pyproject.toml requirements.txt 2>/dev/null | grep -A 2 -B 2 "astroid" | head -20

Repository: Refactron-ai/Refactron_lib

Length of output: 228


🏁 Script executed:

# Look at what the code is actually importing from modutils
rg "from astroid import modutils|from astroid.modutils import" --type py

Repository: Refactron-ai/Refactron_lib

Length of output: 142


🏁 Script executed:

# Check if there's any version compatibility handling in the codebase
rg "astroid.__version__|astroid.version" --type py

Repository: Refactron-ai/Refactron_lib

Length of output: 195


🏁 Script executed:

# Verify the actual modpath_from_file function and how it works
python3 << 'EOF'
from astroid import modutils
import inspect
print("modpath_from_file signature:")
print(inspect.signature(modutils.modpath_from_file))
print("\nDocstring:")
print(modutils.modpath_from_file.__doc__)
EOF

Repository: Refactron-ai/Refactron_lib

Length of output: 715


🏁 Script executed:

# Test modpath_from_file to see what it returns
python3 << 'EOF'
from astroid import modutils
try:
    result = modutils.modpath_from_file("/some/random/path.py")
    print(f"modpath_from_file result: {result}")
except Exception as e:
    print(f"Exception: {type(e).__name__}: {e}")

# Try with an actual file
import tempfile
import os
with tempfile.NamedTemporaryFile(suffix=".py", delete=False) as f:
    temp_path = f.name
try:
    result = modutils.modpath_from_file(temp_path)
    print(f"modpath_from_file for temp file: {result}")
except Exception as e:
    print(f"Exception for temp file: {type(e).__name__}: {e}")
finally:
    os.unlink(temp_path)
EOF

Repository: Refactron-ai/Refactron_lib

Length of output: 544


🏁 Script executed:

# Check if modname_from_path was available in older astroid versions
git log -p --all -S "modname_from_path" -- "*.py" | head -50 2>/dev/null || echo "Not found in git history"

Repository: Refactron-ai/Refactron_lib

Length of output: 52


🏁 Script executed:

# Check the actual error handling in parse_file - what exceptions are expected?
cat -n refactron/core/inference.py | sed -n '71,95p'

Repository: Refactron-ai/Refactron_lib

Length of output: 1226


🏁 Script executed:

# Test if astroid can resolve relative imports with empty module_name but valid path
python3 << 'EOF'
import astroid
import tempfile
import os
from pathlib import Path

# Create a test package structure
temp_dir = tempfile.mkdtemp()
pkg_dir = os.path.join(temp_dir, "mypackage")
os.makedirs(pkg_dir)

# Create __init__.py
with open(os.path.join(pkg_dir, "__init__.py"), "w") as f:
    f.write("# init\n")

# Create utils.py
with open(os.path.join(pkg_dir, "utils.py"), "w") as f:
    f.write("def helper(): return 42\n")

# Create test.py with relative import
test_file = os.path.join(pkg_dir, "test.py")
with open(test_file, "w") as f:
    f.write("""from . import utils
def main():
    return utils.helper()
""")

# Test 1: Parse with empty module_name
print("Test 1: Parse with empty module_name and path")
with open(test_file) as f:
    code = f.read()
try:
    module1 = astroid.parse(code, module_name="", path=test_file)
    print(f"  Success. Module name: '{module1.name}'")
    # Try to infer the import
    for node in module1.nodes_of_class(astroid.ImportFrom):
        print(f"  Found ImportFrom: {node.as_string()}")
except Exception as e:
    print(f"  Error: {e}")

# Test 2: Parse with correct module_name
print("\nTest 2: Parse with correct module_name='mypackage.test' and path")
try:
    module2 = astroid.parse(code, module_name="mypackage.test", path=test_file)
    print(f"  Success. Module name: '{module2.name}'")
    for node in module2.nodes_of_class(astroid.ImportFrom):
        print(f"  Found ImportFrom: {node.as_string()}")
except Exception as e:
    print(f"  Error: {e}")

# Cleanup
import shutil
shutil.rmtree(temp_dir)
EOF

Repository: Refactron-ai/Refactron_lib

Length of output: 646


🏁 Script executed:

# Check what the correct way to get module name from file path should be
python3 << 'EOF'
from astroid import modutils
import os

# Show the correct usage of modpath_from_file
print("Correct function: modpath_from_file()")
print("It returns a list of module path components")
print("\nExample of how to convert modpath to modname:")
modpath = ["mypackage", "test"]
modname = ".".join(modpath)
print(f"  modpath: {modpath}")
print(f"  modname: {modname}")
EOF

Repository: Refactron-ai/Refactron_lib

Length of output: 257


🏁 Script executed:

# Test if inference actually works with empty module_name for relative imports
python3 << 'EOF'
import astroid
import tempfile
import os
from pathlib import Path

# Create a test package structure
temp_dir = tempfile.mkdtemp()
pkg_dir = os.path.join(temp_dir, "mypackage")
os.makedirs(pkg_dir)

# Create __init__.py
with open(os.path.join(pkg_dir, "__init__.py"), "w") as f:
    f.write("# init\n")

# Create utils.py
with open(os.path.join(pkg_dir, "utils.py"), "w") as f:
    f.write("def helper(): return 42\n")

# Create test.py with relative import
test_file = os.path.join(pkg_dir, "test.py")
with open(test_file) as f:
    code = f.read()

# Parse with empty module_name (current behavior)
print("Test: Inference with empty module_name")
module1 = astroid.parse(code, module_name="", path=test_file)
print(f"Module name: '{module1.name}'")

# Try to infer the imported symbol
for node in module1.nodes_of_class(astroid.nodes.ImportFrom):
    print(f"ImportFrom node: level={node.level}, names={node.names}")
    try:
        # Try to infer what 'utils' refers to
        inferred = list(node.infer_name("utils", None))
        print(f"  Inferred 'utils': {inferred}")
    except Exception as e:
        print(f"  Inference failed: {type(e).__name__}: {e}")

print("\n---\n")

# Parse with correct module_name (desired behavior)
print("Test: Inference with correct module_name='mypackage.test'")
module2 = astroid.parse(code, module_name="mypackage.test", path=test_file)
print(f"Module name: '{module2.name}'")

for node in module2.nodes_of_class(astroid.nodes.ImportFrom):
    print(f"ImportFrom node: level={node.level}, names={node.names}")
    try:
        inferred = list(node.infer_name("utils", None))
        print(f"  Inferred 'utils': {inferred}")
    except Exception as e:
        print(f"  Inference failed: {type(e).__name__}: {e}")

# Cleanup
import shutil
shutil.rmtree(temp_dir)
EOF

Repository: Refactron-ai/Refactron_lib

Length of output: 236


🏁 Script executed:

# Check if symbol_table.py actually needs to infer relative imports
cat -n refactron/analysis/symbol_table.py | sed -n '1,50p'

Repository: Refactron-ai/Refactron_lib

Length of output: 1841


🏁 Script executed:

# Check if there are any actual test failures when running tests
rg "test.*parse_file|test.*inference" tests/ --type py -l

Repository: Refactron-ai/Refactron_lib

Length of output: 94


🏁 Script executed:

# Fix the test - create the file first, then read it
python3 << 'EOF'
import astroid
import tempfile
import os
from pathlib import Path

# Create a test package structure
temp_dir = tempfile.mkdtemp()
pkg_dir = os.path.join(temp_dir, "mypackage")
os.makedirs(pkg_dir)

# Create __init__.py
with open(os.path.join(pkg_dir, "__init__.py"), "w") as f:
    f.write("# init\n")

# Create utils.py
with open(os.path.join(pkg_dir, "utils.py"), "w") as f:
    f.write("def helper(): return 42\n")

# Create test.py with relative import - BEFORE reading it
test_file = os.path.join(pkg_dir, "test.py")
with open(test_file, "w") as f:
    f.write("""from . import utils
def main():
    return utils.helper()
""")

# NOW read it
with open(test_file) as f:
    code = f.read()

# Parse with empty module_name (current behavior)
print("Test: Inference with empty module_name")
module1 = astroid.parse(code, module_name="", path=test_file)
print(f"Module name: '{module1.name}'")

# Try to infer the imported symbol
for node in module1.nodes_of_class(astroid.nodes.ImportFrom):
    print(f"ImportFrom node: level={node.level}, names={node.names}")
    try:
        inferred = list(node.infer_name("utils", None))
        print(f"  Inferred 'utils': {inferred}")
    except Exception as e:
        print(f"  Inference failed: {type(e).__name__}: {e}")

print("\n---\n")

# Parse with correct module_name (desired behavior)
print("Test: Inference with correct module_name='mypackage.test'")
module2 = astroid.parse(code, module_name="mypackage.test", path=test_file)
print(f"Module name: '{module2.name}'")

for node in module2.nodes_of_class(astroid.nodes.ImportFrom):
    print(f"ImportFrom node: level={node.level}, names={node.names}")
    try:
        inferred = list(node.infer_name("utils", None))
        print(f"  Inferred 'utils': {inferred}")
    except Exception as e:
        print(f"  Inference failed: {type(e).__name__}: {e}")

# Cleanup
import shutil
shutil.rmtree(temp_dir)
EOF

Repository: Refactron-ai/Refactron_lib

Length of output: 490


🏁 Script executed:

# Check what the test_rag_parser tests actually do
cat -n tests/test_rag_parser.py | sed -n '84,120p'

Repository: Refactron-ai/Refactron_lib

Length of output: 1826


🏁 Script executed:

# Look for any test that calls InferenceEngine.parse_file directly
rg "InferenceEngine\|parse_file" tests/ --type py -B 2 -A 5 | head -50

Repository: Refactron-ai/Refactron_lib

Length of output: 52


🏁 Script executed:

# Check if there are any actual test failures by examining test output or fixtures
cat -n tests/test_rag_parser.py | sed -n '1,50p'

Repository: Refactron-ai/Refactron_lib

Length of output: 1516


🏁 Script executed:

# Look at what parse_file returns and how it's used in RAG parser
cat -n refactron/rag/parser.py | sed -n '185,210p'

Repository: Refactron-ai/Refactron_lib

Length of output: 1234


🏁 Script executed:

# The key question: does symbol_table need correct module_name for its use case?
cat -n refactron/analysis/symbol_table.py | sed -n '240,280p'

Repository: Refactron-ai/Refactron_lib

Length of output: 2045


🏁 Script executed:

# Verify if modname_from_path is truly non-existent or if it's from a different import
rg "modname_from_path" . --type py

Repository: Refactron-ai/Refactron_lib

Length of output: 162


🏁 Script executed:

# Check if there's maybe a compatibility layer or if this is just dead code
rg "def modname_from_path" . --type py

Repository: Refactron-ai/Refactron_lib

Length of output: 52


🏁 Script executed:

# Check the git commit history for this specific function
git log --oneline -p refactron/core/inference.py | grep -A 10 -B 5 "modname_from_path" | head -40

Repository: Refactron-ai/Refactron_lib

Length of output: 731


🏁 Script executed:

# Check if maybe there was an older version of astroid that had this function
python3 << 'EOF'
# See if there's any way to resolve module name correctly from a path
from astroid import modutils
import os
import sys

test_path = "/some/nonexistent/path/mymodule.py"
print("Available functions in modutils related to module/path:")
for attr in sorted(dir(modutils)):
    if not attr.startswith('_'):
        print(f"  - {attr}")
EOF

Repository: Refactron-ai/Refactron_lib

Length of output: 1044


🏁 Script executed:

# Check if the code even runs - try to import and use it
python3 << 'EOF'
import sys
sys.path.insert(0, '.')
try:
    from refactron.core.inference import InferenceEngine
    print("Import successful")
    # Try to use parse_file
    import tempfile
    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
        f.write("x = 1\n")
        f.flush()
        try:
            result = InferenceEngine.parse_file(f.name)
            print(f"parse_file succeeded: {result}")
        except Exception as e:
            print(f"parse_file failed: {type(e).__name__}: {e}")
        finally:
            import os
            os.unlink(f.name)
except Exception as e:
    print(f"Import failed: {e}")
EOF

Repository: Refactron-ai/Refactron_lib

Length of output: 341


🏁 Script executed:

# Final verification: Check if symbol_table actually needs correct module_name
# by checking what operations it does on the parsed tree
cat -n refactron/analysis/symbol_table.py | sed -n '264,320p'

Repository: Refactron-ai/Refactron_lib

Length of output: 2504


🏁 Script executed:

# Verify the actual impact: does empty module_name cause problems for inference in practice?
python3 << 'EOF'
import astroid
import tempfile
import os

# Create a simple test
temp_dir = tempfile.mkdtemp()
test_file = os.path.join(temp_dir, "test.py")
with open(test_file, "w") as f:
    f.write("""
class MyClass:
    def method(self):
        return 42

def my_function():
    pass
""")

# Parse with empty module_name (current broken behavior)
with open(test_file) as f:
    code = f.read()
    
module = astroid.parse(code, module_name="", path=test_file)
print(f"Module name: '{module.name}'")
print(f"Module file: {module.file}")
print(f"Classes: {[c.name for c in module.nodes_of_class(astroid.nodes.ClassDef)]}")
print(f"Functions: {[f.name for f in module.nodes_of_class(astroid.nodes.FunctionDef)]}")

# Can we visit the nodes like symbol_table does?
for node in module.body:
    if isinstance(node, (astroid.nodes.ClassDef, astroid.nodes.FunctionDef)):
        print(f"Node: {node.name}, line: {node.lineno}")

import shutil
shutil.rmtree(temp_dir)
EOF

Repository: Refactron-ai/Refactron_lib

Length of output: 231


Fix the non-existent function call and properly resolve module names.

The code calls modutils.modname_from_path() which does not exist in astroid 4.0.4 (or any version). This causes an AttributeError that gets silently caught, leaving modname as an empty string. The correct approach is to use modpath_from_file() to derive the module name from the file path.

With empty module_name, astroid parses the code but loses package context. This breaks relative import resolution and package-aware inference for files outside sys.path, undermining the cache-correctness goal.

Instead of silently falling back to empty module name, either:

  1. Use modutils.modpath_from_file() to properly resolve the module path and convert it to a module name (handling ImportError for files outside sys.path), or
  2. Pass a repository-derived module name into parse_file, or
  3. Fall back to manager.ast_from_file() only when module resolution fails.
🧰 Tools
🪛 GitHub Actions: CI

[error] Command failed with exit code 1: flake8 refactron --count --select=E9,F63,F7,F82 --show-source --statistics

🪛 Ruff (0.15.9)

[error] 82-83: try-except-pass detected, consider logging the exception

(S110)


[warning] 82-82: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/core/inference.py` around lines 77 - 86, The code currently calls
the non-existent modutils.modname_from_path and silently falls back to an empty
module_name which loses package context; replace that call with
modutils.modpath_from_file(abs_path) and convert the returned module path to a
dotted module name for astroid.parse (assign to modname) while handling
ImportError/ValueError for files outside sys.path; alternatively, if resolution
fails, either accept a repository-derived module name passed into parse_file or
call manager.ast_from_file(abs_path) as a fallback instead of parsing with an
empty module_name—update the block around modutils usage and the return calling
astroid.parse(code, module_name=modname, path=abs_path) accordingly.

files: List of file paths to process.
process_func: Function to process a single file. Should return
(FileMetrics, None) on success or (None, FileAnalysisError) on error.
(FileMetrics, None, skip_warn) on success or (None, FileAnalysisError, None) on error.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Line 75 exceeds the max line length and is currently failing flake8.

This is causing the pre-commit failure (E501).

🔧 Proposed fix
-            process_func: Function to process a single file. Should return
-                         (FileMetrics, None, skip_warn) on success or (None, FileAnalysisError, None) on error.
+            process_func: Function to process a single file.
+                Should return (FileMetrics, None, skip_warn) on success
+                or (None, FileAnalysisError, None) on error.
As per coding guidelines `**/*.py`: Use line length of 100 characters, enforced by black, isort, and flake8.
🧰 Tools
🪛 GitHub Actions: CI

[error] Command failed with exit code 1: flake8 refactron --count --select=E9,F63,F7,F82 --show-source --statistics

🪛 GitHub Actions: Pre-commit

[error] 75-75: pre-commit hook 'flake8' failed (exit code 1): E501 line too long (111 > 100 characters)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/core/parallel.py` at line 75, The docstring line containing
"(FileMetrics, None, skip_warn) on success or (None, FileAnalysisError, None) on
error." is over 100 characters and triggers flake8 E501; split this sentence
into two or more shorter lines (e.g., break before "on success" or before "or")
in the docstring in refactron/core/parallel.py so each line <=100 chars, keeping
the same wording and the tuple literals "(FileMetrics, None, skip_warn)" and
"(None, FileAnalysisError, None)" intact.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
refactron/core/refactron.py (1)

341-341: ⚠️ Potential issue | 🟠 Major

Fix flake8 E501 at Line 341 (current CI blocker).

The recovery suggestion string is over 100 chars and fails pre-commit (E501), so this should be wrapped.

🔧 Proposed fix
-                            recovery_suggestion="Check the file for syntax errors or encoding issues",
+                            recovery_suggestion=(
+                                "Check the file for syntax errors or encoding issues"
+                            ),

As per coding guidelines, **/*.py must use a maximum line length of 100 characters.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/core/refactron.py` at line 341, The long literal passed as
recovery_suggestion exceeds the 100-char line length; locate the call that
supplies recovery_suggestion in refactron.core.refactron (the argument named
recovery_suggestion) and split the message across multiple shorter string
literals (using implicit concatenation inside parentheses or concatenated with
+) so no single source line is longer than 100 chars, preserving the same text
and punctuation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@refactron/core/refactron.py`:
- Line 341: The long literal passed as recovery_suggestion exceeds the 100-char
line length; locate the call that supplies recovery_suggestion in
refactron.core.refactron (the argument named recovery_suggestion) and split the
message across multiple shorter string literals (using implicit concatenation
inside parentheses or concatenated with +) so no single source line is longer
than 100 chars, preserving the same text and punctuation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 73c612e7-b837-4a08-b849-a02e11ee4a44

📥 Commits

Reviewing files that changed from the base of the PR and between 13bfe96 and 753d176.

📒 Files selected for processing (1)
  • refactron/core/refactron.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant