perf: Cache or narrow TestSuiteGate test discovery (avoid full rglob … by shrutu0929 · Pull Request #159 · Refactron-ai/Refactron_lib

shrutu0929 · 2026-04-07T13:03:43Z

solve #153
The latest update aggressively drastically optimizes the time TestSuiteGate spends dynamically discovering test coverage bindings, fixing a significant performance bottleneck scaling heavily with overall project sizes. Previously, the verification check blindly executed a full recursive search (rglob("*.py")) across the entire workspace structure for every single fix candidate, constantly rescanning bloated internal dependency folders (like .venv or node_modules) and redundantly parsing AST layouts on unchanged assets. We solved this by instituting three layers of restriction: first, we instituted strict semantic path exclusions to bypass generic, noisy directories entirely; second, the search mechanism now explicitly pivots to prioritize established structural endpoints like tests/ or test/; finally, we configured a persisting dictionary state-cache at the class level. By securely storing the mapped outcome of which files internally import which module branches, subsequent checks against identical files immediately short-circuit without re-triggering disk I/O, ensuring verification iteration speeds reliably parallel your actual test scope density instead of being bogged down by sheer repository weight.

Summary by CodeRabbit

Release Notes

Refactor
- Test discovery and caching mechanisms have been enhanced for improved performance. The verification system now intelligently caches discovered test files and test results per module, eliminating redundant file system scans. This optimization improves the overall efficiency of verification check processes and reduces scan times for projects with multiple test suites.

…per verify)

coderabbitai · 2026-04-07T13:03:59Z

📝 Walkthrough

Walkthrough

The TestSuiteGate class now implements caching for test file discovery and per-module relevant test results. The first call to _find_relevant_tests scans designated test directories for Python files matching test naming conventions while excluding specified directories, with subsequent calls reusing cached results.

Changes

Cohort / File(s)	Summary
Test File Discovery Caching `refactron/verification/checks/test_gate.py`	Implements caching of discovered test files and per-module test results in `TestSuiteGate._find_relevant_tests`. Adds `_all_test_files` cache built on first call by scanning `tests/` and `test/` directories (or `search_root` fallback), filtering for `test_.py` / `_test.py` patterns while excluding directories like `.git`, `__pycache__`, `venv`, and `node_modules`. Subsequent calls reuse `_test_file_cache[module_name]` to avoid redundant test file lookup.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

perf: cache or index TestSuiteGate relevant-test discovery #139: PR directly addresses this issue by implementing caching of discovered test files and per-module relevant test results, along with skipping non-test directories during discovery to improve performance.

Poem

🐰 A rabbit hops through test files fast,
With caches built to make them last,
No .git or venv slows the way,
Tests are found in half the day! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title directly and clearly describes the main performance improvement: caching and narrowing test discovery in TestSuiteGate to avoid full rglob scans.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@refactron/verification/checks/test_gate.py`:
- Line 118: Remove the trailing whitespace in the test file (test_gate.py) that
is causing the pre-commit hook to fail: open the file and delete the trailing
spaces at the ends of the affected lines, ensure your editor is set to trim
trailing whitespace on save (or run a trim command), then re-run the pre-commit
hooks (or git commit) to verify the hook passes before pushing.
- Around line 112-113: The cache key currently uses module_name = file_path.stem
which can collide across packages; instead compute a module-scoped identifier
based on the file path relative to the search_root/project_root (use search_root
or self.project_root to derive file_path.relative_to(search_root) and strip the
suffix, converting separators to a dotted module path or a normalized path
string) and replace all uses of module_name in the cache lookup/write sites (the
cache read around where reuse occurs and the cache write where stale writes
happen) to use this new module_path; also defend against files not under the
search_root by falling back to file_path.as_posix() to avoid exceptions.
- Around line 21-24: The __init__ method in the TestGate class is missing an
explicit return type; add an explicit "-> None" annotation to the __init__
signature (the constructor defined as def __init__(...)) to comply with the
typed-def policy and mypy disallow_untyped_defs; update the signature only (no
body changes) so the method becomes explicitly typed.
- Around line 119-120: The current discovery narrows search_dirs to only
test_dirs when a tests/ or test/ exists, which misses colocated tests; modify
the logic in test_gate.py so that search_dirs includes both the found test
directories and the repository root for fallback (e.g., make search_dirs =
test_dirs + [search_root] with deduplication/ordering) instead of replacing the
root entirely, ensuring colocated tests like pkg/foo_test.py are still
discovered while retaining the fast-path of scanning tests/ or test/.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9ef1b33d-49fc-433e-92ef-dc73a802ab3a

📥 Commits

Reviewing files that changed from the base of the PR and between a9659f5 and 42e9694.

📒 Files selected for processing (1)

refactron/verification/checks/test_gate.py

coderabbitai · 2026-04-07T13:08:17Z

refactron/verification/checks/test_gate.py

    def __init__(self, project_root: Optional[Path] = None):
        self.project_root = project_root
+        self._test_file_cache: Optional[Dict[str, List[Path]]] = None
+        self._all_test_files: Optional[List[Path]] = None


🛠️ Refactor suggestion | 🟠 Major

Add explicit -> None on __init__ to satisfy typed-def policy.

At Line 21, __init__ is missing an explicit return type, which violates the typed-def rule under refactron/.

Proposed fix

- def __init__(self, project_root: Optional[Path] = None): + def __init__(self, project_root: Optional[Path] = None) -> None: self.project_root = project_root self._test_file_cache: Optional[Dict[str, List[Path]]] = None self._all_test_files: Optional[List[Path]] = None

As per coding guidelines, "Type annotations are required in refactron/ with mypy disallow_untyped_defs = true enabled."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def __init__(self, project_root: Optional[Path] = None):

self.project_root = project_root

self._test_file_cache: Optional[Dict[str, List[Path]]] = None

self._all_test_files: Optional[List[Path]] = None

def __init__(self, project_root: Optional[Path] = None) -> None:

self.project_root = project_root

self._test_file_cache: Optional[Dict[str, List[Path]]] = None

self._all_test_files: Optional[List[Path]] = None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@refactron/verification/checks/test_gate.py` around lines 21 - 24, The __init__ method in the TestGate class is missing an explicit return type; add an explicit "-> None" annotation to the __init__ signature (the constructor defined as def __init__(...)) to comply with the typed-def policy and mypy disallow_untyped_defs; update the signature only (no body changes) so the method becomes explicitly typed.

coderabbitai · 2026-04-07T13:08:17Z

refactron/verification/checks/test_gate.py

        module_name = file_path.stem
        search_root = self.project_root or file_path.parent


⚠️ Potential issue | 🟠 Major

Cache key should include module path, not only file stem.

At Line 112, using file_path.stem as the cache key can collide across packages (a/utils.py vs b/utils.py),
causing incorrect reuse from Line 131 and stale writes at Line 145.

Proposed fix

module_name = file_path.stem + module_cache_key = str(file_path.resolve()) search_root = self.project_root or file_path.parent @@ - if module_name in self._test_file_cache: - return self._test_file_cache[module_name] + if module_cache_key in self._test_file_cache: + return self._test_file_cache[module_cache_key] @@ - self._test_file_cache[module_name] = test_files + self._test_file_cache[module_cache_key] = test_files return test_files

Also applies to: 131-132, 145-146

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@refactron/verification/checks/test_gate.py` around lines 112 - 113, The cache key currently uses module_name = file_path.stem which can collide across packages; instead compute a module-scoped identifier based on the file path relative to the search_root/project_root (use search_root or self.project_root to derive file_path.relative_to(search_root) and strip the suffix, converting separators to a dotted module path or a normalized path string) and replace all uses of module_name in the cache lookup/write sites (the cache read around where reuse occurs and the cache write where stale writes happen) to use this new module_path; also defend against files not under the search_root by falling back to file_path.as_posix() to avoid exceptions.

coderabbitai · 2026-04-07T13:08:17Z

refactron/verification/checks/test_gate.py

+        if self._test_file_cache is None:
+            self._test_file_cache = {}
+            self._all_test_files = []
+


⚠️ Potential issue | 🟡 Minor

Remove trailing whitespace to unblock pre-commit.

Lines around Line 118/122/130/144 include trailing spaces, and pre-commit already failed on this hook.

Also applies to: 122-122, 130-130, 144-144

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@refactron/verification/checks/test_gate.py` at line 118, Remove the trailing whitespace in the test file (test_gate.py) that is causing the pre-commit hook to fail: open the file and delete the trailing spaces at the ends of the affected lines, ensure your editor is set to trim trailing whitespace on save (or run a trim command), then re-run the pre-commit hooks (or git commit) to verify the hook passes before pushing.

coderabbitai · 2026-04-07T13:08:17Z

refactron/verification/checks/test_gate.py

+            test_dirs = [d for d in [search_root / "tests", search_root / "test"] if d.is_dir()]
+            search_dirs = test_dirs if test_dirs else [search_root]


⚠️ Potential issue | 🟠 Major

Discovery narrowing can miss valid tests in mixed layouts.

At Line 120, when tests/ or test/ exists, discovery skips all other directories. Repos with colocated tests
(e.g., pkg/foo_test.py) will be silently excluded, which can let regressions pass verification.

Safer fallback pattern (keep fast path, avoid false negatives)

test_dirs = [d for d in [search_root / "tests", search_root / "test"] if d.is_dir()] search_dirs = test_dirs if test_dirs else [search_root] @@ test_files: List[Path] = [] for py_file in self._all_test_files: # type: ignore @@ if self._imports_module(source, module_name): test_files.append(py_file) except Exception: continue + + # Mixed-layout fallback: if canonical test dirs exist but yielded nothing relevant, + # do one broader pass to catch colocated tests. + if not test_files and test_dirs: + for py_file in search_root.rglob("*.py"): + if any(excluded in py_file.parts for excluded in excluded_dirs): + continue + name = py_file.name + if not (name.startswith("test_") or name.endswith("_test.py")): + continue + try: + source = py_file.read_text(encoding="utf-8") + if self._imports_module(source, module_name): + test_files.append(py_file) + except Exception: + continue

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@refactron/verification/checks/test_gate.py` around lines 119 - 120, The current discovery narrows search_dirs to only test_dirs when a tests/ or test/ exists, which misses colocated tests; modify the logic in test_gate.py so that search_dirs includes both the found test directories and the repository root for fallback (e.g., make search_dirs = test_dirs + [search_root] with deduplication/ordering) instead of replacing the root entirely, ensuring colocated tests like pkg/foo_test.py are still discovered while retaining the fast-path of scanning tests/ or test/.

perf: Cache or narrow TestSuiteGate test discovery (avoid full rglob …

42e9694

…per verify)

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Cache or narrow TestSuiteGate test discovery (avoid full rglob …#159

perf: Cache or narrow TestSuiteGate test discovery (avoid full rglob …#159
shrutu0929 wants to merge 1 commit intoRefactron-ai:mainfrom
shrutu0929:perf/test-suite-gate-discovery

shrutu0929 commented Apr 7, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 7, 2026

Uh oh!

coderabbitai bot Apr 7, 2026

Uh oh!

coderabbitai bot Apr 7, 2026

Uh oh!

coderabbitai bot Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		module_name = file_path.stem
		search_root = self.project_root or file_path.parent

		test_dirs = [d for d in [search_root / "tests", search_root / "test"] if d.is_dir()]
		search_dirs = test_dirs if test_dirs else [search_root]

Conversation

shrutu0929 commented Apr 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shrutu0929 commented Apr 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 7, 2026 •

edited

Loading