Skip to content

perf: Cache or narrow TestSuiteGate test discovery (avoid full rglob …#159

Open
shrutu0929 wants to merge 1 commit intoRefactron-ai:mainfrom
shrutu0929:perf/test-suite-gate-discovery
Open

perf: Cache or narrow TestSuiteGate test discovery (avoid full rglob …#159
shrutu0929 wants to merge 1 commit intoRefactron-ai:mainfrom
shrutu0929:perf/test-suite-gate-discovery

Conversation

@shrutu0929
Copy link
Copy Markdown

@shrutu0929 shrutu0929 commented Apr 7, 2026

solve #153
The latest update aggressively drastically optimizes the time TestSuiteGate spends dynamically discovering test coverage bindings, fixing a significant performance bottleneck scaling heavily with overall project sizes. Previously, the verification check blindly executed a full recursive search (rglob("*.py")) across the entire workspace structure for every single fix candidate, constantly rescanning bloated internal dependency folders (like .venv or node_modules) and redundantly parsing AST layouts on unchanged assets. We solved this by instituting three layers of restriction: first, we instituted strict semantic path exclusions to bypass generic, noisy directories entirely; second, the search mechanism now explicitly pivots to prioritize established structural endpoints like tests/ or test/; finally, we configured a persisting dictionary state-cache at the class level. By securely storing the mapped outcome of which files internally import which module branches, subsequent checks against identical files immediately short-circuit without re-triggering disk I/O, ensuring verification iteration speeds reliably parallel your actual test scope density instead of being bogged down by sheer repository weight.

Summary by CodeRabbit

Release Notes

  • Refactor
    • Test discovery and caching mechanisms have been enhanced for improved performance. The verification system now intelligently caches discovered test files and test results per module, eliminating redundant file system scans. This optimization improves the overall efficiency of verification check processes and reduces scan times for projects with multiple test suites.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

The TestSuiteGate class now implements caching for test file discovery and per-module relevant test results. The first call to _find_relevant_tests scans designated test directories for Python files matching test naming conventions while excluding specified directories, with subsequent calls reusing cached results.

Changes

Cohort / File(s) Summary
Test File Discovery Caching
refactron/verification/checks/test_gate.py
Implements caching of discovered test files and per-module test results in TestSuiteGate._find_relevant_tests. Adds _all_test_files cache built on first call by scanning tests/ and test/ directories (or search_root fallback), filtering for test_*.py / *_test.py patterns while excluding directories like .git, __pycache__, venv, and node_modules. Subsequent calls reuse _test_file_cache[module_name] to avoid redundant test file lookup.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Poem

🐰 A rabbit hops through test files fast,
With caches built to make them last,
No .git or venv slows the way,
Tests are found in half the day! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and clearly describes the main performance improvement: caching and narrowing test discovery in TestSuiteGate to avoid full rglob scans.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@refactron/verification/checks/test_gate.py`:
- Line 118: Remove the trailing whitespace in the test file (test_gate.py) that
is causing the pre-commit hook to fail: open the file and delete the trailing
spaces at the ends of the affected lines, ensure your editor is set to trim
trailing whitespace on save (or run a trim command), then re-run the pre-commit
hooks (or git commit) to verify the hook passes before pushing.
- Around line 112-113: The cache key currently uses module_name = file_path.stem
which can collide across packages; instead compute a module-scoped identifier
based on the file path relative to the search_root/project_root (use search_root
or self.project_root to derive file_path.relative_to(search_root) and strip the
suffix, converting separators to a dotted module path or a normalized path
string) and replace all uses of module_name in the cache lookup/write sites (the
cache read around where reuse occurs and the cache write where stale writes
happen) to use this new module_path; also defend against files not under the
search_root by falling back to file_path.as_posix() to avoid exceptions.
- Around line 21-24: The __init__ method in the TestGate class is missing an
explicit return type; add an explicit "-> None" annotation to the __init__
signature (the constructor defined as def __init__(...)) to comply with the
typed-def policy and mypy disallow_untyped_defs; update the signature only (no
body changes) so the method becomes explicitly typed.
- Around line 119-120: The current discovery narrows search_dirs to only
test_dirs when a tests/ or test/ exists, which misses colocated tests; modify
the logic in test_gate.py so that search_dirs includes both the found test
directories and the repository root for fallback (e.g., make search_dirs =
test_dirs + [search_root] with deduplication/ordering) instead of replacing the
root entirely, ensuring colocated tests like pkg/foo_test.py are still
discovered while retaining the fast-path of scanning tests/ or test/.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9ef1b33d-49fc-433e-92ef-dc73a802ab3a

📥 Commits

Reviewing files that changed from the base of the PR and between a9659f5 and 42e9694.

📒 Files selected for processing (1)
  • refactron/verification/checks/test_gate.py

Comment on lines 21 to +24
def __init__(self, project_root: Optional[Path] = None):
self.project_root = project_root
self._test_file_cache: Optional[Dict[str, List[Path]]] = None
self._all_test_files: Optional[List[Path]] = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add explicit -> None on __init__ to satisfy typed-def policy.

At Line 21, __init__ is missing an explicit return type, which violates the typed-def rule under refactron/.

Proposed fix
-    def __init__(self, project_root: Optional[Path] = None):
+    def __init__(self, project_root: Optional[Path] = None) -> None:
         self.project_root = project_root
         self._test_file_cache: Optional[Dict[str, List[Path]]] = None
         self._all_test_files: Optional[List[Path]] = None

As per coding guidelines, "Type annotations are required in refactron/ with mypy disallow_untyped_defs = true enabled."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __init__(self, project_root: Optional[Path] = None):
self.project_root = project_root
self._test_file_cache: Optional[Dict[str, List[Path]]] = None
self._all_test_files: Optional[List[Path]] = None
def __init__(self, project_root: Optional[Path] = None) -> None:
self.project_root = project_root
self._test_file_cache: Optional[Dict[str, List[Path]]] = None
self._all_test_files: Optional[List[Path]] = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/verification/checks/test_gate.py` around lines 21 - 24, The
__init__ method in the TestGate class is missing an explicit return type; add an
explicit "-> None" annotation to the __init__ signature (the constructor defined
as def __init__(...)) to comply with the typed-def policy and mypy
disallow_untyped_defs; update the signature only (no body changes) so the method
becomes explicitly typed.

Comment on lines 112 to 113
module_name = file_path.stem
search_root = self.project_root or file_path.parent
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Cache key should include module path, not only file stem.

At Line 112, using file_path.stem as the cache key can collide across packages (a/utils.py vs b/utils.py),
causing incorrect reuse from Line 131 and stale writes at Line 145.

Proposed fix
         module_name = file_path.stem
+        module_cache_key = str(file_path.resolve())
         search_root = self.project_root or file_path.parent
@@
-        if module_name in self._test_file_cache:
-            return self._test_file_cache[module_name]
+        if module_cache_key in self._test_file_cache:
+            return self._test_file_cache[module_cache_key]
@@
-        self._test_file_cache[module_name] = test_files
+        self._test_file_cache[module_cache_key] = test_files
         return test_files

Also applies to: 131-132, 145-146

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/verification/checks/test_gate.py` around lines 112 - 113, The cache
key currently uses module_name = file_path.stem which can collide across
packages; instead compute a module-scoped identifier based on the file path
relative to the search_root/project_root (use search_root or self.project_root
to derive file_path.relative_to(search_root) and strip the suffix, converting
separators to a dotted module path or a normalized path string) and replace all
uses of module_name in the cache lookup/write sites (the cache read around where
reuse occurs and the cache write where stale writes happen) to use this new
module_path; also defend against files not under the search_root by falling back
to file_path.as_posix() to avoid exceptions.

if self._test_file_cache is None:
self._test_file_cache = {}
self._all_test_files = []

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove trailing whitespace to unblock pre-commit.

Lines around Line 118/122/130/144 include trailing spaces, and pre-commit already failed on this hook.

Also applies to: 122-122, 130-130, 144-144

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/verification/checks/test_gate.py` at line 118, Remove the trailing
whitespace in the test file (test_gate.py) that is causing the pre-commit hook
to fail: open the file and delete the trailing spaces at the ends of the
affected lines, ensure your editor is set to trim trailing whitespace on save
(or run a trim command), then re-run the pre-commit hooks (or git commit) to
verify the hook passes before pushing.

Comment on lines +119 to +120
test_dirs = [d for d in [search_root / "tests", search_root / "test"] if d.is_dir()]
search_dirs = test_dirs if test_dirs else [search_root]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Discovery narrowing can miss valid tests in mixed layouts.

At Line 120, when tests/ or test/ exists, discovery skips all other directories. Repos with colocated tests
(e.g., pkg/foo_test.py) will be silently excluded, which can let regressions pass verification.

Safer fallback pattern (keep fast path, avoid false negatives)
             test_dirs = [d for d in [search_root / "tests", search_root / "test"] if d.is_dir()]
             search_dirs = test_dirs if test_dirs else [search_root]
@@
         test_files: List[Path] = []
         for py_file in self._all_test_files:  # type: ignore
@@
                 if self._imports_module(source, module_name):
                     test_files.append(py_file)
             except Exception:
                 continue
+
+        # Mixed-layout fallback: if canonical test dirs exist but yielded nothing relevant,
+        # do one broader pass to catch colocated tests.
+        if not test_files and test_dirs:
+            for py_file in search_root.rglob("*.py"):
+                if any(excluded in py_file.parts for excluded in excluded_dirs):
+                    continue
+                name = py_file.name
+                if not (name.startswith("test_") or name.endswith("_test.py")):
+                    continue
+                try:
+                    source = py_file.read_text(encoding="utf-8")
+                    if self._imports_module(source, module_name):
+                        test_files.append(py_file)
+                except Exception:
+                    continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/verification/checks/test_gate.py` around lines 119 - 120, The
current discovery narrows search_dirs to only test_dirs when a tests/ or test/
exists, which misses colocated tests; modify the logic in test_gate.py so that
search_dirs includes both the found test directories and the repository root for
fallback (e.g., make search_dirs = test_dirs + [search_root] with
deduplication/ordering) instead of replacing the root entirely, ensuring
colocated tests like pkg/foo_test.py are still discovered while retaining the
fast-path of scanning tests/ or test/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant