perf: Reuse CodeChunker in RAGIndexer._index_file by shrutu0929 · Pull Request #158 · Refactron-ai/Refactron_lib

shrutu0929 · 2026-04-07T12:53:39Z

solve #152
The recent optimization targets a structural inefficiency found during large-scale repository indexing, where a fresh instance of the CodeChunker class was needlessly being rebuilt for every single file processed by the system. Since the chunker operates generally statelessly—aside from initially binding to the underlying CodeParser—we completely eliminated this deep-loop overhead by hoisting the instantiation out of the _index_file method and statically caching it just once during the overarching RAGIndexer initialization. Now, the RAGIndexer sustainably reuses its internal self.chunker mechanism to dissect files consecutively across the entire directory crawl. This dramatically reduces garbage collection workload and redundant memory reallocation per file iteration, streamlining overall traversal durations while remaining entirely functionally identical for all existing tests and processes.

Summary by CodeRabbit

Refactor
- Optimized the file indexing process for improved performance.

coderabbitai · 2026-04-07T12:56:58Z

📝 Walkthrough

Walkthrough

RAGIndexer refactored to initialize a persistent CodeChunker instance once during construction instead of creating a new instance per file. The _index_file method now reuses the stored instance, eliminating redundant allocations while preserving existing control flow.

Changes

Cohort / File(s)	Summary
CodeChunker Initialization Optimization `refactron/rag/indexer.py`	Moved `CodeChunker` initialization from `_index_file` (per-file creation) to `RAGIndexer.__init__` (single persistent instance). Removed local import and instantiation logic; `_index_file` now reuses `self.chunker`.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related issues

perf: Reuse CodeChunker in RAGIndexer._index_file #152: The persistent CodeChunker initialization strategy directly implements the optimization goal of avoiding per-file CodeChunker allocation described in this issue.

Poem

🐰 A bunny hops with glee,
One chunker now, not three!
Reuse and cache with care,
Less work, more time to spare. 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main optimization: reusing a single CodeChunker instance instead of creating one per file, which aligns perfectly with the changeset that hoists CodeChunker instantiation to RAGIndexer.init and stores it on self.chunker.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

refactron/rag/indexer.py (1)

92-93: Consider moving the import to the module level.

CodeChunk is already imported from refactron.rag.chunker at line 22. Consolidating CodeChunker into that same import statement improves readability and follows the existing pattern in this file.

Suggested refactor

At line 22:

-from refactron.rag.chunker import CodeChunk
+from refactron.rag.chunker import CodeChunk, CodeChunker

At lines 92-93:

         self.parser = CodeParser()
-        from refactron.rag.chunker import CodeChunker
         self.chunker = CodeChunker(self.parser)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@refactron/rag/indexer.py` around lines 92 - 93, The local import of
CodeChunker inside the constructor should be moved to the module-level import
alongside the existing CodeChunk import from refactron.rag.chunker; update the
top-of-file import statement to include CodeChunker and remove the inline "from
refactron.rag.chunker import CodeChunker" near the CodeIndexer/constructor,
leaving "self.chunker = CodeChunker(self.parser)" intact so the class still
instantiates the chunker.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@refactron/rag/indexer.py`:
- Around line 92-93: The local import of CodeChunker inside the constructor
should be moved to the module-level import alongside the existing CodeChunk
import from refactron.rag.chunker; update the top-of-file import statement to
include CodeChunker and remove the inline "from refactron.rag.chunker import
CodeChunker" near the CodeIndexer/constructor, leaving "self.chunker =
CodeChunker(self.parser)" intact so the class still instantiates the chunker.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9193efb1-7ada-4db4-bae5-c95147cacaf9

📥 Commits

Reviewing files that changed from the base of the PR and between a9659f5 and c843c8c.

📒 Files selected for processing (1)

refactron/rag/indexer.py

perf: Reuse CodeChunker in RAGIndexer._index_file

c843c8c

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Reuse CodeChunker in RAGIndexer._index_file#158

perf: Reuse CodeChunker in RAGIndexer._index_file#158
shrutu0929 wants to merge 1 commit intoRefactron-ai:mainfrom
shrutu0929:perf/reuse-codechunker

shrutu0929 commented Apr 7, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

Review ran into problems

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shrutu0929 commented Apr 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

Review ran into problems

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shrutu0929 commented Apr 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 7, 2026 •

edited

Loading