Skip to content

perf: SymbolTableBuilder skips venv/git and supports incremental cache #149

@omsherikar

Description

@omsherikar

Scope

refactron/analysis/symbol_table.pySymbolTableBuilder.build_for_project

Problem

  • python_files = list(project_root.rglob("*.py")) walks the entire tree with no excludes, so large trees (e.g. .venv, node_modules if misnamed, copied vendor trees) inflate work.
  • When cache_dir is set and cache loads, the code returns the cached table and never refreshes; the TODO at line ~127 acknowledges missing incremental updates, so users either get stale symbols or pay full rebuild cost after deleting cache.

Suggested direction

  • Reuse the same directory filtering pattern as RAGIndexer.index_repository (exclude .git, .venv, venv, __pycache__, etc.) or centralize "discover Python sources" in one helper.
  • Incremental cache: track per-file hashes/mtimes in metadata; merge updated files into the in-memory SymbolTable instead of all-or-nothing reload.

Acceptance

  • Medium/large fixtures with an artificial venv/ tree no longer dominate symbol build time.
  • Documented behavior when cache is partial vs full rebuild.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions