Skip to content

fix(loader): repo-relative ignore matching + deterministic local defaults#12

Closed
Bbowlby22 wants to merge 1 commit intoHKUDS:mainfrom
Bbowlby22:omnilore-pinned-fixes
Closed

fix(loader): repo-relative ignore matching + deterministic local defaults#12
Bbowlby22 wants to merge 1 commit intoHKUDS:mainfrom
Bbowlby22:omnilore-pinned-fixes

Conversation

@Bbowlby22
Copy link
Copy Markdown

@Bbowlby22 Bbowlby22 commented Feb 27, 2026

Summary

  • fix ignore pattern matching to use repo-relative paths in RepositoryLoader
  • avoid indexing generated/runtime directories in large workspaces
  • switch retrieval/query enhancement defaults to deterministic local behavior for stability

Why

OmniLore-scale repositories were being over-indexed due to absolute-path matching misses (e.g. .venv/, output/, docs/). This patch makes ignore behavior deterministic and reduces indexing noise/cost.

Validation

  • local compile check for fastcode/loader.py
  • YAML parse check for config/config.yaml
  • verified stable indexing behavior in local OmniLore FastCode integration

Notes

This PR originates from Bbowlby22/FastCode:omnilore-pinned-fixes and is currently pinned in OmniLore submodule for reproducibility.

Copilot AI review requested due to automatic review settings February 27, 2026 06:37
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes ignore pattern matching in the repository loader and adjusts default configuration for deterministic local behavior. The core fix changes directory filtering to use repo-relative paths instead of absolute paths, which was causing gitignore-style patterns to fail to match. The PR also adds extensive OmniLore-specific ignore patterns and disables LLM-enhanced features by default.

Changes:

  • Fixed ignore pattern matching to use repo-relative paths in directory filtering logic
  • Added comprehensive ignore patterns for OmniLore workspace directories (build artifacts, caches, virtual environments)
  • Removed documentation file extensions (.md, .txt, .rst, .json, .html, .css, .xml) from supported_extensions
  • Disabled LLM enhancement features (agency_mode, use_llm_enhancement) for deterministic local behavior

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
fastcode/loader.py Fixed directory filtering to use repo-relative paths with normalize_path, enabling proper gitignore-style pattern matching; removed trailing empty line
config/config.yaml Added extensive OmniLore-specific ignore patterns; removed documentation/markup file extensions; disabled LLM features for deterministic behavior
Comments suppressed due to low confidence (2)

config/config.yaml:64

  • The ignore pattern list includes both directory names (e.g., ".venv") and wildcard patterns (e.g., ".venv/"). In gitignore semantics, ".venv" matches the directory itself, while ".venv/" matches its contents. For effective directory pruning during os.walk, you typically want to match the directory name itself (without trailing slash), which is why line 208 checks rel_dir_path. However, some patterns in the list like "output/" are designed to match contents but not necessarily prevent traversal into "output". Ensure that your ignore patterns are correctly formatted for your intended behavior—if you want to skip traversing into "output", the pattern should be "output" or "output/", not just "output/".
    - "output"
    - "output/*"
    - "out"
    - "out/*"
    - "reports"
    - "reports/*"
    - "docs"
    - "docs/*"
    - "external"
    - "external/*"
    - "node_modules/*"
    - ".mypy_cache"
    - ".mypy_cache/*"
    - ".pytest_cache"
    - ".pytest_cache/*"
    - ".hypothesis"
    - ".hypothesis/*"
    - "htmlcov"
    - "htmlcov/*"
    - "logs"
    - "logs/*"
    - "artifacts"
    - "artifacts/*"
    - "archives"
    - "archives/*"
    - "data"
    - "data/*"
    - "chroma"
    - "chroma/*"

fastcode/loader.py:206

  • The use of os.path.relpath without validation could potentially return paths starting with ".." if a symlink or other mechanism allows accessing files outside the repo_path. While os.walk typically won't follow symlinks by default, consider adding validation to ensure rel_dir_path doesn't start with ".." as a defense-in-depth measure, especially since this affects security-sensitive ignore pattern matching.
                rel_dir_path = normalize_path(
                    os.path.relpath(abs_dir_path, self.repo_path)
                )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread config/config.yaml
- .toml
- .md
- .txt
- .yaml
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of documentation and markup file extensions (.md, .txt, .rst, .json, .html, .css, .xml) is a potentially breaking change that will prevent these file types from being indexed. This appears intentional based on the PR description's focus on "avoiding indexing generated/runtime directories," but it also excludes important documentation files like README.md and configuration files like package.json. Consider whether this tradeoff aligns with the intended use case, especially if users expect documentation files to be searchable.

Suggested change
- .yaml
- .yaml
- .md
- .txt
- .rst
- .json
- .html
- .css
- .xml

Copilot uses AI. Check for mistakes.
Comment thread fastcode/loader.py
Comment on lines +208 to +210
if should_ignore_path(rel_dir_path, self.ignore_patterns) or should_ignore_path(
rel_dir_with_trailing, self.ignore_patterns
):
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The directory filtering now calls should_ignore_path twice per directory. Since should_ignore_path creates a new PathSpec object from ignore_patterns on each call (see utils.py:81-87), this doubles the PathSpec creation overhead. For repositories with many directories and long ignore pattern lists, this could impact performance. Consider refactoring should_ignore_path to accept a pre-compiled PathSpec, or create the PathSpec once at the class level and reuse it throughout the scan.

Suggested change
if should_ignore_path(rel_dir_path, self.ignore_patterns) or should_ignore_path(
rel_dir_with_trailing, self.ignore_patterns
):
# Use the trailing-slash form to correctly match directory patterns
# like "output/" while avoiding redundant should_ignore_path calls.
if should_ignore_path(rel_dir_with_trailing, self.ignore_patterns):

Copilot uses AI. Check for mistakes.
Comment thread config/config.yaml
Comment on lines +42 to +45
- "docs"
- "docs/*"
- "external"
- "external/*"
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ignore pattern "docs" (line 42) will match any directory or file named "docs" anywhere in the repository tree, which may be overly broad. This could unintentionally skip important documentation directories in subdirectories. Consider using "docs/*" or "/docs" (with a leading slash for root-level only) if you only want to ignore a specific docs directory. The same applies to other broad patterns like "data", "logs", "external", etc.

Copilot uses AI. Check for mistakes.
@Bbowlby22
Copy link
Copy Markdown
Author

Superseded by #13, which is rebased on current main and narrowed to the loader-only bug fix. Closing this broader/conflicting version to keep review clean.

@Bbowlby22 Bbowlby22 closed this Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants