fix(loader): repo-relative ignore matching + deterministic local defaults#12
fix(loader): repo-relative ignore matching + deterministic local defaults#12Bbowlby22 wants to merge 1 commit intoHKUDS:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request fixes ignore pattern matching in the repository loader and adjusts default configuration for deterministic local behavior. The core fix changes directory filtering to use repo-relative paths instead of absolute paths, which was causing gitignore-style patterns to fail to match. The PR also adds extensive OmniLore-specific ignore patterns and disables LLM-enhanced features by default.
Changes:
- Fixed ignore pattern matching to use repo-relative paths in directory filtering logic
- Added comprehensive ignore patterns for OmniLore workspace directories (build artifacts, caches, virtual environments)
- Removed documentation file extensions (.md, .txt, .rst, .json, .html, .css, .xml) from supported_extensions
- Disabled LLM enhancement features (agency_mode, use_llm_enhancement) for deterministic local behavior
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| fastcode/loader.py | Fixed directory filtering to use repo-relative paths with normalize_path, enabling proper gitignore-style pattern matching; removed trailing empty line |
| config/config.yaml | Added extensive OmniLore-specific ignore patterns; removed documentation/markup file extensions; disabled LLM features for deterministic behavior |
Comments suppressed due to low confidence (2)
config/config.yaml:64
- The ignore pattern list includes both directory names (e.g., ".venv") and wildcard patterns (e.g., ".venv/"). In gitignore semantics, ".venv" matches the directory itself, while ".venv/" matches its contents. For effective directory pruning during os.walk, you typically want to match the directory name itself (without trailing slash), which is why line 208 checks rel_dir_path. However, some patterns in the list like "output/" are designed to match contents but not necessarily prevent traversal into "output". Ensure that your ignore patterns are correctly formatted for your intended behavior—if you want to skip traversing into "output", the pattern should be "output" or "output/", not just "output/".
- "output"
- "output/*"
- "out"
- "out/*"
- "reports"
- "reports/*"
- "docs"
- "docs/*"
- "external"
- "external/*"
- "node_modules/*"
- ".mypy_cache"
- ".mypy_cache/*"
- ".pytest_cache"
- ".pytest_cache/*"
- ".hypothesis"
- ".hypothesis/*"
- "htmlcov"
- "htmlcov/*"
- "logs"
- "logs/*"
- "artifacts"
- "artifacts/*"
- "archives"
- "archives/*"
- "data"
- "data/*"
- "chroma"
- "chroma/*"
fastcode/loader.py:206
- The use of os.path.relpath without validation could potentially return paths starting with ".." if a symlink or other mechanism allows accessing files outside the repo_path. While os.walk typically won't follow symlinks by default, consider adding validation to ensure rel_dir_path doesn't start with ".." as a defense-in-depth measure, especially since this affects security-sensitive ignore pattern matching.
rel_dir_path = normalize_path(
os.path.relpath(abs_dir_path, self.repo_path)
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - .toml | ||
| - .md | ||
| - .txt | ||
| - .yaml |
There was a problem hiding this comment.
The removal of documentation and markup file extensions (.md, .txt, .rst, .json, .html, .css, .xml) is a potentially breaking change that will prevent these file types from being indexed. This appears intentional based on the PR description's focus on "avoiding indexing generated/runtime directories," but it also excludes important documentation files like README.md and configuration files like package.json. Consider whether this tradeoff aligns with the intended use case, especially if users expect documentation files to be searchable.
| - .yaml | |
| - .yaml | |
| - .md | |
| - .txt | |
| - .rst | |
| - .json | |
| - .html | |
| - .css | |
| - .xml |
| if should_ignore_path(rel_dir_path, self.ignore_patterns) or should_ignore_path( | ||
| rel_dir_with_trailing, self.ignore_patterns | ||
| ): |
There was a problem hiding this comment.
The directory filtering now calls should_ignore_path twice per directory. Since should_ignore_path creates a new PathSpec object from ignore_patterns on each call (see utils.py:81-87), this doubles the PathSpec creation overhead. For repositories with many directories and long ignore pattern lists, this could impact performance. Consider refactoring should_ignore_path to accept a pre-compiled PathSpec, or create the PathSpec once at the class level and reuse it throughout the scan.
| if should_ignore_path(rel_dir_path, self.ignore_patterns) or should_ignore_path( | |
| rel_dir_with_trailing, self.ignore_patterns | |
| ): | |
| # Use the trailing-slash form to correctly match directory patterns | |
| # like "output/" while avoiding redundant should_ignore_path calls. | |
| if should_ignore_path(rel_dir_with_trailing, self.ignore_patterns): |
| - "docs" | ||
| - "docs/*" | ||
| - "external" | ||
| - "external/*" |
There was a problem hiding this comment.
The ignore pattern "docs" (line 42) will match any directory or file named "docs" anywhere in the repository tree, which may be overly broad. This could unintentionally skip important documentation directories in subdirectories. Consider using "docs/*" or "/docs" (with a leading slash for root-level only) if you only want to ignore a specific docs directory. The same applies to other broad patterns like "data", "logs", "external", etc.
|
Superseded by #13, which is rebased on current main and narrowed to the loader-only bug fix. Closing this broader/conflicting version to keep review clean. |
Summary
RepositoryLoaderWhy
OmniLore-scale repositories were being over-indexed due to absolute-path matching misses (e.g.
.venv/,output/,docs/). This patch makes ignore behavior deterministic and reduces indexing noise/cost.Validation
fastcode/loader.pyconfig/config.yamlNotes
This PR originates from
Bbowlby22/FastCode:omnilore-pinned-fixesand is currently pinned in OmniLore submodule for reproducibility.