From 457520d33a6f58513392ae2b667744ae2531d3e4 Mon Sep 17 00:00:00 2001 From: Stephen Bryant Date: Fri, 6 Feb 2026 12:40:49 +0100 Subject: [PATCH 1/7] Add documentation for .otignore feature (OT-1318) Comprehensive documentation for the .otignore and .opentraceignore files that allow users to exclude files and directories from OpenTrace analysis. ## Changes - Add docs/otignore.md with complete feature documentation - Update mkdocs.yml navigation to include "Excluding Files" section ## Documentation Includes - Overview and comparison with .gitignore - Quick start guide - Pattern syntax reference with examples - Common use cases (generated code, dependencies, test fixtures) - Troubleshooting guide - Technical implementation details - Best practices ## Verification All documentation verified against implementation in: - insight-agent/src/source_analyzers/sources/otignore.py - insight-agent/src/source_analyzers/sources/directory/parser.py - insight-agent/src/source_analyzers/sources/code/parser.py - Unit and integration tests Relates to: OT-1318, OT-1364 Co-Authored-By: Claude Sonnet 4.5 --- docs/otignore.md | 345 +++++++++++++++++++++++++++++++++++++++++++++++ mkdocs.yml | 1 + 2 files changed, 346 insertions(+) create mode 100644 docs/otignore.md diff --git a/docs/otignore.md b/docs/otignore.md new file mode 100644 index 0000000..ac9b995 --- /dev/null +++ b/docs/otignore.md @@ -0,0 +1,345 @@ +# Excluding Files with .otignore + +## Overview + +OpenTrace analyzes your repository to build a knowledge graph of your codebase. While this analysis is valuable, you may want to exclude certain files or directories from being analyzed. The `.otignore` file allows you to specify which files and directories OpenTrace should skip during analysis. + +### Why Use .otignore? + +Common scenarios for using `.otignore`: + +- **Generated code**: Protobuf files, OpenAPI specs, or code generated by tools +- **Third-party dependencies**: Vendor directories, node_modules, or other package directories +- **Build artifacts**: Compiled code, distribution directories, or temporary build files +- **Large binary files**: Archives, images, or other non-code assets +- **Performance optimization**: Reduce analysis time for large repositories + +### Difference from .gitignore + +`.otignore` serves a different purpose than `.gitignore`: + +- **`.gitignore`**: Controls which files Git tracks and commits +- **`.otignore`**: Controls which *committed* files OpenTrace analyzes + +Since OpenTrace only has access to files that are already committed to your repository, `.otignore` helps you exclude committed files that you don't want analyzed (like vendored dependencies or generated code that you choose to commit). + +## Quick Start + +To start using `.otignore`: + +1. Create a file named `.otignore` (or `.opentraceignore`) in your repository root +2. Add patterns for files or directories to exclude (uses gitignore syntax) +3. Commit the file to your repository +4. The next OpenTrace analysis will automatically respect your exclusions + +**Example `.otignore` file:** + +``` +# Exclude generated protobuf code +*.pb.go +*.pb.py + +# Exclude vendored dependencies +/vendor/ +/node_modules/ + +# Exclude build outputs +/dist/ +/build/ +``` + +## Configuration + +### Supported Filenames + +OpenTrace recognizes two filenames for ignore patterns: + +- `.otignore` (recommended) +- `.opentraceignore` (alternative) + +If both files exist, patterns from both will be applied. Patterns from `.otignore` are loaded first, followed by patterns from `.opentraceignore`. + +### File Location + +The ignore file must be placed in the **repository root directory**. Ignore files in subdirectories are not currently supported. + +## Pattern Syntax + +`.otignore` uses the same pattern syntax as `.gitignore`, powered by the `pathspec` library. This ensures familiar and predictable behavior. + +### Basic Patterns + +| Pattern | Matches | +|---------|---------| +| `file.txt` | Specific file named `file.txt` in any directory | +| `*.log` | All files ending with `.log` | +| `test` | Any file or directory named `test` | + +### Directory Patterns + +| Pattern | Matches | +|---------|---------| +| `logs/` | Directory named `logs` (trailing slash required) | +| `/build/` | Directory named `build` only in repository root | +| `**/temp/` | Directory named `temp` at any depth | + +### Wildcards + +| Wildcard | Meaning | +|----------|---------| +| `*` | Matches any characters except `/` | +| `**` | Matches any characters including `/` (any depth) | +| `?` | Matches exactly one character | + +**Examples:** + +``` +# Match all .log files +*.log + +# Match all files in any __pycache__ directory +**/__pycache__/** + +# Match .js files in any test directory +**/test/*.js +``` + +### Negation Patterns + +Use `!` to re-include files that were previously excluded: + +``` +# Exclude all .txt files +*.txt + +# But include this important one +!important.txt +``` + +### Comments + +Lines starting with `#` are treated as comments and ignored: + +``` +# This is a comment explaining the next pattern +*.tmp +``` + +## Examples + +### Example 1: Python Project + +``` +# Exclude Python generated files +**/__pycache__/ +*.pyc +*.pyo +*.pyd +.Python + +# Exclude virtual environments +/venv/ +/.venv/ +/env/ + +# Exclude generated protobuf code +*_pb2.py +*_pb2_grpc.py +``` + +### Example 2: JavaScript/TypeScript Project + +``` +# Exclude dependencies +/node_modules/ +/bower_components/ + +# Exclude build outputs +/dist/ +/build/ +/.next/ +/out/ + +# Exclude generated files +*.generated.ts +``` + +### Example 3: Go Project + +``` +# Exclude vendored dependencies +/vendor/ + +# Exclude generated protobuf code +*.pb.go + +# Exclude compiled binaries +/bin/ +*.exe +``` + +### Example 4: Monorepo + +``` +# Exclude all node_modules in monorepo +**/node_modules/ + +# Exclude all dist directories +**/dist/ + +# Exclude specific generated service code +services/api/generated/ +services/auth/generated/ + +# But include the schema definitions +!services/*/generated/schema.yaml +``` + +## Common Use Cases + +### Excluding Protobuf/OpenAPI Generated Code + +Generated API code often creates noise in analysis results: + +``` +# Protocol Buffers +*.pb.go +*.pb.py +*_pb2.py +*_pb2_grpc.py + +# OpenAPI/Swagger +/generated/openapi/ +**/swagger-generated/ +``` + +### Ignoring Vendored Dependencies + +Third-party code adds unnecessary complexity to your knowledge graph: + +``` +# Go vendor directory +/vendor/ + +# JavaScript/Node +/node_modules/ + +# Ruby gems +/vendor/bundle/ + +# Python packages +/site-packages/ +``` + +### Excluding Test Fixtures and Mocks + +Large test data files can slow down analysis: + +``` +# Test fixtures +**/fixtures/ +**/testdata/ + +# Mock data +**/mocks/ +**/__mocks__/ + +# But keep test code +!**/*_test.go +!**/*_test.py +``` + +### Performance Optimization for Large Repos + +For very large repositories, exclude non-essential directories: + +``` +# Documentation site builds +/docs/site/ +/docusaurus/build/ + +# IDE configurations (if committed) +/.vscode/ +/.idea/ + +# Large binary assets +/assets/images/ +/public/uploads/ +``` + +## Troubleshooting + +### Pattern Not Working? + +**Check your syntax:** +- Ensure patterns follow gitignore format +- Directory patterns need a trailing `/` +- Use `/` as the path separator (even on Windows) +- Test your patterns with a gitignore validator + +**Example issue:** +``` +# ❌ Won't match directories +node_modules + +# ✅ Correctly matches directories +node_modules/ +``` + +### File Still Appears in Analysis? + +**Verify the file is committed:** +- `.otignore` only excludes files that are already in your Git repository +- Check with `git ls-files | grep ` +- If the file isn't committed, OpenTrace won't see it anyway + +**Check file location:** +- Ensure `.otignore` is in the repository root, not a subdirectory +- Verify the file is named exactly `.otignore` or `.opentraceignore` + +### Changes Not Taking Effect? + +**Commit your .otignore file:** +- Changes only apply after the `.otignore` file is committed +- OpenTrace reads the ignore file from the committed repository +- Run `git add .otignore && git commit -m "Update otignore patterns"` + +**Wait for next analysis:** +- Changes apply to new analysis runs, not retroactively +- Trigger a new repository sync in OpenTrace to apply changes + +## Technical Details + +### Pattern Matching Implementation + +OpenTrace uses the `pathspec` library with gitignore-style pattern matching, ensuring behavior identical to `.gitignore` files. Patterns are applied to relative paths from the repository root. + +### Scope of Exclusions + +The `.otignore` file affects all Git-based integrations: +- GitHub repository analysis +- GitLab repository analysis +- Any future Git-based source integrations + +Files excluded by `.otignore` will not appear in: +- The knowledge graph +- Code search results +- Dependency analysis +- Investigation context + +### Performance Impact + +Using `.otignore` to exclude large directories can significantly improve: +- Analysis speed +- Memory usage during analysis +- Knowledge graph size and query performance + +For repositories with thousands of files, excluding generated code and dependencies can reduce analysis time by 50% or more. + +## Best Practices + +1. **Commit your .otignore early**: Add it when you first set up OpenTrace for your repository +2. **Start broad, refine later**: Begin by excluding obvious directories like `node_modules/`, then add more specific patterns as needed +3. **Document your patterns**: Use comments to explain why certain paths are excluded +4. **Review periodically**: As your repository evolves, update `.otignore` to reflect new generated code or dependencies +5. **Keep it simple**: Don't over-optimize - exclude only what meaningfully impacts analysis quality or performance diff --git a/mkdocs.yml b/mkdocs.yml index 28d1fbb..f8464a0 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -79,6 +79,7 @@ nav: - GitHub: integrations/github.md - GitLab: integrations/gitlab.md - AWS EKS (Early Access): integrations/aws-eks.md + - Excluding Files: otignore.md - What You Can Do: capabilities.md - Example Workflows: workflows.md - Privacy Policy: privacy-policy.md From 12fa2ddc38ffb487905d214bdaa345361732095e Mon Sep 17 00:00:00 2001 From: Stephen Bryant Date: Fri, 6 Feb 2026 12:54:00 +0100 Subject: [PATCH 2/7] Add links to .otignore documentation in GitHub and GitLab pages - Add reference to Excluding Files page in 'What Gets Synced' sections - Helps users discover how to exclude files from analysis --- docs/integrations/github.md | 2 ++ docs/integrations/gitlab.md | 2 ++ 2 files changed, 4 insertions(+) diff --git a/docs/integrations/github.md b/docs/integrations/github.md index 27f5042..613eec8 100644 --- a/docs/integrations/github.md +++ b/docs/integrations/github.md @@ -21,6 +21,8 @@ Connect GitHub to sync repository data and analyze code structure. - Code symbols and dependencies - Issues +You can exclude specific files or directories from analysis using [`.otignore` files](../otignore.md). + ## Permissions Required OpenTrace requests read-only access to: diff --git a/docs/integrations/gitlab.md b/docs/integrations/gitlab.md index eebc4b5..cf0336e 100644 --- a/docs/integrations/gitlab.md +++ b/docs/integrations/gitlab.md @@ -21,6 +21,8 @@ Connect GitLab to sync repository data and analyze code structure. - Code symbols and dependencies - Issues +You can exclude specific files or directories from analysis using [`.otignore` files](../otignore.md). + ## Self-Hosted GitLab For self-hosted GitLab instances, you may need to configure the GitLab URL in your OpenTrace settings before connecting. From f4af3287e9c86141e0acf0b4eaa4d72f29101055 Mon Sep 17 00:00:00 2001 From: Stephen Bryant Date: Fri, 6 Feb 2026 14:57:41 +0100 Subject: [PATCH 3/7] Changed order, placing 'ignore' doc next to the Git integration docs --- mkdocs.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mkdocs.yml b/mkdocs.yml index f8464a0..a387903 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -78,8 +78,8 @@ nav: - Data Sources: - GitHub: integrations/github.md - GitLab: integrations/gitlab.md + - Excluding Files from Data Sources: otignore.md - AWS EKS (Early Access): integrations/aws-eks.md - - Excluding Files: otignore.md - What You Can Do: capabilities.md - Example Workflows: workflows.md - Privacy Policy: privacy-policy.md From 1a7f89ff2ac1ce5c20da946b6be5e8016cef0ea2 Mon Sep 17 00:00:00 2001 From: Stephen Bryant Date: Fri, 6 Feb 2026 14:59:15 +0100 Subject: [PATCH 4/7] Removed negation example using `**` as files cannot be re-included if their parent directories are already excluded --- docs/otignore.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/docs/otignore.md b/docs/otignore.md index ac9b995..16b23d1 100644 --- a/docs/otignore.md +++ b/docs/otignore.md @@ -243,10 +243,6 @@ Large test data files can slow down analysis: # Mock data **/mocks/ **/__mocks__/ - -# But keep test code -!**/*_test.go -!**/*_test.py ``` ### Performance Optimization for Large Repos From 1c5f94ab4378750f4f3609e66651d91655f5464d Mon Sep 17 00:00:00 2001 From: Stephen Bryant Date: Fri, 6 Feb 2026 15:20:26 +0100 Subject: [PATCH 5/7] Removed paragraph from 'File still appears' section that talks about files not being visible unless they're committed. It's not the same thing, and is already covered elsewhere. --- docs/otignore.md | 5 ----- 1 file changed, 5 deletions(-) diff --git a/docs/otignore.md b/docs/otignore.md index 16b23d1..99eadd4 100644 --- a/docs/otignore.md +++ b/docs/otignore.md @@ -284,11 +284,6 @@ node_modules/ ### File Still Appears in Analysis? -**Verify the file is committed:** -- `.otignore` only excludes files that are already in your Git repository -- Check with `git ls-files | grep ` -- If the file isn't committed, OpenTrace won't see it anyway - **Check file location:** - Ensure `.otignore` is in the repository root, not a subdirectory - Verify the file is named exactly `.otignore` or `.opentraceignore` From a51818e90302dc38655d0a67789aa1c0d68df9eb Mon Sep 17 00:00:00 2001 From: Stephen Bryant Date: Fri, 6 Feb 2026 15:22:59 +0100 Subject: [PATCH 6/7] Document default directory exclusions in .otignore guide - Add 'Default Exclusions' section listing always-excluded directories - Explains that .git, node_modules, build, etc. are automatically skipped - Notes that default exclusions work alongside custom .otignore patterns - Adds rationale for why these directories are excluded by default --- docs/otignore.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/docs/otignore.md b/docs/otignore.md index 99eadd4..b096692 100644 --- a/docs/otignore.md +++ b/docs/otignore.md @@ -63,6 +63,27 @@ If both files exist, patterns from both will be applied. Patterns from `.otignor The ignore file must be placed in the **repository root directory**. Ignore files in subdirectories are not currently supported. +### Default Exclusions + +OpenTrace automatically excludes certain directories from analysis, even without an `.otignore` file. These directories are always skipped: + +- `.git` - Git repository metadata +- `.venv`, `venv` - Python virtual environments +- `node_modules` - Node.js dependencies +- `__pycache__` - Python bytecode cache +- `.pytest_cache` - Pytest cache +- `.mypy_cache` - MyPy type checker cache +- `.tox` - Tox testing environments +- `dist` - Distribution/build output +- `build` - Build artifacts +- `.eggs` - Python egg artifacts +- `vendor` - Vendored dependencies + +These default exclusions work alongside your `.otignore` patterns. You don't need to add these directories to your `.otignore` file - they're already excluded automatically. + +!!! note "Why these defaults?" + These directories typically contain dependencies, build artifacts, or caching data that adds noise to analysis without providing value. Excluding them by default improves performance and keeps your knowledge graph focused on your actual source code. + ## Pattern Syntax `.otignore` uses the same pattern syntax as `.gitignore`, powered by the `pathspec` library. This ensures familiar and predictable behavior. From c65223a6997156c4f573a1c6191c1030b557f5b3 Mon Sep 17 00:00:00 2001 From: Stephen Bryant Date: Fri, 6 Feb 2026 17:54:17 +0100 Subject: [PATCH 7/7] Updated the 'why' message, and added some blank linkes to help MD->HTML formatting. --- docs/otignore.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/docs/otignore.md b/docs/otignore.md index b096692..4602105 100644 --- a/docs/otignore.md +++ b/docs/otignore.md @@ -8,13 +8,13 @@ OpenTrace analyzes your repository to build a knowledge graph of your codebase. Common scenarios for using `.otignore`: -- **Generated code**: Protobuf files, OpenAPI specs, or code generated by tools -- **Third-party dependencies**: Vendor directories, node_modules, or other package directories -- **Build artifacts**: Compiled code, distribution directories, or temporary build files +- **Generated code**: Protobuf files, specs, or code generated by tools +- **Third-party dependencies**: Vendor directories, or other package directories +- **Secrets**: API tokens, keys or similar pieces of information - **Large binary files**: Archives, images, or other non-code assets - **Performance optimization**: Reduce analysis time for large repositories -### Difference from .gitignore +### Difference to .gitignore `.otignore` serves a different purpose than `.gitignore`: @@ -289,6 +289,7 @@ For very large repositories, exclude non-essential directories: ### Pattern Not Working? **Check your syntax:** + - Ensure patterns follow gitignore format - Directory patterns need a trailing `/` - Use `/` as the path separator (even on Windows) @@ -306,17 +307,20 @@ node_modules/ ### File Still Appears in Analysis? **Check file location:** + - Ensure `.otignore` is in the repository root, not a subdirectory - Verify the file is named exactly `.otignore` or `.opentraceignore` ### Changes Not Taking Effect? **Commit your .otignore file:** + - Changes only apply after the `.otignore` file is committed - OpenTrace reads the ignore file from the committed repository - Run `git add .otignore && git commit -m "Update otignore patterns"` **Wait for next analysis:** + - Changes apply to new analysis runs, not retroactively - Trigger a new repository sync in OpenTrace to apply changes @@ -329,11 +333,13 @@ OpenTrace uses the `pathspec` library with gitignore-style pattern matching, ens ### Scope of Exclusions The `.otignore` file affects all Git-based integrations: + - GitHub repository analysis - GitLab repository analysis - Any future Git-based source integrations Files excluded by `.otignore` will not appear in: + - The knowledge graph - Code search results - Dependency analysis @@ -342,6 +348,7 @@ Files excluded by `.otignore` will not appear in: ### Performance Impact Using `.otignore` to exclude large directories can significantly improve: + - Analysis speed - Memory usage during analysis - Knowledge graph size and query performance