Skip to content

Feature: Improve file discovery and tool usage for Grok agent #31

@KHAEntertainment

Description

@KHAEntertainment

Feature Request: Improve File Discovery and Tool Usage for Grok Agent

Problem

Currently, Grok has limited file discovery and tool usage capabilities that make it less effective as an autonomous agent:

Issue 1: No Glob Pattern Expansion

The --files argument requires explicit paths. Glob patterns like *.py or **/*.py are passed as literal strings.

# Current behavior - FAILS:
--files "src/**/*.py"

# Workaround required:
--files src/foo.py src/bar.py src/baz.py  # must list every file

Issue 2: No Recursive Directory Discovery

Grok cannot discover relevant files in a directory tree on its own. Claude Code must pre-discover files and pass them explicitly.

Issue 3: No Path Filtering

Even if Grok could scan directories, there's no mechanism to:

  • Filter by file type (e.g., only .py files)
  • Exclude patterns (e.g., **/test_*.py, **/node_modules/**)
  • Limit file count or size

Issue 4: Single File Context Mode

The current architecture sends all files as a single concatenated context. For large codebases, this:

  • Consumes tokens inefficiently
  • Makes Grok parse through irrelevant files
  • Prevents selective file analysis

Impact

These limitations make it harder to use Grok as a true sub-agent because:

  1. Claude Code must do file discovery work that Grok could do itself
  2. Claude Code must pass file paths explicitly, increasing prompt size
  3. Large codebase analysis requires manual file selection

Proposed Solutions

Solution 1: Directory Scanning Mode

# New CLI argument
parser.add_argument("--scan-dir", type=str, help="Scan directory recursively for files matching patterns")

# Example usage:
--scan-dir ./src --include "*.py" --exclude "**/test_*.py" --max-files 100

Implementation in grok_bridge.py:

def scan_directory(
    path: str,
    include_patterns: List[str] = None,
    exclude_patterns: List[str] = None,
    max_files: int = 100,
    extensions: List[str] = None
) -> List[str]:
    """Recursively discover files matching criteria."""
    discovered = []
    path_obj = Path(path)

    if not path_obj.exists():
        return []

    for ext in (extensions or []):
        for pattern in (include_patterns or ["*"]):
            for match in path_obj.glob(f"**/{pattern}{ext}"):
                if match.is_file():
                    # Apply exclusions
                    if any(excl in str(match) for excl in (exclude_patterns or [])):
                        continue
                    discovered.append(str(match))

    return discovered[:max_files]

Solution 2: MCP Server Integration

Instead of file-based context, expose an MCP server that Grok can call to:

  • List files in directory
  • Read specific files on demand
  • Write files to specified locations
# Conceptual MCP server for Grok bridge
class GrokBridgeMCP:
    def list_files(self, path: str, pattern: str = "*") -> List[str]:
        """List files matching pattern in directory."""

    def read_file(self, path: str, offset: int = 0, limit: int = None) -> str:
        """Read file content with optional pagination."""

    def write_file(self, path: str, content: str) -> bool:
        """Write content to file."""

    def glob(self, pattern: str, root: str = ".") -> List[str]:
        """Glob pattern matching."""

This would give Grok tool-calling ability while keeping file I/O local to the bridge.

Solution 3: Intelligent File Selection

Add a --auto-discover flag that uses heuristics:

  • Look for package.json, pyproject.toml, go.mod to identify project root
  • Use language-specific patterns (e.g., **/*.py for Python projects)
  • Apply .gitignore-like exclusions
  • Limit based on token budget
def auto_discover_files(target: str, token_budget: int = 500_000) -> List[str]:
    """Automatically discover relevant files based on project type."""
    path = Path(target)

    # Detect project type
    if (path / "pyproject.toml").exists():
        extensions = [".py"]
        exclude = ["**/test_*.py", "**/__pycache__/**", "**/.venv/**"]
    elif (path / "package.json").exists():
        extensions = [".js", ".ts", ".jsx", ".tsx"]
        exclude = ["**/node_modules/**", "**/dist/**"]
    # ... etc

    # Discover and filter
    files = []
    for ext in extensions:
        files.extend(path.glob(f"**/*{ext}"))

    files = [f for f in files if not any(excl in str(f) for excl in exclude)]
    files = sort_by_relevance(files)  # prioritize main/source over tests

    # Trim to token budget
    selected = []
    total_tokens = 0
    for f in files:
        content = f.read_text()
        tokens = len(content) // 4  # rough estimate
        if total_tokens + tokens > token_budget:
            break
        selected.append(f)
        total_tokens += tokens

    return selected

Priority

Medium — Current workaround (manual file listing) works but is inconvenient. Would improve agent autonomy.

Labels

  • enhancement
  • agent
  • bridge
  • file-discovery

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions