Skip to content

fix: grep false negatives, output mangling, and truncation annotations#791

Open
BadassBison wants to merge 2 commits intortk-ai:masterfrom
BadassBison:fix/grep-false-negatives-and-truncation-annotations
Open

fix: grep false negatives, output mangling, and truncation annotations#791
BadassBison wants to merge 2 commits intortk-ai:masterfrom
BadassBison:fix/grep-false-negatives-and-truncation-annotations

Conversation

@BadassBison
Copy link

@BadassBison BadassBison commented Mar 23, 2026

Summary

Fixes three issues where RTK's output filtering causes AI agents (Claude Code) to burn extra tokens on retry loops, producing net-negative token impact during analysis-heavy workflows.

  • grep: add --no-ignore to rg — prevents false negatives in repos with .gitignore
  • grep: passthrough for small results (<=50 matches) — preserves standard file:line:content format AI agents can parse
  • smart_truncate: clean truncation — removes synthetic // ... N lines omitted annotations that break AI parsing

Problem

Observed in a real session across a large Rails monorepo (~83K files, 1,633 RTK commands):

Issue Root Cause Impact
grep returns "0 matches" for existing files rg respects .gitignore by default, grep -r doesn't ~10 false negatives led to wrong analysis conclusions
grep output in "217 matches in 1F:" format Always reformatted, even for 4 matches AI agents can't parse it, retry 2-4 times each
// ... 81 lines omitted in file reads smart_truncate inserts synthetic comment markers AI treats annotations as code, retries with alternative commands

Quantified impact: grep had the lowest savings rate (9.3%) but the highest retry cost. Estimated 200-500K tokens burned on retries across ~15 retry patterns, each requiring 2-4 extra tool calls.

Evidence

Screenshots from real Claude Code sessions showing the retry pattern:

  • Claude detects mangled output: "RTK is eating the grep results. Let me use find + xargs instead"
  • Claude switches to Python: "RTK is interfering with grep. Let me use a direct python approach"
  • Claude retries sed: "The sed didn't work with RTK. Let me use python"

Session metrics:

Total RTK commands:    1,633
Tokens saved by RTK:   5.5M (44.5%)
Est. tokens on retries: 200-500K
grep savings rate:     9.3% (lowest of all commands)
grep retry instances:  ~15 (highest retry cost)

The retry loop in detail

When Claude runs grep -rn "def test_" apps/ through RTK, the output gets reformatted to:

217 matches in 1F:

[file] /.../some_file.rb (217):
     2: [] 1420, appfolio-developers
    24: [] 1496, appfolio-developers

Claude can't extract the data it needs from this format. It then retries with workarounds:

# Attempt 1 (rewritten by RTK, output mangled):
grep -rn "def test_" apps/tportal/test/selenium/

# Attempt 2 (Claude tries find + xargs to bypass RTK):
find apps/tportal/test -name "*.rb" -exec grep -l "def test_" {} \;

# Attempt 3 (Claude switches to python):
python3 -c "import os, re; ..."

# Finally gets actual results on attempt 3-4

Each retry burns 500-2000 tokens. Across 15 instances in a single session, this adds up to 200-500K wasted tokens.

False negatives — the most damaging case

RTK's grep returned "0 matches" for patterns that actually existed because rg respects .gitignore while grep -r does not. In a large monorepo, this caused:

  • 13 test mutes marked as "file not found" when 10 actually existed in different subdirectories
  • Hours of rework to correct the false conclusions
  • Multiple agent retries across 10 parallel subagents, each hitting the same filtering issue

Truncation annotations break file parsing

When head -5 file.rb gets rewritten to rtk read file.rb --max-lines 5, the old smart_truncate function inserted synthetic comment markers like // ... 79 lines omitted in the middle of the output. AI agents treated these as actual file content, got confused about the file structure, and retried with alternative commands (Read tool, python, etc.), doubling the token cost.

Changes

1. src/grep_cmd.rs--no-ignore flag

Added --no-ignore to the rg invocation so it doesn't skip files listed in .gitignore. This matches grep -r behavior and eliminates false negatives in repos where test files, build artifacts, or generated code live in gitignored directories.

2. src/grep_cmd.rs — Passthrough for small results

Results with <=50 matches now output raw file:line:content format (standard grep output that AI agents already know how to parse). The grouped "X matches in Y files:" format is preserved only for >50 matches where token savings are meaningful. For small result sets, the token savings from grouping are negligible (~9.3%) but the retry cost from mangling is high (500-2000 tokens per retry).

3. src/filter.rs — Clean truncation in smart_truncate

Replaced the "smart" truncation logic that scattered " // ... N lines omitted" markers throughout file content with clean first-N-lines truncation. A single [X more lines] marker appears at the end only. The old annotations were treated as actual code by AI agents, causing parsing confusion and retry loops.

Why this fix

--no-ignore on rg: tracing the exact code path

When Claude runs grep "def test_" apps/tportal/test/selenium/:

grep "def test_" apps/tportal/test/selenium/
  → hook rewrites to: rtk grep "def test_" apps/tportal/test/selenium/
  → Clap parses: pattern="def test_", path="apps/tportal/test/selenium/"
  → grep_cmd::run() executes: rg -n --no-heading "def test_" apps/tportal/test/selenium/

If any files under that path are covered by a .gitignore entry, rg silently skips them. grep -r would not. The output is empty, and grep_cmd.rs:60-65 prints "0 matches".

Four alternative explanations were considered and ruled out:

  1. Regex difference (BRE vs PCRE): Already handled on line 26 (\||). The user's patterns (def test_, simple strings) have no BRE/PCRE divergence.
  2. Wrong cwd: RTK doesn't change cwd. Both rg and grep would fail on a bad path, but the files were confirmed to exist.
  3. Clap parse failure falling through to raw grep: This would give correct results (raw grep runs in the fallback path), not false negatives. The false negative only happens when grep_cmd::run() executes — meaning Clap succeeded.
  4. Explicit file path vs directory search: When rg is given an explicit file path, it does NOT apply gitignore rules — only during directory traversal. So the false negatives specifically affect directory searches.

Only the gitignore explanation produces the observed behavior: rg returns empty stdout on a directory search where grep -r returns matches.

Honest tradeoff: --no-ignore is broad — it disables .gitignore, .ignore, AND .rgignore. This means rg will now traverse node_modules/, vendor/, etc., which is potentially slower. A more surgical option is --no-ignore-vcs (only disables .gitignore/.hgignore). However, the goal is to match grep -r behavior exactly, and grep -r also traverses node_modules. Performance-wise, rg is fast enough that the hit is negligible compared to the cost of a false negative (2-4 retry commands at 500-2000 tokens each).

Passthrough for <=50 matches: the quantitative argument

For a 10-match result (typical small search):

  • Raw output: ~500 tokens (10 lines × ~50 tokens)
  • Grouped format: ~450 tokens (headers + formatting + content)
  • Savings: ~50 tokens (10%)
  • Cost of ONE retry when agent can't parse grouped format: 500-2000 tokens
  • Net: -450 to -1950 tokens

For a 200-match result (large search):

  • Raw output: ~10,000 tokens
  • Grouped format: ~3,000 tokens
  • Savings: ~7,000 tokens (70%)
  • Cost of one retry: ~1,500 tokens
  • Net: +5,500 tokens even with a retry

The crossover point is where savings exceed retry cost. From the session data, if even 30% of small-result grep calls trigger retries:

142 grep calls × 30% retry rate = ~43 retries
43 retries × 1,000 tokens avg = 43,000 tokens burned
142 calls × 9.3% savings × ~500 avg tokens = ~6,600 tokens saved
Net: -36,400 tokens

At 50 matches (~2,500 tokens raw, ~1,500 grouped, ~1,000 saved), savings-per-call roughly equals single-retry cost. Below 50, retry risk outweighs savings. Above 50, savings dominate.

Why not disable grep filtering entirely? Because large results DO benefit from grouping. The user's rtk find saved 3.2M tokens at 73%. Filtering works when there's genuinely large output. The fix targets the specific range where it's counterproductive.

Clean truncation: the old behavior was fundamentally broken

The old code inserted " // ... 87 lines omitted" as a code comment mid-file. This is wrong for two independent reasons:

Language mismatch. The marker uses // comment syntax, but truncated files could be Ruby (# comments), Python (#), YAML (#), Shell (#), etc. An AI reading a Ruby file sees // ... 87 lines omitted and interprets it as invalid syntax, not a truncation marker.

Unpredictable placement. The old code scattered annotations throughout the output based on "structural importance" heuristics. An AI agent encounters these markers at unpredictable positions, breaking any line-by-line parsing logic.

The new [X more lines] format is:

  • Not valid syntax in any programming language (unambiguously metadata)
  • At the end only (predictable position)
  • Parseable by simple regex if needed

Why not truncate silently with no marker? An AI agent needs to know truncation occurred. Without a marker, it might assume it saw the full file and draw incorrect conclusions. [X more lines] communicates truncation without being confused with file content.

What was NOT changed (and why)

  • sed is already in IGNORED_PREFIXES — never rewritten by RTK. The user's sed issues are downstream effects of grep/cat problems.
  • catrtk read still uses FilterLevel::Minimal — the minimal filter (strips blank lines/comments) had 28.8% savings across 214 calls (~1.5M tokens saved). The user's complaint was about the // ... omitted annotations (fixed), not the filtering itself.
  • The 50-match threshold is hardcoded — a config option would be over-engineering for a reasonable default. If tuning is needed later, it's a one-line change.

Tests

  • test_smart_truncate_no_annotations — verifies no // ... markers in output
  • test_smart_truncate_no_truncation_when_under_limit — no truncation when content fits
  • test_smart_truncate_exact_limit — edge case at exact line count
  • test_rg_no_ignore_flag_accepted — verifies rg accepts the new flag

Test plan

  • cargo fmt --all && cargo clippy --all-targets && cargo test --all
  • Manual: rtk grep "fn run" src/ with <50 results outputs raw file:line:content format
  • Manual: rtk read src/main.rs --max-lines 5 shows clean truncation without // ... markers
  • Manual: verify grep finds files in .gitignored directories

Three fixes for issues causing AI agents to burn tokens on retry loops:

1. grep: add --no-ignore to rg invocation (src/grep_cmd.rs)
   rg respects .gitignore by default while grep -r does not. In large
   monorepos (~83K files), this caused rg to return 0 matches for files
   in gitignored directories, producing false negatives. AI agents then
   concluded files/methods didn't exist and drew wrong conclusions.

2. grep: passthrough for small result sets (src/grep_cmd.rs)
   Results with <=50 matches now output raw file:line:content format
   instead of the grouped "X matches in YF:" format. The grouped format
   confused AI agents which couldn't parse it, triggering 2-4 retry
   attempts per search (each burning 500-2000 tokens). The grouped
   format is preserved for >50 matches where token savings matter.

3. smart_truncate: clean truncation without annotations (src/filter.rs)
   Replaced the "smart" truncation that inserted synthetic
   "// ... N lines omitted" comment markers throughout file content.
   AI agents treated these as actual code, got confused, and retried
   with alternative commands. New behavior: clean first-N-lines
   truncation with "[X more lines]" at the end only.

Evidence from a real session (1,633 RTK commands):
- grep had lowest savings rate (9.3%) but highest retry cost
- ~15 retry patterns observed, each 2-4 extra tool calls
- ~10 false negative searches led to wrong analysis conclusions
- Estimated 200-500K tokens burned on retries
- Net token impact was negative for grep-heavy workflows
@CLAassistant
Copy link

CLAassistant commented Mar 23, 2026

CLA assistant check
All committers have signed the CLA.

@pszymkowiak pszymkowiak added bug Something isn't working effort-medium 1-2 jours, quelques fichiers filter-quality Filter produces incorrect/truncated signal labels Mar 23, 2026
@pszymkowiak
Copy link
Collaborator

[w] wshm · Automated triage by AI

📊 Automated PR Analysis

🐛 Type bug-fix
🟡 Risk medium

Summary

Fixes three issues in grep and smart_truncate that caused AI agents to waste tokens on retry loops: adds --no-ignore to rg so gitignored files aren't silently skipped, passes through raw grep output for small result sets (<=50 matches) instead of a grouped format that confused AI parsers, and replaces synthetic '// ... N lines omitted' truncation markers with clean first-N-lines truncation plus a single '[X more lines]' suffix.

Review Checklist

  • Tests present
  • Breaking change
  • Docs updated

Analyzed automatically by wshm · This is an automated analysis, not a human review.

Update all docs referencing grep's output strategy to reflect the new
behavior: raw passthrough for <=50 matches, grouped format for >50.

Files updated: CLAUDE.md, README.md, README_{fr,es,ja,ko,zh}.md,
INSTALL.md, ARCHITECTURE.md, docs/AUDIT_GUIDE.md, docs/FEATURES.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working effort-medium 1-2 jours, quelques fichiers filter-quality Filter produces incorrect/truncated signal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants