Skip to content

fix(rewrite): skip filtering when output is redirected to a file#323

Merged
mpecan merged 2 commits intomainfrom
feat/322-skip-redirected-commands
Apr 9, 2026
Merged

fix(rewrite): skip filtering when output is redirected to a file#323
mpecan merged 2 commits intomainfrom
feat/322-skip-redirected-commands

Conversation

@mpecan
Copy link
Copy Markdown
Owner

@mpecan mpecan commented Apr 9, 2026

Summary

When an agent runs git diff > /tmp/foo.txt, tokf was rewriting it to tokf run git diff > /tmp/foo.txt, applying the git/diff --stat filter, and then bash redirected the filtered output into the file. The agent — who explicitly used > precisely because they wanted the raw form for downstream processing — got the wrong data.

This PR adds has_toplevel_output_redirect() to bash_ast.rs, mirroring the existing has_toplevel_heredoc() helper, and wires it into should_skip() in rules.rs. Commands with output redirection are now passed through unchanged.

Closes #322

Design

The walker descends into command redirects, pipelines, subshells, brace groups, and the redirects field of compound constructs (if/while/until/for/select/case/(( ))). It deliberately does NOT descend into:

  • CommandSubstitution / ProcessSubstitution — separate scopes; an inner redirect is a side effect of the substitution, not a property of the outer command. So echo $(git diff > /tmp/foo) does not skip the outer echo.
  • Function — a definition produces no output; the body runs at call time. Skipping the definition would silently disable filtering at every call site of the function.
  • List — segments are handled by the per-segment loop in mod.rs:307, so the whole-command check at mod.rs:280 returns false for compound commands and only the redirected segment is skipped instead of the whole compound. This asymmetry vs heredoc handling is intentional and documented in the walker's doc comment — heredocs bind lexically across compound segments, file redirects don't.

The redirect-vs-fd discriminator (is_file_output_op) handles every bash output operator:

Operator Example Skip?
>, >>, >|, &>, &>> cmd > file yes
1>, 2>, 1>>, 2>> cmd 2> err yes
<> exec 3<> sock yes (read+write)
>& with file target cmd >& log (bash ext) yes
>& with digit target cmd 2>&1 no (fd merge)
>&- (also <&-, 2>&-) cmd >&- no (close fd)
<, <<< cmd < input no (input)

For non-Word / variable-expansion targets like cmd >&\$fd we cannot statically resolve the target, so we err on the side of skipping (safer than filtering raw bytes into a file the agent will read back).

Headline behaviour

$ tokf rewrite 'git diff > /tmp/foo.txt'
git diff > /tmp/foo.txt
# unchanged — agent gets the raw diff in the file

$ tokf rewrite 'git diff > foo.txt; git status'
git diff > foo.txt; tokf run git status
# only the redirected segment is skipped; the other still gets filtered

$ tokf rewrite 'git diff 2>&1'
tokf run git diff 2>&1
# fd merge only, no file involved — filter still applies

$ tokf rewrite 'echo \$(git diff > /tmp/foo)'
echo \$(git diff > /tmp/foo)
# substitution is a separate scope — outer echo unaffected

Test plan

  • 31 new AST tests in bash_ast_tests.rs covering every redirect operator (>, >>, >|, &>, &>>, 1>, 2>, <>, >& with digit and file targets, >&-, 2>&-, <, <<<), every walker arm (Command, Pipeline, Subshell outer + inner, BraceGroup, If/While/For compound constructs), and every "should not descend" scope (CommandSubstitution, Function, list at top level, double-quoted >, single-quoted >, herestring).
  • 9 new should_skip tests in rules.rs covering the end-to-end path through the public API.
  • 5 new integration tests in tests_compound.rs exercising the real rewrite pipeline:
    • rewrite_skips_command_with_output_redirect — the basic case
    • rewrite_skips_only_redirected_segment_in_compoundgit diff > foo.txt; git status → only diff skipped
    • rewrite_skips_redirected_segment_after_andgit status && git diff > foo.txt → only diff skipped
    • rewrite_does_not_skip_fd_merge_onlygit status 2>&1 → still wrapped
    • rewrite_skips_command_with_pipeline_redirectgit diff > foo.txt | head → unchanged
  • cargo test --workspace2010 passed, 0 failed, 166 ignored
  • cargo clippy --workspace --all-targets -- -D warnings → clean
  • cargo fmt --check → clean
  • cargo run -p tokf -- verify --scope stdlib134/134 stdlib filters passed (no regressions)
  • Manual tokf rewrite "..." sanity checks confirmed all four headline cases behave as documented above

Files

  • crates/tokf-cli/src/rewrite/bash_ast.rs — new public API + walker + helpers
  • crates/tokf-cli/src/rewrite/rules.rs — wired into should_skip after the heredoc check
  • crates/tokf-cli/src/rewrite/bash_ast_tests.rs — 31 new AST-level tests
  • crates/tokf-cli/src/rewrite/tests_compound.rs — 5 new integration tests
  • docs/rewrites-config.md — new "Implicit skip rules" section covering both heredocs (existing) and output redirects (new)
  • README.md — regenerated from docs/

Out of scope

  • tee in pipelinestee is a command argument, not a redirect operator. Worth a follow-up issue.
  • Process substitution (>(grep ...)) — separate ProcessSubstitution node; deferred.
  • Compound commands with mixed segments and a heredoc — current heredoc handling skips the whole compound; this PR keeps that asymmetry.

🤖 Generated with Claude Code

When an agent runs `git diff > /tmp/foo.txt`, tokf was rewriting it to
`tokf run git diff > /tmp/foo.txt`, applying the `git/diff --stat` filter,
and then bash redirected the **filtered** output into the file. The agent
— who explicitly used `>` precisely because they wanted the **raw** form
for downstream processing — got the wrong data.

Add `has_toplevel_output_redirect()` to `bash_ast.rs`, mirroring the
existing `has_toplevel_heredoc()` helper, and wire it into `should_skip()`
in `rules.rs` after the heredoc check. The walker descends into command
redirects, pipelines, subshells, brace groups, and the `redirects` field
of compound constructs (`if/while/until/for/select/case/((  ))`). It
deliberately does NOT descend into:

- `CommandSubstitution` / `ProcessSubstitution` — separate scopes; an
  inner redirect is a side effect of the substitution, not a property of
  the outer command.
- `Function` — a definition produces no output; the body runs at call
  time, where the call site is what we'd evaluate.
- `List` — segments are handled by the per-segment loop in `mod.rs:307`,
  so the whole-command check at `mod.rs:280` returns false for compound
  commands and only the redirected segment is skipped instead of the
  whole compound. This asymmetry vs heredoc handling is intentional and
  documented in the walker's doc comment.

The redirect-vs-fd discriminator (`is_file_output_op`):
- Operators not containing `>` (input redirects: `<`, `<<`, `<<<`, `<&`)
  → never a file write.
- `op == ">&-"` (close fd; rable normalises both `>&-` and `<&-` here)
  → not a file write.
- `op == ">&"` or `<&` with a digit-only target (`2>&1`, `1>&2`) → fd
  duplication, not a file write.
- `op == ">&"` with a non-digit target (`>& filename`, bash extension)
  → file write.
- Everything else with `>` (`>`, `>>`, `>|`, `&>`, `&>>`, `1>`, `2>`,
  `<>`, etc.) → file write. For non-Word / variable-expansion targets
  like `cmd >&$fd` we cannot statically resolve the target, so we err on
  the side of skipping (safer than filtering raw bytes into a file).

Tests
- 31 new AST tests in `bash_ast_tests.rs` covering every redirect
  operator (`>`, `>>`, `>|`, `&>`, `&>>`, `1>`, `2>`, `<>`, `>&` with
  digit and file targets, `>&-`, `2>&-`, `<`, `<<<`), every walker arm
  (`Command`, `Pipeline`, `Subshell` outer + inner, `BraceGroup`,
  `If/While/For` compound constructs), and every "should not descend"
  scope (`CommandSubstitution`, `Function`, list at top level, double
  and single quoted `>`, herestring).
- 9 should_skip integration tests in `rules.rs` covering the end-to-end
  path through the public API.
- 5 integration tests in `tests_compound.rs` exercising the real
  rewrite pipeline:
    rewrite_skips_command_with_output_redirect
    rewrite_skips_only_redirected_segment_in_compound
    rewrite_skips_redirected_segment_after_and
    rewrite_does_not_skip_fd_merge_only
    rewrite_skips_command_with_pipeline_redirect
- `docs/rewrites-config.md` gains a new "Implicit skip rules" section
  covering both heredocs (existing) and output redirects (new) with a
  full example block, including the per-segment compound behaviour and
  a note on the deferred `tee` follow-up.

Verified
- cargo test --workspace                                  -> 2010 passed
- cargo clippy --workspace --all-targets -- -D warnings   -> clean
- cargo fmt --check                                       -> clean
- cargo run -p tokf -- verify --scope stdlib              -> 134/134 passed
- Manual `tokf rewrite "..."` sanity checks confirmed expected behaviour
  for the four headline cases (single redirect, compound per-segment,
  fd merge only, redirect inside command substitution).

Closes #322

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@repository-butler
Copy link
Copy Markdown
Contributor

Filter Verification Report

Changed Filters

No filter files changed in this PR.

All Filters Summary

✅ 134/134 test cases passed across 49 filters


Generated by tokf verify

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the tokf rewrite pipeline to avoid rewriting commands whose output is redirected to a file, ensuring agents capture raw, unfiltered bytes when they explicitly use shell redirection (>, >>, &>, <>, etc.). This aligns rewrite behavior with strong user intent and prevents silently corrupted downstream processing.

Changes:

  • Add has_toplevel_output_redirect() (AST-based) and supporting redirect classification logic in bash_ast.rs.
  • Wire the new check into should_skip() so redirected segments pass through unchanged.
  • Add extensive unit + integration tests and document the new implicit skip rule (alongside heredocs).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
crates/tokf-cli/src/rewrite/bash_ast.rs Adds AST walker + redirect operator classification for “output to file” detection.
crates/tokf-cli/src/rewrite/rules.rs Uses the new AST check to skip rewriting when output is redirected to a file.
crates/tokf-cli/src/rewrite/bash_ast_tests.rs Adds AST-level test coverage for redirect operators and scope boundaries.
crates/tokf-cli/src/rewrite/tests_compound.rs Adds integration tests proving correct per-segment behavior in compound commands.
docs/rewrites-config.md Documents the new implicit skip rule for output redirection.
README.md Propagates the documentation update into the generated README.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- bash_ast.rs walker doc: replace hard-coded `mod.rs:280` line reference
  with a control-flow description (top-level skip check runs before
  `split_compound`; segments are re-checked in the per-segment loop).
  Line numbers go stale; behavioural descriptions don't.
- bash_ast.rs `is_file_output_op`: drop the dead `<&` arm in the
  fd-merge branch. The early return on `!op.contains('>')` already
  excludes `<&`, so listing it here was unreachable and misleading
  about which cases the function actually handles. Comment updated to
  call out the early-return guard explicitly.
- tests_compound.rs `rewrite_skips_command_with_pipeline_redirect`:
  fix the test comment that incorrectly claimed bash errors on
  `git diff > foo.txt | head`. Bash doesn't error — the redirect sends
  stdout to the file and the piped command just reads EOF.

No behaviour change. Doc/comment cleanups only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mpecan mpecan merged commit dd66681 into main Apr 9, 2026
5 checks passed
@mpecan mpecan deleted the feat/322-skip-redirected-commands branch April 9, 2026 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(rewrite): skip filtering when command output is redirected to a file

2 participants