fix(rewrite): skip filtering when output is redirected to a file#323
Merged
fix(rewrite): skip filtering when output is redirected to a file#323
Conversation
When an agent runs `git diff > /tmp/foo.txt`, tokf was rewriting it to
`tokf run git diff > /tmp/foo.txt`, applying the `git/diff --stat` filter,
and then bash redirected the **filtered** output into the file. The agent
— who explicitly used `>` precisely because they wanted the **raw** form
for downstream processing — got the wrong data.
Add `has_toplevel_output_redirect()` to `bash_ast.rs`, mirroring the
existing `has_toplevel_heredoc()` helper, and wire it into `should_skip()`
in `rules.rs` after the heredoc check. The walker descends into command
redirects, pipelines, subshells, brace groups, and the `redirects` field
of compound constructs (`if/while/until/for/select/case/(( ))`). It
deliberately does NOT descend into:
- `CommandSubstitution` / `ProcessSubstitution` — separate scopes; an
inner redirect is a side effect of the substitution, not a property of
the outer command.
- `Function` — a definition produces no output; the body runs at call
time, where the call site is what we'd evaluate.
- `List` — segments are handled by the per-segment loop in `mod.rs:307`,
so the whole-command check at `mod.rs:280` returns false for compound
commands and only the redirected segment is skipped instead of the
whole compound. This asymmetry vs heredoc handling is intentional and
documented in the walker's doc comment.
The redirect-vs-fd discriminator (`is_file_output_op`):
- Operators not containing `>` (input redirects: `<`, `<<`, `<<<`, `<&`)
→ never a file write.
- `op == ">&-"` (close fd; rable normalises both `>&-` and `<&-` here)
→ not a file write.
- `op == ">&"` or `<&` with a digit-only target (`2>&1`, `1>&2`) → fd
duplication, not a file write.
- `op == ">&"` with a non-digit target (`>& filename`, bash extension)
→ file write.
- Everything else with `>` (`>`, `>>`, `>|`, `&>`, `&>>`, `1>`, `2>`,
`<>`, etc.) → file write. For non-Word / variable-expansion targets
like `cmd >&$fd` we cannot statically resolve the target, so we err on
the side of skipping (safer than filtering raw bytes into a file).
Tests
- 31 new AST tests in `bash_ast_tests.rs` covering every redirect
operator (`>`, `>>`, `>|`, `&>`, `&>>`, `1>`, `2>`, `<>`, `>&` with
digit and file targets, `>&-`, `2>&-`, `<`, `<<<`), every walker arm
(`Command`, `Pipeline`, `Subshell` outer + inner, `BraceGroup`,
`If/While/For` compound constructs), and every "should not descend"
scope (`CommandSubstitution`, `Function`, list at top level, double
and single quoted `>`, herestring).
- 9 should_skip integration tests in `rules.rs` covering the end-to-end
path through the public API.
- 5 integration tests in `tests_compound.rs` exercising the real
rewrite pipeline:
rewrite_skips_command_with_output_redirect
rewrite_skips_only_redirected_segment_in_compound
rewrite_skips_redirected_segment_after_and
rewrite_does_not_skip_fd_merge_only
rewrite_skips_command_with_pipeline_redirect
- `docs/rewrites-config.md` gains a new "Implicit skip rules" section
covering both heredocs (existing) and output redirects (new) with a
full example block, including the per-segment compound behaviour and
a note on the deferred `tee` follow-up.
Verified
- cargo test --workspace -> 2010 passed
- cargo clippy --workspace --all-targets -- -D warnings -> clean
- cargo fmt --check -> clean
- cargo run -p tokf -- verify --scope stdlib -> 134/134 passed
- Manual `tokf rewrite "..."` sanity checks confirmed expected behaviour
for the four headline cases (single redirect, compound per-segment,
fd merge only, redirect inside command substitution).
Closes #322
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Filter Verification ReportChanged FiltersNo filter files changed in this PR. All Filters Summary✅ 134/134 test cases passed across 49 filters Generated by |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the tokf rewrite pipeline to avoid rewriting commands whose output is redirected to a file, ensuring agents capture raw, unfiltered bytes when they explicitly use shell redirection (>, >>, &>, <>, etc.). This aligns rewrite behavior with strong user intent and prevents silently corrupted downstream processing.
Changes:
- Add
has_toplevel_output_redirect()(AST-based) and supporting redirect classification logic inbash_ast.rs. - Wire the new check into
should_skip()so redirected segments pass through unchanged. - Add extensive unit + integration tests and document the new implicit skip rule (alongside heredocs).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
crates/tokf-cli/src/rewrite/bash_ast.rs |
Adds AST walker + redirect operator classification for “output to file” detection. |
crates/tokf-cli/src/rewrite/rules.rs |
Uses the new AST check to skip rewriting when output is redirected to a file. |
crates/tokf-cli/src/rewrite/bash_ast_tests.rs |
Adds AST-level test coverage for redirect operators and scope boundaries. |
crates/tokf-cli/src/rewrite/tests_compound.rs |
Adds integration tests proving correct per-segment behavior in compound commands. |
docs/rewrites-config.md |
Documents the new implicit skip rule for output redirection. |
README.md |
Propagates the documentation update into the generated README. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- bash_ast.rs walker doc: replace hard-coded `mod.rs:280` line reference
with a control-flow description (top-level skip check runs before
`split_compound`; segments are re-checked in the per-segment loop).
Line numbers go stale; behavioural descriptions don't.
- bash_ast.rs `is_file_output_op`: drop the dead `<&` arm in the
fd-merge branch. The early return on `!op.contains('>')` already
excludes `<&`, so listing it here was unreachable and misleading
about which cases the function actually handles. Comment updated to
call out the early-return guard explicitly.
- tests_compound.rs `rewrite_skips_command_with_pipeline_redirect`:
fix the test comment that incorrectly claimed bash errors on
`git diff > foo.txt | head`. Bash doesn't error — the redirect sends
stdout to the file and the piped command just reads EOF.
No behaviour change. Doc/comment cleanups only.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When an agent runs
git diff > /tmp/foo.txt, tokf was rewriting it totokf run git diff > /tmp/foo.txt, applying thegit/diff --statfilter, and then bash redirected the filtered output into the file. The agent — who explicitly used>precisely because they wanted the raw form for downstream processing — got the wrong data.This PR adds
has_toplevel_output_redirect()tobash_ast.rs, mirroring the existinghas_toplevel_heredoc()helper, and wires it intoshould_skip()inrules.rs. Commands with output redirection are now passed through unchanged.Closes #322
Design
The walker descends into command redirects, pipelines, subshells, brace groups, and the
redirectsfield of compound constructs (if/while/until/for/select/case/(( ))). It deliberately does NOT descend into:CommandSubstitution/ProcessSubstitution— separate scopes; an inner redirect is a side effect of the substitution, not a property of the outer command. Soecho $(git diff > /tmp/foo)does not skip the outerecho.Function— a definition produces no output; the body runs at call time. Skipping the definition would silently disable filtering at every call site of the function.List— segments are handled by the per-segment loop inmod.rs:307, so the whole-command check atmod.rs:280returns false for compound commands and only the redirected segment is skipped instead of the whole compound. This asymmetry vs heredoc handling is intentional and documented in the walker's doc comment — heredocs bind lexically across compound segments, file redirects don't.The redirect-vs-fd discriminator (
is_file_output_op) handles every bash output operator:>,>>,>|,&>,&>>cmd > file1>,2>,1>>,2>>cmd 2> err<>exec 3<> sock>&with file targetcmd >& log(bash ext)>&with digit targetcmd 2>&1>&-(also<&-,2>&-)cmd >&-<,<<<cmd < inputFor non-Word / variable-expansion targets like
cmd >&\$fdwe cannot statically resolve the target, so we err on the side of skipping (safer than filtering raw bytes into a file the agent will read back).Headline behaviour
Test plan
bash_ast_tests.rscovering every redirect operator (>,>>,>|,&>,&>>,1>,2>,<>,>&with digit and file targets,>&-,2>&-,<,<<<), every walker arm (Command,Pipeline,Subshellouter + inner,BraceGroup,If/While/Forcompound constructs), and every "should not descend" scope (CommandSubstitution,Function, list at top level, double-quoted>, single-quoted>, herestring).should_skiptests inrules.rscovering the end-to-end path through the public API.tests_compound.rsexercising the real rewrite pipeline:rewrite_skips_command_with_output_redirect— the basic caserewrite_skips_only_redirected_segment_in_compound—git diff > foo.txt; git status→ only diff skippedrewrite_skips_redirected_segment_after_and—git status && git diff > foo.txt→ only diff skippedrewrite_does_not_skip_fd_merge_only—git status 2>&1→ still wrappedrewrite_skips_command_with_pipeline_redirect—git diff > foo.txt | head→ unchangedcargo test --workspace→ 2010 passed, 0 failed, 166 ignoredcargo clippy --workspace --all-targets -- -D warnings→ cleancargo fmt --check→ cleancargo run -p tokf -- verify --scope stdlib→ 134/134 stdlib filters passed (no regressions)tokf rewrite "..."sanity checks confirmed all four headline cases behave as documented aboveFiles
crates/tokf-cli/src/rewrite/bash_ast.rs— new public API + walker + helperscrates/tokf-cli/src/rewrite/rules.rs— wired intoshould_skipafter the heredoc checkcrates/tokf-cli/src/rewrite/bash_ast_tests.rs— 31 new AST-level testscrates/tokf-cli/src/rewrite/tests_compound.rs— 5 new integration testsdocs/rewrites-config.md— new "Implicit skip rules" section covering both heredocs (existing) and output redirects (new)README.md— regenerated fromdocs/Out of scope
teein pipelines —teeis a command argument, not a redirect operator. Worth a follow-up issue.>(grep ...)) — separateProcessSubstitutionnode; deferred.🤖 Generated with Claude Code