[gardener/classifiers] raise DIFF_CAP and filter noise — 20KB is 6% of Haiku's window

## Problem

Both classifiers (`anthropic.ts:40`, `claude-cli.ts:36`) cap diff input at 20,000 bytes with no content awareness. Two issues compound:

1. **20KB is far too small.** The default model is `claude-haiku-4-5` (200K context window). Current total prompt is ~55KB (~13K tokens) — only 6% of available context. The cap truncates real PRs at 300-400 lines of code. A typical feature PR is 800-1500 lines; a refactor or migration is multiples of that.

2. **Lockfile noise eats the budget.** A PR that regenerates `pnpm-lock.yaml` blows through 20KB of lock diff before a single line of real code reaches the classifier. `tree sync` already solved this (`sync.ts:665` `formatPrDiffForPrompt`) by filtering noise first, then truncating.

## Proposal

### 1. Raise caps

| Constant | Current | New | Share of 200K window |
|---|---|---|---|
| `DIFF_CAP` (both classifiers) | 20,000 | **200,000** | ~25% |
| `DIGEST_BUDGET_BYTES` (`tree-digest.ts:25`) | 30,000 | **100,000** | ~12% |
| Combined with system + metadata | ~55KB | ~310KB | ~38% of Haiku's 200K |

Leaves ~60% of window for prompt-caching overhead, tokenizer variance on mixed Chinese/code content, and Anthropic's soft ~180-190K prompt-length threshold. Works for Sonnet/Opus upgrades without re-tuning.

### 2. Filter noise before truncating

Drop from both classifiers' diff input:

- `*.lock`, `package-lock.json`, `pnpm-lock.yaml`, `yarn.lock`, `Cargo.lock`, `poetry.lock`, `Gemfile.lock`
- `*.min.js`, `*.min.css`, `*.map`
- `dist/`, `build/`, `node_modules/`, `coverage/`, `__pycache__/`, `out/`

Regex list already exists in `sync.ts:626-630`. Extract to shared `engine/classifiers/diff-filter.ts` so sync and both classifiers share one source of truth.

### 3. Single default — no flag

One set of numbers for all scenarios (daemon sweeps, CI, manual invocations). Keeping it simple; we can add an override later only if a real workload demands it.

## Cost implication

Haiku 4.5 input pricing is cheap enough that raising per-classify cost from ~\$0.004 → ~\$0.02 is a non-issue for the verdict quality gain. Prompt caching (#315) keeps repeated classifies on the same tree cheap.

## Acceptance

- [ ] `DIFF_CAP` raised to 200,000 in both `anthropic.ts` and `claude-cli.ts`
- [ ] `DIGEST_BUDGET_BYTES` raised to 100,000 in `tree-digest.ts`
- [ ] Noise filter extracted to shared module and applied in both classifiers before truncation
- [ ] Existing `gardener-*-classifier.test.ts` still pass, updated fixtures if needed
- [ ] One new test: lockfile-heavy PR produces non-empty real-code diff in prompt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[gardener/classifiers] raise DIFF_CAP and filter noise — 20KB is 6% of Haiku's window #338

Problem

Proposal

1. Raise caps

2. Filter noise before truncating

3. Single default — no flag

Cost implication

Acceptance

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Constant	Current	New	Share of 200K window
`DIFF_CAP` (both classifiers)	20,000	200,000	~25%
`DIGEST_BUDGET_BYTES` (`tree-digest.ts:25`)	30,000	100,000	~12%
Combined with system + metadata	~55KB	~310KB	~38% of Haiku's 200K

[gardener/classifiers] raise DIFF_CAP and filter noise — 20KB is 6% of Haiku's window #338

Description

Problem

Proposal

1. Raise caps

2. Filter noise before truncating

3. Single default — no flag

Cost implication

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions