[Foundation F11] Aux-LLM trace-judge framework

A pluggable framework for using a cheap aux-LLM as a judge over agent traces. The same primitive feeds risk scoring (#619), leak-detection enrichment (#465), brand-voice scoring (#551), trajectory rating for fine-tuning (#553), and periodic agent performance reviews.

**Priority:** High Priority (P1) — foundational  |  **Manifesto:** Principles VII (audit + reversibility) and VIII (doesn't break overnight).

**Why foundation:** without this, every "LLM-as-judge" use case in the roadmap (risk, leak, brand, trajectory selection, performance review) reinvents the same primitive — model dispatch, trace windowing, redaction, eval harness for the judge itself. Centralizing it is a 2-3x leverage win.

Children:
- [ ] #620 — Judge interface + cheap-model dispatch
- [ ] #621 — Trace preparation (windowed, redacted, F4-versioned templates)
- [ ] #622 — Eval harness for the judge itself ("judge the judge")
- [ ] #623 — Subscriber pattern: "judge-on-trace" hook other features consume

**DoD:** any feature that needs an LLM rating of a trace plugs in with one subscriber registration; judge model + prompt are versioned; judge accuracy is measurable.

**Depends on:** F2 (#415) ✅, F4 (#417) ✅, F5 (#418) ✅.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Foundation F11] Aux-LLM trace-judge framework #628

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Foundation F11] Aux-LLM trace-judge framework #628

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions