Skip to content

feat: activate LLM judges for self-evolution engine#6

Merged
mcheemaa merged 3 commits intomainfrom
feat/activate-evolution-judges
Mar 31, 2026
Merged

feat: activate LLM judges for self-evolution engine#6
mcheemaa merged 3 commits intomainfrom
feat/activate-evolution-judges

Conversation

@mcheemaa
Copy link
Copy Markdown
Member

Summary

The self-evolution LLM judges were built and tested (Phase 3.5) but never activated in production. The EvolutionEngine constructor defaulted useLLMJudges = false and enableLLMJudges() was never called anywhere in the 27K-line codebase. The heuristic regex fallback was running as the primary path, violating the Cardinal Rule.

This PR:

  • Auto-detects ANTHROPIC_API_KEY at construction time and enables Sonnet-powered judges when available
  • Removes dead API surface (enableLLMJudges() / disableLLMJudges()) that was never called and could cause inconsistent state
  • Upgrades memory consolidation to the LLM path for structured fact extraction and contradiction detection
  • Fixes Zod v3/v4 compatibility so judge schemas work with the Anthropic SDK's zodOutputFormat
  • Fixes model ID constants to use short aliases (claude-sonnet-4-6) instead of dated versions that returned 404
  • Adds cost controls ($50/day cap) and golden suite pruning (50-entry cap)

What the LLM judges do vs the heuristic path

Capability Heuristic (before) LLM Judges (after)
Observation extraction 19 regex patterns Sonnet analyzes full session transcript
Constitution gate File name matching Triple Sonnet with minority veto
Safety gate 9 regex patterns Triple Sonnet with minority veto
Regression gate Keyword overlap Cascaded Haiku to Sonnet
Quality assessment Not running Sonnet session scorer
Memory consolidation Regex fact extraction Sonnet structured fact extraction

Example: heuristic extracted "always use Rust for CLIs. That's what I prefer." (raw text dump)

Example: LLM judges extracted:

  • "User communicates casually and informally ('Hey man'), suggesting they prefer a conversational tone over formal responses."
  • "User appears to be a developer comfortable with multiple languages and CLI tooling concepts."

The LLM catches implicit signals (tone, expertise level) that regex cannot detect.

Safety verification

On cheema.ghostwright.dev, the triple-judge constitution gate correctly rejected an unsafe evolution change. When told "always use Postgres, never suggest anything else", the Sonnet judges analyzed this against the constitution's Honesty principle and rejected it because forcing a single recommendation in all cases would mean giving dishonest technical guidance.

The heuristic path would have blindly appended the raw text to user-profile.md.

Test plan

  • 785 tests pass, 0 failures (15 new tests added)
  • Typecheck clean
  • Lint clean on changed files
  • Verified on cheema.ghostwright.dev (existing VM, Docker Hub mode)
  • Verified on cheem.ghostwright.dev (fresh VM, first boot E2E)
  • Constitution gate correctly rejects unsafe changes
  • Judges gracefully fall back to heuristics when API is unavailable
  • Zero-config migration (missing judges section defaults to auto)
  • Existing tests unaffected (no API key in test env = heuristic mode)

The self-evolution LLM judges were built and tested but never activated.
The heuristic regex fallback was running as the primary path, which
violates the Cardinal Rule (TypeScript doing reasoning work that should
be delegated to the LLM).

This change auto-detects ANTHROPIC_API_KEY at construction time and
enables Sonnet-powered judges when available. The heuristic path
remains as a fallback for environments without an API key.

What changed:
- EvolutionEngine constructor resolves judge mode at startup via
  config setting (auto/always/never) + API key detection
- Removed enableLLMJudges() and disableLLMJudges() runtime toggles
  that were never called and could cause inconsistent state
- Added judges config section to evolution.yaml with daily cost cap
  ($50/day safety net) and golden suite size cap (50 entries)
- Upgraded memory consolidation to use LLM path when judges enabled,
  with existingFacts from evolved config for contradiction detection
- Fixed Zod v3/v4 compatibility: judge schemas now import from
  zod/v4 to match the Anthropic SDK's zodOutputFormat expectations
- Fixed model ID constants to use short aliases (claude-sonnet-4-6)
  instead of dated versions that returned 404
- Golden suite pruning enforces the 50-entry cap

When judges are enabled, every session gets:
- Sonnet observation extraction (catches implicit corrections, inferred
  preferences, sentiment signals that regex misses)
- Triple-judge constitution and safety gates with minority veto
- Cascaded Haiku-to-Sonnet regression gate
- Session quality assessment
- LLM-powered memory consolidation with structured fact extraction

Verified on two production VMs:
- cheema.ghostwright.dev: judges correctly rejected an unsafe evolution
  change ("never suggest anything else") based on constitutional
  analysis of the Honesty principle
- cheem.ghostwright.dev (fresh VM): full E2E from zero to working
  judges in 90 seconds, extracted implicit signals like communication
  style preferences from casual conversation

785 tests pass, 0 failures. Typecheck clean. Lint clean.
Replace `delete process.env.X` with `process.env.X = undefined` to
satisfy Biome's noDelete rule, and fix import ordering. These were
pre-existing lint failures unrelated to the judge activation work.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4a2d3be560

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ementally

Addresses two review findings:

1. Memory consolidation now checks the daily cost cap before invoking
   the LLM judge, and tracks the returned cost toward the daily total.
   Added isWithinCostCap() and trackExternalJudgeCost() to the engine.

2. Cost tracking within afterSession() is now incremental. Each LLM
   stage updates the daily counter immediately, so later stages see
   prior costs and fall back to heuristics when the cap is reached.
@mcheemaa mcheemaa merged commit 9d76ff0 into main Mar 31, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant