feat: activate LLM judges for self-evolution engine by mcheemaa · Pull Request #6 · ghostwright/phantom

mcheemaa · 2026-03-31T04:06:45Z

Summary

The self-evolution LLM judges were built and tested (Phase 3.5) but never activated in production. The EvolutionEngine constructor defaulted useLLMJudges = false and enableLLMJudges() was never called anywhere in the 27K-line codebase. The heuristic regex fallback was running as the primary path, violating the Cardinal Rule.

This PR:

Auto-detects ANTHROPIC_API_KEY at construction time and enables Sonnet-powered judges when available
Removes dead API surface (enableLLMJudges() / disableLLMJudges()) that was never called and could cause inconsistent state
Upgrades memory consolidation to the LLM path for structured fact extraction and contradiction detection
Fixes Zod v3/v4 compatibility so judge schemas work with the Anthropic SDK's zodOutputFormat
Fixes model ID constants to use short aliases (claude-sonnet-4-6) instead of dated versions that returned 404
Adds cost controls ($50/day cap) and golden suite pruning (50-entry cap)

What the LLM judges do vs the heuristic path

Capability	Heuristic (before)	LLM Judges (after)
Observation extraction	19 regex patterns	Sonnet analyzes full session transcript
Constitution gate	File name matching	Triple Sonnet with minority veto
Safety gate	9 regex patterns	Triple Sonnet with minority veto
Regression gate	Keyword overlap	Cascaded Haiku to Sonnet
Quality assessment	Not running	Sonnet session scorer
Memory consolidation	Regex fact extraction	Sonnet structured fact extraction

Example: heuristic extracted "always use Rust for CLIs. That's what I prefer." (raw text dump)

Example: LLM judges extracted:

"User communicates casually and informally ('Hey man'), suggesting they prefer a conversational tone over formal responses."
"User appears to be a developer comfortable with multiple languages and CLI tooling concepts."

The LLM catches implicit signals (tone, expertise level) that regex cannot detect.

Safety verification

On cheema.ghostwright.dev, the triple-judge constitution gate correctly rejected an unsafe evolution change. When told "always use Postgres, never suggest anything else", the Sonnet judges analyzed this against the constitution's Honesty principle and rejected it because forcing a single recommendation in all cases would mean giving dishonest technical guidance.

The heuristic path would have blindly appended the raw text to user-profile.md.

Test plan

785 tests pass, 0 failures (15 new tests added)
Typecheck clean
Lint clean on changed files
Verified on cheema.ghostwright.dev (existing VM, Docker Hub mode)
Verified on cheem.ghostwright.dev (fresh VM, first boot E2E)
Constitution gate correctly rejects unsafe changes
Judges gracefully fall back to heuristics when API is unavailable
Zero-config migration (missing judges section defaults to auto)
Existing tests unaffected (no API key in test env = heuristic mode)

The self-evolution LLM judges were built and tested but never activated. The heuristic regex fallback was running as the primary path, which violates the Cardinal Rule (TypeScript doing reasoning work that should be delegated to the LLM). This change auto-detects ANTHROPIC_API_KEY at construction time and enables Sonnet-powered judges when available. The heuristic path remains as a fallback for environments without an API key. What changed: - EvolutionEngine constructor resolves judge mode at startup via config setting (auto/always/never) + API key detection - Removed enableLLMJudges() and disableLLMJudges() runtime toggles that were never called and could cause inconsistent state - Added judges config section to evolution.yaml with daily cost cap ($50/day safety net) and golden suite size cap (50 entries) - Upgraded memory consolidation to use LLM path when judges enabled, with existingFacts from evolved config for contradiction detection - Fixed Zod v3/v4 compatibility: judge schemas now import from zod/v4 to match the Anthropic SDK's zodOutputFormat expectations - Fixed model ID constants to use short aliases (claude-sonnet-4-6) instead of dated versions that returned 404 - Golden suite pruning enforces the 50-entry cap When judges are enabled, every session gets: - Sonnet observation extraction (catches implicit corrections, inferred preferences, sentiment signals that regex misses) - Triple-judge constitution and safety gates with minority veto - Cascaded Haiku-to-Sonnet regression gate - Session quality assessment - LLM-powered memory consolidation with structured fact extraction Verified on two production VMs: - cheema.ghostwright.dev: judges correctly rejected an unsafe evolution change ("never suggest anything else") based on constitutional analysis of the Honesty principle - cheem.ghostwright.dev (fresh VM): full E2E from zero to working judges in 90 seconds, extracted implicit signals like communication style preferences from casual conversation 785 tests pass, 0 failures. Typecheck clean. Lint clean.

Replace `delete process.env.X` with `process.env.X = undefined` to satisfy Biome's noDelete rule, and fix import ordering. These were pre-existing lint failures unrelated to the judge activation work.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4a2d3be560

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/index.ts

src/evolution/engine.ts

…ementally Addresses two review findings: 1. Memory consolidation now checks the daily cost cap before invoking the LLM judge, and tracks the returned cost toward the daily total. Added isWithinCostCap() and trackExternalJudgeCost() to the engine. 2. Cost tracking within afterSession() is now incremental. Each LLM stage updates the daily counter immediately, so later stages see prior costs and fall back to heuristics when the cap is reached.

mcheemaa added 2 commits March 30, 2026 21:06

fix: resolve pre-existing lint errors in dynamic-handlers test

2bca825

Replace `delete process.env.X` with `process.env.X = undefined` to satisfy Biome's noDelete rule, and fix import ordering. These were pre-existing lint failures unrelated to the judge activation work.

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

src/index.ts Outdated Show resolved Hide resolved

src/evolution/engine.ts Outdated Show resolved Hide resolved

mcheemaa merged commit 9d76ff0 into main Mar 31, 2026
1 check passed

mcheemaa mentioned this pull request Mar 31, 2026

[codex] Fix dynamic handler test env cleanup #5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: activate LLM judges for self-evolution engine#6

feat: activate LLM judges for self-evolution engine#6
mcheemaa merged 3 commits intomainfrom
feat/activate-evolution-judges

mcheemaa commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mcheemaa commented Mar 31, 2026

Summary

What the LLM judges do vs the heuristic path

Safety verification

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant