Skip to content

[codex] Add foundation/spec compiler scaffolding and eval runner#1

Merged
jeevanpillay merged 30 commits intomainfrom
codex/foundation-spec-evals
Apr 23, 2026
Merged

[codex] Add foundation/spec compiler scaffolding and eval runner#1
jeevanpillay merged 30 commits intomainfrom
codex/foundation-spec-evals

Conversation

@jeevanpillay
Copy link
Copy Markdown
Member

@jeevanpillay jeevanpillay commented Apr 20, 2026

What Changed

This PR adds the first foundation/spec compiler and eval system for the skills repo.

  • Adds foundation-creator as a separate skill lane for thesis, primitive, and company/product foundation documents.
  • Adds BAML compiler contracts for both foundation-creator and spec-creator.
  • Adds fixture-driven eval packets for foundation creation, spec creation, update-mode edits, ambiguous founder notes, and non-dev-domain care coordination.
  • Migrates local eval execution to Bun with .env loading and Vercel AI Gateway via AI_GATEWAY_API_KEY.
  • Adds a modular TypeScript eval runner with local JSON artifacts as the source of truth.
  • Adds deterministic validators for template shape, update preservation, open questions, taxonomy metadata, and source-bound document constraints.
  • Adds Braintrust-ready reporting while keeping local artifacts canonical.
  • Adds CI smoke coverage for static checks plus live foundation/spec smoke evals.
  • Upgrades CI/runtime targeting to Node 24 via .node-version.

Why

The goal is to make skill quality measurable before these skills are used broadly:

  • Separate strategic foundation-doc generation from concrete behavioral SPEC.md generation.
  • Turn messy notes into typed intermediate briefs before rendering final documents.
  • Catch prompt drift with deterministic checks instead of relying only on LLM judges.
  • Preserve ambiguity explicitly instead of inventing product or implementation decisions.
  • Establish a reusable eval runner shape that can later support more skills and projects.

Validation

Latest local validation:

  • bun run ci:check
  • bun run eval:spec -- create-from-lightfast-founder-notes-packet
  • bun run eval:spec:smoke passed 4/4

Latest GitHub Actions validation:

  • Evals / Typecheck and smoke evals passed on run 24827490425 in 6m21s

Notes:

  • The PR is intentionally not squashed; commit history reflects the iterative prompt/eval tuning process.
  • The remaining Node 20 GitHub Actions annotation is informational: repository CI forces Node 24 for the affected actions.

@jeevanpillay jeevanpillay marked this pull request as ready for review April 23, 2026 10:14
@jeevanpillay jeevanpillay merged commit 0a10e79 into main Apr 23, 2026
1 check passed
@jeevanpillay jeevanpillay deleted the codex/foundation-spec-evals branch April 23, 2026 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant