Skip to content

feat: Knowledge Runtime — Resolver SDK + BrainWriter + integrity + Budget + scheduler polish (v0.13.0)#210

Open
garrytan wants to merge 20 commits intomasterfrom
garrytan/knowledge-runtime
Open

feat: Knowledge Runtime — Resolver SDK + BrainWriter + integrity + Budget + scheduler polish (v0.13.0)#210
garrytan wants to merge 20 commits intomasterfrom
garrytan/knowledge-runtime

Conversation

@garrytan
Copy link
Copy Markdown
Owner

Summary

GBrain v0.13.0 ships the Knowledge Runtime — typed abstractions that turn a knowledge base into a runtime other agents can voluntarily adopt. Five focused modules build on v0.12's graph layer and v0.11's Minions orchestration, grouped into six logical PRs landed as bisectable commits.

Resolver SDK (PR 1): Resolver<I,O> typed interface + in-memory registry + 2 reference builtins (url_reachable with SSRF guard, x_handle_to_tweet with confidence-scored X API v2). FailImproveLoop gained optional AbortSignal (backwards compatible). gbrain resolvers list|describe CLI.

BrainWriter + validators (PR 2): BrainWriter.transaction(fn, ctx) over engine.transaction with pre-commit validators. Scaffolder builds citations from API IDs (never LLM text). SlugRegistry detects collisions. Four deterministic validators (citation/link/back-link/triple-hr). v0.13.0 TS migration grandfathers validate: false onto existing pages.

gbrain integrity (PR 3): user-facing shipping milestone. gbrain integrity check|auto|review|reset-progress with three-bucket auto-repair (≥0.8 / 0.5-0.8 / <0.5), resumable progress file, review queue, skip log.

BudgetLedger + CompletenessScorer (PR 4): FOR-UPDATE-serialized daily spend cap with TTL auto-reclaim + IANA-TZ midnight rollover. Seven per-type completeness rubrics + default, non_redundancy + recency_score kill the Wintermute length-heuristic pathology. Schema v11.

Scheduler polish (PR 5): claim-time quiet-hours gate (wrap-around windows, IANA tz), deterministic FNV stagger offset. Schema v12.

Post-write lint hook (PR 2.5): lint-only validator hook on put_page, gated on writer.lint_on_put_page (default false). Observability for the strict-mode flip gate.

Test Coverage

AI-assessed coverage: 72% (above 60% minimum, short of 80% target). Detailed per-PR breakdown:

PR Src LOC Tests Coverage
PR 1 Resolver SDK 952 43 ~85%
PR 2 BrainWriter 1191 57 ~78%
PR 2.5 post-write lint 157 11 ~70%
PR 3 gbrain integrity 627 21 ~35% — CLI command-layer untested
PR 4 Budget + Completeness 590 23 ~75%
PR 5 quiet-hours + stagger 193 25 ~70%
v0.13.0 migration 299 3 ~20% — orchestrator phases untested inline

Tests: 1522 → 1626 (+104 new).

Coverage gaps flagged as accepted risk. Top untested paths: integrity auto three-bucket repair integration, v0.13.0 grandfather orchestrator phases, fail-improve AbortSignal threading, resolvers CLI, worker quiet-hours defer SQL. E2E + manual smoke validates the shipping paths.

Pre-Landing Review

Four P1 bugs caught by codex adversarial review, all fixed in bisectable commits:

  1. PR 5 quiet-hours was dead code (53d4414). Schema v12 added quiet_hours + stagger_key columns. Worker read them. But MinionJobInput never accepted them, queue.add never inserted them, rowToMinionJob never mapped them. Every scheduled job saw quiet_hours: null. Wired the full path.
  2. Quiet-hours skip stranded parents (7083f01). Direct UPDATE status='cancelled' bypassed MinionQueue.cancelJob — parent jobs in waiting-children never got rolled up. Now routes through cancelJob for proper dependency resolution.
  3. BudgetLedger cap bypassable at commit (8e90e39). reserve({estimateUsd:0.01}) + commit(id, 100) silently charged $100 to a $1 cap. commit now locks the ledger row FOR UPDATE, re-checks effective headroom, clamps + throws on overage. Negative actuals rejected (no side-channel refunds).
  4. integrity auto --dry-run poisoned resume state (2fad71d). Dry-run wrote status=repaired to the progress file; the follow-on real run would skip those slugs. Progress writes gated on !dryRun.

Codex also flagged (not blocking, documented for follow-on):

  • url_reachable SSRF guard is hostname-only — DNS rebinding attack possible. Shares this limitation with existing wave-3 isInternalUrl.
  • x_handle_to_tweet 429 handling only parses numeric Retry-After, ignores x-rate-limit-reset header.
  • BrainWriter slug creation has TOCTOU: concurrent creates for people/alice from separate processes can both choose the same slug. Single-writer lock in engine.transaction mitigates within one process.
  • Citation validator accepts [Source:] with empty content — decoration check, not evidence check.
  • auto-link reconciliation union-of-writes race on concurrent put_page for same slug.

Adversarial Review

  • Claude adversarial subagent: findings merged into pre-landing review above.
  • Codex adversarial challenge: 7 P1 + 1 P2 found, 4 P1s fixed inline (commits 53d4414, 7083f01, 8e90e39, 2fad71d), 5 residual documented above.

No remaining P0 or P1 blockers for v0.13.0 ship.

Plan Completion

From ~/.gstack/projects/garrytan-gbrain/ceo-plans/2026-04-18-knowledge-runtime-v2.md:

  • PR 1 Resolver SDK — all deliverables shipped
  • PR 2 BrainWriter + validators — all four validators + grandfather migration
  • PR 2.5 post-write lint hook — scope narrowed per plan's staged-rollout language
  • PR 3 gbrain integrity — three-bucket repair, review queue, progress file
  • PR 4 BudgetLedger + CompletenessScorer — 7 core rubrics + default, FOR UPDATE, TTL reclaim
  • PR 5 quiet-hours + stagger + claim-time gate — claim-time enforcement, FNV hash stagger

Intentionally deferred (per plan): strict-mode default flip (requires 7-day soak), openai_embedding refactor (PR 1.5 post-flip), brain_slug_lookup adapter, Wintermute claw-bridge (post-release stretch), sandboxed user TS plugins (embedded-only v1), multi-tenant BudgetLedger (team mode).

Verification Results

No dev server running + no UI scope — plan-verification auto-skipped. CLI + backend paths verified via 115 E2E tests (10 files) against real Postgres including Tier 2 skills (Opus/Sonnet agent loops).

Test plan

  • Unit tests pass — 1522 pass, 0 fail, 161 skip (E2E skipped without DB)
  • Full unit + E2E combined — 1626 pass, 0 fail
  • Tier 1 E2E (mechanical, sync, upgrade, minions concurrency + resilience, graph-quality, MCP, migration-flow, search-quality): 112 pass
  • Tier 2 skills E2E (Opus + Sonnet real-API): 3 pass
  • BrainBench v1 regression: no regression on 240-page rich-prose corpus (verified pre-merge)

Schema migrations (automatic on gbrain init / upgrade)

  • v11budget_ledger + budget_reservations tables. Rollback: DROP TABLE (budget regenerable from resolver call logs).
  • v12minion_jobs.quiet_hours JSONB + stagger_key TEXT + partial index on stagger_key. Additive nullable columns; existing rows claim unchanged.
  • TS v0.13.0 — grandfathers validate: false onto existing pages. Idempotent. Rollback log at ~/.gbrain/migrations/v0_13_0-rollback.jsonl.

🤖 Generated with Claude Code

garrytan and others added 20 commits April 18, 2026 23:09
…educed-scope delta

Captures the Knowledge Runtime design thinking from the CEO review session:
Resolver SDK, Enrichment Orchestrator, Scheduler, Deterministic Output Builder.

The original 7-phase plan was drafted before v0.12.0 (knowledge graph layer)
and v0.11.x (Minions agent runtime) shipped. Cross-referenced against what's
already merged on master, roughly 60% of the 4-layer vision is already in
production under different names:

  - Minions = scheduler + plugin contract (L1 + L3)
  - Knowledge graph auto-link = deterministic output at L4 + orchestrator at L2
  - BrainBench v1 benchmarks already validate the graph layer

The doc is kept as a draft design reference; the actual build-out will scope
down to the real delta (typed Resolver interface, BrainWriter API + validators,
BudgetLedger, CompletenessScorer, quiet-hours + stagger). See the CEO review
notes for the reduced plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the typed plugin interface that unifies external-lookup calls (X API,
Perplexity, HEAD check, brain-local slug resolution) behind a single shape:

    registry.resolve('x_handle_to_tweet', { handle, keywords }, ctx)
      → { value, confidence, source, fetchedAt, raw? }

Zero behavior change — the registry is empty by default. Builtins
(url_reachable, x_handle_to_tweet) land in the next pass. ScheduledResolver
wrapping via Minions lands in PR 5.

New files:
- src/core/resolvers/interface.ts — Resolver<I,O>, ResolverResult<O>,
  ResolverContext (engine, storage, config, logger, requestId, remote,
  deadline, signal), ResolverError (not_found, already_registered,
  unavailable, timeout, rate_limited, auth, schema, aborted, upstream)
- src/core/resolvers/registry.ts — ResolverRegistry (register/get/has/
  list/resolve/clear/size) + getDefaultRegistry() for process-wide use
- src/core/resolvers/index.ts — barrel export

Design rules enforced by types:
- Every result carries confidence (0.0-1.0) + source attribution
- LLM-backed resolvers return confidence<1.0 by convention
- ctx.remote propagates the trust boundary (mirrors OperationContext.remote)
- AbortSignal threads through for cooperative cancellation

Smoke: imports + runs, list()/get()/resolve() behave as typed.
Dependency-free beyond types and storage/engine type imports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends FailImproveLoop.execute with an optional `opts.signal` that threads
through the deterministic-first / LLM-fallback flow. Needed by the Resolver
SDK so long-running lookups can be cooperatively cancelled when a caller
aborts (deadline hit, Minion job timeout, user ctrl-c).

Additive and backwards-compatible:
- execute() signature widens callbacks to (input, signal?) => ...; existing
  two-arg callbacks are structurally compatible and ignore the extra arg.
- opts is optional; callers that omit it get pre-extension behavior.
- Aborts throw a DOM-style AbortError (name='AbortError'), matching what
  fetch() throws, so downstream `err.name === 'AbortError'` branches work
  unchanged.
- Aborted runs are NOT logged to the failure JSONL — not informative and
  would pollute pattern analysis.

Abort check fires in three places:
- Before the deterministic call (pre-flight)
- Between deterministic miss and LLM call (mid-flight)
- Inside llmFallbackFn if the implementation respects signal itself

Smoke tests: 5 scenarios (existing sig, llm fallback, pre-abort, mid-flight
abort, signal threaded to fallback) — all pass. Existing test/fail-improve.test.ts
(13 tests, 27 expects) unchanged and passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two reference resolver implementations that validate the interface against
real-world requirements: a deterministic free-cost check and a rate-limited
paid-backend lookup.

src/core/resolvers/builtin/url-reachable.ts
  HEAD-check a URL, follow redirects (max 5), detect dead links. Reused
  isInternalUrl() from the wave-3 SSRF hardening; re-validates every redirect
  hop against the same filter. Falls back from HEAD to GET on 405/501.
  Composes caller's AbortSignal with a per-request timeout via
  AbortSignal.any (with manual-propagation fallback). Confidence=1 when the
  backend answers; confidence=0 only on transport failure (DNS/connect/timeout).

src/core/resolvers/builtin/x-api/handle-to-tweet.ts
  Find a tweet by handle + free-text keyword hint. Used by the upcoming
  `gbrain integrity --auto` loop to repair the 1,424 bare-tweet citations
  in Garry's brain. Confidence buckets align with the three-bucket contract:
    - >=0.8 auto-repair (single strong match, or dominant in small candidate set)
    - 0.5-0.8 review queue (ambiguous but promising)
    - <0.5 skip (many candidates or weak match)
  Scoring: normalized keyword-token overlap against tweet text, with margin
  boost for dominant matches. Strict handle regex (X's username rules).
  Retries on 429 up to 2x with Retry-After honor. Terminal 401/403 surfaces
  as auth ResolverError so the caller stops hammering. Bearer token read
  from ctx.config.x_api_bearer_token or X_API_BEARER_TOKEN env — never logged.

Smoke: registry accepts both, SSRF blocks localhost + file://, available()
returns false when token missing, schema validator rejects bad handles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mplete)

Closes out PR 1. 43 new tests in test/resolvers.test.ts covering registry
contract, both reference builtins, all three confidence buckets, and every
ResolverError subcode.

test/resolvers.test.ts
  - ResolverRegistry: register, duplicate-id rejection, get/has, list with
    cost+backend filters, resolve, unavailable propagation, clear, default
    singleton lifecycle.
  - url_reachable: available(), SSRF guard on localhost + RFC1918 + 169.254
    metadata + file:// scheme, empty-url schema error, 200/404 status
    propagation, HEAD→GET fallback on 405, redirect chain, per-hop SSRF
    re-validation, network failure → reachable=false, AbortSignal mid-flight.
  - x_handle_to_tweet: token gate via env AND via ctx.config, invalid/long
    handle schema errors, zero-candidate + single-strong + single-weak +
    many-ambiguous confidence buckets (gates >=0.5 url emission), 401/403
    auth error, 500 upstream error, 429 retry-then-rate_limited, X operator
    stripping (prompt injection defense).

src/commands/resolvers.ts
  - `gbrain resolvers list [--cost | --backend | --json]` pretty table
    or JSON.
  - `gbrain resolvers describe <id>` schema + availability detail.
  - registerBuiltinResolvers() is idempotent; ready to be called from
    future entry points (gbrain integrity, MCP server).

src/cli.ts wires `resolvers` into CLI_ONLY + dispatches to runResolvers.

Full suite: 1343 pass / 0 fail / 141 skip (E2E without DATABASE_URL).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the transactional writer library that the rest of the Knowledge
Runtime sits on top of. No callers routed through it yet — publish.ts /
backlinks.ts / put_page migrations are pass 4 and PR 2.5.

src/core/output/scaffold.ts
  Deterministic URL / citation / link builders. Callers pass typed inputs
  (handle + tweetId, account + messageId, slug + display text) and get
  canonical markdown bytes out. LLM-generated URLs never touch disk.
  - tweetCitation({handle, tweetId, dateISO?})
  - emailCitation({account, messageId, subject, dateISO?})
  - sourceCitation(resolverResult, {url?, label?})
  - entityLink({slug, displayText, relativePrefix?})
  - timelineLine({dateISO, summary, citation?})
  ScaffoldError with codes for invalid_handle / invalid_tweet_id /
  invalid_slug / invalid_message_id / invalid_date / empty.

src/core/output/slug-registry.ts
  Solves the "Marc Benioff vs Marc-Benioff both slug to marc-benioff" bug.
  create() probes engine.getPage and either returns the desired slug or
  disambiguates (alice-smith → alice-smith-2). isFree() + suggestDisambiguators()
  for interactive UX. Errors: collision, disambiguator_exhausted, invalid_slug.

src/core/output/writer.ts
  BrainWriter.transaction(fn, ctx) wraps engine.transaction. The `fn`
  callback receives a WriteTx with createEntity / appendTimeline /
  setCompiledTruth / setFrontmatterField / putRawData / addLink (the last
  creates both forward + reverse back-link atomically). On commit, per-page
  validators run against all touchedSlugs. Strict mode throws on
  error-severity findings, rolling back the outer tx. Lint mode (default for
  PR 2 rollout) returns the report but commits regardless. Pages with
  `validate: false` frontmatter skip validators entirely (grandfather hook
  for PR 2 migration).

Integration smoke against PGLite: createEntity → disambiguator (2nd call
with same desired slug), addLink writes both forward + back-link,
strict-mode validator failure rolls back the transaction bit-identically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the validator suite that BrainWriter runs before committing a
transaction. Paragraph-level deterministic checks, markdown-aware, skip
legacy pages via validate:false frontmatter.

src/core/output/validators/citation.ts
  Every factual paragraph in compiled_truth carries at least one citation
  marker: [Source: ...] or a linked URL. Splits paragraphs on blank lines,
  strips fenced code / inline code / HTML comments before checking.
  Ignores headings, key-value lines ("**Status:** Active"), table rows,
  pure wikilink bullets (## See Also), and short labels without a factual
  verb. Deterministic — no LLM, no semantic judgment.

src/core/output/validators/link.ts
  Every [text](path) wikilink resolves to a page that exists (unless it's
  an external http(s) URL, which this validator doesn't check; that's
  url_reachable's job in PR 3). Strips relative prefix and .md extension.
  Batches engine.getPage lookups per unique target. mailto/anchor/other
  schemes flagged as warning. Links inside fenced code blocks are skipped.

src/core/output/validators/back-link.ts
  Iron Law: if page X → page Y, then Y → X. Reads engine.getLinks(ctx.slug),
  and for each target checks engine.getLinks(target) for a reverse edge.
  Missing reverses flagged as warning (runAutoLink is the authoritative
  enforcer on put_page; this is defense-in-depth for pages edited outside
  the main write path).

src/core/output/validators/triple-hr.ts
  Catches hygiene issues on the compiled_truth / timeline split: bare `---`
  in compiled_truth would re-split on round-trip through parseMarkdown;
  headings in the timeline section signal authoring mistakes. Both warn
  (not error) — legacy pages legitimately use thematic breaks.

src/core/output/validators/index.ts
  registerBuiltinValidators(writer) wires all four.

test/writer.test.ts
  57 tests: Scaffolder (all 5 helpers + error paths), SlugRegistry (create,
  disambiguator, collision throw, invalid-slug, isFree, suggestDisambiguators),
  BrainWriter (happy path, disambiguate, addLink + reverse, strict rollback,
  lint proceeds with report, off skips validators, validate:false grandfather,
  setCompiledTruth, setFrontmatterField merge, registered validators list),
  citation validator (all 11 shape cases), link validator (normalizeToSlug
  including ../../, external URL skip, mailto warning, code-fence skip),
  back-link validator (no outbound, missing reverse → warning, bidirectional
  clean), triple-hr validator (clean, bare --- warning, fenced --- skipped,
  heading in timeline warning, ## Timeline header allowed).

Full suite: 1400 pass / 0 fail / 141 skip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the TS migration that makes BrainWriter's strict-mode rollout safe:
every existing page gets `validate: false` in frontmatter so the new
citation / link / back-link / triple-HR validators skip legacy content.
gbrain integrity --auto (PR 3) clears the flag per-page once real citations
are repaired.

src/commands/migrations/v0_13_0_add_validate_false.ts
  Four-phase orchestrator following the v0_12_0 pattern:
    A. connect   — loadConfig + createEngine. Does NOT write config (prior
                   learning: gbrain init --migrate-only semantics; never
                   flip Postgres users to PGLite via bare init).
    B. snapshot  — engine.getAllSlugs() upfront (prior learning:
                   listpages-pagination-mutation; OFFSET iteration is
                   self-invalidating when each write bumps updated_at).
    C. grandfather — per slug, skip if frontmatter.validate already set,
                   else append-log pre-mutation snapshot to
                   ~/.gbrain/migrations/v0_13_0-rollback.jsonl and
                   putPage with validate:false merged in. Batched 100
                   at a time so interruption losses are bounded.
    D. verify    — SQL count of pages with validate=false ≥ expectedTouched.
  Idempotent: second run is a no-op. Reversible: rollback log is
  append-only JSONL; future `gbrain apply-migrations --rollback v0.13.0`
  replays it. Safe on empty brains (returns complete with 0 touched).

src/commands/migrations/index.ts
  Registers v0_13_0 after v0_12_0 in semver order.

test/migrations-v0_13_0.test.ts
  Registry integration (v0.13.0 present, semver-after-v0.12.0, pitch
  metadata well-formed), orchestrator handles no-config gracefully,
  dryRun skips the connect phase.

test/apply-migrations.test.ts
  Updated two assertions that hard-coded the v0.12.0 skippedFuture list
  to also include v0.13.0 (now skippedFuture when installed < 0.13.0).

Full suite: 1405 pass / 0 fail / 141 skip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n (PR 3)

Ships the user-visible milestone for the Knowledge Runtime delta: a
command that finds brain-integrity issues and repairs them through the
BrainWriter + Resolver SDK infrastructure from PRs 1 and 2.

Targets the two quantified pain points from brain/CITATIONS.md:
  - 1,424 of 3,115 people pages have bare tweet references without URLs
  - An unknown fraction of existing URL citations have rotted

Subcommands:
  gbrain integrity check                 Read-only report, optional --json
  gbrain integrity auto                  Three-bucket repair loop
  gbrain integrity review                Print review-queue path + count
  gbrain integrity reset-progress        Clear the progress file

Three-bucket contract (matches x_handle_to_tweet resolver's confidence
scoring):
  >=0.8 → auto-repair via BrainWriter transaction. Appends a timeline
          entry on the page with a Scaffolder-built tweet citation (URL
          from the API response, never from LLM text).
  0.5-0.8 → append to ~/.gbrain/integrity-review.md with all candidates
            sorted by match score, for batch human review.
  <0.5 → log reason to ~/.gbrain/integrity.log.jsonl and skip.

Resumable: every processed slug hits ~/.gbrain/integrity-progress.jsonl
so an interrupted run resumes from the last slug. --fresh clears it.

Bare-tweet detection patterns (regex, deterministic, skip code fences
and already-cited lines):
  - "tweeted about"
  - "in/on a (recent|viral) tweet"
  - "wrote a tweet/post"
  - "posted on X"
  - "via X" (but not "via X/handle" — already cited)
  - possessive "his/her/their tweet"

External-link detection extracts all [text](https?://...) pairs (code
fences skipped) for optional dead-link probing via url_reachable.

Dead links are surfaced, not auto-repaired — no "correct" replacement
exists without human judgment.

Wiring: runIntegrity dispatches subcommands, registers builtin resolvers
into the default registry, connects to the brain engine, and uses
BrainWriter in strict-off mode (integrity is the repair path, not the
write-gate path).

Unit tests: 21 cover bare-tweet regex (all 9 phrase shapes + code-fence
skip + URL-already-present skip + per-line dedup), external-link
extraction (http+https, line numbers, fenced skip), frontmatter handle
extraction (x_handle, twitter, twitter_handle, x; preference order;
leading @ strip; null paths). End-to-end auto flow verified manually
via the resolver SDK tests + BrainWriter tests it composes.

src/cli.ts wires `integrity` into CLI_ONLY + dispatches to runIntegrity.

Full suite: 1426 pass / 0 fail / 141 skip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two layer-2 primitives that slot under the resolver SDK and BrainWriter:
cost-aware spend caps and evidence-weighted per-page completeness scoring.

Schema migration v11 adds two tables:
  budget_ledger (scope, resolver_id, local_date) PK — midnight rollover by
    date column means a new calendar day upserts a new row; no rollover
    thread, no race.
  budget_reservations (reservation_id) — TTL-bounded held reservations
    (default 60s) so process death between reserve() and commit() doesn't
    strand money.

Rollback plan: DROP TABLE. Budget data is regenerable from resolver call
logs; no durable product value lives in the ledger.

src/core/enrichment/budget.ts
  BudgetLedger.reserve({resolverId, estimateUsd, capUsd?, ttlSeconds?})
  serializes concurrent reserves on {scope, resolver_id, local_date} via
  SELECT ... FOR UPDATE. Returns {kind:'held', reservationId, ...} or
  {kind:'exhausted', reason, spent, pending, cap} — never over-spends.

  commit(id, actualUsd) moves money from reserved_usd to committed_usd and
  marks the reservation status='committed'. rollback(id) zeros out the
  reservation without touching committed. Commit-after-commit throws
  already_finalized; rollback-after-commit is a no-op (callers don't need
  to guard). commit-unknown-id throws reservation_not_found.

  cleanupExpired() sweeps held reservations past expires_at and rolls them
  back; reserve() opportunistically reclaims the target row's expired
  reservations before acquiring its own lock.

  IANA timezone config via opts.tz (default America/Los_Angeles); midnight
  rollover is naturally expressed as a date column + Intl.DateTimeFormat
  with en-CA locale (YYYY-MM-DD). DST is handled by the formatter.

src/core/enrichment/completeness.ts
  Seven per-type rubrics (person, company, project, deal, concept, source,
  media) + default. Each rubric's dimension weights sum to 1.0, checked at
  module load. scorePage(page) returns {score, dimensionScores, rubric}
  where score is 0.000–1.000.

  Person rubric dimensions: has_role_and_company, has_source_urls,
  has_timeline_entries, has_citations, has_backlinks, recency_score,
  non_redundancy. The last two are the explicit fix for the two pathologies
  called out in the codex review of the earlier design: stale pages that
  never decay (30-day re-enrich forever) and Wilco-style repeated blocks
  that pass Wintermute's length heuristic.

  Pure functions. No engine calls — BrainWriter invokes scorePage after a
  transaction and caches the result in frontmatter.completeness.

test/enrichment.test.ts — 23 tests:
  BudgetLedger: under-cap held, over-cap exhausted, commit moves money,
  rollback clears, commit-rollback no-op, commit-commit throws, commit-
  unknown throws, invalid input, empty state null, scope isolation,
  parallel reserves respect cap (10 parallel, cap 1.0, est 0.3 each →
  ≤ 3 held; state.reservedUsd ≤ 1.0), cleanupExpired reclaims TTL=0.

  CompletenessScorer: all 8 rubrics sum to 1.0, empty person scores <0.3,
  fully-enriched person >0.8, dimension scores exposed, role detection,
  company/concept/source/media/default routing, recency decay with age,
  non_redundancy penalizes repeated lines.

Full suite: 1449 pass / 0 fail / 141 skip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the scheduler gap per CEO plan: Minions v7 shipped a durable
runtime but nothing about when jobs should NOT run. This wires
quiet-hours enforcement at claim time (the codex correction — dispatch-
time is wrong because a queued job can become claimable after its window
opens) plus deterministic stagger slots to prevent cron-boundary storms.

Schema migration v12 adds two columns to minion_jobs:
  quiet_hours JSONB    — {start, end, tz, policy} window config
  stagger_key TEXT     — partitioning key for deterministic offset
Plus a partial index on stagger_key for later slot-assignment queries.

src/core/minions/quiet-hours.ts
  evaluateQuietHours(cfg, now?) → 'allow' | 'skip' | 'defer'. Pure,
  deterministic, no engine. Handles straight-line and wrap-around windows
  (e.g. 22→7 spans midnight). IANA timezone via Intl.DateTimeFormat;
  unknown tz fails open (allow) — safer than hard-blocking every job.
  'skip' policy drops the event; 'defer' (default) re-queues for later.

src/core/minions/stagger.ts
  staggerMinuteOffset(key) → 0–59, FNV-1a hash. Same key → same slot.
  Pure; no module-level state. Used by scheduled resolvers that want to
  avoid cron-boundary collisions ("10 jobs all fire at minute 0").

src/core/minions/worker.ts
  MinionWorker.tick now consults evaluateQuietHours on every claimed job.
  Verdict 'defer' → UPDATE status='delayed', delay_until = now() + 15m
  (prevents immediate re-claim loops when the claim query re-runs).
  Verdict 'skip' → UPDATE status='cancelled', error_text='skipped_quiet_hours'.
  Both paths clear lock_token and require lock_token match in the WHERE
  clause so a concurrent stall recovery can't race us.

test/minions-quiet-hours.test.ts — 25 tests:
  evaluateQuietHours: null/undefined/invalid config paths (allow fail-open),
  straight-line in/out + exclusive-end, wrap-around in (before midnight +
  after), skip vs defer policy, timezone-offset propagation (winter PST
  vs summer PDT), localHour parity with Date.getUTCHours.
  staggerMinuteOffset: deterministic same key → same offset, different
  keys spread across buckets (10 keys → ≥5 unique buckets), empty/non-
  string edge cases.
  Schema v12: quiet_hours and stagger_key columns exist on minion_jobs,
  idx_minion_jobs_stagger_key index present.

Full suite: 1474 pass / 0 fail / 141 skip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Minimal integration of BrainWriter validators into the main write path,
feature-flag-gated and non-blocking. The CEO plan explicitly scoped PR 2.5
as a pre-soak landing step: the hook plugs in now, observability lands,
but strict-mode rejection is deferred to a follow-on release gated on the
7-day soak + BrainBench regression ≤1pt.

src/core/output/post-write.ts
  runPostWriteLint(engine, slug, opts?) invokes the four BrainWriter
  validators (citation, link, back-link, triple-hr) against a freshly
  written page and returns a PostWriteLintResult. Skips cleanly when:
    - config `writer.lint_on_put_page` is not truthy (default OFF; opts.force overrides)
    - the page is not found (shouldn't happen in normal put_page flow)
    - the page has frontmatter.validate === false (grandfathered)
  Findings are logged to:
    - ~/.gbrain/validator-lint.jsonl (capped at 20 findings per line)
    - engine.logIngest (ingest_log table) for durable agent-inspectable history
  Validator-level exceptions are swallowed so a buggy validator never
  breaks put_page.

src/core/operations.ts put_page handler
  After importFromContent + runAutoLink, imports runPostWriteLint and
  invokes it. Result returns writer_lint: {error_count, warning_count} or
  {skipped: reason}. Try/catch wraps the whole hook so an import or
  runtime error never blocks the main write.

Enable locally:
  gbrain config set writer.lint_on_put_page true
Then every put_page emits a writer_lint summary + appends structured
findings to the ingest log for analysis before the strict-mode flip.

test/post-write-lint.test.ts — 11 tests:
  Flag reader (default off, true/1/on, other values false, explicit false)
  Hook behavior (flag-off skip, page-not-found skip, validate:false
  grandfather skip, force=true overrides flag, dirty page yields citation
  error, clean page yields zero findings).

Full suite: 1485 pass / 0 fail / 141 skip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 'does not succeed when no brain is configured' test assumed loadConfig
would return null when HOME is empty, but it also reads DATABASE_URL from
the environment. When .env.testing sources DATABASE_URL into the shell
(normal E2E lifecycle), the orchestrator connects successfully and runs
to completion — the test's assertion was unreachable.

The dry-run path is still covered by the remaining test in the same
describe block; registry integration and semver ordering are covered by
the sibling describe.

Full suite with DATABASE_URL live: 1574 pass / 0 fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…runtime

# Conflicts:
#	src/cli.ts
#	src/commands/migrations/index.ts
#	test/apply-migrations.test.ts
…eue.add

Codex adversarial review caught that PR 5 (claim-time quiet-hours gate) was
cosmetic: the schema v12 column existed, the worker read it via
`readQuietHoursConfig(job)`, but `MinionJobInput` never accepted it,
`queue.add()` never inserted it, and `rowToMinionJob()` never mapped it out.
Result: every scheduled job saw `quiet_hours: null`, so the gate was a
no-op. Stagger_key had the same broken wiring.

- MinionJob (types.ts): add `quiet_hours` and `stagger_key` fields.
- MinionJobInput: add matching optional fields so callers can submit them.
- rowToMinionJob: parse both columns (JSONB handled the same way as `data`).
- MinionQueue.add: include both columns in the INSERT (idempotent + normal
  paths), bound as $19/$20. The `$19::jsonb` cast matches the JSONB column
  shape; the wire format is the same native-JS object path that fixed the
  JSONB double-encode bug in v0.12.1.

After this, `await queue.add('x', {}, { quiet_hours: {start:22,end:7,
tz:"America/Los_Angeles",policy:"defer"} })` actually stores the window
and the worker's claim-time gate defers the job inside it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rents

Codex flagged that handleQuietHoursDefer with verdict='skip' directly set
status='cancelled' via raw UPDATE — bypassing MinionQueue.cancelJob, which
means:
  - Parent jobs in 'waiting-children' never get rolled up.
  - Descendant jobs don't cascade-cancel.
  - Child-done inbox notification is skipped.

Result: a parent waiting on a child that fell inside quiet hours with
policy='skip' stays stuck forever.

Fix: release the lock, then delegate to queue.cancelJob(job.id) which
handles the recursive CTE + parent rollup + inbox posting correctly.
Falls back to a direct UPDATE only if cancelJob errors — even then, the
status transition is status-guarded to avoid stomping terminal states.

Defer path unchanged (no parent rollup needed since the job hasn't reached
a terminal state).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex caught two cap-bypass bugs in BudgetLedger.commit():

1. reserve({estimateUsd: 0.01, capUsd: 1.0}) + commit(id, 100) silently
   charged $100 to a $1-cap bucket. Cap is an advertised invariant that
   the code was not enforcing.

2. Negative actuals (commit(id, -5)) were accepted, letting callers
   artificially reduce committed_usd below the real spend. Refunds need
   a dedicated API, not a side-channel on commit.

Fix:
- Reject non-finite AND negative actualUsd at entrypoint.
- Lock the ledger row FOR UPDATE during commit (same serialization as
  reserve).
- Compute effective cap headroom = cap - other_committed - other_reserved
  (excluding this reservation from the reserved pool since we're about to
  finalize it).
- When actualUsd would exceed available, clamp committed_usd to max
  available and throw BudgetError with the overage reported. The
  reservation is still marked 'committed' (API call already happened;
  don't retry-loop), but the cap is honored.

After this, a $1/day cap actually means $1/day.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex caught that 'gbrain integrity auto --dry-run' appended progress
entries (status='repaired', 'reviewed', 'skipped', 'error') despite doing
no actual writes. The follow-on real run with default --resume would then
skip those slugs — the dry-run silently consumed the work queue.

Fix: gate every appendProgress() call in cmdAuto on !dryRun. Dry-run
still logs to the skip log / review queue (so the user sees what WOULD
happen), but the progress file stays untouched.

Behavior:
  --dry-run            → buckets counted + summary printed + review-queue
                         + log populated, but progress file unchanged.
  (default)            → progress file tracks every processed slug, so
                         Ctrl-C + re-run resumes from the right place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant