Skip to content

feat(agents): Phase 5 Oracle Activation v1 — assembly line + provenance receipts#659

Merged
wileland merged 1 commit intodevelopfrom
phase5/oracle-activation-v1-impl
Mar 3, 2026
Merged

feat(agents): Phase 5 Oracle Activation v1 — assembly line + provenance receipts#659
wileland merged 1 commit intodevelopfrom
phase5/oracle-activation-v1-impl

Conversation

@wileland
Copy link
Owner

@wileland wileland commented Mar 3, 2026

Summary

  • Replace all 4 agent fn stubs (scribe, enrichment, reflection, archivist) with real implementations wired to Whisper, GPT-4o-mini, Atlas $vectorSearch, and MongoDB
  • Add 4-stage assembly line pipeline: extractStage → validateStage → compareStage → generateStage
  • Upgrade ReflectionOutput to z.discriminatedUnion on receiptStatus (Path A: receipts_found, Path B: no_receipts_available)
  • Content guard enforces provenance: historical claims must match receipt quote field or throw INVALID_OUTPUT
  • SHA-256 integrity check drops tampered chunks with sanitized warnings (no user content in logs)
  • userId tenancy enforced on every vector query and every MongoDB write

Hard Locks Respected

  • Embedding model: text-embedding-3-small (1536 dims)
  • Atlas index: default, cluster: ech0log-main, DB: AE, collection: memorychunks
  • Retrieval threshold: score >= 0.72, top-k: 3
  • OpenAI model: gpt-4o-mini, temperature: 0
  • Langfuse: .span() only, no .generation()
  • ReceiptSchema fields: quote, textHash, sourceEntryId (verbatim from schema)

Three Core Principles

  1. Provenance is Sacred — every AI insight traceable to actual user words
  2. Mirror not Oracle — reflect intent, never predict or diagnose
  3. Determinism Über Alles — same input → same output, always

Test plan

  • 425 passed, 0 failed, 5 skipped (31 new Phase 5 tests)
  • All 394 pre-existing tests still pass (0 regressions)
  • extractStage: mock embeddings, assert queryVector length = 1536, userId passthrough
  • validateStage: score >= 0.72 included, < 0.72 excluded, exactly 0.72 boundary, userId in filter
  • compareStage: hash match → receipt, hash mismatch → drop + WARNING (no user content)
  • generateStage: temperature=0, correct system prompt per receipt status
  • reflectionFn: Path A happy path, content guard throws on unsupported claims, Path B orbXp=0
  • archivistFn: MongoDB write with userId tenancy, archiveId is string
  • git diff --name-only shows ONLY files in repo scope (no sealed files touched)

🤖 Generated with Claude Code

…ce receipts (#659)

Replace all 4 agent fn stubs with real implementations:
- scribeFn: Whisper transcription + SHA-256 provenance anchor
- enrichmentFn: GPT-4o-mini extraction (Mirror not Oracle)
- reflectionFn: Assembly line (extract → validate → compare → generate)
- archivistFn: MongoDB write with full provenance trail

Add 4 pipeline stages (server/src/agents/pipeline/):
- extractStage: embed transcript via text-embedding-3-small
- validateStage: Atlas $vectorSearch with userId tenancy + score >= 0.72
- compareStage: SHA-256 integrity check, drops tampered chunks
- generateStage: GPT-4o-mini reflection grounded in receipt chunks

Upgrade ReflectionOutput to z.discriminatedUnion on receiptStatus:
- Path A (receipts_found): grounded reflection + content guard + orbXp
- Path B (no_receipts_available): present-tense only, orbXp=0

Content guard enforces provenance: historical claims must match receipt
quote field or throw INVALID_OUTPUT. Hash integrity drops mismatched
chunks with sanitized warnings (no user content in logs).

425 passed, 0 failed, 5 skipped (31 new tests).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Audit

Audit artifacts not found.

Schema

No drift detected ✅

Codex

No Codex outputs yet.

@wileland wileland merged commit e8daef3 into develop Mar 3, 2026
3 checks passed
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c27de430d3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +17 to +20
const response = await openai.audio.transcriptions.create({
model: whisperModel,
file: audioUrl,
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Upload an audio file object, not the raw URL string

This passes audioUrl directly to openai.audio.transcriptions.create as file, but inline orchestration provides entry.audioUrl as a string; unlike the worker path (server/src/workers/scribe.worker.js) which resolves audio and sends a stream, this call does not convert the URL/path into an uploadable file object, so Whisper requests can fail at runtime and the inline pipeline aborts before transcription.

Useful? React with 👍 / 👎.

});

const transcript = response.text;
const durationMs = response.duration != null ? Math.round(response.duration * 1000) : 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid returning zero duration for schema-validated output

When response.duration is absent, durationMs is set to 0, but ScribeOutput requires durationMs to be strictly positive; this means AgentRunner will raise INVALID_OUTPUT and fail the scribe stage for those responses, causing inline pipeline failures even when transcription text is otherwise valid.

Useful? React with 👍 / 👎.

Comment on lines +34 to +38
if (receipts.length === 0) {
const { reflection } = await generateStage({ transcript, enrichment, receipts });
return {
reflection,
receiptStatus: 'no_receipts_available',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce historical-claim guard in no-receipts path

The early return for receipts.length === 0 skips the historical-claim guard entirely, so if generation produces text like "you said" despite receiptStatus: 'no_receipts_available', unsupported past claims are returned to users with no provenance checks; this breaks the stated no-receipts safety contract and can surface hallucinated history.

Useful? React with 👍 / 👎.

@wileland wileland mentioned this pull request Mar 4, 2026
wileland added a commit that referenced this pull request Mar 4, 2026
* feat(enrichment): add Phase 2 enrichment relay stage (#648)

* chore(codex): phase2 enrichment relay task

* feat(enrichment): add Phase 2 relay stage

* fix(enrichment): allow retries; fail only on final attempt (#649)

* feat: oracle foundation (MemoryChunk receipts + context-pack-v0 + artifact-registry-v0) (#650)

* feat: oracle foundation (MemoryChunk receipts + context-pack-v0 + artifact-registry-v0)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(specs): remove leading --- from context-pack-v0 and artifact-registry-v0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(docs): mark legacy FAISS vector index; clarify Atlas is canonical (#651)

* feat: wire enrichmentData into reflection prompt (phase 3 track A) (#652)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(docs): clarify enrichment terminal semantics in isUpstreamReady (#653)

* feat(oracle): wire Atlas $vectorSearch retrieval into reflection context pack (#654)

* feat(oracle): wire Atlas $vectorSearch retrieval into reflection context pack

Implements Phase 3 of Oracle retrieval (context-pack-v0.md):
- In reflectEntryWithContext (already-async layer), embed the entry
  transcript via generateEmbedding and run MemoryChunk.aggregate with
  $vectorSearch (index=VECTOR_INDEX_NAME, path=embedding, top-k=5,
  userId tenancy filter).
- Hallucination firewall: only chunks scoring >= 0.72 are mapped into
  retrievedReceipts ({content, messageIds, score}).
- Soft-fail: try/catch around the entire retrieval block; reflection
  continues unconditionally if Oracle retrieval throws.
- buildMCPContext signature unchanged (sealed caller in reflection.worker.js).
- 8 new Oracle retrieval tests covering score gating, soft-fail paths,
  userId filter assertion, and messageIds fallback.

All 285 tests pass (5 skipped).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(memory): persist userId tenancy on MemoryChunk ingest

* fix(librarian): pass userId to knowledge-store for tenant-scoped chunks

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(archive): Phase 4A sovereign archive + batch ingestion pipeline (#655)

* feat(archive): Phase 4A sovereign archive pipeline

Implements the full sovereign archive ingestion pipeline for Echo Doj0:

- Task 0: MemoryChunk.js schema — adds 5 Phase 4A fields (chunkId,
  densityScore, sourceRole, supabaseUserId, sourceFileSha256) plus
  two new indexes: { userId, sourceRole } and { chunkId } unique sparse.

- Task 7: TDD test suite (parseChatGptExport, chunkConversations,
  scoreDensity) — 342 passed, 5 skipped, 0 failed.

- Task 1: parseChatGptExport.js — pure fn, user-only (doctrine),
  SHA-256 textHash, sourceId = chatgpt:{convId}:{msgId}.

- Task 2: chunkConversations.js — scene merging ≤4000 chars,
  deterministic sort (createdAt + messageId tie-break), stable chunkId,
  MIN_CHUNK_CHARS filter, full provenance on every chunk.

- Task 3: scoreDensity.js + applyKeepRatio — density scoring with
  length/unique/sentence factors; keepRatio slice with tie-break sort.

- Task 4: embedAndStore.js — library module (no dotenv), concurrency-5
  worker pool, 3-attempt retry with exponential backoff, upsert on
  { chunkId } for true idempotency.

- Task 5: importChatGptExport.js — CLI entrypoint; normalizes array
  or {conversations:[]} export shapes; dryRun + quarantineOut modes;
  dynamic import of embedAndStore to avoid ESM hoisting / openai init
  race with dotenv.

- Task 6: validateRetrieval.js — CLI smoke test; Atlas $vectorSearch
  with userId filter; prints receipts with score ≥ 0.72.

ESM dotenv fix: config/openai.js evaluates OPENAI_API_KEY at module
load time. CLI entrypoints use dynamic import() for OpenAI-dependent
modules so dotenv.config() runs first in the module body.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(archive): hard-split oversized chunks

Add splitMessage() with layered split strategy (paragraph → sentence → hard-slice)
so every output chunk satisfies content.length <= MAX_CHUNK_CHARS. Single messages
larger than MAX_CHUNK_CHARS (e.g. the 54 595-char quarantine case) are expanded into
sub-parts before the scene-merging loop, closing the bypass that occurred when buffer
was empty.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(archive): --dir batch ingestion + manifest + resume

Add --dir batch mode to importChatGptExport.js:
- Glob conversations-*.json in target dir (lexical, no recursion)
- Process shards sequentially, never in parallel
- Write JSONL manifest entry per file (APPEND, never overwrite):
  {file, sourceFileSha256, startedAt, endedAt, parsedCount,
   chunkedCount, promotedCount, storedCount, skippedCount,
   errorCount, status, error}
- --resume=true: skip paths already status=success in manifest,
  logging "⏭️  Skipping (already success): <file>"
- File-level failures write failed entry and continue; never abort
- --quarantineOut APPENDs across all shards (unified JSONL)
- Prints summary: ✅ Completed | ⏭️ Skipped | ❌ Failed | 💾 Total
- Default manifestOut: ./.backup/import_manifest.jsonl
- --file and --dir are mutually exclusive → hard error if both
- Single-file mode (--file) unchanged; no manifest written

Dry-run verified: 16 shards, 1855 promoted chunks, 0 errors
Live import: 143 chunks stored, all manifest entries success
Atlas: max content 4000 chars, 0 oversized

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(deps): update package manifest and lockfile for archive pipeline

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(agents): phase 4.5 agent runner + inline pipeline mode (#656)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix inline createScribeTask to return AgentTask contract (#657)

* Mark inline SCRIBE task completed on pipeline success (#658)

* feat(agents): Phase 5 Oracle Activation v1 — assembly line + provenance receipts (#659) (#659)

Replace all 4 agent fn stubs with real implementations:
- scribeFn: Whisper transcription + SHA-256 provenance anchor
- enrichmentFn: GPT-4o-mini extraction (Mirror not Oracle)
- reflectionFn: Assembly line (extract → validate → compare → generate)
- archivistFn: MongoDB write with full provenance trail

Add 4 pipeline stages (server/src/agents/pipeline/):
- extractStage: embed transcript via text-embedding-3-small
- validateStage: Atlas $vectorSearch with userId tenancy + score >= 0.72
- compareStage: SHA-256 integrity check, drops tampered chunks
- generateStage: GPT-4o-mini reflection grounded in receipt chunks

Upgrade ReflectionOutput to z.discriminatedUnion on receiptStatus:
- Path A (receipts_found): grounded reflection + content guard + orbXp
- Path B (no_receipts_available): present-tense only, orbXp=0

Content guard enforces provenance: historical claims must match receipt
quote field or throw INVALID_OUTPUT. Hash integrity drops mismatched
chunks with sanitized warnings (no user content in logs).

425 passed, 0 failed, 5 skipped (31 new tests).

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Fix scribeFn durationMs to satisfy positive schema (#660)

* feat(chat): Phase 6 Context Pack and Companion Chat v1 (#661)

* feat(chat): Phase 6 Context Pack and Companion Chat v1

- 4 new Mongoose models: ChatSession, ChatMessage, ContextSnapshot, ContextPack
- Retrieval adapter maps chat domain to Phase 5 pipeline (text → transcript)
- Companion Chat: gpt-4o-mini, temperature 0, receipts-or-silence prompts
- Context Pack: cryptographically anchored save state (seedHash, not seedText)
- GraphQL mutations/queries in main tree (server/graphql/), not agent tree
- ContextSnapshot audit trail for every chat response and context pack
- 18 new test assertions across 3 test files (445 total passing, 0 regressions)
- Zero Entry modifications — chat is a separate domain
- Tenancy filter (userId) enforced on all vector queries via sealed pipeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(chat): merge typedefs + verify jwt + mutation context pack

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(chat): Phase 6.5 Companion Chat UI — receipts-first companion interface (#662)

Adds the client-side companion chat UI that calls the Phase 6 GraphQL
surface (companionChat, buildContextPack, getContextPack). Receipt
status badges distinguish grounded vs present-moment responses. The
ReceiptsDrawer renders verified user quotes with truncated textHash
and sourceEntryId — no AI-generated text inside the drawer.

New components: CompanionChat, ChatMessage, ChatInput, ReceiptsDrawer,
ContextPackPanel, ChatPage. Route at /companion with sidebar nav link.
7 frontend test cases covering optimistic render, mutation shape,
receipt badge paths, drawer content, and sessionId persistence.

No server files modified. Backend: 445 passed, 0 failed.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(arena): Phase 7 Persona Arena v1 — receipts-gated chips + invokePersona lens (#663)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* chore(prod): Phase 8 Portfolio Prod hardening — auth sweep, Render config, seed, README (#664)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(render): declare S3 env + fix static blueprint runtime

* chore(render): add canonical S3 bucket env aliases

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant