Skip to content

rpwalsh/scce

Repository files navigation

SCCE 2.0

Sourced-Citation Cognitive Engine is a production-grade, offline-first intelligence system built for environments where trust matters more than stylistic fluency.

Executive Summary

SCCE is a local-first question-answering system designed for high-trust environments.

  • It answers from your ingested corpus, not from remote model calls.
  • It combines lexical, graph, and spectral retrieval before synthesis.
  • It exposes provenance as part of every answering workflow.
  • It is operationalized as a server plus worker with observable status and job control APIs.
  • It is built for teams that need auditable behavior under privacy, regulatory, or mission constraints.

Key characteristics:

  • Evidence is not optional. Response quality is tied to retrievable source material.
  • Runtime is local-first. Core answering paths do not depend on cloud LLM calls.
  • Provenance is product behavior, not a dashboard extra.
  • Operational behavior is inspectable: jobs, status, ingestion, and model state are exposed through APIs.

Why This Matters

SCCE is designed for teams that cannot outsource reasoning to opaque cloud models and cannot accept answers without traceable evidence. It ingests your corpus, builds local structure over that corpus, and answers questions through retrieval + reasoning + constrained synthesis with provenance as a first-class output.

If your use case includes regulated workflows, private data estates, air-gapped infrastructure, or high-cost decisions, SCCE is built for that reality.

What SCCE Does

SCCE combines five capabilities into one deployable system:

  1. Corpus ingestion across mixed sources (documents, spreadsheets, code, wiki-style corpora).
  2. Knowledge structuring via entities, relations, and spectral projections.
  3. Multi-channel retrieval (lexical, graph, spectral) with diversity-aware fusion.
  4. Planner-driven reasoning loop that tests and refines candidate claims.
  5. Local synthesis with quality gates, provenance checks, and uncertainty signaling.

End-to-End Pipeline

At a high level:

  1. Ingest files into documents/spans/chunks.
  2. Correlate entities and relations.
  3. Build and refresh spectral basis/projections.
  4. Train and load local n-gram models.
  5. Resolve queries through perception, retrieval, planning, verification, and synthesis.
  6. Return response text plus source-linked context.

This is implemented as a stable server runtime with background jobs and API visibility for each operational phase.

Production Posture

SCCE is structured for real operations, not just demos.

  • Stateful service with explicit DB + model dependencies
  • Startup migration safety and controlled shutdown persistence
  • Async chat mode with SSE streaming and status events
  • Job queue control for indexing/training/spectral refresh
  • Operational endpoints for status, topology, activity, and audit export
  • Runbook coverage for backups, restore, incidents, and handoff

See full operating details in docs/OPERATIONS.md and docs/PRODUCTION_HANDOFF.md.

Architecture at a Glance

  • apps/server: Fastify API, startup/shutdown lifecycle, routes, worker orchestration
  • apps/web: React UI for chat, vault, training, artifacts, and system monitoring
  • packages/core: ingestion, correlation, retrieval, planner, synthesis, spectral logic
  • packages/db: PostgreSQL access and migration layer
  • packages/types: shared TypeScript types and contracts
  • packages/compute: parallel pipeline and compute dispatch utilities
  • packages/security: policy and audit support
  • packages/plugins: renderer and webapp template infrastructure
  • packages/sketches: probabilistic structures used by supporting workflows
  • data: local models, uploads, corpora, artifacts, and runtime state

Prerequisites

  • Node.js >= 20
  • pnpm >= 8 (via corepack)
  • PostgreSQL >= 14

Quick Start (Local)

  1. Install dependencies.
corepack enable
corepack pnpm install
  1. Set database URL for the server process.
$env:SCCE_DB_URL="postgres://scce_app:scce_app@localhost:5432/scce"
  1. Build all packages.
corepack pnpm -r build
  1. Start server and web app in separate terminals.
corepack pnpm dev:server
corepack pnpm dev:web
  1. Verify runtime health.
curl http://127.0.0.1:3000/health
curl http://127.0.0.1:3000/api/system/status

Fast Production Bootstrap

For a full local bootstrap (DB path, demo seeding, ingest, training triggers, and validation request):

corepack pnpm tsx scripts/setup-complete-system.ts

First API Interaction

Synchronous chat (no attachments):

curl -X POST http://127.0.0.1:3000/api/chat `
	-H "Content-Type: application/json" `
	-d '{"message":"What is in the vault?","conversationId":null,"attachments":[]}'

Asynchronous chat pattern (attachments -> SSE):

  1. POST /api/chat with attachments.
  2. Read conversationId from response.
  3. Stream events from GET /api/events/:conversationId.

See detailed contracts and payload shapes in docs/API_REFERENCE.md.

Core Scripts

  • corepack pnpm db-setup: create/apply database schema
  • corepack pnpm smoke-test: validate key runtime paths
  • corepack pnpm seed: seed demo corpus
  • corepack pnpm status: status script
  • corepack pnpm ingest:wiki: run wiki ingestion/training pipeline
  • corepack pnpm quality:check: headers + architecture checks
  • corepack pnpm quality:deep: quality checks + hostile audit suite

Security and Trust Model

SCCE trust posture is layered:

  • credentials are environment-supplied, not hard-coded
  • CORS policy is constrained to localhost development origins and rejects null origin
  • upload/ingest paths are validated before filesystem operations
  • duplicate controls reduce accidental corpus bloat and replay noise
  • provenance verification is part of answer quality handling

Operating SCCE in Production

Operational priorities:

  1. keep DB and model backups current
  2. monitor chat error and timeout rates
  3. watch training/job queue health
  4. track ingestion growth and duplicate trends
  5. validate release upgrades against migration path

Use these docs as your source of truth:

Contributing and Engineering Standards

SCCE expects disciplined, auditable changes.

  • keep changes scoped and reversible
  • preserve API contracts or document intentional changes
  • keep SQL parameterized and input validation explicit
  • update docs alongside behavior changes
  • validate with build/smoke/quality scripts before merge

Contributor workflow references:

Documentation Index

License

Proprietary. See LICENSE for terms.

About

LLM-independent inference engine using graph reasoning, spectral retrieval, BM25/SVD search, Kneser-Ney grounded synthesis, concept learning, provenance tracking, and deterministic answer planning without generative hallucination.

Topics

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors