Sourced-Citation Cognitive Engine is a production-grade, offline-first intelligence system built for environments where trust matters more than stylistic fluency.
SCCE is a local-first question-answering system designed for high-trust environments.
- It answers from your ingested corpus, not from remote model calls.
- It combines lexical, graph, and spectral retrieval before synthesis.
- It exposes provenance as part of every answering workflow.
- It is operationalized as a server plus worker with observable status and job control APIs.
- It is built for teams that need auditable behavior under privacy, regulatory, or mission constraints.
Key characteristics:
- Evidence is not optional. Response quality is tied to retrievable source material.
- Runtime is local-first. Core answering paths do not depend on cloud LLM calls.
- Provenance is product behavior, not a dashboard extra.
- Operational behavior is inspectable: jobs, status, ingestion, and model state are exposed through APIs.
SCCE is designed for teams that cannot outsource reasoning to opaque cloud models and cannot accept answers without traceable evidence. It ingests your corpus, builds local structure over that corpus, and answers questions through retrieval + reasoning + constrained synthesis with provenance as a first-class output.
If your use case includes regulated workflows, private data estates, air-gapped infrastructure, or high-cost decisions, SCCE is built for that reality.
SCCE combines five capabilities into one deployable system:
- Corpus ingestion across mixed sources (documents, spreadsheets, code, wiki-style corpora).
- Knowledge structuring via entities, relations, and spectral projections.
- Multi-channel retrieval (lexical, graph, spectral) with diversity-aware fusion.
- Planner-driven reasoning loop that tests and refines candidate claims.
- Local synthesis with quality gates, provenance checks, and uncertainty signaling.
At a high level:
- Ingest files into documents/spans/chunks.
- Correlate entities and relations.
- Build and refresh spectral basis/projections.
- Train and load local n-gram models.
- Resolve queries through perception, retrieval, planning, verification, and synthesis.
- Return response text plus source-linked context.
This is implemented as a stable server runtime with background jobs and API visibility for each operational phase.
SCCE is structured for real operations, not just demos.
- Stateful service with explicit DB + model dependencies
- Startup migration safety and controlled shutdown persistence
- Async chat mode with SSE streaming and status events
- Job queue control for indexing/training/spectral refresh
- Operational endpoints for status, topology, activity, and audit export
- Runbook coverage for backups, restore, incidents, and handoff
See full operating details in docs/OPERATIONS.md and docs/PRODUCTION_HANDOFF.md.
apps/server: Fastify API, startup/shutdown lifecycle, routes, worker orchestrationapps/web: React UI for chat, vault, training, artifacts, and system monitoringpackages/core: ingestion, correlation, retrieval, planner, synthesis, spectral logicpackages/db: PostgreSQL access and migration layerpackages/types: shared TypeScript types and contractspackages/compute: parallel pipeline and compute dispatch utilitiespackages/security: policy and audit supportpackages/plugins: renderer and webapp template infrastructurepackages/sketches: probabilistic structures used by supporting workflowsdata: local models, uploads, corpora, artifacts, and runtime state
- Node.js >= 20
- pnpm >= 8 (via corepack)
- PostgreSQL >= 14
- Install dependencies.
corepack enable
corepack pnpm install- Set database URL for the server process.
$env:SCCE_DB_URL="postgres://scce_app:scce_app@localhost:5432/scce"- Build all packages.
corepack pnpm -r build- Start server and web app in separate terminals.
corepack pnpm dev:server
corepack pnpm dev:web- Verify runtime health.
curl http://127.0.0.1:3000/health
curl http://127.0.0.1:3000/api/system/statusFor a full local bootstrap (DB path, demo seeding, ingest, training triggers, and validation request):
corepack pnpm tsx scripts/setup-complete-system.tsSynchronous chat (no attachments):
curl -X POST http://127.0.0.1:3000/api/chat `
-H "Content-Type: application/json" `
-d '{"message":"What is in the vault?","conversationId":null,"attachments":[]}'Asynchronous chat pattern (attachments -> SSE):
- POST
/api/chatwith attachments. - Read
conversationIdfrom response. - Stream events from
GET /api/events/:conversationId.
See detailed contracts and payload shapes in docs/API_REFERENCE.md.
corepack pnpm db-setup: create/apply database schemacorepack pnpm smoke-test: validate key runtime pathscorepack pnpm seed: seed demo corpuscorepack pnpm status: status scriptcorepack pnpm ingest:wiki: run wiki ingestion/training pipelinecorepack pnpm quality:check: headers + architecture checkscorepack pnpm quality:deep: quality checks + hostile audit suite
SCCE trust posture is layered:
- credentials are environment-supplied, not hard-coded
- CORS policy is constrained to localhost development origins and rejects
nullorigin - upload/ingest paths are validated before filesystem operations
- duplicate controls reduce accidental corpus bloat and replay noise
- provenance verification is part of answer quality handling
Operational priorities:
- keep DB and model backups current
- monitor chat error and timeout rates
- watch training/job queue health
- track ingestion growth and duplicate trends
- validate release upgrades against migration path
Use these docs as your source of truth:
SCCE expects disciplined, auditable changes.
- keep changes scoped and reversible
- preserve API contracts or document intentional changes
- keep SQL parameterized and input validation explicit
- update docs alongside behavior changes
- validate with build/smoke/quality scripts before merge
Contributor workflow references:
- docs/ARCHITECTURE.md: full system architecture and pipeline internals
- docs/DEVELOPMENT.md: development workflow and package boundaries
- docs/ONBOARDING.md: first-contribution and first-PR path
- docs/OPERATIONS.md: startup, ingest, training, backup, troubleshooting
- docs/PRODUCTION_HANDOFF.md: SLOs, monitoring, incidents, ownership transfer
- docs/API_REFERENCE.md: endpoint reference and payload examples
- docs/MATH_OVERVIEW.md: code-grounded equations, scoring functions, and thresholds
- docs/AI_SKILLS.md: repository-specific assistant guidance and guardrails
Proprietary. See LICENSE for terms.