A local-first lore bank + exploration platform for historical fantasy & sci-fi worldbuilding. Ingest your manuscripts, extract entities/relations/timelines, cross-link with curated web research, and explore narrative tropes—with human-in-the-loop control.
- Vision & Non-Goals
- Core Ideas
- Architecture Overview
- Components
- Persistence & Data Model
- Extraction Pipeline
- RAG+ (Web Research Lane)
- Trope Explorer
- Orchestration via Temporal
- MCP Tools & Public APIs
- Review Workflow & Console
- Governance: Confidence, Provenance, Lanes
- Security, Privacy, and Licenses
- Observability & Cost Controls
- Environments & Deployment
- Testing Strategy
- Roadmap
- Glossary
- Appendix A — Schemas (Concept)
- Appendix B — Prompts (Concept)
- Appendix C — Events & Policies
Vision. Build a durable, canon-first knowledge base for your fiction that:
- Ingests local drafts (PDF/DOCX/MD) continuously,
- Extracts entities (Characters, Places, Artifacts, Factions), events, relations, and timelines,
- Keeps provenance (document+char spans) for every accepted fact,
- Curates a separate, attributed research lane for real history/mythology,
- Provides an idea/trope lane for speculative brainstorms,
- Offers RAG search over curated web sources without polluting canon,
- Uses human-in-the-loop review to promote proposals into canon,
- Scales from single-machine to team use with minimal changes.
Non-Goals (for v1).
- No auto-canonization without human approval for high-impact facts (lineage, death, world rules).
- No general web scraping at scale; use whitelists and manual adds.
- No heavy knowledge graph platform required; start relational, project graph views as needed.
- Canon — accepted facts in your universe; fuels timelines/graphs/queries.
- Research — cleaned, attributed external sources (history, myth, archaeology).
- Idea/Trope — speculative “what-ifs”, trope patterns, and subversions; inspire writing but don’t become canon automatically.
parent_of, descendant_of, ally_of, enemy_of, teacher_of, member_of, appears_in, located_in, holds, origin_of, inspired_by, embodies
flowchart LR
A1["Strands Agents"] --> TMP["Temporal Workflows"]
A1 --- MEM["Mem0 (short-term memory)"]
W1["Doc Watcher"]
P1["Parser"]
H1["Web Harvester"]
CLN["Cleaner/Chunker"]
X1["NER + Coref"]
X2["Relation/Timeline Extractor"]
V1["Verifier (GPT-5/Claude/Gemini)"]
T1["Trope Miner"]
RQ["Review Queue API"]
LQ["Lore Query API"]
RAG["RAG API"]
UI["Lore Console"]
DB1["Canon DB"]
DB2["Research DB"]
DB3["Idea/Trope DB"]
VEC["Vector Store"]
BLOB["Blob Store"]
TMP --> W1
TMP --> H1
TMP --> X1
TMP --> X2
TMP --> V1
TMP --> T1
W1 --> P1
P1 --> DB1
H1 --> CLN
CLN --> DB2
CLN --> VEC
CLN --> BLOB
X1 --> RQ
X2 --> RQ
V1 --> RQ
RQ --> DB1
T1 --> DB3
RAG --> DB2
DB2 --> RAG
RAG --> VEC
VEC --> RAG
UI --> RQ
RQ --> UI
UI --> LQ
LQ --> UI
UI --> RAG
RAG --> UI
- Doc Watcher — watches a folder; emits file arrivals.
- Parser — PDF/DOCX/MD → plain text, sections, char offsets; FTS indexing.
- Web Harvester — curated search + URL fetch; stores cleaned text with metadata.
- Cleaner/Chunker — readability, boilerplate removal, chunking for RAG.
- NER + Coref — detects entities & coreference clusters; proposes mentions.
- Relation/Timeline Extractor — rules + dependency patterns; proposes relations/events/time expressions.
- Verifier — gated calls to GPT-5 / Claude 4 / Gemini for low-confidence or conflicting items; JSON-only, evidence-required.
- Trope Miner — detects trope candidates and links them to scenes/entities.
- Review Queue API — CRUD for proposals; accept/reject/merge; domain rules.
- Lore Query API — entity cards, timelines, adjacency views, search.
- RAG API — hybrid (keyword + vector) retrieval, idea synthesis endpoints.
- Lore Console (UI) — human review, browsing canon/research/idea, conflict dashboards.
- Temporal — durable workflows, retries, human signals, rate limits.
- Canon DB — relational (SQLite → Postgres).
- Research DB — relational with FTS; hybrid retrieval hooks.
- Idea/Trope DB — relational; links to scenes/entities.
- Vector Store — Chroma (local) → Qdrant or pgvector.
- Blob Store — local FS → MinIO (S3-compatible).
entity(id, type[character|place|artifact|faction|legend|motif], display_name, attrs_json, notes, lane='canon')alias(id, entity_id, name)event(id, title, time_start?, time_end?, place_id?, notes, lane='canon')relation(id, subj_id, predicate, obj_id, time_start?, time_end?, place_id?, confidence, lane, notes)participation(event_id, entity_id, role)provenance(kind['entity'|'event'|'relation'], ref_id, source_type['local'|'web'], document_id, section_id, evidence_json, model, decided_by['auto'|'human'])
proposed_entity,proposed_event,proposed_relation(mirror canon tables +status,created_at,reasons,conflicts[])
document(id, path_or_url, title, source_type['local'|'web'], publisher?, published_at?, retrieved_at?, era_hint?)section(id, document_id, idx, start_char, end_char, heading, text)(FTS on text)
- Same
document/sectionstructure; each chunk linked to embeddings; all rows lane='research' by default.
trope(id, name, category, description, source_ref)idea_card(id, title, pitch, risks[], subversions[], seeds[], status['draft'|'kept'|'merged'], lane='idea')- Link tables:
trope_link(trope_id, entity_id|event_id|scene_id, tone, status)
Local first pass (fast & deterministic)
- NER (spaCy), Coref (fastcoref)
- Rules/dep patterns for easy edges (appears_in, located_in, teacher_of from appositives, etc.)
dateparser+ regex for temporal phrases
LLM-assisted verification (only when needed)
- GPT-5 / Claude 4 to verify/normalize low-confidence or conflicting relations
- Gemini for temporal normalization (ISO + granularity) on fuzzy expressions
- Strict JSON schemas; evidence spans required; dual-verify for high-impact facts
Human-in-the-loop
- All proposals route to Review Queue
- Acceptance promotes items to canon & writes provenance
Harvester
- Whitelists (academic, museum, high-quality references) + manual URLs
- Clean & chunk into
section; store publisher, published_at, retrieved_at - Build vectors with local embeddings; attach to
section_id
Retriever
- Hybrid: FTS keyword + vector similarity with filters (publisher, date, tags)
- Returns attributed snippets (never full content), each with URL and publisher
Synthesis
- Idea cards derived from snippets are stored in Idea/Trope DB (lane='idea')
- They do not alter canon until explicitly promoted via review
Model
- Trope taxonomy (name, category, description, examples, pitfalls)
- Detected trope candidates (
trope.mine(scene_id)) with confidence & evidence spans - Suggestion API (
trope.suggest(context, goals[])) to subvert/lean-in with risks
Usage
- Audit a scene to avoid cliché or intentionally embrace a pattern
- Link trope → entity/event via
embodiesrelation (lane='idea' until accepted)
Workflows (examples)
IngestWorkflow(doc_path)→ parse → index → propose mentionsExtractWorkflow(doc_id)→ NER/Coref → relation/time passes → proposalsVerifyWorkflow(proposal_ids)→ model calls → enrich confidence/evidenceReviewWorkflow(batch)→ wait on human signalsAccept/Reject/MergeRagHarvestWorkflow(query|url)→ fetch → clean → index → embed
Activities
- Small, idempotent calls to MCP tools & APIs
- Token budgets and rate limits per provider
- Signals for human decisions and for budget exhaustion fallback
lore.ingest(path|folder)lore.extract(mode="hybrid", batch=8)lore.verify(kind, ids[])lore.review.next(kind, limit)/lore.review.accept(id)/lore.review.reject(id)/lore.review.merge(a,b)lore.query.entity(name|id)/lore.timeline(entity_id)/lore.graph(entity_id, depth)lore.search(text)(FTS + optional semantic)lore.import_web(query|url, whitelist_tag)trope.mine(scene_id)/trope.suggest(context, goals[])idea.create(seeds[])/idea.promote(id)
- Lore Query:
/entity/:id,/timeline/:id,/graph/:id,/search?q= - Review Queue:
/proposals?kind=relation&status=pending,POST /accept,POST /merge - RAG:
/rag/search,/rag/snippets,/rag/synthesize
All returns are JSON; pagination and filtering via query params.
Views
- By Entity: proposed aliases/relations/events grouped with side-by-side evidence
- By Predicate: e.g., show all
parent_ofproposals for this batch - Conflicts: contradictory facts with citations & char-spans
Actions
- Accept → promotes to canon (writes provenance, updates confidence)
- Reject → archived with reason
- Merge → unify entities/aliases; re-point references
- Annotate → add clarifications (adoption, reincarnation, mythic metaphor flags)
-
Confidence Gates (defaults)
≥ 0.80: auto-publish for low-impact edges (e.g., appears_in)0.60 – 0.79: require verify or review< 0.60: review only- High-impact edges (lineage/death/world rules): dual-verify or human approval
-
Provenance
- Every canon item links to document + section + evidence spans and the model (if any) that proposed/verified it
-
Lanes
canonfuels timelines/graphs/exportsresearchis attributed background; never auto-bleeds into canonideais sandbox; promote via review when used
- Local-first. Keep manuscripts and extracted data local by default.
- Selective escalation. Only send necessary excerpts to cloud models; consider redacting sensitive lines.
- Attribution. Store publisher, URL, published_at, retrieved_at for all web content; keep quotes short.
- Licenses. Respect site terms; store a
policyfield for each source domain; allow blocklist/allowlist.
- Metrics: ingest rate, extraction latency, acceptance rate, conflict count, token spend per provider
- Tracing: OpenTelemetry spans across MCP calls and Temporal activities
- Budgets: daily token caps per model; workflow pauses → pushes items to review
- Caching: hash(section_text) → cache model outputs; invalidate on change
Local (MVP)
- SQLite (canon/research/idea), Chroma, optional MinIO, Temporal single worker, all services on localhost
Team / Scale-out
- Postgres (+ pgvector) or Qdrant, MinIO, Temporal server+workers, NATS/Redis for messaging, CI for migrations and tests
- Unit tests for parsers, rules, policy gates, schema validation
- Contract tests for MCP tool JSON I/O and public APIs
- Golden/snapshot tests for extraction outputs on small fixture docs
- Property tests for merge/alias behavior and timeline math
- Load tests for RAG retrieval and vector queries
Milestone 1 — Canon Core
- Folder watch → parse → NER/coref → propose
appears_in/located_in - Review console (basic) → accept → entity cards & timelines
- FTS search over sections & notes
Milestone 2 — Relations & Time
- Add parent/descendant/ally/enemy/teacher
- Temporal normalization + event participation
- Conflict detector; provenance everywhere
Milestone 3 — RAG+ Research Lane
- Web harvester + embeddings + hybrid retriever
- Idea cards from attributed snippets (kept in idea lane)
- Promote “inspired_by/origin_of” links via review
Milestone 4 — Trope Explorer
- Trope mining + suggestions + audit reports
- Subversion patterns; link to scenes/entities
Milestone 5 — Visualization & Exports
- Graph/adjacency views; timelines → ICS/CSV
- Graphviz export; report generators
- Canon: Accepted truth of your fictional universe.
- Research: External factual background with citation.
- Idea/Trope: Speculative brainstorms and narrative patterns.
- Evidence spans: Character offsets in source text supporting a fact.
- Dual-verify: Two independent model checks agree before auto-publish.
Proposed Relation (JSON)
{
"subject": "Tomoe",
"predicate": "parent_of | ally_of | enemy_of | teacher_of | member_of | appears_in | located_in | holds | origin_of | inspired_by | embodies",
"object": "Yoli",
"time": {"start_iso": null, "end_iso": null, "text": "late Heian", "granularity": "era"},
"place": "Kyoto",
"confidence": 0.73,
"evidence_spans": [[1204, 1268]],
"source": {"type": "local", "document_id": "doc-17", "section_id": "s-3"},
"model": {"name": "llama3.1:14b", "mode": "verify"},
"lane": "canon"
}Time Expression
{
"text": "the winter after the coronation",
"normalized": {"start_iso": "1450-12", "end_iso": "1451-02", "granularity": "month"},
"assumptions": ["coronation=1450-11 (doc-37 s2)"],
"evidence_spans": [[834, 867]],
"confidence": 0.78
}Idea Card
{
"title": "Storm-pact echoes Illapa rites",
"pitch": "What if Inari’s storm aspect mirrors Andean Illapa ceremonies, binding winds to blood debt?",
"seeds": ["research:doc-5:s-12", "research:doc-9:s-3"],
"risks": ["Chosen One fatigue"],
"subversions": ["mentor’s betrayal is pragmatic, not evil"],
"links": [{"type": "inspired_by", "from": "Event:1450-Chimu-Rite", "to": "Legend:Illapa"}],
"lane": "idea",
"status": "draft"
}Relation Verify (system)
- “You verify proposed lore relations from a fictional manuscript. Output only JSON matching the provided schema. Include at least one evidence span (start,end char offsets) for each accepted relation. If uncertain, omit.”
Temporal Normalize (system)
- “Normalize time expressions to ISO with granularity (day|month|year|era). Include original text, evidence spans, and any assumptions.”
Trope Mine (system)
- “Detect likely narrative tropes present in the scene. Return trope ids/names, confidence, and evidence spans. Do not invent plot facts.”
Events (examples)
document.ingested,webdoc.ingestedmentions.proposed,relations.proposed,events.proposedproposal.verified,proposal.accepted,proposal.rejected,entity.merged
Confidence Policy (defaults)
- Low-impact edges (
appears_in,located_in) with ≥0.80 may auto-publish - High-impact edges (lineage/death/world rules) require dual-verify ≥0.70 or human acceptance
- Any item without evidence spans cannot be published
- Temporal gives you durable workflows and human gates without duct-tape cron jobs.