Skip to content

RightBusiness/rb-corpus-bound-answer-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Corpus-Bound Answer Generator

This project implements a corpus-grounded question-answering system that generates natural language responses strictly limited to an ingested corpus.

It is deliberately not deterministic, and deliberately not a general-purpose assistant.

The system is designed to answer only what the corpus explicitly supports, cite its sources, and refuse to speculate when evidence is missing.

Project Status

Stable research implementation.
Not intended as a production knowledge system.


Design Goals

  • Generate natural language answers without hallucination
  • Enforce strict evidence grounding
  • Fail closed instead of guessing
  • Produce outputs that are human-readable and auditable
  • Demonstrate the practical limits of LLM inference under hard constraints

Core Principle

Evidence precedes generation.

The language model may summarize, paraphrase, or explain, but it may only do so using retrieved corpus evidence. If the corpus does not support the question, the system must return NOT_FOUND.


What This Is Not

This project is not:

  • A deterministic query engine
  • A theology engine
  • A chatbot
  • A knowledge completion system
  • A model benchmark or leaderboard exercise

Apparent intelligence derives solely from generation over retrieved corpus text, not inference beyond it.


System Behavior

Evidence Gating

  • The model only sees retrieved corpus passages.
  • Background knowledge and training data recall are explicitly disallowed.

Citation Enforcement

  • Every sentence in an answer must be supported by at least one citation.
  • Citations refer to concrete corpus locations.

Fail-Closed Output

  • When evidence is insufficient or absent, the system outputs NOT_FOUND.
  • No attempt is made to “fill gaps” using inference or common knowledge.

Output Contract (No JSON)

All responses follow this structure:

Verdict: ANSWERED | NOT_FOUND

Answer: Natural language text with inline citations [E1], [E2], etc.

Evidence used:

E1 — doc_id — locator — "verbatim quote" E2 — ...

This format is human-first, auditable, and intentionally resistant to silent failure.


Why No JSON

JSON is not a safety mechanism. Evidence gating is.

This project prioritizes clarity and inspection over machine convenience. Internal tooling may use structure, but external output is designed for direct reading and review.


Intended Use

  • Research and study
  • Corpus-based religious or textual analysis
  • Demonstrating failure modes of LLMs
  • Whitepapers, audits, and reproducible experiments

Out of Scope by Design

  • World knowledge completion
  • Cross-tradition harmonization
  • Implicit timelines or inferred facts
  • “Helpful” extrapolation beyond text
  • Conversational UX optimization

License

Copyright © 2025 Right Business Pte Ltd All rights reserved.

See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages