Skip to content

LEANDERANTONY/HelpmateAI_RAG_QA_System

Repository files navigation

HelpmateAI

CI License: MIT Live App

HelpmateAI is a document-aware RAG system for long PDFs and DOCX files. It plans retrieval over document topology instead of treating every question as a flat dense top-k search. It is built for the questions where ordinary "chat with PDF" systems break: broad thesis conclusions, research-paper contributions, policy clauses, scattered evidence, weak retrieval, and citation-sensitive answers.

HelpmateAI architecture

Live landing page: https://helpmateai.xyz; Workspace app: https://app.helpmateai.xyz

What Makes It Different

Most RAG demos retrieve the top chunks and hope the answer model can stitch them together. HelpmateAI treats retrieval as a planned workflow over a structured document map.

Typical RAG failure HelpmateAI behavior
"What are the conclusions?" returns a few random result paragraphs. A dedicated global_summary_first route anchors overview, findings, discussion, and conclusion regions before assembling raw chunk evidence.
The model answers even when retrieval is weak. Evidence is graded as strong, weak, or unsupported; unsupported questions stop before answer generation.
Section-scoped questions drift into the wrong chapter or policy region. A bounded orchestrator can resolve explicit local scope to validated section IDs, with deterministic safety checks.
The right chunk appears in top-k but not at rank 1. A spread-triggered, reorder-only evidence selector can promote stronger evidence without pruning away support.
Architecture changes are chosen by intuition. The repo carries ADRs, ablations, and benchmark reports for retrieval, reranking, planning, abstention, and evidence selection.

Product Preview

Landing experience

HelpmateAI landing page

Workspace flow

Workspace Answer panel
HelpmateAI workspace HelpmateAI grounded answer panel

Evidence visibility

HelpmateAI evidence panel

Latest Validation Snapshot

The latest saved held-out product-fit run is final_eval_suite_20260429_193058.json. It used 50 fixed questions across five public documents, running HelpmateAI only in native-context mode and judging with RAGAS using Gemini 2.5 Flash plus OpenAI embeddings.

That run showed the current shape of the system clearly:

Metric HelpmateAI
Questions 50
Answerable questions 45
Unsupported questions 5
Supported rate 0.7200
Answerable supported rate 0.8000
Unsupported abstention rate 1.0000
False support rate 0.0000
False abstention rate 0.2000
RAGAS faithfulness 0.9334
RAGAS faithfulness, attempted only 0.9600
RAGAS answer relevancy, attempted only 0.7892
RAGAS context precision, attempted only 0.9093

The native-context scoring fixes an earlier evaluation-methodology issue where HelpmateAI generated from its full selected evidence while RAGAS judged against a clipped context payload. This latest run is the current HelpmateAI product score; earlier vendor rows remain useful as historical comparisons but should not be mixed into a single headline table without rerunning all systems under the same scoring mode.

Evaluation Methodology

Evaluation is treated as part of the architecture, not a one-off demo. The current final-eval harness uses fixed public documents, fixed question manifests, answerable and intentionally unsupported questions, per-intent reporting, and saved machine-readable reports under docs/evals/reports/.

The latest held-out suite uses:

  • public source documents recorded in final_eval_sources_20260428.md
  • fixed draft questions in final_eval_questions.draft.json
  • RAGAS scoring with a non-generator judge model where configured
  • explicit abstention metrics alongside answer-quality metrics
  • separate native-context and equalized-context modes for future product and controlled retrieval comparisons
  • documented vendor comparison settings when OpenAI File Search or Vectara baselines are run

Full protocol details live in final_eval_protocol.md, with the broader evaluation plan in next_steps_and_final_eval_plan.md.

How It Is Built

The retrieval core lives in src/ and stays framework-agnostic. backend/ exposes it through FastAPI upload, index, status, and ask endpoints. frontend/ ships the Next.js workspace UI. deploy/vps/ contains the Docker Compose and Caddy deployment path for the API, while the public app is split between landing, workspace, and backend surfaces.

Built with Next.js, FastAPI, pypdf, python-docx, optional Docling, ChromaDB, OpenAI, sentence-transformers, scikit-learn, optional Supabase persistence, optional hosted Chroma-compatible storage, Docker, and uv.

PDF extraction defaults to HELPMATE_PDF_EXTRACTOR=pypdf for reliability. DOCX extraction defaults to HELPMATE_DOCX_EXTRACTOR=python-docx. Set either extractor to docling only for local layout-parser experiments; production stays on the predictable local extractors.

Docling OCR is disabled by default (HELPMATE_DOCLING_OCR=false) to avoid unnecessary memory pressure on born-digital PDFs. It can be enabled for scanned PDFs when the runtime has enough memory. Docling runs with expanded Markdown tables and records OCR/table-mode metadata when explicitly enabled.

Current Limits

HelpmateAI is strongest on grounded long-document QA, policy questions, thesis/report navigation, and citation-visible answers. The hardest remaining cases are the broadest academic synthesis prompts on noisy journal-style PDFs, plus broader held-out coverage for orchestrated local-scope behavior.

About

HelpmateAI is a long-document QA app for PDFs and DOCX with hybrid retrieval, citation-aware answers, evidence panels, and a Next.js with FastAPI stack.

Topics

Resources

License

Stars

Watchers

Forks

Contributors