HelpmateAI

HelpmateAI is a document-aware RAG system for long PDFs and DOCX files. It plans retrieval over document topology instead of treating every question as a flat dense top-k search. It is built for the questions where ordinary "chat with PDF" systems break: broad thesis conclusions, research-paper contributions, policy clauses, scattered evidence, weak retrieval, and citation-sensitive answers.

Live landing page: https://helpmateai.xyz; Workspace app: https://app.helpmateai.xyz

What Makes It Different

Most RAG demos retrieve the top chunks and hope the answer model can stitch them together. HelpmateAI treats retrieval as a planned workflow over a structured document map.

Typical RAG failure	HelpmateAI behavior
"What are the conclusions?" returns a few random result paragraphs.	A dedicated `global_summary_first` route anchors overview, findings, discussion, and conclusion regions before assembling raw chunk evidence.
The model answers even when retrieval is weak.	Evidence is graded as `strong`, `weak`, or `unsupported`; unsupported questions stop before answer generation.
Section-scoped questions drift into the wrong chapter or policy region.	A bounded orchestrator can resolve explicit local scope to validated section IDs, with deterministic safety checks.
The right chunk appears in top-k but not at rank 1.	A spread-triggered, reorder-only evidence selector can promote stronger evidence without pruning away support.
Architecture changes are chosen by intuition.	The repo carries ADRs, ablations, and benchmark reports for retrieval, reranking, planning, abstention, and evidence selection.

Product Preview

Landing experience

Workspace flow

Workspace	Answer panel

Evidence visibility

Latest Validation Snapshot

The latest saved held-out product-fit run is final_eval_suite_20260429_193058.json. It used 50 fixed questions across five public documents, running HelpmateAI only in native-context mode and judging with RAGAS using Gemini 2.5 Flash plus OpenAI embeddings.

That run showed the current shape of the system clearly:

Metric	HelpmateAI
Questions	`50`
Answerable questions	`45`
Unsupported questions	`5`
Supported rate	`0.7200`
Answerable supported rate	`0.8000`
Unsupported abstention rate	`1.0000`
False support rate	`0.0000`
False abstention rate	`0.2000`
RAGAS faithfulness	`0.9334`
RAGAS faithfulness, attempted only	`0.9600`
RAGAS answer relevancy, attempted only	`0.7892`
RAGAS context precision, attempted only	`0.9093`

The native-context scoring fixes an earlier evaluation-methodology issue where HelpmateAI generated from its full selected evidence while RAGAS judged against a clipped context payload. This latest run is the current HelpmateAI product score; earlier vendor rows remain useful as historical comparisons but should not be mixed into a single headline table without rerunning all systems under the same scoring mode.

Evaluation Methodology

Evaluation is treated as part of the architecture, not a one-off demo. The current final-eval harness uses fixed public documents, fixed question manifests, answerable and intentionally unsupported questions, per-intent reporting, and saved machine-readable reports under docs/evals/reports/.

The latest held-out suite uses:

public source documents recorded in final_eval_sources_20260428.md
fixed draft questions in final_eval_questions.draft.json
RAGAS scoring with a non-generator judge model where configured
explicit abstention metrics alongside answer-quality metrics
separate native-context and equalized-context modes for future product and controlled retrieval comparisons
documented vendor comparison settings when OpenAI File Search or Vectara baselines are run

Full protocol details live in final_eval_protocol.md, with the broader evaluation plan in next_steps_and_final_eval_plan.md.

How It Is Built

The retrieval core lives in src/ and stays framework-agnostic. backend/ exposes it through FastAPI upload, index, status, and ask endpoints. frontend/ ships the Next.js workspace UI. deploy/vps/ contains the Docker Compose and Caddy deployment path for the API, while the public app is split between landing, workspace, and backend surfaces.

Built with Next.js, FastAPI, pypdf, python-docx, optional Docling, ChromaDB, OpenAI, sentence-transformers, scikit-learn, optional Supabase persistence, optional hosted Chroma-compatible storage, Docker, and uv.

PDF extraction defaults to HELPMATE_PDF_EXTRACTOR=pypdf for reliability. DOCX extraction defaults to HELPMATE_DOCX_EXTRACTOR=python-docx. Set either extractor to docling only for local layout-parser experiments; production stays on the predictable local extractors.

Docling OCR is disabled by default (HELPMATE_DOCLING_OCR=false) to avoid unnecessary memory pressure on born-digital PDFs. It can be enabled for scanned PDFs when the runtime has enough memory. Docling runs with expanded Markdown tables and records OCR/table-mode metadata when explicitly enabled.

Current Limits

HelpmateAI is strongest on grounded long-document QA, policy questions, thesis/report navigation, and citation-visible answers. The hardest remaining cases are the broadest academic synthesis prompts on noisy journal-style PDFs, plus broader held-out coverage for orchestrated local-scope behavior.

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.github/workflows		.github/workflows
backend		backend
deploy/vps		deploy/vps
docs		docs
frontend		frontend
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HelpmateAI

What Makes It Different