Aletheia

Research-grade RAG platform for large-scale book knowledge bases (10k–15k books).

Goals

High-recall retrieval across books and papers
Citation-first answers (book/chapter/page)
Scalable ingestion + hybrid search + reranking

Architecture (modular monolith)

apps/api — HTTP API (/search, /ask, /ingest, /jobs/{id})
apps/worker — background worker process (RQ)
shared storage/index backends: Postgres, OpenSearch, Qdrant, Redis, RustFS

One-command startup

cp .env.example .env
docker compose up -d --build

This brings up everything:

infra services
migration job (schemas/metadata.sql)
API + bridge + background worker

Verify

curl -sS http://127.0.0.1:40007/health
curl -sS http://127.0.0.1:40008/health

Real retrieval benchmark

python3 scripts/benchmark_retrieval_real_case.py

Queue example

curl -sS -X POST http://127.0.0.1:40007/ingest \
  -H 'Content-Type: application/json' \
  -d '{"source_uri":"file:///data/book.pdf","source_type":"book","metadata":{"lang":"vi"}}'

curl -sS http://127.0.0.1:40007/jobs/<job_id>

Important files

docker-compose.yml — unified local stack
infra/dokploy/docker-compose.dokploy.yml — Dokploy stack
docs/DOKPLOY_RUNBOOK.md — Dokploy deployment guide

Status

MVP upgraded with cache + queue + worker baseline.

MinerU Full Ingest (one command)

python3 scripts/ingest_pdf_full_mineru.py /path/to/book.pdf --out /tmp/aletheia_mineru_output --backend pipeline

This runs end-to-end:

MinerU parse (PDF -> merged markdown)
Markdown chunk ingest into Postgres/OpenSearch/Qdrant

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.agent		.agent
.github/workflows		.github/workflows
apps		apps
design-system/aletheia-gateway		design-system/aletheia-gateway
docs		docs
infra		infra
schemas		schemas
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aletheia

Goals

Architecture (modular monolith)

One-command startup

Verify

Real retrieval benchmark

Queue example

Important files

Status

MinerU Full Ingest (one command)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Aletheia

Goals

Architecture (modular monolith)

One-command startup

Verify

Real retrieval benchmark

Queue example

Important files

Status

MinerU Full Ingest (one command)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages