Research-grade RAG platform for large-scale book knowledge bases (10k–15k books).
- High-recall retrieval across books and papers
- Citation-first answers (book/chapter/page)
- Scalable ingestion + hybrid search + reranking
apps/api— HTTP API (/search,/ask,/ingest,/jobs/{id})apps/worker— background worker process (RQ)- shared storage/index backends: Postgres, OpenSearch, Qdrant, Redis, RustFS
cp .env.example .env
docker compose up -d --buildThis brings up everything:
- infra services
- migration job (
schemas/metadata.sql) - API + bridge + background worker
curl -sS http://127.0.0.1:40007/health
curl -sS http://127.0.0.1:40008/healthpython3 scripts/benchmark_retrieval_real_case.pycurl -sS -X POST http://127.0.0.1:40007/ingest \
-H 'Content-Type: application/json' \
-d '{"source_uri":"file:///data/book.pdf","source_type":"book","metadata":{"lang":"vi"}}'
curl -sS http://127.0.0.1:40007/jobs/<job_id>docker-compose.yml— unified local stackinfra/dokploy/docker-compose.dokploy.yml— Dokploy stackdocs/DOKPLOY_RUNBOOK.md— Dokploy deployment guide
MVP upgraded with cache + queue + worker baseline.
python3 scripts/ingest_pdf_full_mineru.py /path/to/book.pdf --out /tmp/aletheia_mineru_output --backend pipelineThis runs end-to-end:
- MinerU parse (PDF -> merged markdown)
- Markdown chunk ingest into Postgres/OpenSearch/Qdrant