RAG-based PDF question-answering app with:
- Frontend: React app (
frontend/) - Backend API: Node + Express (
server.js) - RAG Service: FastAPI + Hugging Face + FAISS (
rag-service/)
Upload a PDF, ask questions from its content, and generate a short summary.
- Frontend uploads file to Node backend (
/upload) - Node forwards file path to FastAPI (
/process-pdf) - FastAPI loads/splits PDF, builds vector index with embeddings
- For
/askand/summarize, FastAPI retrieves relevant chunks and generates output with a Hugging Face model
.
├── frontend/ # React UI
├── rag-service/ # FastAPI RAG service
├── server.js # Node API gateway
├── uploads/ # Uploaded files (runtime)
└── CONTRIBUTING.md
- Node.js 18+ (LTS recommended)
- Python 3.10+
pip
From repository root:
npm install
cd frontend && npm install
cd ../rag-service && python -m pip install -r requirements.txtCreate .env in repo root (or edit existing):
# Optional model override
HF_GENERATION_MODEL=google/flan-t5-baseNotes:
OPENAI_API_KEYis not required for current Hugging Face RAG flow.- Keep real secrets out of git.
cd rag-service
uvicorn main:app --host 0.0.0.0 --port 5000 --reloadcd /workspaces/pdf-qa-bot
node server.jscd /workspaces/pdf-qa-bot/frontend
npm startOpen: http://localhost:3000
Node backend (http://localhost:4000):
POST /upload(multipart form-data, field:file)POST /ask({ "question": "..." })POST /summarize({})
FastAPI RAG service (http://localhost:5000):
POST /process-pdfPOST /askPOST /summarize
Interactive docs: http://localhost:5000/docs
-
Cannot POST /uploadfrom frontend- Restart frontend dev server after config changes:
npm start - Ensure Node backend is running on port
4000
- Restart frontend dev server after config changes:
-
Upload fails / connection refused
- Ensure FastAPI is running on port
5000
- Ensure FastAPI is running on port
-
Slow first request
- Hugging Face model downloads on first run (can take time)
-
Port already in use
- Stop old processes or change ports consistently in frontend/backend/service
- RAG index is in-memory (rebuilds after restart)
- Summarization and QA use retrieved context from the last processed PDF
See CONTRIBUTING.md.