A tiny Flask app that answers questions about your PDFs using local embeddings (MiniLM) + FAISS + an extractive QA model (Distil/Roberta SQuAD). No API keys required.
- Drop PDFs in
./data, runingest.pyto index - Ask questions at
http://127.0.0.1:5000 - 100% local — no internet access needed
- Pure local pipeline (CPU-friendly)
git clone https://github.com/manasmannu/knowledge-chatbot.git
cd knowledge-chatbotpython -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activatepip install -r requirements.txtRun the below to remove the index
rm -rf vectorstorePut one or more PDF files inside the data/folder (There is one sample file in the folder already, you can remove it and add your own pdf).
python ingest.py # builds ./vectorstore from PDFs in ./data
python app.py # starts the web apphttp://127.0.0.1:5000- PDF Processing → Extracts text using pypdf
- Chunking → Splits text into ~1000-character segments for semantic search
- Embedding → Converts chunks into dense vectors using MiniLM
- Vector Store → Saves embeddings in FAISS for fast similarity search
- Retrieval → On user query, retrieves top-matching chunks
- QA Model → Uses Roberta (deepset/roberta-base-squad2) to extract the exact answer
- Response → Displays the best-matched answer + PDF source


