DeepFind is a fully private, offline question-answering (QA) assistant powered by a lightweight open-source LLM (TinyLlama) to answer natural language questions based on your internal Confluence or Wiki documentation.
It is designed to:
- 🛡️ Run entirely offline (no cloud dependency)
- 📚 Automatically crawl and index large internal wiki hierarchies
- 💬 Provide a simple web chat UI and a REST API for integrations
- 🧠 Use fast semantic search and local LLM to generate answers
.
├── ConfluenceCrawler.py #1: Wiki crawler using REST API
├── ConfluenceEmbeddingPipeline.py #2 output: vector index & metadata
├── ConfluenceQAPipeline.py #3: CLI-based semantic search + TinyLlama answer
├── DeepFind.py #4: Web UI + REST API for local answering
├── requirements.txt # All required Python packages
├── README.md
# Clone this repo
$ git clone https://github.com/Siddhanta-10/DeepFind
$ cd deepfind
# Install dependencies
$ pip install -r requirements.txt
# Optional (to suppress symlink warnings on Windows)
$ set HF_HUB_DISABLE_SYMLINKS_WARNING=1- Python 3.8+
- RAM: 4–8 GB minimum (TinyLlama runs comfortably on CPU)
Use your REST API–based crawler to download pages into a folder. Each page should be saved as an HTML file and referenced in a metadata file.
$ python ConfluenceCrawler.py$ python ConfluenceEmbeddingPipeline.py- Splits documents into chunks
- Embeds them using MiniLM
- Saves FAISS index to
faiss_index/index.faiss
NOTE: These steps are already done if you have the
faiss_index/folder.
$ python ConfluenceQAPipeline.pyThis will let you ask questions and get LLM-based answers in the terminal.
$ python DeepFind.py # dev mode
# OR
$ waitress-serve --port=8000 DeepFind:app # productionVisit http://localhost:5000 to chat.
$ pip install openai
$ export OPENAI_API_KEY=your-key-hereThen modify answer() function in the code to use OpenAI API.
- This system does not send any data to external services.
- You can run this fully inside your organization firewall.