🧠 DeepFind – Internal Wiki Q&A Assistant

DeepFind is a fully private, offline question-answering (QA) assistant powered by a lightweight open-source LLM (TinyLlama) to answer natural language questions based on your internal Confluence or Wiki documentation.

It is designed to:

🛡️ Run entirely offline (no cloud dependency)
📚 Automatically crawl and index large internal wiki hierarchies
💬 Provide a simple web chat UI and a REST API for integrations
🧠 Use fast semantic search and local LLM to generate answers

📆 Project Structure

.
├── ConfluenceCrawler.py               #1: Wiki crawler using REST API
├── ConfluenceEmbeddingPipeline.py     #2 output: vector index & metadata
├── ConfluenceQAPipeline.py            #3: CLI-based semantic search + TinyLlama answer
├── DeepFind.py                        #4: Web UI + REST API for local answering
├── requirements.txt                   # All required Python packages
├── README.md

⚙️ Installation

# Clone this repo
$ git clone https://github.com/Siddhanta-10/DeepFind
$ cd deepfind

# Install dependencies
$ pip install -r requirements.txt

# Optional (to suppress symlink warnings on Windows)
$ set HF_HUB_DISABLE_SYMLINKS_WARNING=1

🧹 Requirements

Python 3.8+
RAM: 4–8 GB minimum (TinyLlama runs comfortably on CPU)

🚀 Usage

Step 1: Crawl Confluence or Wiki Pages (not in this repo)

Use your REST API–based crawler to download pages into a folder. Each page should be saved as an HTML file and referenced in a metadata file.

$ python ConfluenceCrawler.py

Step 2: Build FAISS index (embedding)

$ python ConfluenceEmbeddingPipeline.py

Splits documents into chunks
Embeds them using MiniLM
Saves FAISS index to faiss_index/index.faiss

NOTE: These steps are already done if you have the faiss_index/ folder.

Step 3: Run QA in CLI (offline)

$ python ConfluenceQAPipeline.py

This will let you ask questions and get LLM-based answers in the terminal.

Optional: Start Web UI or REST API

$ python DeepFind.py        # dev mode
# OR
$ waitress-serve --port=8000 DeepFind:app  # production

Visit http://localhost:5000 to chat.

Optional: Use GPT-3.5 instead of TinyLlama

$ pip install openai
$ export OPENAI_API_KEY=your-key-here

Then modify answer() function in the code to use OpenAI API.

🔐 Privacy First

This system does not send any data to external services.
You can run this fully inside your organization firewall.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 DeepFind – Internal Wiki Q&A Assistant

📆 Project Structure

⚙️ Installation

🧹 Requirements

🚀 Usage

Step 1: Crawl Confluence or Wiki Pages (not in this repo)

Step 2: Build FAISS index (embedding)

Step 3: Run QA in CLI (offline)

Optional: Start Web UI or REST API

Optional: Use GPT-3.5 instead of TinyLlama

🔐 Privacy First

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
ConfluenceCrawler.py		ConfluenceCrawler.py
ConfluenceEmbeddingPipeline.py		ConfluenceEmbeddingPipeline.py
ConfluenceQAPipeline.py		ConfluenceQAPipeline.py
DeepFind.py		DeepFind.py
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 DeepFind – Internal Wiki Q&A Assistant

📆 Project Structure

⚙️ Installation

🧹 Requirements

🚀 Usage

Step 1: Crawl Confluence or Wiki Pages (not in this repo)

Step 2: Build FAISS index (embedding)

Step 3: Run QA in CLI (offline)

Optional: Start Web UI or REST API

Optional: Use GPT-3.5 instead of TinyLlama

🔐 Privacy First

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages