Skip to content

morikonon/docquery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔍 RAG — Retrieval-Augmented Generation System

A lightweight RAG pipeline built with ChromaDB, HuggingFace Embeddings, and OpenAI GPT. Ask questions about your own documents and get accurate, context-grounded answers.


📐 Architecture

Your Document (.txt / .pdf)
        ↓
   [Loader] → [Chunker] → [Embedder] → [ChromaDB Vectorstore]
                                                ↓
                                User Question → [Retriever]
                                                ↓
                                           [Generator] → Answer

📁 Project Structure

RAG/
├── data/                  # ← Put YOUR documents here (.txt or .pdf)
├── src/
│   ├── api.py             # FastAPI backend
│   ├── loader.py          # Document loading (.txt, .pdf)
│   ├── chunker.py         # Text splitting into chunks
│   ├── embedder.py        # HuggingFace embeddings + ChromaDB
│   ├── retriever.py       # Similarity search
│   ├── generator.py       # OpenAI GPT answer generation
│   ├── ui.py              # Streamlit frontend
│   └── main.py            # CLI entry point (for testing)
├── tests/
├── .env                   # Your API keys (create this yourself)
├── .gitignore
└── requirements.txt

⚙️ Setup

1. Clone the repository

git clone https://github.com/your-username/RAG.git
cd RAG

2. Create and activate a virtual environment

python3 -m venv venv
source venv/bin/activate        # macOS / Linux
venv\Scripts\activate           # Windows

3. Install dependencies

pip install -r requirements.txt

4. Add your OpenAI API key

Create a .env file in the project root:

OPENAI_API_KEY=sk-...your-key-here...

⚠️ Never commit your .env file. It is already listed in .gitignore.

5. Add your documents

Place your .txt or .pdf files inside the data/ folder:

data/
└── your_document.txt

Then update the file path in src/api.py (line 9):

docs = load_documents("/absolute/path/to/your/RAG/data/your_document.txt")

🚀 Running the App

Start the FastAPI backend

cd src
uvicorn api:app --reload

API will be available at: http://127.0.0.1:8000
Swagger UI (for testing): http://127.0.0.1:8000/docs

Start the Streamlit frontend (in a new terminal)

# From the project root
streamlit run src/ui.py

UI will be available at: http://localhost:8501


💬 Usage

  1. Open the Streamlit UI at http://localhost:8501
  2. Type your question in the chat input
  3. The system retrieves relevant chunks from your document and generates an answer

Or test directly via Swagger UI at http://127.0.0.1:8000/docsPOST /ask:

{
  "question": "What is MoE?",
  "history": ""
}

🛠️ Tech Stack

Component Technology
Backend FastAPI
Frontend Streamlit
Embeddings sentence-transformers/all-MiniLM-L6-v2
Vector Store ChromaDB (in-memory)
LLM OpenAI GPT (via langchain-openai)
Doc Loading LangChain TextLoader / PyPDFLoader

📦 Requirements

See requirements.txt. Key dependencies:

fastapi
uvicorn
streamlit
langchain
langchain-community
langchain-openai
chromadb
sentence-transformers
python-dotenv
pypdf

📌 Notes

  • The vectorstore is in-memory — it rebuilds on every server restart.
  • Only one document is loaded at a time. To use a different document, change the path in api.py and restart the server.
  • The history field in the API request is optional — pass an empty string if not needed.

📄 License

MIT License — feel free to use and modify.

docquery

About

Lightweight RAG implementation using LangChain, ChromaDB, and OpenAI GPT for document Q&A.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages