Table of Contents
TALLA-RAG is a local application that lets you chat with your documents using a local LLM backed by a Cassandra cluster for vector storage.
- Upload a
.txtor.pdffile via the sidebar. - Click Ingest to Cluster — chunks are embedded and stored across Cassandra nodes.
- Ask questions in the chat — the app retrieves relevant chunks and answers using the LLM only.
- Use Clear All Knowledge to wipe the vector store.
- Docker & Docker Compose
- Ollama running locally with the following models pulled:
ollama pull nomic-embed-text:v1.5 ollama pull granite3.2:2b ollama pull <your_model_name>
- Python 3.11 or higher
- uv (Python package manager)
pip install uv
-
Clone the repo
git clone https://github.com/nnay29/cassandra-cluster-RAG.git cd cassandra-cluster-RAG -
Copy and configure the environment file
cp .env.example .env
Edit .env and set DOCKER_HOST_IP to you machine's local IP adress
-
Start the Cassandra cluster
docker-compose up -d
-
Create a virtual environment
uv venv
-
Install Python dependencies
uv sync
-
Run the Streamlit app
streamlit run app.py
The app will be available at http://localhost:8501
See the open issues for a full list of proposed features (and known issues).
Project Link: https://github.com/nnay29/cassandra-cluster-RAG