NexusRAG: End-to-End Retrieval Augmented Generation with Graph Database

A comprehensive RAG system that uses a graph database to store and retrieve knowledge in a structured, interconnected way. The system features web scraping capabilities for data collection, document processing with semantic chunking, and a Streamlit interface for easy interaction.

Features

Multiple Data Source Options:
- Web scraping functionality to collect data from any website
- Scraping data collection module (Wikipedia format Suported)
Advanced Text Processing:
- Semantic chunking with configurable size and overlap
- High-quality embeddings using Sentence Transformers
- Context expansion for improved relevance
Graph Database Integration:
- Neo4j backend for knowledge storage
- Semantic relationships between text chunks
- Document and chunk hierarchies with metadata
Streamlined User Interface:
- Interactive Streamlit application
- Ability to ask question or provide URLs for ingesting data
- Visualization of the knowledge graph structure (TBD)
Customizable LLM Integration:
- Configurable to work with any OpenAI model
- Extensible design for other LLM providers

Installation

Prerequisites

Python 3.10
Neo4j Database (local, docker or cloud instance)
OpenAI API key (or equivalent)

Setup

Clone this repository:

git clone git@github.com:MinaBeirami/nexusRAG.git
cd nexusRAG

Install the required dependencies:
```
pip install -r requirements.txt
```
Set up environment variables:

Edit the .env file with your API keys and database credentials.
Start the Neo4j database (if using a local instance): Run it via Docker Desktop, OR run the following Command:
```
# If using Docker
docker run --name neo4j -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=neo4j/password -d neo4j
```

Usage

Starting the Application

streamlit run app.py

This will start the Streamlit server, and you can access the application at http://localhost:8501.

Data Collection

The system provides several ways to collect data:

Web Scraping: Enter URLs to scrape content from websites
Paste Text: Directly paste text content into the application
(TODO) Upload Files: Upload local documents (PDF, DOCX, TXT, CSV)
(TODO) Hugging Face Datasets: Select and import datasets from Hugging Face

Building the Knowledge Graph

After collecting data, the system will:

Process documents into semantic chunks
Generate embeddings for each chunk
Store chunks and their relationships in the Neo4j graph database
Create semantic relationships between related chunks

Querying the System

Once the knowledge graph is built, you can:

Ask questions in natural language
View the retrieved context used to answer the question
(TODO)Explore the knowledge graph visually
Export answers and sources

Configuration

The system can be customized through the src/config/settings.py file:

embedding_model: Change the embedding model (default: "all-MiniLM-L6-v2")
chunk_size: Adjust the size of text chunks (default: 500)
chunk_overlap: Set the overlap between chunks (default: 50)
llm_model: Select the LLM model (default: "gpt-3.5-turbo")

Architecture

The system follows a modular architecture:

data_collector.py: Modules for acquiring data from various sources
text_processor.py: Text processing, chunking, and embedding generation
engine.py: Core RAG implementation with LLM integration
graph_handler.py: Neo4j database interaction
app.py: Streamlit user interface

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NexusRAG: End-to-End Retrieval Augmented Generation with Graph Database

Features

Installation

Prerequisites

Setup

Usage

Starting the Application

Data Collection

Building the Knowledge Graph

Querying the System

Configuration

Architecture

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
.env		.env
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

MinaBeirami/nexusRAG

Folders and files

Latest commit

History

Repository files navigation

NexusRAG: End-to-End Retrieval Augmented Generation with Graph Database

Features

Installation

Prerequisites

Setup

Usage

Starting the Application

Data Collection

Building the Knowledge Graph

Querying the System

Configuration

Architecture

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages