Hierarchical Metadata RAG vs Standard RAG

This project implements and compares two Retrieval-Augmented Generation (RAG) approaches:

Standard RAG: A baseline RAG system that retrieves documents based on semantic similarity alone.
Hierarchical RAG: An advanced RAG system that leverages hierarchical metadata (language, domain, section, topic, document type) to improve retrieval accuracy and efficiency.

The project includes a Gradio application for interactive testing and a comprehensive evaluation suite for generating performance reports.

Setup

Clone the repository:

git clone https://github.com/ikram98ai/hierRAG.git
cd hier-rag

Install dependencies: This project uses uv for package management.
```
uv sync
```
Set up environment variables: Create a .env file in the root of the project and add the following environment variables. You can copy the .env.example file.
```
cp .env.example .env
```
Update the .env file with your credentials:
```
HF_TOKEN=your_hugging_face_token
OPENAI_API_KEY=your_openai_api_key
GRADIO_MCP_SERVER=True
```

Usage

To run the Gradio application for interactive testing and evaluation:

uv run gradio src/app.py

or using the makefile:

make dev

This will launch a web interface with the following tabs:

Document Ingestion: Upload documents and assign metadata.
Chat with Data: Compare the performance of Standard RAG and Hierarchical RAG side-by-side.
Evaluation: Run a full evaluation on synthetic data and generate performance reports.

Deployment to Hugging Face Spaces

To deploy this application to Hugging Face Spaces, you can push the repository to a new Space.

Create a new Hugging Face Space.

Push the repository to the Space:

git remote add space https://huggingface.co/spaces/your-username/your-space-name
git push --force space main

Set the environment variables in the Space's settings.

Evaluation

The evaluation process can be triggered from the Evaluation tab in the Gradio application.

Navigate to the Evaluation tab.
Select the collections you want to evaluate.
Click "Setup Synthetic Test Data" to ingest the synthetic data for the selected collections.
Click "Run Full Evaluation" to start the evaluation.

The evaluation process will:

Ingest synthetic test data.
Run a series of predefined queries against both the Standard and Hierarchical RAG systems.
Generate and display a summary report with key performance metrics (Hit@1, Hit@3, Hit@5, MRR, Latency).
Provide download links for the full evaluation results in CSV and JSON formats.
Display a detailed summary report in Markdown format.

You can also run the evaluation programmatically by calling the functions in src/core/eval.py.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
makefile		makefile
pyproject.toml		pyproject.toml
rag_task_spec.md		rag_task_spec.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hierarchical Metadata RAG vs Standard RAG

Setup

Usage

Deployment to Hugging Face Spaces

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hierarchical Metadata RAG vs Standard RAG

Setup

Usage

Deployment to Hugging Face Spaces

Evaluation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages