GitHub

ranked

Build a local RAG index from EXA search results. The pipeline targets healthcare businesses and NYC restaurants, crawls site content, chunks it, embeds it, and stores the vectors in a local Chroma DB.

Requirements

Python 3.12+
uv package manager
EXA API key

Setup

Create a virtual environment and install dependencies:

uv venv
uv pip install -e .

Set your EXA API key in .env:

EXA_API_KEY=your_key_here

Optional: add OPENAI_API_KEY to enable entity resolution for directory/aggregator pages and follow-up searches for official business websites.

Run

Using just:

just scrape --output-dir rag_index --collection exa_rag

Or directly:

.venv/bin/python main.py --output-dir rag_index --collection exa_rag

Notes

Crawling is limited by --max-pages-per-domain and --max-total-pages.
Business targets can be set with --target-healthcare and --target-nyc-restaurants.
Raw crawled pages are written to rag_pages.json.
The raw crawled pages are written to rag_pages.json for inspection.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
frontend		frontend
generated_blogs		generated_blogs
modules		modules
rag_index		rag_index
screenshots		screenshots
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
api.py		api.py
justfile		justfile
main.py		main.py
pyproject.toml		pyproject.toml
rag_businesses.json		rag_businesses.json
rag_pages.json		rag_pages.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ranked

Requirements

Setup

Run

Notes

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

adagradschool/ranked

Folders and files

Latest commit

History

Repository files navigation

ranked

Requirements

Setup

Run

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages