EECS RAG System — CS 288 Assignment 3

Retrieval-Augmented Generation system for answering factoid questions about UC Berkeley EECS.

Architecture

question
  -> embed query (all-MiniLM-L6-v2)
  -> hybrid retrieval (FAISS dense + BM25 sparse, fused with RRF)
  -> top-k chunks injected into prompt
  -> LLM generates short answer via OpenRouter
  -> postprocess -> answer

Setup

conda create -n rag python=3.10.12 -y && conda activate rag
pip install torch>=2.0.0 transformers>=4.30.0 sentence-transformers>=2.2.2 \
    faiss-cpu>=1.7.4 rank-bm25>=0.2.2 numpy>=1.24.0 tqdm>=4.64.0 \
    matplotlib>=3.5.0 pandas>=1.5.0 seaborn>=0.11.0

# For local scripts only (not needed by autograder):
pip install requests beautifulsoup4

Offline Pipeline (run once locally)

# 1. Crawl EECS pages -> data/raw_html/ + data/url_map.json
python -m scripts.crawl --max-pages 2000 --delay 0.5

# 2. Chunk + embed + build indices -> data/{chunks.jsonl, faiss.index, bm25_corpus.json, meta.json}
python -m scripts.build_index --chunk-size 800 --overlap 200

Run (autograder entrypoint)

export OPENROUTER_API_KEY="sk-..."  # for local testing
bash run.sh data/example_questions.txt data/predictions.txt

Evaluate

python -m scripts.evaluate data/predictions.txt data/example_answers.txt

Test run before submitting

From the repo root, with dependencies installed (e.g. pip install -r requirements.txt or conda activate rag):

# 1. Set your OpenRouter API key (required for real answers)
export OPENROUTER_API_KEY="sk-..."

# 2. Run the pipeline (same as autograder)
bash run.sh data/example_questions.txt data/predictions.txt

# 3. Evaluate against reference answers
python3 -m scripts.evaluate data/predictions.txt data/example_answers.txt

Without OPENROUTER_API_KEY, every answer will be unknown (the pipeline catches the error and falls back so it still finishes). Use the same Python that has faiss, sentence_transformers, etc. installed.

Project Structure

├── run.sh                  # autograder entrypoint (calls main.py)
├── main.py                 # reads questions, writes predictions
├── llm.py                  # OpenRouter wrapper (autograder replaces this)
├── rag/
│   ├── __init__.py
│   ├── pipeline.py         # answer(question) -> string
│   ├── retrieve.py         # hybrid BM25 + FAISS retrieval
│   └── prompt.py           # prompt template + postprocessing
├── scripts/                # local-only (not used by autograder)
│   ├── crawl.py            # BFS crawler for eecs.berkeley.edu
│   ├── build_index.py      # chunk -> embed -> build indices
│   ├── generate_validation_from_corpus.py  # 100+ Q&A from chunks (for assignment)
│   └── evaluate.py         # EM + F1 evaluation
└── data/
    ├── chunks.jsonl         # one chunk per line: {url, text, title}
    ├── faiss.index          # FAISS inner-product index
    ├── bm25_corpus.json     # tokenised corpus for BM25
    ├── meta.json            # embed model, chunk config
    ├── example_questions.txt
    └── example_answers.txt

[Offline] eecs.berkeley.edu → crawl.py → 1,492 HTML files HTML files → build_index.py → 15,270 chunks + FAISS index + BM25 corpus

[Runtime] question → retrieve.py → 5 best chunks (hybrid BM25 + FAISS + RRF) 5 chunks + question → prompt.py → structured prompt prompt → llm.py → raw LLM output raw output → prompt.py/postprocess → clean short answer

Validation set (100+ questions for the assignment)

To create 100+ validation Q&A pairs whose answers appear in your corpus (so EM/F1 are meaningful), run after build_index:

python -m scripts.generate_validation_from_corpus --min-pairs 100

This overwrites data/validation_questions.txt and data/validation_answers.txt with pairs extracted from chunks (course numbers, faculty names, dates, buildings). Then run and evaluate:

bash run.sh data/validation_questions.txt data/predictions_validation.txt
python3 -m scripts.evaluate data/predictions_validation.txt data/validation_answers.txt

(Optional) Remove duplicate questions with python data/dedupe_validation.py before running the pipeline.

Creating the submission zip

Generate the retrieval datastore (if not already done):

python -m scripts.crawl --max-pages 2000 --delay 0.5
python -m scripts.build_index --chunk-size 800 --overlap 200

Build the zip (run from repo root):

bash scripts/build_submission.sh submission.zip

Submit submission.zip to Gradescope. The zip contains: run.sh, code (main.py, llm.py, rag/), requirements.txt, and data/ (chunks.jsonl, faiss.index, bm25_corpus.json, meta.json, plus any QA files).

Submission Checklist

run.sh accepts $1 (questions) and $2 (predictions)
Uses python3, not python
llm.py is unmodified (autograder overwrites it)
No direct OpenRouter calls outside llm.py
All paths are relative
Output has same line count as input, one answer per line
Works within 4 GB RAM, no GPU
Timeout handling per question (falls back to "unknown")
Ship data/{chunks.jsonl, faiss.index, bm25_corpus.json, meta.json} in zip (via build_submission.sh)

Repo vs local

In the GitHub repo: code, data/example_questions.txt, data/example_answers.txt, data/meta.json (if present). The large generated files (data/faiss.index, data/bm25_corpus.json, data/chunks.jsonl) are not in the repo (they’re in .gitignore).

To run locally: generate those data files by running the offline pipeline once (crawl, then build_index) as in Offline Pipeline above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EECS RAG System — CS 288 Assignment 3

Architecture

Setup

Offline Pipeline (run once locally)

Run (autograder entrypoint)

Evaluate

Test run before submitting

Project Structure

Validation set (100+ questions for the assignment)

Creating the submission zip

Submission Checklist

Repo vs local

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
rag		rag
scripts		scripts
.gitignore		.gitignore
CS_288_A3_Writeup.pdf		CS_288_A3_Writeup.pdf
README.md		README.md
llm.py		llm.py
main.py		main.py
requirements.txt		requirements.txt
run.sh		run.sh
validation_answers.txt		validation_answers.txt
validation_questions.txt		validation_questions.txt

Folders and files

Latest commit

History

Repository files navigation

EECS RAG System — CS 288 Assignment 3

Architecture

Setup

Offline Pipeline (run once locally)

Run (autograder entrypoint)

Evaluate

Test run before submitting

Project Structure

Validation set (100+ questions for the assignment)

Creating the submission zip

Submission Checklist

Repo vs local

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages