RAG vs CAG with Llama 3.1 8B Model

This repository demonstrates and compares two powerful approaches for knowledge injection in large language models:

Retrieval-Augmented Generation (RAG)
Cache-Augmented Generation (CAG)

The experiments use the quantized Llama 3.1 8B Instruct model and analyze their effectiveness and efficiency for Ukrainian pop culture question-answering.

Requirements

Python 3.10
Poetry (for environment and dependency management)
Up to 16GB VRAM (GPU highly recommended for Llama 3.1 8B Instruct)
Hugging Face account with accepted request for the meta-llama/Llama-3.1-8B-Instruct model

Installation

Clone this repository

git clone https://github.com/Alex2135/RAG_vs_CAG_analysis
cd RAG_vs_CAG_analysis

Set up the Poetry virtual environment

poetry config virtualenvs.in-project true
poetry env use python3.10
source .venv/bin/activate
poetry install

Request access to the Llama 3.1 8B Instruct model
- Visit: https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
- Click “Access request” and wait for approval
Log in to Hugging Face from the notebook
- Insert your HF token when prompted (from https://huggingface.co/settings/tokens)

Running the Experiment

Open in Jupyter Lab (recommended for step-by-step analysis):

poetry run jupyter lab

What the Code Does

Loads a quantized Llama 3.1 8B model and tokenizer
Prepares pop culture context (biography of Stepan Giga) for experiments
Implements three knowledge-injection strategies:
1. Direct context-injection: Entire biography as prompt
2. RAG: FAISS+MiniLM-based retrieval of relevant context chunks for each question
3. CAG: One-time KV-caching of knowledge, enabling efficient follow-up questions
Compares number of input tokens required by each method
Visualizes results (absolute & relative token usage) using matplotlib

Results & Interpretation

The experiment demonstrates the trade-offs between classic context-stuffing, RAG, and CAG for token efficiency and suitability to large LLMs like Llama-3.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
RAG vs CAG with Llama3.1 8b model.ipynb		RAG vs CAG with Llama3.1 8b model.ipynb
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
stepan_giga_summary_full.md		stepan_giga_summary_full.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG vs CAG with Llama 3.1 8B Model

Requirements

Installation

Running the Experiment

What the Code Does

Results & Interpretation

Citation & License

About

Uh oh!

Releases

Packages

Languages

Alex2135/RAG_vs_CAG_analysis

Folders and files

Latest commit

History

Repository files navigation

RAG vs CAG with Llama 3.1 8B Model

Requirements

Installation

Running the Experiment

What the Code Does

Results & Interpretation

Citation & License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages